Introduction

The coronavirus disease of 2019 (COVID-19) is a severe acute respiratory syndrome that officially appeared for the first time in Wuhan, a city in the Hubei province of China, in December 2019. From the end of February 2020, the virus was rapidly spreading across the globe, dramatically changing every aspect of people’s lives. As of 1 November 2021, the COVID-19 pandemic had affected almost all countries in the world, with about 250 million confirmed cases and more than 5 million deaths1. At the time of writing, the virus has been mutating by generating new forms or variants of itself—the most important of which were first found in the UK, South Africa, Brazil, and India2—making the fight against the outbreak even more difficult. In fact, many countries which are approaching the third or even fourth wave of infections have had to reintroduce or extend their lockdowns and social distancing measures. The worst-hit countries include both advanced and developing ones, such as Brazil, France, India, Italy, Russia, Turkey, the UK, and the US.

In these circumstances, it has become crucial to identify the optimal containment and mitigation policies to prevent and manage the spread of the outbreak and prepare a plan to tackle the risk of future epidemics and pandemics. In the last year, a closer look has been taken at the potential adverse impact of air pollution on the spread dynamic and death toll of COVID-19. In fact, it is widely recognized that several air pollutants, such as benzo[a]pyrene (BaP), nitrogen dioxide (NO2), ozone (O3), particulate matter (PM), and sulfur dioxide (SO2), can cause irritation, inflammation, and serious infections and diseases to the lungs and airways3,4,5. This is a matter of great concern, considering that according to an EEA report [Ref.6, pp. 40, 42], the annual emissions of PM2.5 and PM10 in 2018 exceeded the limits set by the World Health Organization7 Air Quality Consultant (AQG) at 70% and 53% of the stations spread across European countries, respectively.

In particular, the relationship between air pollution exposure and COVID-19 revealed that poor air quality may have favored COVID-19 transmissibility around the world8,9,10,11,12,13 and may have enhanced the risk of severe and fatal COVID-1914,15,16,17,18,19.

This study may be of interest for two main reasons. First, as of 1 November 2021, Italy is one of the most affected countries worldwide, with 4,796,929 confirmed cases, that is, about 8% of the whole resident population, and 132,263 confirmed deaths. Second, although the literature has already established a positive and significant relationship between air pollution and COVID-19 spread/mortality in Italy9,12,19,20,21,22,23,24,25,26, these studies may have suffered from some limitations: (i) they mainly focused on a number of regions and provinces and referred to the early phase of the outbreak; (ii) in many cases, they focused on the impact of short-term exposure to common air pollutants—NO2, O3, PM, and SO2—on COVID-19 infections and deaths; iii) they did not consider other potentially dangerous air pollutants, such as polycyclic aromatic hydrocarbons (PAHs) and heavy metals; (iv) they did not consider other important covariates (except for Refs.12,19), such as demographic characteristics, weather conditions, population habits and structure, and industrial centers; and (v) finally, they did not explicitly consider the spatial dependency of COVID-19 infections, that is, the possibility that neighboring territories may have affected each other through the movement of people.

In this study, I try to fill this gap by jointly considering all these aspects. Thus, the goals of this study are the following: (i) I investigate the general air quality in the Italian provinces in the period 2014–2019, trying to assess the main sources of outdoor air pollution and identifying the most polluted territories in the country; and (ii) I use negative binomial (NB) regression model, an ordinary least squares (OLS) econometric approach, and spatial autoregressive (SAR) model to assess the relationship between long-term exposure to nine air pollutants in the period 2014–2019—NO2, O3, PM2.5, PM10, benzene, BaP, arsenic (As), cadmium (Cd), and nickel (Ni)—and COVID-19 spread and related mortality at the second peak of the outbreak. [Note 1: This is an important task because the risk of multiple COVID-19 waves is real27,28,29.

The rest of the paper is organized as follows. “Environmental pollution in the Italian provinces” discusses the air quality in the Italian provinces; “Literature review” discusses the related literature; “Data” presents the data used in the empirical analysis; “Empirical strategy” discusses the empirical strategy; “Results and discussion” presents and discusses the results; "Limitations" discusses the main study limitations; and finally, “Conclusions” provides some conclusive considerations.

Environmental pollution in the Italian provinces

In this section, the main sources of nine air pollutants and the general quality of air in the 107 Italian provinces are investigated. According to European Environment Agency30, industry processes, road transport, agricultural activities, waste management, energy production and distribution (especially from fossil sources), natural phenomena (i.e., volcanic eruptions, sandstorms, etc.), public buildings, and households are the main causes of outdoor air pollution. For instance, exhaust emissions from vehicles and the abrasion of pneumatics and brakes can release benzene, Cd, carbon monoxide (CO2), lead (Pb), mercury (Me), NO2, PM2.5, PM10, and sulfur oxides (SOx) into the atmosphere31,32 and favor chemical reactions that increase the likelihood of O3 formation. Business activities, livestock buildings, and households are the major factors responsible for production of PM2.533. Industrial activities burning fuels (coal, petroleum, wood, etc.), components of smoke cigarettes, forest fires, and vehicle exhaust emissions are the main causes of benzene and BaP34,35.

Thus, in Table 1, I report the Spearman’s rank correlation coefficient between the nine investigated air pollutants (described in Table B1, Appendix B) and six potential sources of environmental pollution in the period 2014–2019: big firms with over 250 employees per square kilometer in the period 2014–2019; final consumption of energy and natural gas expressed as tons of oil equivalent per square kilometer in the period 2014–2019; number of vehicles used to transport goods and passengers (cars, motorcycles, and other vehicles) per square kilometer in the period 2014–2019; overall supply of local public transport expressed as number of seats per inhabitants in the period 2014–2019; the production of cattle fodder from permanent grassland expressed as quintal per square kilometer in the period 2014–2019; and the number of livestock (bovines, buffalos, and pigs) per square kilometer in the period 2014–2019. [Note 2: Data on energy and gas consumption, vehicles density, and public transports refer to the provincial capital; while data on big firms and production of cattle fodder are at provincial level. Only data on livestock density are at regional level]. Data were extracted from I.Stat database36, except for energy and gas consumption37,38, supply of local public transport39, and number of vehicles used to transport goods and passengers38,40. The results show that common air pollutants are positively and significantly correlated with big firms, energy and gas consumption, density of vehicles, public transport, cattle fodder, and livestock density. Big firms, energy and gas consumption, and livestock density had the highest rank correlation coefficients. Notably, NO2, O3 (>120), O3 (>180), PM2.5, and PM10 (>50) showed rank correlation coefficients ranging from 0.58 to 0.68 for large firms, from 0.51 to 0.72 for energy and gas consumption, and from 0.38 to 0.71 for livestock density. This may have been partially caused by the ammonia (NH3) generated in the urine and feces of cattle41,42, which contributes to the formation of two relevant (secondary) components of particulate matter, ammonium nitrate and ammonium sulphate43. In fact, according to Greenpeace and the Italian Institute for Environmental Protection and Research (ISPRA)44, animal husbandry was the second leading cause of air pollution in Italy in the period 1990–2018, accounting for 17% of all PM2.5 formation.

Table 1 Spearman’s rank correlation coefficients between nine air pollutants and six potential sources of environmental pollution.

Among PAHs, benzene is positively correlated with big firms, energy and gas consumption, and vehicle density at 1% level of significance, and with public transport at 5% level of significance. BaP is positively associated with energy and gas consumption and cattle fodder production at 1% level of significance, and with livestock density at 5% level of significance. Heavy metals are significantly and positively correlated especially with large firms and energy and gas consumption. Notably, the Spearman’s rank correlation coefficients for PAHs and heavy metals are lower than those for common air pollutants. Although these correlations do not imply causation, they warn of the potentially dangerous effects of large firms, vehicles, energy and gas consumption, and livestock sector.

This is particularly worrying because according to the Air Quality Standards established by the European Commission45, the legal threshold for key air pollutants was violated multiple times by most of the Italian provinces in the period 2014–2019 (Table 2). Specifically, almost all provinces (106 out of 107) violated the PM10 limit of 50 µg/m3 both in the short- and long-term, resulting in a national average of 25.15 violations per year. Notably, the legal thresholds for both measures of O3 were also violated several times both in the short- and long-term, with a maximum of 95 provinces involved. Regarding the average concentrations of NO2, PM2.5, and PM10, the violations were fewer, respectively involving 15, 17, and five provinces in the short-term and 11, four, and no provinces in the long-term. Among the PAHs, the legal limit for BaP was violated by 13 provinces in the short-term and seven provinces in the long-term, while the legal threshold for benzene was never exceeded. No provinces registered violations for heavy metals, except Aosta and Terni, which exceeded the legal limit of Ni in the short-term.

Table 2 Provinces that exceeded the EU legal threshold in the period 2014–2019.

The situation becomes even worse when we consider the most restrictive legal thresholds set by the World Health Organization46. In this case, the legal threshold for PM2.5 and PM10 was violated respectively by 88 and 93 provinces in the short-term and by 85 and 86 provinces in the long-term (Table 3). Unlike EU law, the WHO has not established safe limits for the PAHs (benzene and BaP) and heavy metals (As and Ni) considered, except for Cd, which remains unchanged. This is not very surprising because according to the EEA47, Italian and Polish cities were the ones with the highest levels of PM2.5 in the period 2019–2020, among 323 investigated localities. In fact, among Europe’s 53 worst cities for PM2.5 levels, 20 were in Italy.

Table 3 Provinces that exceeded the WHO AQG threshold in the period 2014–2019.

In Table 4, I also calculate a synthetic environmental pollution index for the Italian provinces in the period 2014–2019, using data on NO2, O3 (>120), PM2.5, and PM10, for which there are sufficient observations. Specifically, the index is compiled by switching the data on each of the four air pollutants considered to fixed-base indexes (with average = 1), from whose arithmetic mean I achieve the final standardized index. Provinces are ranked from the most polluted to the cleanest.

Table 4 A synthetic environmental pollution index for the Italian provinces in the period 2014–2019.

The output shows that the top positions are all in Northern Italy. In particular, the 29 most polluted Italian provinces are all concentrated in the eight northern regions of Italy. Among them, the top six positions are held by provinces within Lombardy, that is, the Italian region which has been most severely hit by the COVID-19 outbreak.

On the contrary, the southern provinces hold the lowest positions in the ranking. In the bottom 20 positions of the ranking, 16 are southern provinces, only four provinces are in Central Italy (Fermo, Macerata, Pistoia, and Viterbo), and none are in Northern Italy.

While the most polluted southern provinces are Naples and Chieti, they are in 29th and 41st place, respectively. The results reflect the deep historical gap in industrialization and development between the north and south of Italy48,49.

An air pollution map for the average long-term concentrations or violations of each air pollutant in the Italian provinces is given in Fig. 1.

Figure 1
figure 1

Average long-term outdoor concentrations (or violations) of NO2, O3, PM2.5, PM10, benzene, BaP, As, Cd, and Ni, in the 107 Italian provinces. When no data are available, the province is grey colored. The map was generated using Microsoft Excel software 2021. All the sources used to collect the data are reported in detail in the Appendix B.

Literature review

It is well-established that air pollution exposure can adversely affect lung function. NO2, O3, PM2.5, and PM10 can be risk factors for several respiratory diseases, such as asthma50, bronchiectasis51, chronic obstructive pulmonary disease (COPD)52, invasive pneumococcal disease (IPD)53, lung cancer54, and general respiratory infections55. Meanwhile, exposure to airborne PAHs can worsen respiratory infections and increase the risk of several non-malignant respiratory diseases associated with exposure to other air pollution, such as particulate matter56. Exposure to heavy metals, such as As, Cd, chromium (Cr), mercury (Hg), Ni, and Zinc (Zn), may induce airway inflammation, lung irritation, and pulmonary oedema57,58,59,60, contributing to oxidative stress in lung tissue61,62,63. E.g., Cd and Ni exposure may lead to emphysema and asthma, respectively57, while arsenic exposure may increase the risk of developing pulmonary fibrosis64.

Therefore, in the last year and a half, a large body of literature has focused its attention on the relationship between air quality and the COVID-19 pandemic propagation pattern and mortality. Bashir et al.8 used two non-parametric statistical techniques—Kendal and Spearman rank-order correlation coefficients—to investigate the association between seven air pollutants and COVID-19 cases and deaths in California. Specifically, they analyzed the concentrations of CO, NO2, Pb, PM2.5, PM10, SO2, and volatile organic compounds (VOC) from 4 March 2020 to 24 April 2020. They found that short-term exposure to CO, NO2, PM2.5, PM10, and SO2 was significantly and positively correlated with COVID-19 cases and deaths, and the highest correlation coefficients were shown by NO2 and PM2.5.

Becchetti et al.19 used several statistical techniques, such as the difference-in-difference (DID) approach, ordinary least square (OLS) panel regression, and cross-sectional and panel fixed-effect spatial autoregressive combined models (SAC), to investigate the role of three major air pollutants in the spread of COVID-19 in 96 Italian provinces from 24 February 2020 to 15 April 2020. They found that average concentrations of NO2, PM2.5, and PM10 (registered in 2018) were highly significant and positively associated both with COVID-19 mortality and infections. The results were also confirmed after controlling for a number of demographic, environmental, economic, and healthcare covariates.

By using a mixed linear multiple regression approach, Hendryx and Luo15 analyzed the effect of long-term exposure (in the period 2014–2019) to diesel particulate matter (DPM), O3, and PM2.5 in relation to COVID-19 susceptibility or outcomes in the US. Specifically, they investigated the cumulative confirmed cases as of 31 May 2020, finding that DPM alone was significantly and positively associated with COVID-19 prevalence, and robust enough against changes in the specifications. Although positive, the coefficient of PM2.5 was not robust enough.

Cole et al.14 examined the link between confirmed COVID-19 cases, deaths, hospitalizations, and long-term exposure (in the period 2010–2019) to three major air pollutants (O3, PM2.5, and SO2) in 355 municipalities in The Netherlands. By using instrumental variable (IV) regressions, NB approaches, and spatial autoregressive models with autoregressive disturbances (SARAR), they found that only the PM2.5 coefficient was significant and robust against changes in the specifications. Specifically, for every 1 µg/m3 increase in PM2.5 concentrations, there was an increase of 9.4 cases, 2.3 deaths, and three hospitalizations.

Liang et al.65 used zero-inflated negative binomial (ZINB) models to analyze the association between long-term exposure (in the period 2010–2016) to NO2, O3, and PM2.5, and COVID-19 case-fatality and mortality rates in 3,076 US counties. They found that only NO2 had a significant and positive association with both COVID-19 case-fatality rate and mortality rate from 22 January 2020 to 17 July 2020.

By using a generalized additive model (GAM), Zhu et al.11 investigated the short-term relationship between several air pollutants and daily confirmed COVID-19 cases in 120 Chinese cities from 23 January 2020 to 29 February 2020. They found that 1-unit µg/m3 increases in NO2, O3, PM2.5, and PM10 were associated with 0.69%, 0.48%, 0.22%, and 0.18% increases respectively in daily confirmed COVID-19 cases. On the contrary, a 1-unit µg/m3 increase in SO2 was linked with a 0.78% decrease in daily confirmed COVID-19 cases.

Dales et al.66 analyzed the relationship between short-term exposure to CO, NO2, and PM2.5 and COVID-19 related mortality in Santiago (Chile). They used a two-stage random effects model for count data in the period 16 March 2020–31 August 2020. In particular, they found that daily deaths from COVID-19 related mortality grew by 6% for an interquartile range (IQR) increase in CO, NO2, and PM2.5. No significant effects were detected for O3.

Solimini et al.13 used negative binomial mixed–effect models (NBMM) to investigate the association between long-term exposure (in the period 2015–2018) to PM10 and PM2.5 and COVID-19 cases in a large sample of countries. The data came from 63 countries, 730 regions, and five continents, and was updated on 30 May 2020. After adjusting the models for several regional and country covariates and spatial correlation, they found that 1-unit µg/m3 increases in the PM2.5 and PM10 concentrations were significantly correlated with increases of 0.81% and 1.15% respectively in the total number of confirmed COVID-19 cases in a 14-day window.

Table 5 summarizes 25 international studies on the relationship between environmental pollution and the spread of COVID-19 infections.

Table 5 25 Selected studies on the relationship between exposure to air pollution and COVID-19 spread and related mortality across the world.

Data

In this section, I report the variables used in the empirical analysis. First, to avoid spurious correlations and mitigate the problem of omitted variables, I implement 18 covariates to account for geographical proximity, demographic characteristics, population habits and structure, industrial centers, and weather conditions:

  • four dummy variables to identify the provinces that border Austria, France, Slovenia, and Switzerland respectively;

  • a dummy variable to identify the provinces that are also the regional capital;

  • the size of each province expressed in square kilometers;

  • the distance between the provincial capital’s center and the nearest airport with at least 50,000 passengers in the period from January to November 2020;

  • the foreign-born population as a percentage of total resident population in each province, in 2020;

  • the share of population aged 0–19 in each province, in 2020;

  • the share of male population in each province, in 2020;

  • the degree of urbanization of the population in each province;

  • the average share of obese individuals at regional level, in the period 2016–2019;

  • the average share of smokers at regional level, in the period 2016–2019;

  • the average deaths from chronic lower respiratory tract disease (per 100,000 inhabitants) in each province, in the period 2014–2019;

  • the number of firms with 250 or more employees per 100 square kilometers in each province, in the period 2014–2019;

  • the average altitude of the capital of the province;

  • the average annual days of rain in each province, in the period 2007–2018;

  • the average annual temperature in each province, in the period 2008–2018.

[Note 3: In Table A1 (Appendix A), I considered the pairwise correlation between the main control variables. The reported correlation coefficients for each pair of variables were always lower than the typical cutoff of 0.8074, and only 4 (out of 72) correlations were greater than the most restrictive cutoff of 0.575, ranging from 0.51 to 0.61 (in absolute value). In Table A2 (Appendix A), I considered the pairwise correlation between control variables and air pollutants. Only 6 (out of 117) correlations were greater than the restrictive cutoff of 0.575, ranging from 0.55 to 0.62 (in absolute value). This allows to strongly advocate the simultaneous inclusion of the covariates and single air pollutants].

What follow is a brief literature summary of the relationship between the main control variables and the spread and mortality of COVID-19. First, sex and age composition of population may be an important parameter in explaining the current outbreak. Some studies found that male population was more susceptible to contract COVID-19 infection76, and to have fatal outcomes than female population77,78. Young people and children are less likely to have severe and mild symptoms of COVID-19—such as fever and respiratory symptoms—than adults. Since they usually escape detection by health surveillance system, they could act as silent vectors of COVID-19 transmission79,80.

Population distribution may also affect transmission patterns of COVID-19 because in most densely populated and urban areas the spatial proximity means that people are more likely to contact other individuals81. This may contribute to spreading the contagion and exacerbate COVID-19 related mortality, such as observed in Brazil82, India83, and Italy84.

A number of studies found that the presence of at least one comorbidity, such as chronic lung disease and obesity, may have an adverse impact on patients with COVID-1985,86. In particular, comorbid respiratory and lung disease were found to be associated with higher COVID-19 prevalence87, and with higher risk for severe disease and mortality in COVID-1987,88. Similarly, higher prevalence of obesity may have increased the risk of severe COVID-19 outcomes for hospitalized patients in Milan, Italy89, in the UK90 and in New York City, US91.

The smoking habit in the population may have also played a role in the spread of COVID-19. In fact, even if the relationship between smoking and COVID-19 disease remain substantially unclear92,93, several studies found a partially unexpected protective effect of smoking/nicotine against COVID-1994,95,96. Active smokers were less likely to be infected with COVID-19 than non-smokers, by suggesting the existence of a smokers’ paradox in COVID-1996.

Predictive meteorological and geographical factors were also widely investigated. A number of studies found that higher altitude can mitigate the adverse effect of COVID-19 (transmission and related deaths) in Colombia97, Peru98, and the US99. Huamaní et al.100 suggested that, in Peruvian districts, this may be caused by the combination of low population density and smaller population. The results of the impact of average temperature on COVID-19 cases were mixed. If some studies found that temperature may increase the spread of COVID-198,101, other research found a negative statistical association between the two variables102,103.While, there is a substantial consensus in the literature that warmer climate conditions may reduce COVID-19 mortality103,104,105 and case-fatality rate17. Even if rainfall was not found to be an important predictive factor in COVID-19 spread and mortality in the most literature106, a recent paper pointed out that rainfall may lead to higher social distancing and help to mitigate the adverse effects of the outbreak107. The provinces with international borders, the share of foreign population, the province capital’s distance from the nearest airport, and the size of the province are used to control the effect of the movement of people. Finally, the large firms can be seen as a proxy for greenhouse gases, such as CO2 and methane.

Regarding the explanatory variables, I chose the following nine air pollutants, calculated—when data are available—for each Italian province:

  • the average concentrations of NO2, expressed in micrograms per cubic meter of air (µg/m3), in the period 2014–2019;

  • the average number of days in which Ozone exceeded the limit of 120 µg/m3, in the period 2014–2019;

  • the average number of hours in which Ozone exceeded the limit of 180 µg/m3, in the period 2014–2018;

  • the average concentrations of PM2.5, expressed in µg/m3, in the period 2014–2019;

  • the average concentrations of PM10, expressed in µg/m3, in the period 2014–2019;

  • average number of days in which PM10 exceeded the limit of 50 µg/m3 in the period 2014–2018;

  • the average concentrations of benzene, expressed in µg/m3, in the period 2014–2016;

  • the average concentrations of BaP, expressed in nanogram per cubic meter of air (ng/m3), in the period 2014–2018;

  • the average concentrations of As, expressed in ng/m3, in the period 2014–2016;

  • the average concentrations of Cd, expressed in ng/m3, in the period 2014–2016;

  • the average concentrations of Ni, expressed in ng/m3, in the period 2014–2016.

As dependent variables, I use (i) the number of cumulative confirmed COVID-19 cases on 30 November 2020, in each province; (ii) the proportion of the total resident population infected by COVID-19 on 30 November 2020, in each province; and (iii) the difference, absolute and standardized for population size, between the number of deaths from all causes from March 2020 to November 2020, and the number of deaths from all causes in the March-November five-year average (from 2015 to 2019). [Note 4: Data on COVID-19 prevalence and excess mortality rate (on 30 November 2020) are graphically represented in Fig. 2].

Figure 2
figure 2

Source: own elaborations on data from Italian Ministry of Health109, I.Stat36, and Istat108. The map was generated using Microsoft Excel software 2021.

COVID-19 prevalence and related mortality in 107 Italian provinces (on 30 November 2020).

Since the national number of excess deaths from all causes was exceptionally high in the period 1 March 2020–30 November 2020 (91,416), and equal to 11,427 excess deaths per month, many of them may be reasonable attributed to COVID-19108. The detailed definitions and the sources of all the independent and dependent variables used in this paper are reported in Table B1 (Appendix B). A summary of the main descriptive statistics is also provided in Table C1 (Appendix C).

Empirical strategy

The main goal of this paper is to estimate the relationship between long-term exposure to nine air pollutants and COVID-19 transmissibility and mortality across 107 Italian provinces, using different econometric techniques. To measure the spread of COVID-19, I use both the absolute confirmed cases of the disease and its prevalence, expressed as a percentage of the population, as of 30 November 2020. This date was chosen by looking at the peak of the use of daily nasal swabs for testing COVID-19 at the second peak of the epidemic, which can be dated to the end of November 2020. In fact, at that time, more than 200,000 swabs were used daily110, and it is possible to hypothesize that they presented a reliable snapshot of reality. Although the number of daily swabs was even higher during the third wave of the COVID-19 epidemic, I preferred not to use these data. In fact, the third peak of the epidemic occurred around 8 April 2021, when more than 14% of the Italian population had received at least one dose of a COVID-19 vaccine111. Moreover, in a cross-section analysis the differences across units are more important than the number of infections. This choice may mitigate the inevitable bias in detecting infected people, which was also probably raised in early 2021 due to the start of the nationwide COVID-19 vaccination campaign.

Regarding the empirical strategy, I used a negative binomial regression that fits well when the dependent variable is a count variable, such as the COVID-19 cumulative confirmed cases and related deaths. [Note 5: In fact, the standard deviation exceeds the mean both for COVID-19 confirmed cases and related deaths (Table C1, Appendix C). In particular, the coefficient of variation, which is the ratio of the standard deviation to the mean, is 1.43 for COVID-19 cases and 1.54 for excess deaths, by suggesting a certain degree of variability in the dependent variables]. The choice of a negative binomial approach instead of a standard Poisson regression is based on the evaluation of the likelihood-ratio (LR) test on the overdispersion parameter alpha and is consistent with earlier similar studies on the same matter14,18. The negative binomial regression can be considered a generalization of Poisson regression that allows the conditional variance to exceed the conditional mean. To do this, the negative binomial approach considers an extra parameter that corrects the effects of the larger variance on the p-values112. To avoid biased results, I also include the size of the provincial population as an exposure variable. This is a pivotal point, because it allows to standardize the cumulative confirmed cases and excess deaths, that is, convert each observation from a count variable into a rate. As result, I estimate the following basic equation:

$${Covid}_{i}={\beta }_{0}+{\beta }_{1}{D}_{i}+{\beta }_{2}{DE}_{i}+{\beta }_{3}{E}_{i}+{\beta }_{4}{M}_{i}+{\beta }_{5}{Pollutant}_{i}+{\varepsilon }_{i},$$
(1)

where \(i\) identifies each province, \({\beta }_{0}\) is a constant, \({D}_{i}\) is a vector of dummy variables for identifying Italian provinces with international borders, \({DE}_{i}\) is a vector of demographic and economic factors, \({E}_{i}\) is a vector of epidemiological features, \({M}_{i}\) is a vector of meteorological conditions, \({Pollutant}_{i}\) refers to the concentrations or violations of nine selected air pollutants (NO2, O3, PM2.5, PM10, benzene, BaP, As, Cd, and Ni), and \({\varepsilon }_{i}\) is the error term.

As sensitivity checks, I modeled the cases and deaths of COVID-19, using a standard ordinary least squares (OLS) approach and a spatial-autoregressive (SAR) framework. OLS can be considered the most widely used econometric technique for linear statistical models. It takes the same form of Eq. (1), with the only exception of the dependent variables, which are the prevalence and the excess mortality at the provincial level.

However, this procedure is not immune from issues, because from a theoretical point of view it is unlikely that neighboring provinces did not affect each other. In fact, the transmission within neighbor territories may have been affected by the movement of people, which is easier and faster across provinces’ borders. The presence of spatial dependence in the dependent variable may lead to substantial bias in OLS models113, resulting in inconsistent outcomes. Thus, I controlled for possible spatial effects in the dependent variable by following two sequential steps: (i) I investigated the map of COVID-19 prevalence on 30 November 2020 to make sure that an eventual spatial pattern was visible; and (ii) I calculated a common measure of spatial autocorrelation, the global Moran’s I statistic114,115, to verify whether each infection had the same likelihood of occurring at any location. Based on the evaluation of these metrics, I implemented a spatial-autoregressive model (SAR). In particular, the model was estimated with a maximum likelihood (ML) approach instead of the more common generalized spatial two-stage least squares (GS2SLS) approach. This choice is justified by performing Cameron and Trivedi’s116 decomposition of White’s information matrix (IM) test over the hypothesis of normality and heteroscedasticity of the errors, which needs to be met to implement the ML estimator [Ref.117, p. 236].

The equation estimated for the SAR model was eventually obtained by adding a spatially lagged dependent variable to the basic Eq. (1), that accounts for the endogenous interaction effects (2). The spatially lagged dependent variable aimed to verify if and how much a given province was influenced by the COVID-19 prevalence and excess mortality rate of the neighbor provinces. The final equation takes the following form:

$${Covid}_{i}={\beta }_{0}+{\beta }_{1}{D}_{i}+{\beta }_{2}{DE}_{i}+{\beta }_{3}{E}_{i}+{\beta }_{4}{{M}_{i}+\beta }_{5}{Pollutant}_{i}+{\rho w}_{i}{Covid}_{i}+{\varepsilon }_{i},$$
(2)

where \(i\) identifies each province, \({\beta }_{0}\) is a constant, \({D}_{i}\) is a vector of dummy variables for identifying Italian provinces with international borders, \({DE}_{i}\) is a vector of demographic and economic characteristics, \({E}_{i}\) is a vector of epidemiological features, \({M}_{i}\) is a vector of meteorological conditions, \({Pollutant}_{i}\) refers to the average concentrations (or violations) of nine selected air pollutants (NO2, O3, PM2.5, PM10, Benzene, BaP, As, Cd, and Ni), \({\rho }_{i}\) is the spatially lagged dependent variable, \({w}_{i}\) is an inverse-distance weighted matrix with a 50 km cut-off, 75 km cut-off, 100 km cut-off, and no cut-off, and finally \({\varepsilon }_{i}\) is the error term. The matrix was row standardized because: (i) this allows for comparing spatial parameters that come from different models; and (ii) since all the weights summed to 1, the fact that one feature may have two neighbors, and another may have many more does not have a large effect on the results.

Finally, as a further sensitivity check, I used the data on COVID-19 prevalence rates and excess mortality on 28 February 2021, that is approximately one year after the start of the COVID-19 outbreak in Italy. This aimed to test whether the relationship between major air pollutants and COVID-19 spread, and related mortality was maintained over time.

Results and discussion

Negative binomial regressions

In Tables 6 and 7, I present the negative binomial model estimations for the Italian provinces. All models were significant; in fact, the Fisher-Snedecor distribution assumed values far higher than the tabulated critical values at the 1% level of significance. The McFadden’s118 pseudo-R2 is substantially homogenous across specifications and ranges between 0.07 and 0.09 for confirmed cases and from 0.07 and 0.1 for excess deaths. [Note 6: Although these values are low, it should be noted that pseudo-R2 values are usually much lower than those of the classic R-square133. However, OLS and SAR models are used to strengthen the results in “OLS regression models” and “Robustness checks: spatial-autoregressive analysis”, respectively.] Moreover, the likelihood-ratio (LR) chi-square test allows us to strongly reject the null hypothesis that the dispersion parameter alpha is equal to zero. Thus, the negative binomial approach is a better fit for the data than the Poisson regression.

Table 6 Results from negative binomial regressions on COVID-19 cumulative cases registered on 30 November 2020.
Table 7 Results from negative binomial regressions on COVID-19 cumulative excess deaths registered on 30 November 2020.

Regarding control variables, the results showed that a border with Austria, France, and Switzerland, the share of foreigners, population density, and altitude were significantly and positively correlated with cumulative confirmed COVID-19 cases on 30 November 2020 (Table 6). [Note 7: The meaning of the relationship between control variables and COVID-19 cases and deaths will be explained in the next “OLS regression models”]. Conversely, distance from the nearest main airport and average temperature were significantly and negatively associated with total confirmed COVID-19 cases. Regarding air pollutants, NO2, O3(>120), PM2.5, PM10, benzene, and Cd showed a positive and statistically significant relationship with COVID-19 infections. For the remainder, BaP, As, and Ni were not significant at all.

Since coefficients that come from negative binomial models cannot easily be interpreted, I computed the marginal effect for the air pollutants that were statistically significant (Table 8). The most significant coefficients for COVID-19 cases were NO2, O3(>120), PM2.5 PM10, and Cd, which were verified at 1% level of significance, followed by benzene which was verified at 5% level of significance. Regarding primary pollutants, 1 μg/m3 increase in PM2.5, PM10, and NO2 concentrations was associated with average increases of 463.2, 405, and 194.2 COVID-19 infections respectively, while for PAHs and heavy metals, a 0.1 μg/m3 increase in benzene and a 0.1 ng/m3 increase in Cd was associated with average increment of 211.6 and 366.7 COVID-19 infections, respectively. [Note 8: I chose 0.1 units for benzene and BaP because their legal threshold was comparatively much lower than that for common air pollutants]. Thus, among common air pollutants, PM2.5 and PM10 seemed to have the most adverse effects on COVID-19 spread, while Cd was the most dangerous among the remaining pollutants.

Table 8 The average marginal effects got from negative binomial regressions.

With regards to COVID-19 related deaths, male population, urbanization, large firms, LRT disease, and altitude (although barely) were positively and significantly correlated with the excess deaths. Conversely, population density, rainy days, and temperature were negatively and significantly correlated with the excess deaths (Table 7).

The most significant air pollutants were NO2, O3(>120), O3(>180), Bap, and As, which were verified at 1% level of significance, followed by PM2.5 and PM10, which were verified at 5% level of significance. Marginal effects (in Table 8) showed that a 1 μg/m3 increase in PM2.5, PM10, and NO2 concentrations was correlated with an average increase of 29.3, 23.4, and 20.8 COVID-19 related deaths, respectively. For the remaining, a 0.1 ng/m3 in As was associated with an average increment of 37.7 COVID-19 related deaths, while a 0.1 ng/m3 increase in BaP was correlated with an average decrease of 57 COVID-19 related deaths. Thus, As, PM2.5, and PM10 showed the largest positive effect on COVID-19 related deaths.

OLS regression models

To strengthen the results, in Tables 9 and 10, I estimated an OLS regression model for COVID-19 prevalence and excess mortality in the Italian provinces. Since standard errors are usually biased in small samples, I corrected them for heteroscedasticity by applying the HC2 estimator proposed by MacKinnon and White119, which performs well even when sample size is not large [Ref.120, p. 533]. The Fisher–Snedecor distribution was highly significant and verified at a 1% level of significance for all the OLS models; therefore, the choice of the independent variables can be justified. In Tables D1 and D2 (Appendix D), I also report the Cameron and Trivedi’s116 decomposition of IM-test for heteroscedasticity, skewness, and kurtosis. The tests show that the null hypothesis can be safely accepted in all models, i.e., the residuals were homoscedastic and normally distributed. [Note 9: It is necessary to stress that in model 5 (excess mortality), the null hypothesis of residuals normality was rejected (Table D2, Appendix D). However, it does not seem matter of concern because the histogram of the residuals suggests that distribution of residuals was not skewed (Fig. D3, Appendix D)]. Moreover, the R-square ranged from 0.73 to 0.81 for prevalence, and from 0.39 to 0.62 for excess mortality. Thus, the models were a good fit and explained a large and moderate fraction of the variability of COVID-19 prevalence and related mortality, respectively. The variance inflation factors (VIF) were always less than the threshold of 5, suggesting that there were no severe multicollinearity issues121. The only exception was the coefficient of the temperature in model 1, which was carefully excluded by the other models.

Table 9 Results from OLS models on COVID-19 prevalence rate registered on 30 November 2020.
Table 10 Results from OLS models on COVID-19 excess mortality registered on 30 November 2020.

The results are similar to those obtained from the negative binomial regression models. Concerning the control variables, a border with Austria, France, and Switzerland, foreign population, population density, deaths from respiratory disease, and altitude were significantly and positively correlated with COVID-19 prevalence; meanwhile, distance from the nearest airport, and temperature were significantly and negatively associated with infection rates (Table 9). [Note 10: Obesity had an unexpected negative and significant association with COVID-19 prevalence, while smokers were not significant at all]. Notably, the coefficient of border with Switzerland was more significant and larger than those for border with Austria and Slovenia. This may be due to the flow of the 65,000 cross-border workers who reside in Italy and work in Switzerland, and who account for a total of 63.73% of all Italian cross-border commuters [Ref.122, pp. 184–185]. The significance of foreign population could be explained by foreigners’ greater propensity to travel to their native countries, which could have increased the probability of meeting infected people.

The direction of the correlation between population density and COVID-19 cases is consistent with recent literature123,124, suggesting the importance of keeping a safe physical distance from others to limit the spread of the outbreak. The positive significance of altitude, conversely, is in contrast with most of the recent literature97,125,126,127. However, these studies mainly focused on Latin American countries, such as Colombia, Peru, and Brazil, which have cities with altitude differences of up to more than 3000 m. As shown by Table C1 (Appendix C), the difference between the most low-altitude city (Venice) and the most high-altitude city (L’Aquila) is just 1167.3 m, suggesting a lower isolation of the population. Moreover, the size of the regression coefficient of altitude is extremely low. The positive effect of the prevalence of deaths from respiratory diseases in the period 2014–2019 seems to stress the greatest vulnerability of people with comorbidities, who are more likely to get infected87,128.

On the contrary, higher temperatures may have favored a reduction of COVID-19 transmission, and this result appears consistent with several recent studies102,103,129,130,131. The negative relationship between transmission and distance from the nearest airport seems to advocate the beneficial effect of travel restrictions.

Regarding air pollutants, NO2, O3(>120), PM2.5, PM10, PM10 (>50), and benzene were statistically significant at the 1% level, while Cd showed a significance level of 10% (Table 9). Among common air pollutants, 10 μg/m3 increases in the concentrations of NO2, PM2.5, and PM10 were associated respectively with average increments of 0.27% (95% CI 0.08–0.45), 0.44% (95% CI 0.16–0.72), and 0.54% (95% CI 0.32–0.76) of COVID-19 prevalence. Among significant PAHs and heavy metals, a 1-unit µg/m3 increase in benzene and a 1-unit ng/m3 increase in Cd was associated respectively with increments of 0.3% (95% CI 0.08–0.53) and 0.42% (95% CI − 0.05 to 0.89) in nationwide COVID-19 prevalence [Note 11: 95% CI stands per 95% confidence interval]. Thus, PM10 exhibited the largest dangerous effect on COVID-19 spread.

For the model 1–12 (Table 10), the results showed that, among control variables male population, LRT disease, and big firms were significantly and positively correlated with COVID-19 excess mortality. By the contrary, rainy days and temperature were significantly and negatively associated with COVID-19 excess mortality. [Note 12: Obesity and smokers had a negative and significant association with excess mortality rate. The virtuous impact of smoking seems to confirm the existence of a smokers’ paradox in COVID-1996. However, since data on obesity and smokers are available only at regional level, these outcomes should be treated with caution]. The adverse impact of COVID-19 disease on male population is large and consistent with other studies77,78. [Note 13: In fact, a 1-unit % increase in male population was associated with an increase up to 96 excess deaths per 100,000 people]. The positive effect of LRT deaths on COVID-19 excess mortality stresses the importance of comorbidities on COVID-19 patients outcomes85,86,87,88. The positive relationship between big firms and excess mortality due to COVID-19 seems to reinforce the idea that ambient air pollution can increase the severity of the disease. While the virtuous effect of historical rainy days can be explained considering the arguments put forward by Shenoy et al.107, which argued that rainfall may lead to higher social distancing. This could have mitigated the negative impact of the outbreak. The beneficial impact of higher temperatures is consistent with the literature18,103,104,105.

Regarding the air pollutants, NO2, O3(>120), O3(>180), PM10, and PM10 (>50) were statistically significant at the 1% level of significance, while PM2.5 and As were verified at the 5% level of significance (Table 10). In particular, a 10 μg/m3 increase in the concentrations of NO2, PM2.5, and PM10 was associated with an average increment of 40.2 (95% CI 14.8–65.5), 63.7 (95% CI 14–113.4), and 81.6 (95% CI 36–127.1) excess deaths per 100,000 people, respectively. Among the remaining air pollutants, a 1-unit ng/m3 increase in As concentration was correlated with an average increment of 47.1 (95% CI 10.8–83.4) excess deaths per 100,000 people. Notwithstanding the BaP had an unexpected negative impact on COVID-19 excess mortality, it was only verified at 10% level of significance. Thus, the results confirm the adverse impact of outdoor air pollution on COVID-19 spread and mortality.

Robustness checks: Spatial-autoregressive analysis

Tables 11 and 12 I presented the results of the SAR models on COVID-19 prevalence and excess mortality, on 30 November 2020, respectively. The use of the SAR approach is justified by the global Moran’s I, which allowed to reject the null hypothesis that data were randomly distributed both for the dependent and main independent variables. It ranges from − 1 (dispersion) to 1 (clustering). Specifically, the global Moran’s I was always positive and statistically significant at 1% level of confidence. Since the prevalence and excess mortality on November 2020 showed a Moran’s I of 0.341 and 0.362, they both were positively spatially correlated. That means that the high (HH) or low (LL) values of prevalence and excess mortality tended to be clustered spatially (Table E1, Appendix E).

Table 11 Results from SAR models on COVID-19 prevalence rate registered on 30 November 2020.
Table 12 Results from SAR models on COVID-19 excess mortality registered on 30 November 2020.

Moreover, since spatially lagged dependent variable (\(\rho\)) was highly significant in almost all the specifications (Tables 11 and 12), the SAR approach is more appropriate than the classical OLS econometric technique. [Note 14: The use of an ML estimator was also justified by the Cameron and Trivedi’s116 decomposition of IM-test over the OLS models, reported in Tables D1 and D2 (Appendix D). All the tests confirmed the hypothesis that OLS errors were homoscedastic and close to a normal distribution, definitively advocating the ML approach [Ref.117, p. 236].

Specifically, the outcomes showed that the scalar parameter ρ was large, positive, and verified at a 1% level of significance when a weight matrix with no cut-off is used, suggesting that neighboring provinces tended to display similar patterns in terms of the spread of COVID-19 and excess mortality. [Note 15: Since the scalar parameter ρ always ranged from − 1 to 1 (a sufficient condition for row-standardized weights matrix), the covariance matrix is symmetric positive-definite. Thus, the covariance matrix is correct132].

By supposing that other variables remained unchanged, the increase of 1% in the local prevalence resulted in an average increment of 0.87% of COVID-19 prevalence in the adjacent provinces. Similarly, an increase of 1% in the local excess mortality rate resulted in an average increment of 0.92% of excess mortality rate in the neighboring provinces. Notably, when weight matrices with different cut-offs were used (50 km, 75 km, and 100 km), the scalar parameter ρ for excess mortality rate was larger and in some cases more significant than that for prevalence rate.

The high significance of spatially lagged dependent variable may largely be due to the fact that people usually move more easily to neighboring provinces, increasing the likelihood of meeting someone with COVID-19 and spreading the infection.

Moreover, the pseudo R2 ranged from 0.78 to 0.91 for COVID-19 prevalence and from 0.45 to 0.68 for excess mortality rate. Since they were significantly higher than 0.2, the models represent an excellent fit [Ref.133, p. 35].

For the prevalence rate, NO2, PM2.5, PM10, benzene, and Cd remained positive and statistically significant despite the inclusion of the spillover effect. Notably, the coefficients of Cd increased in statistical significance from 10 to 5% level, while O3 switched from 1 to 10% level of significance when a cut-off larger than 50 km was applied (Table 11).

For the excess mortality rate, none of the ambient air pollutants lost its statistical significance. NO2, O3, PM2.5, PM10, and As remained positive and highly significant in most cases. BaP increased its level of significance from 10 to 5% (Table 12).

Tables 13 and 14 reported the direct, indirect, and total effect of each air pollutant on prevalence rate and excess mortality rate. The direct effect was almost always significant, while the indirect and total effect were significant especially when weight matrices with different distance cut-offs (50 km, 75 km, and 100 km) were implemented. In other words, air pollutants concentrations in a given province had a significant and positive spillover indirect effect on COVID-19 spread and related mortality in the nearby provinces. For example, as concerns COVID-19 prevalence, PM10 had a direct effect ranging from 0.035 and 0.048, an indirect effect ranging from 0.023 to 0.026, and a total effect ranging from 0.052 to 0.062 (Table 13). [Note 16: Only statistically significant coefficients are considered]. Thus, a 1 μg/m3 increase in PM10 concentrations caused an increment of COVID-19 prevalence ranging from 0.05 to 0.06%. [Note 17: Similarly, for excess mortality, As had a direct effect ranging from 33.4 and 44.9, an indirect effect ranging from 15.1 to 34, and a total effect ranging from 45.3 to 67.8 (Table 14). Consequently, a 1 ng/m3 increase in As concentrations caused an increment of excess mortality rate ranging from 45.3 and 67.8 deaths per 100,000 inhabitants]. Generally, the direct effects were greater than spillover effects, suggesting that air pollution concentrations in a province had a larger adverse effect on the same province than in the neighboring provinces. [Note 18: The statistical significance of the spillover indirect effect of air pollutants may also indicate a certain degree of industrial clustering]. Moreover, among common air pollutants, PM10 and PM2.5 showed the highest total positive effect both for prevalence and excess mortality rate. While, among PAHs and heavy metals, Cd and As showed the total highest effect for prevalence and excess mortality rate, respectively.

Table 13 Direct, indirect, and total effects of air pollutants after fitting SAR models on COVID-19 prevalence (on 30 November 2020).
Table 14 Direct, indirect, and total effects of air pollutants after fitting SAR models on COVID-19 related mortality (on 30 November 2020).

Finally, as a further sensitivity check, in Tables 15 and 16, I computed the SAR models for the prevalence and excess mortality rate registered approximately 1 year after the start of the outbreak, that is, on 28 February 2021. [Note 19: The formula used for calculating the excess mortality rate on 28 February 2021 was: \(Exces{s}_{mortality}=\mathrm{100,000}\times \left( \frac{{deaths}_{2020-2021}}{{\overline{pop} }_{2020-2021}}- \frac{{\overline{deaths} }_{2015-2019}}{{\overline{pop} }_{2015-2019}}\right)\). Where \(deaths_{2020-2021}\) refers to the cumulative deaths from all causes registered from 1 March 2020 to 28 February 2021, \({\overline{deaths} }_{2015-2019}\) is the five-year average deaths (2015—2019) from all causes (from 1 January to 31 December), \({\overline{pop} }_{2020-2021}\) means the average population in the two-year period 2020–2021, and \({\overline{pop} }_{2015-2019}\) is the average population in the 5-year period 2015–2019].

Table 15 Results from SAR models on COVID-19 prevalence registered on 28 February 2021.
Table 16 Results from SAR models on COVID-19 excess mortality registered on 28 February 2021.

The results confirmed the statistical significance of the spatially lagged dependent variable (ρ), that was almost always large and positive. Moreover, outdoor air pollutants substantially maintained a high statistical significance, although the latter had changed in some cases. Regarding to COVID-19 prevalence, benzene became not significant at all, and BaP increased in statistical significance from 5 to 1% (Table 15). [Note 20: Notably, the coefficient of Ni becomes statistically significant, even if the association with COVID-19 prevalence was negative]. Table 16 showed that just the impact of BaP on excess mortality was not confirmed. In fact, its coefficient, although still negative, became not significant.

Thus, the results are robust to changes in the specifications and show the persistence of the link between environmental pollution and the transmission and mortality of COVID-19, also suggesting the potentially dangerous effect of PAHs and heavy metals, such as benzene, BaP, As, and Cd.

Limitations

This study has three main limitations: (1) first, the sample size is not large, ranging from 60 to 107 observations, that is the Italian provinces; (2) since pollution monitors are sparsely located in some specific provincials’ areas, such as specific traffic and industrial provincial capitals’ areas, the study may suffer from exposure measurement errors, that is the discrepancy between outdoor air pollutants concentration and personal air pollution exposure; (3) notwithstanding the study considers a wide range of potential covariates, it is not possible to grasp and include all the aspects that may affect COVID-19 spread and related mortality.

Conclusions

In this article, I investigated the common sources of outdoor air pollution and the global air quality in the 107 Italian provinces in the period 2014–2019, and the link between long-term exposure to nine air pollutants in the same period and COVID-19 spread and related mortality. The major strengths of this study are the implementation of nine air pollutants, 18 potential covariates, and three different statistical methodologies (NB, OLS, and SAR) to address the robustness of the associations.

The results showed that: (i) common air pollutants (NO2, O3, PM2.5, and PM10) and PAHs (benzene and BaP) exhibited a positive and significant correlation with the presence of large firms, energy and gas consumption, vehicles density, public transport, cattle fodder, and livestock density; (ii) the provinces located in the north of Italy were generally much more polluted than the southern ones; (iii) long-term exposure to NO2, PM2.5, and PM10, benzene, BaP, and Cd was positively correlated with the spread of COVID-19 infections across the Italian provinces; and (iv) long-term exposure to NO2, O3, PM2.5, PM10, and As was positively associated with excess mortality due to COVID-19.

The dangerous effect of the common air pollutants NO2, O3, PM2.5 and PM10 was consistent with recent literature11,13,14,17,19,66,67,72. Moreover, this study found that as well as the common air pollutants, PAHs and heavy metals may also have played a key role in explaining the variability of COVID-19 spread and related mortality. This outcome seems interesting and of relevance, given that these air pollutants have not been considered at all by recent scientific literature. Finally, the results suggest the need for national strategies and economic policies that aim at reducing air pollutant concentrations to improve air quality levels (especially in Northern Italy) and to cope more effectively with similar unexpected pandemics in the future.