Introduction

Long-term exposure to air pollution has been recognized as an important modifiable risk factor for cardiovascular diseases1,2. An increasing number of epidemiological studies support positive associations between long-term air pollution and the occurrence of cardiovascular events, although specific cardiovascular outcomes have been less investigated relative to overall cardiovascular mortality and morbidity. Stroke, which is characterized by high incidence and mortality, is the second leading cause of death worldwide3. Researchers have reported that long-term exposure to air pollution, particularly fine particulate matter with an aerodynamic diameter <2.5 μm (PM2.5), could be associated with an increased risk of hospitalization, incidence, and mortality due to stroke4. Heart failure (HF) and atrial fibrillation (AF) are other two major cardiovascular diseases. They are important risk factors for stroke onset5. Several studies demonstrated the adverse effect of long-term air pollution on the risk of HF6,7,8 and AF9,10, although these two endpoints have been understudied as primary outcomes of interest. Overall, the evidence for the hypothesized association, especially with HF and AF, remains scarce and inconsistent. In addition, with the predominant focus on PM2.5, the potential cardiovascular effects of long-term exposure to other major air pollutants such as nitrogen dioxide (NO2) and ozone (O3) have been under-examined, and the correlations between different pollutants have also been overlooked. To conclude, the potential causal relationships between multiple air pollutants and specific cardiovascular events need to be further elucidated.

Most of the existing studies linking long-term exposure to air pollution to cardiovascular events examined the entire range of exposure. The average pollution levels may differ substantially by region and therefore partially account for the geographical differences in the estimated associations. Average pollution level differences could only explain geographical differences if the association is not linear. There is a dearth of our understanding of the health impacts of air pollution at concentrations below regulatory standards, which has important implications for air pollution regulations in regions such as the United States (US) where populations experience generally low air pollutant exposures. Previous studies found the shape of the exposure–response (E–R) curves for long-term PM2.5 and all-cause and cardiovascular mortality to be curvilinear with no evidence of a threshold11,12. According to several studies of large cohorts in the US9,13 and Europe14,15, the risk of cardiovascular diseases per mass unit could persist and even become stronger at lower exposure levels below the annual limit values recommended by the US Environmental Protection Agency (EPA; 9 μg/m3 for PM2.5 and 53 ppb [99.6 μg/m3] for NO2) and European Union (10 μg/m3 for PM2.5 and 20 μg/m3 for NO2). Further research specifically at lower concentrations can help quantify the disease burden attributable to low-level air pollution and elucidate the true “safe” level. This could further inform recommendations for even stricter air quality guidelines, as suggested by the World Health Organization (5 μg/m3 for PM2.5 and 10 μg/m3 for NO2).

Furthermore, all observational studies are subject to the possibility of confounding by omitted variables, and causal modeling methods that can capture some omitted confounders are therefore valuable. Propensity scores are widely adopted in air pollution research by balancing measured covariates across different levels of continuous exposure. However, this method is weakened by its stringent requirement for precisely specified regression of exposure on measured covariates and its inability to control for unmeasured covariates. Negative controls have been suggested as a useful tool to enhance causal inference independently of covariate distributions and to tackle unmeasured confounding bias16. The negative exposure control is a variable known not to be causally related to the outcome of interest, while the negative outcome control is a variable known not to be caused by the exposure of interest. Both of them may share a common confounding mechanism with the exposure and outcome17. Therefore, they can serve as instruments for reducing bias by unmeasured confounders. In prior air pollution and health studies, researchers have used future air pollution as a negative control exposure18,19,20,21, or a negative outcome due to causes other than primary exposure as a negative control outcome22,23. More recently, double negative control adjustment has been employed to strengthen causal inference in studies examining short- and long-term effects of air pollution24,25,26.

To address the research gaps, the present study used a double negative control approach to analyze the relationships between long-term exposure to PM2.5, NO2, and warm-season O3 at low concentrations with risk of hospitalizations for three major cardiovascular diseases (stroke, HF, and AF) in the Medicare population aged ≥65 years across the contiguous US from 2000 to 2016. We focused on the areas where populations were consistently exposed to low pollutant concentration levels (PM2.5 < 9 μg/m³, NO2 < 75.2 [40 ppb], warm-season O3 < 88.2 μg/m3 [45 ppb]). Furthermore, we conducted stratified analyses to investigate potential susceptible demographic subpopulations.

Results

Table 1 shows the summary statistics of ZIP code-level air pollution and covariates in the low-pollution areas from 2000 through 2016. In low PM2.5 areas, the annual average concentrations of PM2.5, NO2, and warm-season O3 were 5.2 ± 1.6 μg/m3, 23.2 ± 14.7 μg/m3, and 88.8 ± 14.5 μg/m3, respectively. In the areas with either NO2 or O3 deemed low in our analyses, the mean annual PM2.5 concentration was higher and closer to the typical range. The Pearson correlation coefficients (r) for three air pollutants are presented in Supplementary Table 1. We observed a moderate-to-low positive correlation between annual PM2.5 and NO2 in low NO2 areas (r = 0.38) and in low PM2.5 areas (r = 0.13). In contrast, there was a strong correlation between annual PM2.5 and NO2 in areas with low warm-season O3 (r = 0.66). Warm-season O3 exhibited a moderate-to-low correlation with both annual PM2.5 and NO2 in areas with low levels of PM2.5 and NO2, while in areas with lower warm-season O3, the correlations were negligible.

Table 1 Summary of ZIP code-level air pollution, meteorological covariates, and socioeconomic status (SES) covariates in the low pollution areas from 2000 through 2016

Supplementary Table 2 presents the total number of hospitalizations and the annual rate for stroke, HF, and AF in the low pollution areas during the study period. The annual hospitalization rate for stroke, HF, and AF among the Medicare participants were 0.87%, 0.84%, and 0.41%, respectively, in low PM2.5 areas where low NO2 and warm-season O3 exposures concurrently occurred. The corresponding hospitalization rates were similar in low O3 areas. However, the hospitalization rates were higher in low NO2 areas where people experienced more normal PM2.5 exposures. Nevertheless, the pattern of the hospitalization rates for each cardiovascular outcome within demographic groups was generally similar across all the defined low-pollution areas. Overall, we observed higher annual hospitalization rates for stroke and HF among those aged 85 years and older and eligible for Medicaid. However, there were some inconsistencies in the pattern by sex and race across specific outcomes. While the annual hospitalization rate for stroke and HF was higher in males and black individuals, this was not seen for AF.

We compared the estimated associations of long-term exposures to PM2.5, NO2, and warm-season O3 at low concentrations with the rates of hospitalizations for stroke, HF, and AF as determined from three-pollutant double negative control models and GLM (Fig. 1). The results from single-pollutant models are illustrated in Supplementary Fig. 1. Overall, the adjustments for co-pollutants resulted in stronger estimates for PM2.5, while those for NO2 and warm-season O3 remained similar. When examining the associations between PM2.5 and all three outcomes, we found that the GLM yielded estimates that were modestly comparable but lower than those derived from the double negative control models. While both modeling approaches produced relatively similar estimates for the associations of NO2 and warm-season O3 with AF, there were slight differences in the estimates for stroke and HF. All the numeric results can be found in Supplementary Table 3.

Fig. 1: Percent change in hospitalization rate for cardiovascular diseases associated with 1-μg/m3 increase in long-term exposure to air pollution at low concentrations using double negative control models and generalized linear models adjusted for co-pollutants.
figure 1

Error bars indicate the 95% confidence intervals. Source data are provided as a Source Data file.

In this study, we focused on the results adjusted for co-pollutants using double negative control adjustment. For long-term PM2.5 exposure below 9 μg/m3, each 1-μg/m3 increase in PM2.5 was associated with the percent increases of 1.82% (95% confidence interval [CI]: 1.44%, 2.19%), 2.83% (95% CI: 2.36%, 3.30%), and 0.13% (95% CI: −0.39%, 0.65%) in the hospitalization rates for stroke, HF, and AF, respectively. For each 1-µg/m3 increase in annual NO2 below 75.2 µg/m3, the percent increases in the hospitalization rates for stroke, HF, and AF were 0.01% (95% CI: −0.002%, 0.03%), 0.18% (95% CI: 0.16%, 0.19%), and 0.09% (95% CI: 0.07%, 0.10%), respectively. For long-term exposure to warm-season O3 below 88.2 μg/m3, we found adverse associations with the three outcomes with percent increases in the hospitalization rates of 0.32% (95% CI: 0.27%, 0.38%), 0.05% (95% CI: −0.01%, 0.12%), and 0.12% (95% CI: 0.04%, 0.20%) per 1-µg/m3 increase in warm-season O3. The estimates remained very similar after excluding additional confounding adjustments from the prediction model for the negative outcome control (Supplementary Table 4).

We conducted stratified analyses by individual demographic characteristics to identify the vulnerable subgroups. The results of the stratified analyses for stroke, HF, and AF from three-pollutant models are shown in Figs. 24, respectively. The observed positive associations in the overall analyses generally persisted in demographic subgroups. Similar patterns of the potential effect modification by demographics were found in both single- and three-pollutant models, despite some differences in the magnitude and statistical significance of the subgroup-specific effect estimates (Supplementary Figs. 24 and Tables 57).

Fig. 2: Percent change in hospitalization rate for stroke associated with 1-μg/m3 increase in long-term exposure to air pollution at low concentrations in stratified analyses in three-pollutant double negative control models.
figure 2

Statistical significance is calculated via a two-tailed t-test (*P < 0.05, **P < 0.01, ***P < 0.001). Error bars indicate the 95% confidence intervals. Source data are provided as a Source Data file.

Fig. 3: Percent change in hospitalization rate for heart failure associated with 1-μg/m3 increase in long-term exposure to air pollution at low concentrations in stratified analyses in three-pollutant models using double negative control adjustment.
figure 3

Statistical significance is calculated via a two-tailed t-test (*P < 0.05, **P < 0.01, ***P < 0.001). Error bars indicate the 95% confidence intervals. Source data are provided as a Source Data file.

Fig. 4: Percent change in hospitalization rate for atrial fibrillation and flutter associated with 1-μg/m3 increase in long-term exposure to air pollution at low concentrations in stratified analyses in three-pollutant models using double negative control adjustment.
figure 4

Statistical significance is calculated via a two-tailed t-test (*P < 0.05, **P < 0.01, ***P < 0.001). Error bars indicate the 95% confidence intervals. Source data are provided as a Source Data file.

We found a significantly larger effect of long-term PM2.5 below 9 μg/m3 on all three outcomes for black people compared to white people. In the association of PM2.5 with stroke and AF, we identified Medicaid eligibility as a significant modifier, with a higher risk seen in individuals who were eligible for Medicaid than those who were not. In addition, age modified the PM2.5 association for HF with a stronger effect in the younger group (aged 65–74 years), but this modification pattern was not observed for stroke or AF. In contrast, we found no evidence of any effect modification by sex on the association of all outcomes in relation to PM2.5.

For long-term exposure to NO2 below 75.2 µg/m3, individuals aged over 84 years and those who were not Medicaid-eligible were at greater risk of stroke. We observed similar effect modification patterns by age and Medicaid eligibility in the associations of HF and AF with NO2. Regarding the modification by sex, males were at greater NO2-associated risk of HF compared to females. At the same time, white people exhibited a significantly higher NO2-associated risk of HF and AF compared to black people.

In terms of long-term exposure to warm-season O3 below 88.2 μg/m3, individuals aged 65–74 years had greater risks of all three outcomes compared to older age groups. Black individuals were found to be more susceptible to stroke and HF, while females were more susceptible to HF and AF. Additionally, individuals eligible for Medicaid were at greater risk of HF compared to those who were not.

The E–R curves for the main associations from three-pollutant GLM models with natural spline function are provided in Supplementary Fig. 5. For the relationship between PM2.5 with stroke and HF, a positive association for stroke and HF was apparent down to the lowest concentrations. The effect size for AF was more complex, with a positive association beyond levels of 5 μg/m3, but negative at lower concentrations. The E–R curves for NO2 displayed an almost positively linear shape for HF with a steeper slope below 20 μg/m3, however, appeared negative before linearly increasing at around 25 μg/m3 for stroke and AF. For warm-season O3, the E–R curve showed a linear positive association with stroke, while the curves for HF and AF depicted a non-linear relationship, with effects increasing when approaching the highest concentrations.

Discussion

Among US Medicare participants, we found that long-term exposure to low-level PM2.5 (<9 μg/m3), NO2 (<75.2 µg/m3), and warm-season O3 (<88.2 μg/m3) could be positively associated with increased rate of hospitalizations for stroke in three-pollutant models that accounted for correlations between co-existing air pollutants and controlled for unmeasured confounders using negative controls. PM2.5 and NO2 were most strongly associated with HF, whereas the strongest effect of warm-season O3 was seen on stroke. Black people and Medicaid-eligible people appeared to be more vulnerable to the cardiovascular risk attributable to PM2.5 and warm-season O3. The youngest-old and females were also found to be more vulnerable to the warm-season O3-related risk. However, the NO2-related risk showed contradictory effect modification patterns.

We designed a pair of negative control exposure and outcome variables in an attempt to capture uncontrolled confounding. The selection of negative exposure control (or negative outcome control) relies on the absence of a causal relationship with the true outcome (or exposure) due to chronological order. However, correlations between them are likely present due to their relations with unmeasured confounders. If the assumption of linearity between unmeasured covariates with exposure and negative exposure control holds, regressing the negative outcome control on the exposure and negative exposure control is expected to reveal confounded associations. Therefore, adjusting for the expected counts of hospitalization in the preceding year aims to capture confounding bias and strengthen the causal interpretation of our observed associations. Given that concurrent-year air pollution exposure and subsequent-year exposure can be highly correlated, an alternative assumption of the two variables sharing the same magnitude of a correlation with the omitted confounders is rendered more reasonable. The GLM method yielded comparable results with the double negative control approach, exhibiting only slight differences in the effect size estimates. Such discrepancies may be attributable to unadjusted confounding bias. The consistent findings derived from these two statistical methods demonstrate the robustness of our results to different model adjustments, suggesting that any omitted confounding bias is small, and negative in the case of PM2.5 but positive for NO2. A previous study reported that greater control for SES resulted in increased effect sizes for PM2.527.

Our study has a special emphasis on long-term exposure to low-level air pollution below the annual US EPA limits. While a growing number of prior studies have revealed increased health risks at lower levels of air pollution exposure under regulatory standards, most have focused on all-cause and cardiovascular mortality11,12,28,29. However, the available evidence concerning cardiovascular disease risk at these lower pollution levels remains limited. In consistence with our findings, several previous studies of the Medicare population have found a steeper E–R curve for a range of cardiovascular outcomes when restricted to lower exposures9,13,26,30. Specifically, we found no threshold below which cardiovascular effects were absent in our E–R curves for PM2.5 in relation to stroke and HF. In addition, the positive curves for NO2 and HF displayed a steeper increase in risk below 20 μg/m3. Our E–R curves for warm-season O3 also showed an increasing tendency at higher concentrations, which was the most pronounced for stroke second to AF. These findings indicate that substantial health benefits can likely be obtained by lowering ambient air pollution levels even at low concentrations. Similarly, in a large population-based Canadian cohort, Bai et al.7 found the concentration-response curves for congestive HF with long-term exposure to PM2.5 and NO2 to be supralinear with no discernable threshold values. They also observed a sublinear relationship for O3 with an indicative threshold. A meta-analysis of 102 coefficients from 53 cohort studies reporting associations with all-cause or cause-specific mortality found a steeper E–R curve at lower PM2.5 concentrations for cardiovascular mortality, which also supports these findings12. Overall, our findings indicate the need to reassess the current air quality guidelines and tighten pollution control policies and measures.

This study also supplemented the limited epidemiologic evidence regarding the long-term effects of multiple air pollutants on cause-specific cardiovascular morbidity. Our findings of adverse associations are in accordance with some of the existing literature, despite some slight difference in the statistical significance. Prior studies of the Medicare population using diverse methodologies and different ranges of exposure have reported significant positive associations of our studied outcomes with annual PM2.5, NO2, and warm-season O39,13,31. A review and meta-analysis identified five studies of long-term exposure to PM2.5 and stroke incidence from North America and Europe and found a 6.4% (95% CI: 2.1%, 10.9%) increase in the hazard for each 5-μg/m3 increase in PM2.532. A more recent review article reported that each 10-μg/m3 increase in long-term PM2.5 exposure could be associated with an increased risk of 13% (95% CI: 11%, 15%) for incident stroke, synthesizing the results of fourteen studies across the globe33. Additionally, other studies conducted in Canada7,34, the UK6,8, and Sweden35 indicated an increased risk of HF associated with PM2.5 at relatively low exposures. According to state-of-the-art evidence, the odds ratios of HF associated with each 10-μg/m3 increase in long-term PM2.5 and NO2 exposure were estimated to be 1.019 (95% CI: 1.008–1.030) and 1.012 (95% CI: 1.007–1.017), respectively36. Yue et al.10 conducted a systematic review and meta-analysis to quantify the association between air pollutants and AF based on eighteen studies. They indicated that exposure to all air pollutants including PM2.5, NO2, and O3 had a deleterious impact on AF onset in the general population. By contrast, several other studies reported null relationships between air pollution and the risk of these outcomes37,38,39,40.

It is worth noting that direct comparisons across these studies might be challenging because of potentially heterogeneous air pollution ranges and diverse demographic characteristics of study populations. In addition, in the presence of correlations between air pollutants, considering the influence of co-pollutants in the air pollution mixture is crucial for the validity of the estimated association. Recent studies have increasingly emphasized the importance of utilizing multi-pollutant models to better disentangle the individual effect of a certain pollutant6,13,31,35,37, which is the most widely used way to adjust for confounding bias by co-pollutants41. In contrast, results obtained from single-pollutant models in other studies are more likely confounded by the impact of other pollutants that share similar sources with the pollutant under investigation7,34,39,40. Therefore, our use of multi-pollutant models likely yields more accurate estimates in describing the individual cardiovascular effect of each air pollutant.

Multiple pathophysiological mechanisms have been proposed to explain the detrimental cardiovascular effects of air pollution. It is widely accepted that air pollution can trigger systemic inflammation, oxidative stress reactions, and dysfunction of the autonomic nervous system1. The autonomic imbalance can further result in increases in cardiac frequency and arterial pressure, and a reduction in heart rate variability42. Numerous experimental studies have demonstrated that these responses may further instigate endothelial dysfunction, atherosclerosis, and vascular dysfunction42,43. Another plausible mechanism underlying the onset of cardiovascular diseases is that inhaled irritants can traverse the pulmonary epithelium and directly enter the blood circulation and cardiac organs, which may alter blood coagulability and contribute to thrombus formation44. A higher PM2.5- and NO2- associated risk appeared to be seen for HF hospitalization possibly because it was the common consequence of most cardiovascular diseases, especially for elderly people.

Environmental justice is an increasing concern and we found evidence that independent of differences in exposure, some disadvantaged groups had worse responses to any given level of air pollution. Specifically, we identified Medicaid eligibility as a positive modifier of the association of low-level PM2.5 and warm-season O3 with at least one studied cardiovascular outcome. This suggests a greater vulnerability for lower-SES individuals even when residing in low-pollution regions, as Medicaid coverage is provided for low-income elderly beneficiaries to expand their healthcare access45. Low SES has been determined as a significant risk factor for cardiovascular diseases because socio-economically disadvantaged individuals tend to have poorer health, higher psychosocial stress, and a propensity for unhealthy behaviors and lifestyles46. In addition to Medicaid eligibility, we found that the effect sizes for effects of PM2.5 and warm-season O3 on all outcomes were more pronounced for Black individuals compared to white individuals. The tendency of a higher susceptibility among Blacks is consistent with much of the existing evidence13,47. Black populations have been disproportionately affected by the detrimental health impacts of historic discrimination and ongoing racial segregation, and this study demonstrates additional susceptibility to air pollution. Additionally, while we observed increased susceptibility to warm-season O3 in individuals aged 65–74 years, the specific underlying reasons for this pattern remain unclear. It is likely that a lower baseline risk in this age group may influence these findings. We also found greater susceptibility among females compared to males, which is similar to the sex difference reported in some previous studies and could be explained by physiological differences48,49. This pattern and the specific reasons are worth attention in future research.

In terms of the adverse effects of NO2, our results indicated that people aged ≥85 years, males, white people, and those who were not Medicaid-eligible may be more vulnerable to at least one cardiovascular disease we studied. First, an increased risk in the oldest group is understandable, given that advanced age significantly drives the deterioration of cardiovascular functionality in older people50. Relative to age differences, sex as a potential modifier of cardiovascular risk in relation to air pollution as well as the relevant biological mechanisms has been more underappreciated. While some researchers found a more prominent NO2-attributed cardiovascular risk among males51,52, which is comparable to our finding for HF, there is no consensus on this question53,54. Our findings of a higher susceptibility among the very elderly and males are not conclusive, but we think that paying more attention to these questions can be meaningful to improve the distribution of preventive medical care in the future. Interestingly, when we looked at the modification by race and Medicaid eligibility, the greater susceptibility for NO2 seen in white individuals and non-Medicaid eligible individuals contrasts with our findings for PM2.5 and warm-season O3. Such inconsistent results in the modifying roles of demographics and SES exist in the literature examining the association between air pollution and cardiovascular health, which may have to do with specific air pollutants and outcomes9,55,56. For example, a study utilizing US nationwide survey data found an adverse association between PM2.5 and hypertension among non-Hispanic white adults but a nearly null association among non-Hispanic black and Hispanic adults, although the latter two groups are generally thought to experience higher exposure and are more vulnerable56. Moreover, there is also a controversy over the presence and direction of modification by community-level SES in the current empirical-based literature, which could be explained by discrepancies in underlying vulnerable factors in diverse neighborhood samples55,57,58,59. In fact, the specious modification patterns we found for NO2 are unlikely but still possible. As a pollutant predominantly coming from urban origins and often transported on a local scale, NO2 can vary by urbanicity level60. It is reasonable to assume that NO2 might be more of a proxy for commercial activities, since its emissions from other major sources (e.g., diesel traffic, fuel combustion, power plants) have been reduced in recent years61,62. Therefore, the observed higher vulnerability in white and non-Medicaid eligible individuals might be partially accounted for by their higher access to urbanization or commercial activities. In addition, we should also note that our estimate is a measure relative to the baseline risk and does not necessarily represent the magnitude of its absolute attributable risk. For example, the lower baseline risk of hospitalization rates in non-Medicaid eligible beneficiaries might have exaggerated the magnitude of relative risk, although the difference in rate is unlikely to be the major explanation.

Our study has multiple strengths. Foremost is the use of a double negative control approach. This methodology provides an alternative tool to instrumental variables to control for omitted confounding and thus enhance the credibility of the estimated associations. We also thoroughly considered a variety of cardiovascular risk factors to reinforce the confounding adjustment. Another notable strength is that we leveraged the data from the Medicare population. The data that we used was from a very large nationwide cohort, which ensured sufficient statistical power and increased the generalizability of our results to the population that suffers over three quarters of the deaths in the US. Furthermore, the exposure data were derived from high-quality models with a fine resolution and satisfactory predictive accuracy, further assuring the reliability of our analyses. Moreover, compared to restricting the analyses to low exposures in ZIP code-year combinations in prior Medicare studies13,63, the selection criteria applied in this study are somewhat more rigorous by imposing low-exposure constraints over the 17-year study duration. Hence, the possibility of mistakenly including the individuals impacted by past higher exposures was reduced, despite that the exposure history due to migration and travel patterns was not fully accounted for. Last, we attempted to address the correlations among air pollutants and more accurately estimate the independent effect of each exposure by constructing both single- and three-pollutant models.

Some limitations of this study should also be cautioned. First, we may not generalize the conclusions to younger populations or highly polluted regions. Second, there could be residual or unmeasured confounding by omitted cardiovascular risk factors such as diet when the assumption of the same magnitude of linear correlations of them with true exposure and subsequent-year exposure is violated. However, we considered a series of major confounders, ranging from possible meteorological conditions, and area-level health behavioral factors, to socioeconomic measures, which should have captured most of the confounding bias. It is noteworthy that we controlled for co-exposures to other air pollutants using the three-pollutant models as well. Admittedly, the moderate correlation between annual PM2.5 and NO2 concentrations may indicate potential collinearity and the risk of over-controlling issues. Third, the ZIP code-level air pollution data derived from exposure models may not fully represent true personal exposures. Specifically, our exposure metrics did not account for the exposures occurring distant from the participants’ residences. However, the National Human Activity Pattern Survey reported that US adults spent 69% of their time at home and 8% of the time immediately outside their home64. Older people may spend even more time at home, implying that the exposure misclassification would be relatively minor. Another concern is that the variations in personal exposures caused by different indoor activity patterns and building features might not be captured by the neighborhood metrics. Nevertheless, the resulting error is likely a Berksonian exposure error and may cause little bias65. In addition, ambient concentration serves as an instrumental variable for personal exposure and thus personal behavioral factors which were not available would not confound the association between ambient exposure and outcomes66. That is neighborhood level pollution can be correlated with neighborhood level covariates, but if e.g. neighborhoods with a high intake of saturated fats had higher exposure to a pollutant, a vegetarian living in the neighborhood would get the same ambient exposure, despite not eating any saturated fats. Therefore, the confounding is with neighborhood characteristics, not individual ones. Some residual prediction errors of exposure models may be present, but they should be minimal because we studied low air pollutant concentrations. Last, we accessed hospital discharge diagnoses from the administrative Medicare database as the morbidity measure, which may not capture some cases with milder symptoms. This outcome classification might be differential because it can be related to SES factors such as healthcare accessibility.

In conclusion, using a double negative control approach, we found positive associations of long-term exposure to PM2.5, NO2, warm-season O3 at low concentrations with the hospitalization rate of stroke, HF, and AF in US Medicare older adults. Our findings suggest that the current National Ambient Air Quality Standards (NAAQS) for annual PM2.5 and NO2 may not be adequate to minimize the cardiovascular disease burden. Future guidelines for warm-season O3 could be warranted.

Methods

Study population and outcome assessment

We used data from a national cohort of fee-for-service (FFS) Medicare beneficiaries aged 65 years and older across the contiguous US from January 1st, 2000 to December 31st, 2016. The beneficiaries were followed up from January 1st of the year after their Medicare enrollment until the development of the outcome of interest, death, censoring, or the end of the follow-up time. In this study, we restricted the analyses to the individuals who were consistently exposed to low-level annual air pollution for the entire period (2000–2016) with certain thresholds (PM2.5 < 9 μg/m3, NO2 < 75.2 µg/m3 [40 ppb], warm-season O3 < 88.2 μg/m3 [45 ppb]). Therefore, separate datasets were created for each pollutant according to its specified threshold. We further restricted the datasets to ZIP code areas with more than 100 beneficiaries.

Beneficiary records were provided by the Medicare denominator file from the Centers for Medicare and Medicaid Services, which contained information on age, self-reported sex, self-reported race, Medicaid eligibility, date of death, and residential ZIP code for each beneficiary. Information on age, Medicaid eligibility, and residential ZIP code are updated each year. We obtained the hospital discharge claims of Medicare enrollees from the Medicare Provider Analysis and Review (MEDPAR) file. The International Classification of Diseases (ICD) codes were used to identify the primary discharge diagnosis for each of our three cardiovascular outcomes of interest: stroke (ICD-9 codes: 430–438, ICD-10 codes: I60–I69), heart failure (ICD-9 code: 428, ICD-10 code: I50; hereafter referred to as HF), and atrial fibrillation and flutter (ICD-9 code: 427.3, ICD-10 code: I48; hereafter referred to as AF). For each cardiovascular outcome, we computed the ZIP code-level annual counts based on the beneficiaries’ residential addresses. All hospitalizations from each beneficiary occurring after enrollment during the multi-year follow-up period were counted as cases.

This study was approved by the institutional review board at Harvard T. H. Chan School of Public Health. Our study was exempt from consent requirements as it is considered non-human subject research.

Exposure assessment

We obtained the daily concentrations of ambient PM2.5, NO2, and O3 at 1 km × 1 km spatial resolution across the contiguous US from three ensemble prediction models that combined multiple machine learning algorithms67,68,69. The exposure models incorporated meteorological variables, chemical transport model simulations, land-use features, and satellite remote sensing data. They were well validated using 10-fold cross-validation. We aggregated the daily predictions of PM2.5 and NO2 to annual averages. For long-term O3, we calculated its warm-season levels based on the daily predictions from April 1st through September 30th, since the health impacts of O3 are suggested to be more observable during warm seasons compared to throughout the year13,31,54. We then computed the ZIP code-level exposures by averaging the 1 km × 1 km grid cell predictions whose centroids were within the boundary of ZIP code polygons or assigning the nearest grid cell predictions for the ZIP codes that do not have polygon representations. Annual average exposures were then linked to Medicare beneficiaries based on their residential ZIP codes for each calendar year over the study period.

For each exposure, we limited our dataset to the ZIP code areas where the populations were always exposed to low-concentration air pollution below thresholds we set over the study period of 2000–2016. The threshold was determined for each pollutant individually to more directly assess its specific health effects and allow for greater statistical power given the different distributions of exposures for the different pollutants. We chose 9 μg/m3 as the threshold for annual average PM2.5 concentration, which is the latest limit set by the US EPA on February 7, 2024 to substitute the previous NAAQS of 12 μg/m3. For NO2, we chose an annual limit of 75.2 µg/m3 [40 ppb] for our analysis, well below the NAAQS of 99.6 µg/m3 [53 ppb], as the annual NO2 concentrations in the US rarely exceeded this standard. Although there is no formal annual regulatory standard for long-term O3, we selected 88.2 μg/m3 [45 ppb] as the threshold value to define low-level warm-season O3, which has been chosen as a plausible pollution target in previous studies to evaluate its effectiveness in reducing health risk70,71. We did not examine lower thresholds due to potentially insufficient statistical power from fewer observations.

Covariates

We considered a variety of SES covariates at the ZIP code tabulation-area (ZCTA) level as important predictors for cardiovascular disease72, including percent of the population self-reporting as Black, percent of the population self-reporting as Hispanic, percent of the population ≥65 years of age living in poverty, population density, percent of the population ≥65 years of age who had not graduated from high school, median home value, median household income, and percent of owner-occupied housing unit. These data were obtained from the US Census Bureau 2000 and 2010 Census Summary File 3 and the American Community Survey from 2011 through 2016. To account for long-term smoking behaviors, we included lung cancer hospitalization rates as a surrogate measure for each ZIP code from the MEDPAR file. We also accessed county-level data on the yearly percentage of residents who ever smoked and mean body mass index (BMI) from the Centers for Disease Control and Prevention (CDC) Behavioral Risk Factor Surveillance System (BRFSS)73. These county-level lifestyle data were assigned to ZIP codes. Additionally, from the Dartmouth Atlas of Health Care74, we obtained several access-to-care covariates in each hospital service area, and further assigned them to ZIP codes: proportion of Medicare beneficiaries with at least 1 hemoglobinA1c test per year, proportion of diabetic beneficiaries who had a lipid panel test in a year, proportion of beneficiaries who had an eye examination in a year, proportion of beneficiaries with at least 1 ambulatory doctor visits in a year, and proportion of female beneficiaries who had a mammogram during a 2-year period. We also calculated the distance from the centroid of each ZIP code to the nearest hospital, a proxy for healthcare accessibility, using data on hospital locations derived from an ESRI dataset75. Given that seasonal meteorological conditions have been known to impact cardiovascular health76,77, we assessed the average temperature and relative humidity (RH) during the summer (June-August) and the winter (December-February) for each ZIP code and each year based on the 4 km Gridded Surface Meteorological (gridMET) dataset78,79.

Missing values for all area-level risk factors were filled in using linear interpolation and extrapolation. Any other missingness accounting for <1% of the observations was assumed to be random and was excluded from our analyses.

Statistical analysis

In this study, we analyzed the association between long-term exposure to low-level air pollution and hospitalization rate of major cardiovascular diseases among the US Medicare population. As aforementioned, the analysis was restricted to the low pollution ZIP code areas with more than 100 Medicare beneficiaries. We used a double negative control strategy16, which has been recommended to address unmeasured confounding in observational settings, to enhance the causal evidence of a potential relationship79. The detailed descriptions of this double negative control approach can be found elsewhere26. A summary of the principles is given below. First, we consider a quasi-Poisson regression model to obtain the unbiased association between the exposure (A) and the outcome (Y), adjusting for unmeasured confounders (U):

$${{\mathrm{ln}}}[E(Y)]={\beta }_{Y0}+{\beta }_{{YA}}A+{\beta }_{{YU}}U$$
(1)

The negative exposure control (Z) and negative outcome control (W) are designed to capture confounding bias introduced by U. In this study, we chose the exposure to air pollution in the year after cause-specific hospitalizations as Z. It cannot lead to the hospitalization outcome in the concurrent year, however, it could be influenced by unmeasured or measured confounders that are correlated with air pollution level in the year of the hospitalization outcome. Similarly, we defined the count of cause-specific hospitalizations in the year before exposure as W, as it is by no means affected by the exposure in the concurrent year but may be correlated to omitted confounders. Given the hypothesized correlations of U with A and Z, and non-causality between A and W, the formulas (2) and (3) can be derived:

$$E\left(U\right)={\beta }_{U0}+{\beta }_{{UA}}A+{\beta }_{{Uz}}Z$$
(2)
$${{{\mathrm{ln}}}}[E(W)]={\beta }_{W0}+{\beta }_{{WU}}U$$
(3)

If we substitute U with its expected value regressed on A and Z from the formula (2), the formula (1) can be interpreted into:

$${{{\mathrm{ln}}}}[E(Y)]=({\beta }_{Y0}+{\beta }_{{YU}}{\beta }_{U0})+({\beta }_{{YA}}+{\beta }_{{YU}}{\beta }_{{UA}})\,A+{\beta }_{{YU}}{\beta }_{{Uz}}Z$$
(4)

where \({\beta }_{{YU}}{\beta }_{{UA}}\) is exactly equal to the bias due to unmeasured confounders. Thus, if the equation \({\beta }_{{Uz}}\) = \({\beta }_{{UA}}\) holds, the subtraction between the coefficient of A and the coefficient of Z will yield a causal effect of A on Y.

If we substitute U with its expected value again in the formula (3), \(W\) as a surrogate for U can be predicted by A and Z based on:

$${{{\mathrm{ln}}}}[E(W)]=({{\beta }_{W0}+\beta }_{{WU}}{\beta }_{U0})+{\beta }_{{WU}}{\beta }_{{UA}}A+{\beta }_{{WU}}{\beta }_{{Uz}}Z$$
(5)

Alternatively, assuming the linear correlations of U with A and Z, which renders the formulas (2) and (5) valid, we can mitigate the confounding effect of U by including the predicted W in the outcome regression model.

To summarize, unmeasured confounding bias can be captured if either of the following two assumptions holds true:

  1. (1)

    We assume that U is linearly correlated with both A and Z. Although W is unlikely to link to exposure variables by its definition, a correlation of W with A and Z can be introduced due to a connection with U. Therefore, the precited value of W by regressing it on A and Z represents the part of U that is related to A and Z. As a surrogate for U, adjusting for predicted W is equivalent to removing omitted confounding bias.

  2. (2)

    Alternatively, we assume that U has the same magnitude of correlations with A and Z (\({\beta }_{{Uz}}\) = \({\beta }_{{UA}}\)). According to the formula (4), if \({\beta }_{{Uz}}\) = \({\beta }_{{UA}}\), the causal effect of A on Y can be derived by subtracting the coefficient for Z from the coefficient for A. If either assumption holds, omitted confounder U is controlled for.

Conversely, violations may occur if neither of these two assumptions is satisfied.

In both the model used to predict the negative outcome control and the outcome regression model, we adjusted for a variety of area-level risk factors for cardiovascular diseases selected prior, including SES, behavioral, and meteorological covariates which are described in the covariates section, to relax our assumptions and to mitigate potential uneliminated confounding bias as comprehensively as possible. Confounding bias by other unmeasured area-level and individual-level factors was assumed to be addressed given the afore-described assumptions. We also included the admission year as a categorical indicator in the models to control for the time trends of omitted confounders that might drive an association. We analyzed the effect of each air pollutant separately using both a single-pollutant model and a three-pollutant model. A directed acyclic graph for the double negative control approach considering measured confounders altogether is shown in Supplementary Fig. 6. The performance of the outcome regression models is evaluated as satisfactory through computing the Quasi-Akaike information criteria and pseudo-R2 values (Supplementary Table 8).

As a secondary analysis, we repeated the main analyses using generalized linear models (GLM) without the negative controls. To examine the shape of E–R curves, we further applied natural spline functions with three degrees of freedom to the GLM adjusted for co-pollutants.

We examined the potential effect measure modification by individual demographic characteristics, namely, age (65–74 years, 75–84 years, 85+ years), sex (male or female), race (White or Black), and Medicaid eligibility (yes or no), using stratified analyses. We conducted comparisons of coefficients within the strata of each factor to detect any statistically significant differences, assuming the difference between the coefficients to follow a normal distribution with a mean of zero and a variance of the sum of the strata variances. To assess the robustness of the results, we repeated the primary analysis by removing confounding adjustments from the prediction model for the negative outcome control. In the above analyses, we reported the effect as the percent change in hospitalization rate and its 95% CIs for each cardiovascular outcome per μg/m3 increase in annual exposure to PM2.5 and per ppb increase in annual exposure to NO2 and O3.

All analyses were performed using R software version 4.2.3 on the Research Computing Environment as part of Research Computer at Harvard University Faculty of Arts and Sciences. A two-sided P-value < 0.05 was considered statistically significant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.