Chronic kidney disease (CKD) is a significant public health problem in the US affecting 14.8% of the US adult population [1]. Common risk factors for CKD include: history of diabetes or hypertension, older age, male sex, black race, and having lower socioeconomic status [2]. In addition, CKD is associated with increased risk of cardiovascular disease, frailty, poor quality of life, premature death, and is costly in the private and public sectors [3]. The risk of adverse events increases with CKD progression [4]. Patients with CKD are also at increased risk of severe illness from SARS-CoV-2 infection [5, 6]. The burden of CKD risk is not equitably distributed with greater prevalence among low-income countries and socioeconomically disadvantaged and minority communities in the US [7]. However, even among these groups, it has been observed that CKD risk shows a substantial spatial variation that cannot be explained by individual risk factors, such as incident hypertension or diabetes [8, 9]. This indicates that potentially unmeasured environmental exposures may be attributing to unexplained CKD incidence.

A growing literature has documented that air pollution exposure, notably fine particulate matter (PM2.5), maybe a novel risk factor for CKD [10,11,12]. PM2.5 can induce systemic-wide effects through mechanisms involving inflammation, oxidation, and direct toxicity from particle constituents [13, 14]. In animal models, inhalation of diesel exhaust has been shown to induce renal oxidative stress, inflammation, and DNA damage, while a 6-year study of Boston hospital stroke patients found that individuals living closer to roadways showed reduced renal function [15, 16]. On a global scale, Bowe et al. estimated that PM2.5 can be attributed to 17–20% of CKD burden with an estimated 6.95 million annual cases caused by PM2.5 exposure [8]. In the US, the estimated attributable burden is closer to 4% [8]. Previous studies in the US have looked at the association between PM2.5 and CKD in specific populations (veterans and established cohorts), but few have examined the general population and the spatial variation in risk of CKD [17, 18].

To gain a better understanding of the PM2.5 and CKD association at a local and actionable level, we used electronic health record (EHR) data of a health care system servicing the 7-county Twin Cities area in Minnesota. This study aims to use fine-scale environmental data and individual patient cohort data to characterize the association between PM2.5 and CKD prevalence and incidence, as well as evaluate the spatial clustering of CKD cases. Our goal is to add to the current body of literature and estimate the effect of PM2.5 on kidney outcomes.



Patients were identified from the EHR database of Fairview Health Services, the primary hospital affiliate of the University of Minnesota, from 1 January 2012 to 1 April 2019. The EHR data incorporates all outpatient visits, labs, and billing data, geocoded addresses (latitude and longitude), and census tract of patients’ residences [19]. We manually abstracted clinical and laboratory data for 100 random patient charts and compared it with the EHR to ensure data quality. This study was approved by the Institutional Review Board at the University of Minnesota.

Study population

The population included adult patients (≥18 years of age) with a geocoded address from the 7-county Twin Cities area in Minnesota. Included patients’ EHR data was restricted to outpatient clinic visits. Patients had at least one measure of outpatient creatinine between 1/1/2012 and 1/1/2014 (baseline period), and at least one primary care physician visit in the Fairview health system at or before the time of index creatinine. We defined index creatinine as the first available creatinine measure during the baseline period. For longitudinal analysis, we included all outpatient creatinines obtained up until 4/1/2019. Patients who opted out of research or moved between 2012 and 2019 were excluded. To assess CKD incidence, we additionally excluded all patients who had CKD during the baseline period and those who had <3 creatinine measurements (baseline creatinine and two repeat creatinine measures), to ensure adequate measures and patient clinic visits.

Air pollution exposure

We acquired publicly available daily PM2.5 air pollution data from the Environmental Protection Agency (EPA) downscaler model at the US census tract level (~4000 people) from 2011 to 2014 [20]. Geocoded home address was determined as the latest available address at or before the time of index creatinine. Subsequently, each patient was assigned a census tract based on their geocoded home location and we estimated the 365-day mean PM2.5 concentration prior to their first estimated glomerular filtration rate (eGFR) during the baseline period (2012–2014). A 7- and 30-day mean prior to the date of their first eGFR visit was also estimated to examine short-term exposure.

Health outcomes

All creatinine measures were traceable to an isotope dilution mass spectrometry reference measurement. Of note, creatinine is a marker of kidney function and is derived from the metabolism of creatine in skeletal muscle and from dietary meat intake. Creatinine is freely filtered across the glomerulus but is not reabsorbed or metabolized by the kidney. According to guidelines, kidney function should be evaluated using eGFR which is calculated using the CKD epidemiology collaboration equation that includes sex, age, creatinine, and race [21,22,23,24,25]. Lower eGFR levels reflect worse kidney function (normal eGFR ≥ 60 ml/min/1.73 m2). CKD prevalence was defined as having an eGFR <60 ml/min/1.73 m2 during the baseline period (1/1/2012–1/1/2014). In the secondary analysis, we used a more restrictive definition of CKD prevalence as an eGFR <45 ml/min/1.73 m2 during the baseline period; cutoff reflects moderate to severe kidney disease associated with worse clinical outcomes [4, 22, 25,26,27,28]. CKD incidence was assessed between 1/2/2014 and 4/1/2019. CKD incidence was defined using four different criteria: (a) first eGFR <60, (b) two consecutive eGFR <60, (c) first eGFR <45, and (d) two consecutive eGFR <45 ml/min/1.73 m2. Consecutive measures are taken at least one day apart.

Individual and neighborhood covariates

We identified sex and race from the EHR. Age, smoking status, body mass index, and insurance status as a measure of individual socioeconomic status (private vs. other [Medicaid or Medicare with no supplemental insurance status]) were defined using the last value before or at the time of the index creatinine. Comorbidities (hypertension, diabetes, cardiovascular disease, stroke, hyperlipidemia, and cancer) were considered present if at least two International Classification of Disease (ICD)−9/10 codes for that condition were present at or before the date of the index creatinine (Supplemental Data). The number of hospitalizations per patient over follow-up was calculated. We assigned neighborhood sociodemographic characteristics of wealth, education, and income for each patient based on census tract of residence using the 2012 American Community Survey (ACS) 5-year Data [2008–2012] [29,30,31]. Measures included: (1) median value of owner-occupied housing units, (2) percentage of residents over age 25 with a Bachelor’s degree or more, and (3) median household income.

Individual covariates that presented >5% missing data, including obesity and insurance, were adjusted with an additional “missingness” variable rather than by imputation, which is limited to the assumption of randomness (missing completely at random) [32, 33].

Statistical analyses

Spatial analyses

We explored spatial patterns of CKD intensity and clustering using the 2012 EHR data for two patient populations: (1) individuals residing in the 5-county Twin Cities area, we included, only 5 of the 7-county areas as these counties had sufficient patient density for analysis and (2) patients residing in a sub-region of the 5-county Twin Cities area encompassing a high density of patients (Supplemental Fig. 1). The targeted sub-region captures 93% of the Twin Cities 5-county area patients. We did not examine the area beyond the 7-county Twin Cities region to avoid spurious spatial results that can arise when patient locations are sparsely distributed near census tract margins.

We evaluated the spatial variation in CKD risk by estimating the spatial intensity of both CKD patients and non-CKD patients (e.g., controls). Spatial intensity represents the number of CKD cases or controls per unit area, which we defined with a Gaussian weighted kernel. We estimated an optimal kernel bandwidth using a cross-validation approach; however, upon consideration of the geographic coverage of the Fairview System (7-county Twin Cities area) and after sensitivity testing of multiple distances (1.0, 1.25, 1.50, 1.75, and 2.5 km), we selected diameter of 2 km for both case and control bandwidths to provide sensible smoothing. Under a null association, we expect constant intensity across our study region. A spatial relative risk was estimated by taking the log ratio of the spatial intensity of CKD cases and the spatial intensity of controls, with a value of zero representing an equal case-control spatial distribution [34, 35]. As a sensitivity test, we estimated a non-log transformed CKD relative risk and performed estimates for the Twin Cities sub-region, and found our distribution of estimates to be very similar.

To evaluate CKD spatial clustering in the 5-county Twin Cities region and sub-region, we applied a K-function test statistic that takes the difference in the K-function for controls from the K-function of CKD cases [36, 37]. A difference in K-function value of zero indicates the null hypothesis of equal spatial risk, a positive departure from 0 indicates clustering of cases more than controls, and a negative value indicates clustering of controls over cases. For our study, clustering is when a CKD patient (or control) is more likely to reside close to another CKD patient (or control), excluding those within the same household. In this study, we evaluated clustering at a distance of up to 10 km to capture most neighborhood-level effects. The significance of clustering was assessed by simulating 95% confidence envelopes using a Monte Carlo random re-assignment of the case and control spatial labels while maintaining the same underlying population at-risk (N = 500 iterations). A calculated p-value represents the overall departure from null [37].

Cross-sectional analyses

We used a multilevel model with a random intercept at the census tract level to estimate the PM2.5 association with CKD prevalence from 2012 to 2014. Since CKD prevalence (eGFR < 60 ml/min/1.73 m2) was a common outcome, we estimated the prevalence ratio (PR) and 95% confidence interval using Poisson regression with robust error variance [38,39,40]. We adjusted for demographic (age, race, sex), clinical (obesity, smoking history, history of hypertension, diabetes, cardiovascular disease, hyperlipidemia, and cancer), insurance status, and census tract characteristics of education, and income. Potentially unaccounted for spatial variation was checked by estimating a semivariogram of residuals for each regression model.

Longitudinal analyses

We used a mixed effect Cox proportional hazards model with a random intercept at the census tract level to estimate the PM2.5 association with CKD incidence. We adjusted for the same demographic, clinical, and census tract characteristics as the cross-sectional analysis and we adjusted additionally for the number of times the patient was hospitalized over follow-up and the number of eGFRs measures available.

For longitudinal and cross-sectional analyses, we did not adjust for tract wealth due to the high correlation between tract wealth and education. We also examined effect modification of the PM2.5—CKD relationship by hypertension, diabetes, sex, age (< vs. ≥65 years), and race (black vs. white). Exposures were performed for continuous (per 1 µg/m3) and categorical (per quartile) changes of PM2.5. All statistical analyses were conducted using R Statistical Software (version 3.3.1) and Stata [41, 42].


Baseline cohort characteristics

There were 345,042 patients identified from the EHR at baseline. We excluded 94,463 patients due to no primary care physician visit or who moved during the study period. We further excluded 60,399 patients who lived outside the metro area and 2998 patients <18 years at the time of index creatinine. We included 113,725 patients for our baseline analyses.

Patients were on average 50 years of age, 8% were black, 45% were males and 11% had CKD (eGFR < 60 ml/min/m2) (Table 1). Patients with CKD compared to non-CKD patients were older, less likely to be black and males, more obese, and more likely to have comorbidities, but had similar distribution across quartiles of SES census tract measures. This was true irrespective of whether the eGFR cutoff to define CKD was 60 or 45 ml/min/1.73 m2 (Table 1, Supplemental Table 1). Tract income was moderately correlated with tract wealth and education (r = 0.33 and 0.30). However, tract wealth was highly correlated with tract education (r = 0.68).

Table 1 Baseline characteristics of Fairview cohort and of those with and without CKD (eGFR < 60 ml/min/1.73 m2) (2012–2014).

Spatial analyses

Within the Twin Cities sub-region, we visually interpret spatial heterogeneities in the intensity of both CKD cases and control patients (Fig. 1A, B). As anticipated, areas with high intensity correspond to neighborhoods with dense populations and a greater number of Fairview patients. In a corresponding step, we estimated spatial variation in CKD risk by taking the log ratio of case and control intensities (Fig. 1C). A heterogeneous CKD risk is observed that is different from cases or control intensities alone. This indicates that elevated risk of CKD is specific to certain geographic areas and is not a function of patient density. Hotspots of CKD incidence are shown near-certain neighborhoods. It should be noted that high relative risk is observed in a few isolated spots near the boundaries of our study region; however, we consider these results to be artifacts of small case and control numbers and do not consider them valid risks. Evidence for spatial clustering of CKD cases compared to non-CKD patients was observed (Fig. 2). Our estimated difference in the K-function line was positive, indicating cases clustering more than controls, and outside the simulated 95% confidence envelopes, providing evidence for statistically significant CKD clustering for all distances up to 10 km (p < 0.01). Both Figs. 1 and 2 provide evidence that the risk of CKD varies significantly across the Twin Cities region and that the risk is not equally distributed with apparent geographic disparities.

Fig. 1: Spatial intensity of CKD by County in the Twin Cities.
figure 1

Spatial intensity of (A) CKD cases, (B) non-CKD patients (e.g., controls), and (C) log ratio of CKD case and control intensities (e.g., the spatial relative risk for CKD), using 2012 Fairview Health System records from the Twin Cities subregion, Minnesota. In panel (C), light colors denote regions with greater risk of CKD, while dark colors denote less risk of CKD. Gray lines denote county boundaries found in the inset of Fig. 1. Hotspots of CKD incidence are shown near-certain neighborhoods, such as: Maplewood, North St. Paul, downtown St Paul, Arden Hills, Bloomington, and portions west of Bredesen Park.

Fig. 2: K-function by CKD status.
figure 2

The difference in K-function for cases and controls of chronic kidney disease from the Fairview System in the 5-county Twin Cities Area.

CKD prevalence

At baseline, PM2.5 quartiles were <9.5, 9.5–10.1, 10.1–10.7, and ≥10.7 µg/m3 respectively. There was no cross-sectional association between PM2.5, for each 1 µg/m3 and by quartiles, with CKD prevalence (eGFR < 60 ml/min/1.73 m2) (Table 2). However, when defining CKD as having an eGFR of <45 ml/min/1.73 m2, higher quartiles of PM2.5 were associated with greater odds of CKD compared to the first quartile after adjustment for demographic, clinical, and SES census tract characteristics (Supplemental Table 2). For example, the fourth quartile of PM2.5 was associated with 18% greater odds [1.05, 1.33] of CKD compared to the first quartile after adjustment. Similarly, a 1 µg/m3 higher annual average PM2.5 was associated with 7% greater odds [1.03, 1.11] of CKD even after adjustment. We found no evidence of effect modification of the PM2.5-CKD association with hypertension, diabetes, age, or sex.

Table 2 The prevalence ratio (PR) of chronic kidney disease (CKD) associated with annual PM2.5 using the Fairview Health System cohort (2012–2014)a.

CKD incidence

A total of 20,289 patients had no CKD at baseline and had at least three outpatient creatinine measures following index eGFR. Of those 3042 patients had incident CKD (eGFR < 60 ml/min/1.73 m2) over a median of 8.5 years (Supplemental Table 3). The overall CKD incidence was 2 per 100 person-years (Supplemental Table 4). We observed increased CKD incidence rates for both continuous and categorical measures of baseline PM2.5. Each 1 µg/m3 increase in baseline annual average PM2.5 was associated with increased risk of CKD after adjustment for demographic, clinical, and SES characteristics (HR: 1.78 [1.65, 1.89]). Comparing the second, third and fourth quartile of baseline PM2.5 exposure to the first quartile, we observe an HR of CKD incidence of 1.72, 2.15, and 2.49, respectively (Table 3). Similar trends were observed when CKD was defined based on having two consecutive eGFR <60 or <45 or having an eGFR<45 ml/min/1.73 m2 (Supplemental Tables 57). The magnitude of the association between PM2.5 and CKD incidence was highest when defining CKD incidence as having two consecutive eGFR <45 ml/min/1.73 m2 and comparing the fourth quartile of PM2.5 to the first quartile, with an HR of 3.83 [2.04, 7.19]. There was no effect modification by hypertension, diabetes, race, age, or sex for any of the models.

Table 3 Hazards rate and 95% CI of incident CKD (first eGFR <60 ml/min/1.73 m2) associated with annual PM2.5 using the Fairview Health System cohort (2012–2019).

Sensitivity analysis

Spatial variograms of residuals from the CKD prevalence model revealed no evidence for unexplained spatial dependence in our regression analyses, eliciting confidence in the variance of the risk estimates. It also indicates that spatial patterns of CKD are likely explained by air pollution concentrations and individual or community characteristics. When restricting our spatial evaluation to the Twin Cities sub-region, we observed similar clustering trends of CKD cases compared to the full study area, although the degree of clustering is greater (Supplemental Fig. 2). We did not observe any association between short-term PM2.5 exposure using 7 and 30-day mean exposures and prevalent CKD.


We found that among patients seen at a healthcare system servicing the 7-county Twin Cities in Minnesota, higher levels of annual PM2.5 were associated with greater CKD prevalence (eGFR <45 ml/min/1.73 m2). Similarly, patients living in census tracts with higher levels of PM2.5 at baseline were associated with a greater incidence of CKD. We also observed clustering of CKD cases indicating geographic heterogeneity of disease in the 7-county Twin Cities area, but these spatial relationships disappeared after controlling for environmental and sociodemographic factors. These findings demonstrate the presence of neighborhood disparities in CKD and hint that improvements in air quality specifically at CKD hotspots may reduce CKD.

Previous epidemiological studies have shown an association between PM2.5 and incident CKD. In a large cohort study, Bowe et al. studied the association between annual average PM2.5 and risk of kidney disease among 2,482,737 veterans over a median of 8.5 years. The authors found, a 10 µg/m3 increase in PM2.5 was associated with an increased risk of eGFR <60 ml/min/1.73 m2 (HR: 1.2 [1.1, 1.3]), incident eGFR decline ≥30% (HR: 1.3[1.2, 1.4]), and incident end-stage renal disease (ESRD) (HR: 1.3 [1.2–1.4]) [11] In a recent study of 10,997 participants from the atherosclerosis risk in communities project, each 1 µg/m3 increase in annual average PM2.5 was associated with: (1) greater risk of incident CKD (follow up eGFR <60 ml/min/1.73 m2 or ≥25% decrease in eGFR relative to baseline); (2) higher CKD related hospitalization or ESRD (HR: 1.1[1.0, 1.1]); and (3) increased albuminuria (percentage difference: 7% [3%, 11%]) [18] Our results further solidify that PM2.5 is associated with incident CKD as has been shown in veterans and the ARIC study. Additionally, we observed that the strength of the association between PM2.5 with CKD was greatest when CKD was defined as having two consecutive eGFR <45 ml/min/1.73 m2. This may reflect that air pollution impacts are greatest among patients with advanced CKD and using two consecutive eGFR measures confirms a severe diagnosis.

Conversely, the association between PM2.5 and CKD prevalence and eGFR in cross-sectional studies is unclear. Blum et al. [18] found no significant association between baseline PM2.5 and eGFR16, which differed from our results indicating an association between annual PM2.5 and prevalent CKD, eGFR <45 ml/min/1.73 m2 (18% greater odds). Two previous cross-sectional studies looking at the Medicare population and veterans also showed an association between PM2.5 and CKD prevalence. Among 669 older veteran men living in the Boston area, one year PM2.5 was associated with lower eGFR [12]. A 2.1 interquartile range higher 1 year PM2.5 was associated with a 1.9 ml/min/1.73 m2 [−3, −0.8] lower eGFR. Similarly, among 1.1 million persons from the 2010 Medicare sample, every 4 µg/m3 increase in PM2.5 was associated with a 3% [1.02, 1.05] greater prevalence of CKD (ICD9 codes). These differences could be due to the underlying population assessed (age, race, clinical comorbidities).

Findings from our study are consistent with those reported worldwide as far as greater risk of CKD as PM2.5 concentrations increase. In China, a national renal biopsy study found that each 10 µg/m3 increase in annual PM2.5 concentration was associated with 14% greater odds [1.1, 1.2] of a patient developing membranous nephropathy in high PM2.5 regions (>70 µg/m3 annual average) [43]. This association was attenuated in areas where PM2.5 levels were <60 µg/m3 (OR: 1.0[0.99, 1.0]). In Taiwan, a study among the general population showed that a 10 µg/m3 increase in PM2.5 was associated with a 6% higher risk [1.0, 1.1] of developing CKD (eGFR < 60 ml/min/1.73 m2). It is important to note that, PM2.5 concentrations vary greatly between countries. For instance, the annual average of PM2.5 concentrations in China and India exceed 50 µg/m3 compared to the US, UK, and Japan where the levels are between 10 and 14 µg/m3 [44]. Therefore, it would be of value not only to further understand the association of PM2.5 and CKD but also to identify clusters and areas where this association is more apparent. In the 7-county Twin Cities area, a closer look at the military and industrial facilities is needed to identify if PM2.5 is contributing to this clustering, especially since CKD risk is greatest in these historically industrial areas.

The mechanism by which PM2.5 affects the kidneys is unclear. Several hypotheses have been proposed [45]. PM2.5 can trigger autonomic system imbalance and proinflammatory responses and can activate endothelial dysfunction and arterial vasoconstriction. PM2.5 may also increase blood pressure and decrease nitrous oxide excretion and can induce further insulin resistance. All these possible pathways could in turn cause vascular damage, glomerulosclerosis, tubulointerstitial damage, and intraglomerular hypertension which could lead to CKD and ESRD. We observed in models including both patient hypertension and diabetes (model 4), a slight attenuation of the magnitude of the association in all our analyses, indicating potential mediation. Future work should focus on discerning the exact mechanism through which air pollution affects CKD and the role of co-morbidities.

Our study has several limitations. First, we studied a single healthcare system in the 7-county Twin Cities which may not be representative of the general population and other states. However, the Fairview health system covers 161 zipcodes, nine counties, and represents a population of almost 2.7 million people, therefore providing a good representation of most of Minnesota. Second, we used an indicator variable to account for missingness for obesity (9% missing) and insurance (5% missing) [32, 33] Third, we have a relatively small cohort of patients assessed for CKD incidence and that contributes to the wide CI. However, our results were consistent across all definitions of CKD incidence, indicating a toxicological phenomenon. Fourth, we were unable to measure the association of PM2.5 with albuminuria since most patients in our cohort did not have urine albumin assessed. Fifth, we only looked at one source of modeled PM2.5 air pollution, but the selection of the EPA Downscaler model was critical to support the fine spatial and daily temporal resolution needed for our geographic study area. Furthermore, the downscaler model has been shown to provide a better prediction performance compared to both modeled (e.g., CMAQ) and interpolated (e.g., kriging) surfaces [46] Sixth, by including patients with >3 creatinine measures to study the association of PM2.5 with incident CKD we are subject to selection bias. Finally, we used ICD codes to determine comorbidities. In order to minimize misclassification bias by ICD codes, we used at least two ICD codes to confirm diagnoses. We also adjudicated 100 patient charts and found that ICD code diagnoses were consistent with clinic notes (data not shown). Finally, our study included individual and neighborhood characteristics at baseline and did not have the capacity to longitudinally evaluate changes over time. Strengths of our study include using an EHR data set from routine clinical practice that, unlike structured cohort studies, reflects the population of this healthcare system and the population it is serving and assessing spatial clustering in the Twin Cities in addition to accounting for clustering by tracts in the regression analyses.

In conclusion, we found spatial clustering of CKD cases in the Twin Cities region and identified that higher levels of annual PM2.5 were associated with increased CKD prevalence and incidence. Future research should focus on identifying areas of CKD case clusters and determine whether efforts to improve air quality in these hotspots will reduce the burden of disease. Exploring the association between CKD hotspots with current and historical industrial sites may further reveal relationships between local hazards and kidney-related morbidity. While this has global importance, efforts should focus on settings worldwide where elevated air pollution levels put populations at increased risk [47].