INTRODUCTION

Particulate air pollution has consistently been associated with a variety of adverse health outcomes, including increased mortality1 and hospital admissions.2 Results from a growing number of epidemiological studies suggest that some fine particulate matter (PM2.5) constituents are more toxic than others.3 Multicity studies of acute PM effects, for example, show that the relation between ambient PM2.5 and hospital admissions varies by city,4, 5 with this variation attributed in part to differences in particle composition.6, 7

Several factor analysis and source apportionment methods have been developed to apportion sources of ambient PM2.5,8, 9, 10 the estimates of which have subsequently been used in epidemiological studies to investigate the association between source-specific PM2.5 and health. Ozkaynak and Thurston,11 for example, found increased risk of mortality after exposure to particles associated with iron/steel emissions and coal combustion. Laden et al.12 used specific rotation factor analysis to show that particles from mobile and coal combustion sources are associated with significant increases in daily mortality in six US cities. Using positive matrix factorization (PMF) and chemical mass balance, Sarnat et al.13 found significant associations between mobile sources and biomass burning and cardiovascular disease (CVD)-related hospital admissions in Atlanta, GA, USA. Significant effects between mobile sources and cardiovascular hospital admissions were also observed by Lall et al.14 in New York, NY, USA, using PMF. Although these studies used different source apportionment methods, the results from an intercomparison of source apportionment methods suggest that different source apportionment methods produce similar findings in health effect studies.15

None of these studies, however, have accounted for the uncertainty in the prediction of the source contributions. By failing to account for uncertainty in the source apportionment, uncertainty associated with factor estimation, that is source contribution estimation, was ignored, potentially leading to an underestimation of the uncertainty (i.e., standard errors) associated with the estimated health effects.16, 17 To address this issue, we investigate the impact of propagation of the uncertainty associated with prediction of source contributions in the association between specific PM2.5 sources and emergency CVD hospital admissions, using two source apportionment methods and air pollution and health data from Boston, MA, USA.

METHODS

Data Collection

Hospital admissions data

Daily counts of emergency hospital admissions were obtained from billing claims of Medicare enrollees >64 years old for Boston, MA, USA (2003–2010), for the counties Middlesex, Essex, Norfolk, and Suffolk. Only admissions that occurred through the emergency department were included, as scheduled admissions are likely not related to short-term air pollution exposures. Each billing claim contains information on the date of hospitalization, age, residence county, and primary and secondary diagnoses. Using codes from the International Classification of Diseases, 9th Revision (ICD-9; Center for Disease Control and Prevention 2008), we considered hospital admissions for all CVD conditions (codes 390–429).

Air pollution and weather data

Ambient measurements of PM2.5 and its components were measured at the EPA PM Center stationary ambient monitoring supersite, located in downtown Boston on the roof of the Countway Library at the Harvard Medical School. The 24-h integrated PM2.5 samples were collected using a Sequential Sampler (Partisol Model 2300 by Rupprecht and Patashnick, Albany, NY, USA) at a flow rate of 16.7 LPM. PM2.5 samples were analyzed for mass using gravimetric analysis and for elemental concentrations using X-Ray Fluorescence at the Harvard School of Public Health. Continuous black carbon (BC) concentrations were measured every 5 min using an Aethalometer (model AE-14 by Magee Scientific, Berkeley, CA, USA). Temperature and dew point were obtained from the hourly surface observations of the National Weather Service First Order Station at Logan Airport (East Boston).

Data Analysis

For our study, we restricted analyses to concentrations measured between 2003 and 2010. During this period, we observed strong source-specific PM2.5 effects, allowing us to assess how propagation of uncertainty affects non-null estimates. Data from a longer time period (1998–2010) showed weaker associations, with a significant association only for mobile sources. For the purposes of this study, strong effects were crucial, as uncertainty propagation when the effect estimates are null would not be meaningful.

All statistical analyses were conducted using the R Statistical Software, version 2.14.1 (Foundation for Statistical Computing, Vienna, Austria).

Source apportionment

Sources of PM2.5 in downtown Boston were identified using two different methods: US Environmental Protection Agency PMF 3.0 (ref. 9) and the absolute principle component analysis (APCA).10 These methods were selected because both have been extensively used in fine particulate source apportionment in the past.14, 18, 19

Briefly, both methods assume unknown source profiles. APCA is an extension of principle components analysis (PCA); once PCA is conducted, the principle components are rotated and the component scores rescaled relative to a reference of zero pollution. Total PM2.5 mass concentrations are then regressed on these factor scores and thus both component scores and loadings are provided.10 PMF, on the other hand, minimizes a least squares function that also takes into account the uncertainty associated with the daily measurements of each element. PMF further imposes the restriction that both source profiles and contributions are non-negative.9

For the APCA method, we used the SAS software, version 9.3 (SAS Institute, Cary, NC, USA), whereas for PMF we used the ME-2 multilinear engine executable and script files under the EPA public license, available with installation of EPA PMF 3.0.

Base health analyses

As “base analyses” we refer to all analyses conducted without considering the impact of uncertainty.

We fit case-crossover analyses using a time-stratified approach; control days were chosen bidirectionally for subjects on the same year, month, and day of week of the admission.20, 21 This design eliminates confounding by personal characteristics that do not change over time22 and limits confounding by seasonality and long-term trends.23

We ran conditional logistic regressions for total CVD admissions, for each of the two source contribution data sets, generated by PMF and APCA. We ran our models including all factors simultaneously to account for potential confounding by factors, and separately by factor. We used linear adjustments for same-day temperature, same-day dew point, and 2-day moving averaged temperature starting from the day before to 3 days before admission. We additionally adjusted for PM2.5, given previous associations with the health outcomes, differential correlation with the factors included in the model, and with other pollutants not included in the model that could act as confounders.24

We examined associations for moving averages of 1, 2, 4, 6, and 7 days. Results are presented as % change in CVD admissions per IQR increase of each estimated source contribution.

Uncertainty propagation

To assess the uncertainty associated with the estimation of the source contributions, we used a block bootstrap procedure, allowing for overlapping blocks, to account for serial correlation among observations in the time series.25 We selected the block size by assessing the autocorrelation and deseasonalized autocorrelation of the base analysis identified factors. No serial correlation remained after 15 days for any of the factors and we, therefore, conservatively selected blocks of 20 days.

We created 1500 bootstrap samples. For each bootstrap sample we apportioned PM2.5 sources using both the PMF and APCA methods, selecting a priori the number of factors selected in the base analysis. We then “mapped” the identified factors to the base factors, that is, matched each bootstrap factor to the base factor with which it had the strongest correlation. We designated a factor as unmapped if its correlation with the base factor was <0.60. Subsequently, for both PMF and APCA and each bootstrap sample, we ran the health models, including unmapped factors. The health effect estimate for each factor, and for each source apportionment method, was estimated as the median of the distribution of the 1500 effect estimates. Confidence intervals (CIs) were calculated using the 2.5% and 97.5% percentiles of that distribution. We assessed the % change in the width of the CIs of these estimates compared with the base results for both source apportionment methods, directly on the regression coefficients and not the % change per IQR increase of each factor.

In addition, we calculated the across-methods average factor-specific effect estimate. For each bootstrap sample we calculated the average health effect estimate for each factor identified by the two methods. The across-methods average estimate for each factor was then estimated as the median of the distribution of the 1500 averages and its CIs were calculated using the 2.5% and 97.5% percentiles of that distribution.

A flow diagram of the methods employed is presented in the Supplementary Material.

RESULTS

Descriptive statistics for PM2.5 and the species included in our analyses are presented in the Supplementary Table S1. In Boston, a median of 58 daily CVD-related hospitalizations was observed.

Base Source Apportionment

Six factors were identified by both PMF and APCA: regional, mobile, and crustal sources, residual oil combustion, road dust, and sea salt (Supplementary Table S2 and Supplementary Figure S1). Overall, the factors identified by the two methods were strongly correlated, with across-method by-factor correlation coefficients ranging from 0.79 to 0.98, with the exception of road dust (r=0.33) (Supplementary Figure S2). Within methods, we observed larger correlations across factors in the PMF solution, with maximal correlation between road and mobile sources (r=0.45), and orthogonal factors in the APCA solution (Supplementary Table S3).

Base Health Analyses

For both source apportionment methods, stronger associations were observed for mobile and regional sources and residual oil combustion, whereas inconsistent associations across methods were observed for road dust and crustal sources (Figures 1, 2, 3, 4, 5, 6 and Supplementary Table S4).

Figure 1
figure 1

Percent change in total CVD hospital admissions per IQR increase in regional PM2.5 factor for all exposure windows when all factors were simultaneously included in the health model.

Figure 2
figure 2

Percent change in total CVD hospital admissions per IQR increase in mobile PM2.5 factor for all exposure windows when all factors were simultaneously included in the health model.

Figure 3
figure 3

Percent change in total CVD hospital admissions per IQR increase in residual oil PM2.5 factor for all exposure windows when all factors were simultaneously included in the health model.

Figure 4
figure 4

Percent change in total CVD hospital admissions per IQR increase in crustal PM2.5 factor for all exposure windows, when all factors were simultaneously included in the health model.

Figure 5
figure 5

Percent change in total CVD hospital admissions per IQR increase in road dust PM2.5 factor for all exposure windows when all factors were simultaneously included in the health model.

Figure 6
figure 6

Percent change in total CVD hospital admissions per IQR increase in salt PM2.5 factor for all exposure windows when all factors were simultaneously included in the health model.

When all factors were included in the health model simultaneously, we observed strong positive associations between regional PM2.5 sources and CVD admissions for the shorter exposure durations; for instance, same-day exposure to regional PM2.5 was associated with a 1.44% (−0.01, 2.90%) for PMF and 2.35% (0.77, 3.95%) for APCA increase in the rate of admissions.

We also observed positive effects for mobile sources for same-day exposures (0.89% (95% CI: −0.06, 1.86%) for PMF and 0.76% (−0.04, 1.58%) for APCA increase in the rate of CVD admissions per IQR increase of the factor). Similar increases were observed for 4-day averaged exposures. For 6- and 7-day averaged exposures, we only observed strong positive effects for PMF, whereas for APCA the effects were not as strong.

For residual oil combustion we observed strong positive associations for all exposure windows, with effects increasing with exposure duration. The strongest associations were observed for weekly averaged exposures (PMF: 1.89% (0.52, 3.29%) and APCA: 2.15% (0.71, 3.62%), respectively).

We found strong negative effects for the APCA-identified road dust for same-day and 2-day averaged exposures. We saw null effects for longer moving averages and for all exposure windows when PMF-identified road dust was used. We also found no association between crustal and salt PM2.5 and rate of increased admissions, with the exception of same-day exposure to the APCA-identified crustal.

We found similar results when we ran models containing PM2.5 contributions from a single source (Supplementary Figures S4–S8).

These effect estimates, however, likely have invalid inferences, as the uncertainty in the source contributions has not been propagated into the health models.

Uncertainty Propagation

In all 1500 bootstraps, we mapped the PMF-identified factors to the PMF base factors. For APCA, 6–10% of all bootstraps (across all exposure windows) yielded unmapped factors. Of these, 96% were unmapped to the factor sea salt and <4% to regional sources. The distributions of the correlations between the base and the bootstrap factors are presented in Supplementary Table S5.

The impact of accounting for uncertainty in source contributions on the effect estimates and their CIs, when all factors were simultaneously included in one model, is shown in Figures 1, 2, 3, 4, 5, 6 and Supplementary Table S4. The % change in the CI widths of the regression coefficients is presented in Table 1.

Table 1 Percent change in the CI width of the regression coefficients for each factor, as compared with the base case, when all factors were simultaneously included in the health model.

We did not observe any systematic changes in the effect estimates compared with the base case when we accounted for the uncertainty in the source contributions, suggesting that the original estimates did not incur noticeable bias. We observed strong associations between mobile and regional sources and increased rate of CVD admissions for shorter exposure durations (same-day exposure and 2- and 4-day moving averages) and between residual oil combustion and rate of CVD-hospitalizations for 4- to 7-day moving averages (Figures 1, 2, 3, 4, 5, 6 and Supplementary Table S4).

For 2003–2010, when all factors were included in the health model simultaneously, we observed strong associations between regional PM2.5 sources and CVD admissions for the shorter exposure durations. Regional PM2.5 was associated with a 2.01% (0.21, 3.69%) increase in the rate of admissions for PMF and 1.96% (−0.07, 4.19%) for APCA for same-day exposures and 1.58% (−0.36, 3.67%) increase in the rate of admissions for PMF and 1.60% (−0.60, 4.09%) for APCA for 4-day averaged exposures.

Mobile sources for same-day and 2-day averaged exposures were also associated with an increase in the rate of CVD admissions of 1.44% (95% CI: 0.02, 3.11%) for PMF and 0.95% (−0.31, 2.20%) for APCA per factor IQR increase for 2-day exposures. Similar effects were observed for 4-day averaged exposures. Contrary to the base results, no strong effects were observed for longer moving averages for either method, when uncertainty was propagated.

For residual oil combustion, we no longer observed strong positive associations for all exposure windows. In contrast to the base results, with significant APCA results for same-day and 2-day exposures, we only observed strong associations for both methods for exposures longer than 4 days, with strongest effects observed for weekly exposures: 2.20% (0.25, 4.30%) for PMF and 2.04% (0.08, 4.15%) for APCA.

For crustal, road dust, and salt PM2.5, we observed mostly null effects, with the exception of 2-day exposure to APCA-identified road dust (−0.69% (−1.67, 0.09%)). This was in contrast to our base findings, showing that same-day and 2-day APCA-identified road dust and same-day APCA-identified crustal were strongly related to hospital admissions, but not when PMF was used.

Average effects across methods

When averaging across PMF and APCA, the associations between mobile and regional sources and increased rate of CVD admissions remained strong for shorter exposure durations (same-day exposure and 2- and 4-day moving averages), and between residual oil combustion and rate of CVD hospitalizations for 4- to 7-day moving averages (Figures 1, 2, 3, 4, 5, 6). For example, same-day exposures to mobile and regional sources were associated with a 1.07% (0.08, 2.28%) and 2.00% (0.18, 3.78%) increase in the rate of CVD admissions, whereas weekly exposures to residual oil combustion were associated with a 2.12% (0.19, 4.22%) increase in the rate of hospitalizations (Supplementary Table S7). The results were consistent when the factors were included separately in the health models (Supplementary Tables S6 and S8 and Supplementary Figures S4–S8).

Confidence intervals

For both source apportionment methods, all factors, and exposure windows, we found that the width of the CIs increased when uncertainty was propagated, although with no apparent relationship to exposure durations. The increase in the CI widths was smallest for factors that were identified in a more stable manner and were consistent across source apportionment methods, such as regional and crustal sources. For both PMF- and APCA-identified factors, the correlation between base and bootstrap factors was close to one for regional and crustal sources (0.981–0.997). When comparing with the base results, after propagating uncertainty, the CI widths of the regression coefficients for regional sources increased by 16.1–21.5% for PMF and 18.2–33.0% for APCA across exposure windows. Correspondingly, the increase in the CI widths of the regression coefficients for crustal sources ranged from 10.6 to 31.3% for PMF and from 14.8 to 41.3% for APCA across exposure windows.

Conversely, the increase in the CI widths was much larger when the correlations between the base and bootstrap factors were not as strong. For example, for PMF-identified residual oil, for which the 2.5 percentile of the distribution of correlations was 0.790, the increase in the CI widths of the regression coefficients ranged between 89.3 and 200.8%. Moreover, for APCA-identified salt, for which we observed unmapped factors, the increase in the CI widths of the regression coefficients ranged between 341.4 and 803.6% as compared with the base results.

DISCUSSION

We investigated the impact of uncertainty in source contributions on the association between PM2.5 sources identified using two source apportionment methods and CVD emergency hospital admissions. For our selected study period, we observed strong associations between shorter exposures to mobile and regional sources and longer durations of exposure to residual oil combustion, and increased rate of CVD emergency admissions. These associations remained strong after propagating the uncertainty in the source contributions. Moreover, strong associations observed in the base analysis between same-day exposures to road dust and crustal sources and CVD admissions were no longer significant after accounting for uncertainty in source contributions.

As the two source apportionment methods we employed differ in their approach, it was not surprising that their source contributions and subsequent health estimates would also differ, although in most cases not significantly, with similar effect estimates and widely overlapping CIs. This is in agreement with findings from the workshop on the source apportionment of PM health effects15, 19, 26 that reported that the choice of source apportionment method contributes 15% to the uncertainty of the health effect estimates.26

Furthermore, we demonstrated that failure to account for uncertainty in the source contributions, regardless of the choice of method, leads to invalid inferences, that is, more narrow CIs of the health effect estimates. The increase in the CI widths depends mainly on how consistently a factor is identified, and also on the statistical power one has to observe any effects that factor might have. For example, in our study, regional sources were very consistently identified across methods and bootstraps and we therefore saw the smallest increase in the CI widths. Conversely, although salt was consistently identified with PMF, the lack of daily variability of the factor, and consequently decreased statistical power, led to large increases in the CI widths. Moreover, accounting for uncertainty contributed to fewer spurious significant results and more consistent findings across methods, as was the case with our road dust and crustal findings.

To account for the differences in the effect estimates obtained by the two different source apportionment methods, we also estimated average across-methods source-specific effects using the bootstrap effect distributions. Estimation of effects across methods would otherwise not be possible, as the derivation of the analytical form of the variance of an average effect, which would also incorporate the uncertainty in the estimation of the source contributions by both methods, is not straightforward. Use of bootstrap effect estimate distributions allows us to consider differences in the source apportionment methods while accounting for uncertainty in their estimation, akin to an ensemble of their resulting health effect estimates. Although ensemble-trained source apportionment methods have been previously developed,27, 28 to the best of our knowledge, this is the first effort to combine across-methods source-specific health effect estimates.

An across-methods source-specific effects ensemble reflects the weight of the evidence as well as addresses potentially conflicting signals at the same time. For example, for mobile sources, both source apportionment methods yielded positive, more similar effect estimates for shorter exposure durations, whereas for longer moving averages the base effects were conflicting, with significantly positive PMF, but not APCA, effects. Accounting for uncertainty, the pattern became more clear, with better agreement between the effects estimated by the two source apportionment methods. The same pattern was also clear in the across-methods average effect, with consistency across the two methods yielding reduced variability compared with the individual effect estimates.

In terms of potential bias when failing to account for uncertainty, in a simulation study, Nikolov et al.16 observed biased health effect estimates when they did not account for uncertainty in the source contributions. However, their study design, study population, and statistical analyses were very different than ours, as they investigated cardiovascular effects in a much smaller sample of dogs and the bias occurred in a linear health model. In this study, we did not observe any significant bias in the health effect estimates of our base analyses. Even though the effect estimates were in cases somewhat different in the bootstrap results, any oscillations occurred well within the CIs and thus do not appear as systematic bias.

Our study is limited by the use of measurements conducted at a single monitor; some PM2.5 components and sources, moreover, are more spatially variable than others. A study also addressing the impact of such errors, while accounting for uncertainty propagation, would provide a more comprehensive picture, as we were not able to address exposure error with our analyses. Furthermore, the effects of PM2.5 sources on health have been shown to be heterogeneous in time and space, and thus more studies should be conducted to examine factors contributing to this heterogeneity. Finally, with our analyses, our goal was to address the impact of uncertainty specifically associated with the estimation of source contributions. We, therefore, did not examine the impact of multiple sources of uncertainty, such as measurement error, spatial misalignment, and error associated with XRF analysis.

To our knowledge, this has been the first study to quantify the impact of uncertainty in the source contributions on health effect estimates, employing two widely used source apportionment methods. We demonstrated that uncertainty propagation results in increased CI widths. Finally, our findings qualitatively extend to other cases in which exposure estimates are used as “true” exposures in health analyses.