Introduction

Air pollutant concentrations in indoor environments, where people spend most of their time,1 may be the most relevant exposure metrics to include in health studies.2 Because of high cost and labor demand of indoor sampling, however, studies of indoor air quality usually include data from only a few homes that may not be representative of relevant exposures. This technical limitation has led researchers to deploy models to predict indoor exposure levels using more readily available outdoor measurements.3 Typically, predictive models of indoor exposure to fine particulate matter (PM2.5) are derived from mass balance equations. A key model parameter is the infiltration factor, which varies by housing characteristics (e.g., insulation, age of house, etc.) and human activities (e.g., open windows and use of fan, air purifier, or air conditioning).2, 4, 5 However, examination of various methods and results suggest that a mass balance approach that uses reliable estimates of infiltration and includes detailed information about individual homes is not available.6 As the exposure error associated with poorly characterized infiltration was found to bias health effects assessment, the statistical power of epidemiological studies to detect health effects of exposure to indoor air pollutants remains limited.7

A common quantitative approach in estimating infiltration of fine particles is to incorporate a tracer element that (a) is predominantly of outdoor origin, (b) is measured accurately in both indoor and outdoor environments, (c) is present in relatively high levels to ensure small measurement error, (d) has a similar size distribution to PM2.5, and (e) is chemically stable.8 The indoor-to-outdoor ratio of tracer elements that meet these criteria can be used as a proxy of the infiltration rate. Among major constituents of PM2.5, sulfur or sulfate is generated mostly from outdoor sources such as power plants and industrial activities.9 Moreover, studies have shown that indoor sulfur sources are scarce and that indoor sulfur concentrations are highly correlated with outdoor concentrations.10, 11 For these reasons, sulfur has been utilized as a tracer element to assess infiltration rates;3, 6, 9, 12, 13 however, infiltration rates estimated with the sulfur tracer method vary significantly and quantifying Sindoor/Soutdoor remains a challenging task.14 Uncensored or underreported indoor sulfur sources (e.g., kerosene heater use and cigarette smoking) may violate the common assumption of zero indoor sulfur sources and subsequently bias the Sindoor/Soutdoor estimate.15 Furthermore, previous studies often assumed the spatial variation of outdoor sulfur to be negligible and relied on sulfur measurements from central monitoring sites as a surrogate of sulfur concentration outdoors. Although a substantial amount of sulfur is generated from regional sources, local sulfur emissions have been found to contribute significantly to outdoor sulfur concentrations. For instance, in the New England area, the frequent use of oil as a home heating fuel is a notable local sulfur source and cause outdoor sulfur levels to vary significantly in this region.16 Thus, ignoring the spatial variability of outdoor sulfur concentrations likely introduces exposure errors and subsequently lowers the predictive power of Sindoor/Soutdoor models.

Numerous investigations have linked increased health risks with exposure to PM2.5,17, 18, 19, 20 and a reliable estimation of indoor PM2.5 exposure levels may inform the development of effective regulation and control strategies to reduce health risks. The aim of this work is to develop a paradigm to estimate indoor air pollution levels in specific homes at times when direct measurements are not available, using sulfur as a measure of infiltration, and accounting for the spatial variability of outdoor sulfur concentrations. We constructed a robust model that accounts for potential sources of exposure errors to estimate Sindoor/Soutdoor. We then applied the modeled Sindoor/Soutdoor to predict indoor PM2.5 and black carbon (BC) concentrations. Finally, we conducted cross-validation on all models to examine their predictive power specifically in situations where indoor measurements are not available.

Materials and methods

Study Design

Between November 2012 and December 2014, we collected 405 indoor samples in 130 homes of subjects with COPD who were participating in a study assessing the health effects of indoor particulate matter. Subjects were veterans living in Eastern Massachusetts who received care through the VA Boston Healthcare System and non-Veterans who were recruited via advertisement. Subjects who reported current smoking, indoor sources of smoke, including regular candle burning or use of a fireplace, wood stove, or other indoor combustion source, or known sources of indoor sulfur (such as from a humidifier) were excluded from the study. After excluding homes with reported indoor sulfur sources and those with fewer than two samples, our analysis included 328 indoor samples from weeklong sampling sessions collected at 102 residences. Samples were collected at least twice per home in different seasons to measure indoor sulfur, PM2.5, and BC concentrations.

Data Collection

Outdoor air pollution

Daily outdoor PM2.5 samples were collected at the central monitoring site located on the roof of the Countway Library of the Harvard Medical School in downtown Boston throughout the study period. Details on outdoor PM2.5 measurements at the supersite were reported in a previous study.21

Indoor air pollution

A Harvard School of Public Health Micro-environmental Automated Particle Sampler (MAPS) was placed in each subject’s home for indoor sampling. The MAPS includes an inertial impactor that collects PM2.5 on Teflon filters at a low flow rate of 1.8 l/min. Teflon filters were weighed on an electronic microbalance (MT-5 Mettler Toledo, Columbus, OH, USA) before and after field measurements. Indoor BC concentrations were analyzed by measuring filter blackness of the Teflon filter using a smoke stain reflectometer (model EEL M43D, Diffusion Systems, UK). Sulfur concentration in the indoor PM2.5 samples was determined using X-ray fluorescence spectroscopy (model Epsilon 5, PANalytical, The Netherlands).22 We defined the season that each sample was obtained as winter (December–February), spring (March–May), summer (June–August), and fall (September–November).

Questionnaires

Subjects completed a questionnaire at study entry to collect information about building age, type of heating fuel, number of air conditioning units, and type of heating system (i.e., forced air heating). Following each sampling period, participants were asked to report specific activities and home characteristics during the sampling period that could generate indoor air pollution or alter the penetration of outdoor air pollution into the home. The following questions were included: (1) how many hours were the windows open during the sampling session? (2) How many hours did you use an electric space heater during the sampling session? (3) Did you use an air purifier during sampling session and for how long? Responses to these questions and the baseline questionnaire were used to examine the impact of activities on the sulfur ratio for each residence.

Land use parameters

Land use parameters often serve as surrogates of anthropogenic PM2.5 sources; in this study, land use parameters were used to quantify the spatial variability of PM and BC concentrations outside participating homes. The percentage of urban spaces in a grid of 1 km × 1 km cells covering the study area was obtained from the 2011 collection of the National Land Cover Database. Major road (A1–A3) density was gathered from the StreetMap USA database using the Feature Class Code (A1–A4) classification from the U.S. Census Bureau Topologically Integrated Geographic Encoding and Referencing system. Annual averaged traffic count for major roads was obtained from the Highway Performance Monitoring System database. The built-in Kernel density algorithm23 from ArcMap was used to calculate traffic count weighted for major road density within 1 km2. Population density was calculated within 1 km2 from the census track database of year 2000.

Statistical Analysis

We considered the dynamics of PM2.5 involving infiltration (or inflow), exfiltration (or outflow), indoor emission, and deposition removal (Figure 1).

Figure 1
figure 1

Dynamics of inflow, outflow, emission, removal of PM2.5 in the indoor environment.

On the basis of the relationships illustrated in Figure 1, we can obtain a simplified mass balance equation as follows:

where dCindoor(t) is the change in indoor concentration of particle mass (μg/m3) during the time interval dt, Fin represents particles infiltrated from outdoor, E is the indoor emission, Fout is particle exfiltration to the outdoors, and R is the indoor removal by deposition, with d as the deposition rate. Dockery and Spengler24 first developed the mass balance equation for indoor particles based on the assumption that air exchange rate (α), penetration rate (p), and deposition rate (k) remain constant over time period dt to obtain the following equation:

where dCoutdoor(t) is the change in particle concentration outside the household (μg/m3), V is the volume of the house (m3), p is the penetration factor (dimensionless), and d is the deposition rate (dimensionless). The product of the penetration factor and air exchange rate is equivalent to the infiltration factor, Finf, illustrated in Figure 1. Under the assumption that indoor air is well mixed and outdoor concentration is constant,14 the steady-state indoor concentration can be expressed as follows:

As air exchange rate is usually much faster than the indoor deposition rate (d),25 we can further rearrange Eq. 3 as follows:

From the above derivation, it is apparent that the key parameter allowing us to use outdoor particle concentrations to predict indoor concentrations is the infiltration factor (Finf). The traditional method to estimate infiltration is to measure the building tightness of each house, which is unrealistic due to the large number of complex physical factors involved. For this reason, the indoor-to-outdoor ratio of tracer elements is often used to quantify infiltration. In this study, we used sulfur as the tracer element and constructed a sulfur model to estimate the indoor-to-outdoor sulfur ratio (Sindoor/Soutdoor) for individual households. Subsequently, we incorporated the estimated Sindoor/Soutdoor to predict indoor PM2.5 and BC concentrations based on the mass balance concept (Eq. 4). Finally, we examined the predictive ability of all three models using cross-validation by home and season.

Sulfur model

As previously described, in situations where there are no indoor sulfur emissions, we can calculate the indoor-to-outdoor sulfur ratio (Eq. 5) directly and use it as a surrogate of the infiltration factor (Finf):

However, given that indoor sulfur emissions may still have occurred despite efforts to exclude these homes by questionnaire data, we estimated the infiltration factor with the following model (Eq. 6):

where Sindoor and Soutdoor are is the sulfur concentration measured indoors and outdoors, respectively, i is the identifier of the homes included in the study, and j is the season identifier. Ij is an indicator variable for season j. As sulfur was not measured outside each individual household, we used the sulfur concentration measured at the central site as a surrogate of the sulfur concentration outside homes. The exposure error introduced by using sulfur measured at a central site was accounted for in the subsequent indoor PM2.5 and BC model. Here we included a random slope by home (α1i) to generate Sindoor/Soutdoor for individual homes and to take into account the spatial variability of the sulfur concentrations outside homes that are not captured when using central site measurements. We also included fixed intercept (β0) and random intercept by home (α0i) to capture potential indoor sulfur emissions such as underreported indoor smoking and humidifier use. Finally, we included an interaction term between outdoor sulfur concentration and season to capture the seasonal variation of the infiltration. Thus, the estimated Sindoor/Soutdoor equals the sum of the estimated fixed (β1) and random slope (α1i) and the slopes of the season interaction terms (βj).

Indoor PM2.5 and BC model

Combining the estimated Sindoor/Soutdoor (Eq. 6) and mass balance (Eq. 3) we can predict indoor concentrations of PM2.5 and BC with the following equation:

where Cindoor (μg/m3) is the indoor concentration of particle mass and Coutdoor (μg/m3) is the particle concentration outside the individual household. In this model, we included a random intercept (α0i) by home (i) to account for indoor PM2.5 and BC emissions. The random slope (α1i) by home corrects any bias in the sulfur ratio due to underestimating the spatial variation of sulfur by using central site sulfur measurement as a surrogate of outdoor sulfur concentrations.

To a greater extent than sulfur, the spatial distribution of PM2.5 and BC can vary by location significantly, even within a small metropolitan area. The PM2.5 and BC concentrations measured at the central site may not reflect the local generation of particles outside participating homes, as BC, and to a lesser extent PM2.5, are highly associated with traffic volume and metropolitan activities. For this reason, we used land use variables, including major road density, percent urban space, and the distance between home and the central measurement site to predict outdoor PM2.5 and BC concentrations in addition to the measured concentrations. The final indoor model (Eq. 8) is therefore formulated as

where Ccountway (μg/m3) is the measured concentration at the central monitoring site, road density (number of vehicles‧distance traveled in km/1 km2) is the traffic density of A1–A3 roads within a 1 km2 grid in which the participating home falls, and %urban is the percentage of urban spaces within a 1 km2 grid surrounding the homes.

Out-of-sample cross-validation

We conducted cross-validation by home and season to evaluate the predictive power of the sulfur model (Eq. 6) and indoor PM2.5 and BC model (Eq. 8) described above. For each home, we first held out one measurement out of the two to four total samples measured in the same home during the study period. We then trained the sulfur, indoor PM2.5, and BC models using the rest of the samples from that specific home and all samples from other homes to predict the held-out data. We iterated the same procedure until all data were predicted once and examined the R2 of the cross-validated prediction vs the observed measurements. We designed the cross-validation to utilize measurements from the same home to provide home-specific information and samples from other homes to provide data that were temporally unavailable. In essence, this cross-validation examined whether our models could accurately predict Sindoor/Soutdoor, indoor PM2.5, and indoor BC during the periods with no indoor measurements.

Results and discussion

General Characteristics

Samples with an observed sulfur ratio higher than 1.12 (95th percentile of the total 405 samples) and homes with fewer than 2 samples were excluded for model predictability and cross-validation purposes. This resulted in a data set collected during 328 weeklong sampling sessions at 102 residences that were matched by date to a central monitoring site (countway supersite). The distribution of measured indoor and outdoor concentrations is summarized in Table 1. Indoor PM2.5 concentrations were generally higher than those measured outdoors suggesting significant indoor sources of particles. On the contrary, indoor BC concentrations were lower than those measured outdoors, which is likely due to the lack of indoor sources of BC and potentially larger amount of traffic emissions and biomass burning activities at the central site.26, 27 More importantly, the mean difference of the indoor and outdoor concentration of PM2.5 and BC varied considerably. Previous studies have attributed this variation to multiple factors, including indoor particle emission, air exchange rate, and natural ventilation.9 Together, these findings indicate the need to consider variability of the infiltration factors in the modeling process of indoor PM2.5 and BC exposure levels.

Table 1 Distribution of measured indoor and outdoor (central site) PM2.5 concentrations and its BC and sulfur constituents.

Table 2 displays the land use environment and household characteristics of the 102 homes in the Greater Boston included in the study data sample. The land use setting of the study homes varied significantly, suggesting that the spatial variation of outside PM2.5 and BC is likely to vary from home to home. Furthermore, the prevalence of air conditioning and forced air heating varied considerably. Moderate variability was also found in natural ventilation, specifically window-opening hours. These factors are expected to influence the infiltration of particles and were found to lower the model predictive power in previous studies.28 More importantly, the large variation among land use and household characteristics reinforces the notion that assuming no indoor sulfur and homogeneous outdoor sulfur concentration may contribute to reduced model fit.

Table 2 Distribution of home characteristics, surrounding land use, and questionnaire variables related to indoor air pollution.

Estimated Sindoor/Soutdoor

The performance of the mixed-effects model (Eq. 6) to predict indoor sulfur using central site measurements was excellent, with a high cross-validated R2 of 0.89, a non-biased slope, and a negligible intercept bias when comparing the predicted to the measured indoor sulfur concentration (Table 3).

Table 3 Cross-validated R2 and corresponding RMSE values for predicted indoor Sindoor/Soutdoor, PM2.5, and BC.

Table 4 displays the relationship between the predicted sulfur ratio and home characteristics and human activities. The Sindoor/Soutdoor increased with higher natural ventilation (window opening) and urban density. Use of an air purifier, air conditioning, and forced air heating had an opposite effect on the predicted Sindoor/Soutdoor (−0.06 to −0.28). These findings are consistent with those of previous investigations.29, 30 As expected, homes using oil as primary heating fuel had a higher sulfur ratio (0.07) compared to those that using gas or electric fuel. Outdoor furnaces emitting sulfur-rich particles may have added to the spatial variability of the sulfur concentration outdoors. Consequently, the Sindoor/Soutdoor for these oil-using homes was slightly overestimated (0.07), due to the relatively lower concentrations of sulfur at the central site compared to outside the home (Table 4). However, inclusion of a random slope by home in the indoor PM2.5 and BC model to account for indoor sulfur heterogeneity induced bias (Eq. 8). Finally, the predicted sulfur ratio did not vary by electric space heater use and there was a borderline association with traffic density.

Table 4 Relationship between the predicted Sindoor/Soutdoor ratio and household characteristics and behaviors.

Predicted Indoor PM2.5 and BC

Using the estimated Sindoor/Soutdoor as the infiltration proxy, we predicted indoor PM2.5 and BC concentrations. The cross-validated R2 for indoor PM2.5 and BC was 0.79 and 0.76, respectively, showing strong predictive power (Table 3). Model performance was superior to previous indoor models, specifically those that did not use a sulfur tracer as an infiltration surrogate (Table 5). We relaxed the assumption regarding absence of indoor sulfur sources and used fixed and random intercepts (Eq. 6) to filter noise attributable to potential indoor sulfur emissions. This approach is likely improved our model’s performance. Furthermore, the number of participating homes in this study was higher than that of previous studies and, therefore, enhanced our model’s predictive ability with a relatively larger sample size.

Table 5 Comparison of the current study and previous approaches to modeling infiltration and indoor PM2.5 concentrations.

While indoor PM2.5 concentrations differed across households, this variability was not explained by housing characteristics except for the significantly lower PM2.5 concentration (−6.3 μg/m3) when air purifiers were used. Unexplained variations in indoor particle concentrations could have been generated by cooking and unreported incense burning, which was found to contribute to indoor particle concentrations in previous studies.28, 31, 32 On the other hand, we found a significant relationship between indoor BC of outdoor origin and traffic density surrounding homes. This is consistent with previous reports that traffic pollutants, including BC, can easily penetrate buildings.21, 33, 34, 35

Our model’s predictive ability could be further improved by extending the sample size, either by measuring daily instead of weeklong indoor concentrations or by expanding the study area. We also did not take into account the influence of chemical transformations, such as changes in gas-to-particle partitioning during the infiltration of volatile organic compounds, nitrate, or ammonium,36, 37 which may also affect the performance of indoor PM2.5 and BC exposure models. Nevertheless, the models developed in this study had superior predictive power compared to models described the literature and, therefore, may provide more reliable exposure data to study the health effects of particles of indoor vs outdoor origin.

Conclusion

In this paper, we presented a new paradigm that utilizes a relatively small number of indoor measurements to predict exposures for individual households when indoor measurements are not available. Our methods strengthen the estimation of infiltration factor, indoor PM2.5, and BC concentrations for individual residences in the Greater Boston Area. Our models may improve indoor PM2.5 exposure assessments for future health effects studies and could serve as an exposure model framework more generally for other locations.