Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Impact of limited residential address on health effect analysis of predicted air pollution in a simulation study

A Correction to this article was published on 31 March 2022

This article has been updated



Recent epidemiological studies of air pollution have adopted spatially-resolved prediction models to estimate air pollution concentrations at people’s homes. However, the benefit of these models was limited in many studies that used existing health data relying on incomplete addresses resulting from confidentiality concerns or lack of interest when designed.


This simulation study aimed to understand the impact of incomplete addresses on health effect estimation based on the association between particulate matter with diameter ≤10 µm (PM10) and low birth weight (LBW).


We generated true annual average concentrations of PM10 at 46,007 mothers’ homes and their LBW status, using the parameters obtained from our data analysis and a previous study in Seoul, Korea. Then, we hypothesized that mothers’ address information is limited to the district and compared the properties of their health effect estimates of PM10 with those using complete addresses. We performed this comparison across eight environmental scenarios that represent various spatial distributions of PM10 and nine exposure prediction methods that provide different sets of predicted PM10 concentrations of mothers.


We observed increased bias and root mean square error consistently across all environmental scenarios and prediction methods using incomplete addresses compared to complete addresses. However, the bias related to incomplete addresses decreased when we used population-representative exposures averaged to the district from predicted PM10 at census tract centroids.


Our simulation study suggested that individual exposure estimated by prediction approaches and averaged across population-representative points can provide improved accuracy in health effect estimates when complete address data are unavailable.

Impact statement

Our simulation study focused on a common and practical challenge of limited address information in air pollution epidemiology, and investigated its impact on health effect analysis. Cohort studies of air pollution have developed advanced exposure prediction model to allow the estimation of individual-level long-term air pollution concentrations at people’s addresses. However, it is common that address information of existing health data is available at the coarse spatial scale such as city, district, and zip code area. Our findings can help understand the possible consequences of limited address information and provide practical guidance in achieving the accuracy in health effect analysis.


Long-term exposure to air pollution was associated with mortality and morbidity in many epidemiological studies and the investigation was expanded to large health data such as multi-city or multi-country cohorts [1,2,3,4]. Accurate assessment of individual exposure to long-term air pollution has been crucial in these studies, as individual air pollution measurements are not available given financial and technical constraints. Recent studies developed exposure prediction models to represent substantial spatial variability of exposures across study participants and enhanced the capacity to assess the association with human health. These models produced air pollution concentrations estimated at people’s homes or workplaces as their individual-level exposures [5, 6]. Specifically, physicochemical models relied on emissions and meteorology data and estimated air pollution concentrations on the grid [7, 8]. Statistical models were mostly constructed in pointwise regression including geographical and/or meteorological characteristics, named land use regression [9,10,11]. Additional spatial correlation structure was modelled by using geostatistical techniques such as kriging [12, 13] and spatial smoothing as applied in generalized additive model [14].

The benefit of these advanced exposure prediction approaches could be limited when complete address data are unavailable. It is common that address information of existing health data is available at the coarse spatial scale such as city, district, and zip code [15,16,17,18,19,20,21,22]. Existing cohorts were often not designed to collect full address data [1, 23,24,25,26]. This limitation is more common in administrative health data constructed based on census or public health insurance. Despite their strength of large representative populations that allow examining the association at the national or regional scale, address data were restricted given the concerns of confidentiality [16,17,18,19,20,21,22,23,24, 27]. For instance, studies using the U.S. Medicare cohort, the Canadian Census cohort, and the Taiwan National Health Insurance Database cohort assessed individual exposure to air pollution at the zip code or postal code area which is the finest spatial resolution of available address data [17, 19, 21, 22]. This incomplete address data may increase exposure misclassification and affect the accuracy and/or precision of health effect estimates.

This simulation study aimed to understand the impact of incomplete address information on outdoor exposure prediction and health effect estimation. In order to achieve the applicability and generalizability of the simulation, we designed our study based on a previous epidemiological study of long-term exposure to particulate matter with a diameter less than or equal to 10 micrometers (PM10) and low birth weight (LBW) in Seoul, Korea [28]. We designed our simulation to make our finding applicable to other pollutants than PM10 by constructing various exposure scenarios using modified simulation parameters.

Materials/subjects and methods

Our simulation procedure consists of four steps (Fig. S1): (1) exploratory data analyses to obtain parameters for the underlying distributions of PM10 and LBW; (2) generation of true PM10 exposure and LBW status; (3) application of incomplete addresses and estimation of mothers’ exposures; and (4) health effect estimation of LBW for PM10 and comparison of the performance of health effect estimates by complete and incomplete addresses. We constructed eight environmental scenarios representing different distributions of air pollution of 46,007 mothers, and nine exposure prediction methods under either complete or incomplete residential addresses. The following sub-sections provide detailed information on each step. Further details including formulas are provided in the Supplementary Information.

Data analysis and parameter acquisition

We obtained parameters to be used for generating exposure and outcome from the exploratory analysis of air quality regulatory monitoring data for PM10, geographic variables, and birth certificate data in Seoul, Korea, during 2010. [29, 30] Using the annual average concentrations of PM10, we fitted empirical variogram models and estimated mean and variance parameters. Mean parameters were regression coefficients for five geographic variables that were highly associated with particulate matter in Seoul [31]. Three variance parameters include range, partial sill, and nugget that indicate the distance in which spatial correlation exists, spatial variability, and non-spatial variability, respectively. [31,32,33] For LBW, we obtained birth certificate data from the Statistical Geographic Information Service operated by Statistics Korea and computed the proportion of LBW cases to the total births. [28] To focus on the spatial variation, we restricted our study period to a single year in 2010 and selected 46,007 mothers who had births in 2010.

Generation of true exposure and outcome

Before generating true exposure and outcome of each mother, we generated the locations of mothers’ homes based on the spatial distribution of the number of births in Seoul (median area and average population in 2010: 21.59 km [2] and 412,520 people), the Capital of South Korea, which is composed of 25 districts, 422 neighbourhoods, and 16,230 census tracts. Because mothers’ addresses in birth certificate data are available at the district level, we treated census tract centroids as mothers’ potential home addresses. We randomly sampled the same number of centroids to those of mothers in each district with the weight of live births across neighbourhoods in each district. These locations were fixed over the simulation.

Assuming that exposure to PM10 follows a Gaussian random field with spatial dependency, we generated true annual-average PM10 concentrations using mean and variance parameters at all locations (Table 1). These locations included 46,007 mothers’ homes, 37 air quality regulatory monitoring sites, 25 district governmental offices, 422 neighbourhood community centres, 16,230 census tract centroids, and 610 centroids on the 1-km grid in Seoul (Fig. S2). To represent possibly different spatial structures of true PM10 annual-average concentrations, we used different combinations of mean and variance parameters and constructed eight environmental scenarios (ES1–ES8). Eight combinations of parameters gave varying contributions of the mean structure, spatial variability, and non-spatial variability of PM10 to total variability (Table 1, Fig. S3). While ES1–ES4 was defined based on different spatial correlation structures, ES5-ES8 additionally included different mean structures characterized by five geographic variables that were highly associated with particulate matter in Seoul [31]. ES8 was constructed by the optimal parameters from our data analysis. These various and extended environmental scenarios can also represent different pollutants other than PM10.

Table 1 Spatial characteristics of eight environmental scenarios (ESs) based on their variability components and variance parameters used for simulating true PM10 annual average concentrations.

For outcome, we assumed an inverse logit function as the underlying distribution of LBW. Then, we generated LBW status of mothers using simulated true PM10 concentrations, the proportion of LBW cases, and the effect estimate of LBW for PM10 obtained from our previous study [28].

Exposure prediction

Using simulated PM10 at 37 regulatory monitoring sites, we applied nine prediction methods to estimate mothers’ individual exposure to PM10 by complete and incomplete address conditions (Table S1). When mothers’ complete home addresses were available, we applied four prediction methods commonly used in previous studies [6, 10, 11]. In the nearest monitor (NM) and inverse distance weighted average (IDWA) methods solely based on (“simulated”) measurements, we assigned PM10 at the monitoring site nearest to a home of each mother and averaged across the sites weighted by inverse squared Euclidean distance from each home, respectively. The other two approaches employed modelling approaches including geographic characteristics that represent direct or indirect pollution sources. Land use regression (LUR) includes these characteristics as predictors in regression equations. Universal kriging (UK) additionally includes spatial correlation as a geostatistical method that optimally derives interpolated concentrations based on mean structure and spatial correlation. We built LUR and UK models using the same five geographic variables to those used in the generation of true PM10 (See the “Data Analysis and Parameter Acquisition” section). Out of the 37 regulatory monitoring sites in Seoul, we used underlying PM10 concentrations from 25 urban-background sites for NM, IDWA, and area averaging (AA), and from all 37 sites including 12 urban roadside sites for LUR and UK. We predicted PM10 concentrations of mothers using estimated regression and/or variance parameters in LUR and UK along with geographic variables at mothers’ homes.

When address data were assumed to be incomplete and available at the district level, we applied one measurement-based and four model-based prediction approaches. In AA as a measurement-based approach, we computed the average concentration across all monitoring sites in a district to all mothers living in the same area as often used in earlier cohort studies of air pollution [34, 35]. Since the regulatory monitoring network in Seoul had one urban-background site in every district, we treated the PM10 concentration at a single site as a special case of AA. In addition, we applied UK to compute area-level representative exposure and developed four approaches. Here, we assumed when a pointwise prediction model is available but complete address data are unavailable, a preferred option could be the aggregation of predictions at many representative points [15, 36]. We used three representative locations for aggregation: 422 neighbourhood community centres (UKNA), 16,230 census tract centroids (UKCA), and 610 1-km grid coordinates (UKGA). We predicted PM10 concentrations using UK at these three types of locations, computed district averages, and assigned to the mothers living in the same districts. UKNA and UKCA predictions represent population exposure at the fine spatial scale, whereas UKGA predictions focus on spatially-representative exposure based on spatially even distribution of PM10. We also used predictions at 25 district governmental offices without aggregation (UKD) for comparison.

Health effect estimation and comparison of properties

Using true and predicted PM10 as well as true LBW status of mothers, we estimated the health effects of LBW for PM10 using logistic regression. Then, we repeated the whole procedure from exposure generation to health effect estimation 1000 times, and computed properties of health effect estimates over 1,000 simulations as bias, root mean square error (RMSE), average standard error (ASE), coverage probability (CP), and true positive rate (TPR). CP was computed as the proportion of the simulations where the 95% confidence intervals of health effect estimates contain the true effect. TPR was the ratio of the number of simulations that provide significantly positive health effect estimates for each predicted PM10 (p value < 0.05) to those for true PM10. While bias, RMSE, ASE, and CP aim to evaluate the accuracy or uncertainty of the estimates, TPR can provide the insight into statistical power. Finally, we compared the health effect estimate properties between complete and incomplete addresses depending on the exposure prediction method and pollution environment.


True and predicted PM10

Table S2 and Fig. 1 summarize true and predicted annual-average PM10 concentrations at home addresses of 46,007 mothers by different ESs and exposure prediction methods. Mothers’ PM10 concentrations predicted at their homes (mean = 47.25–60.52 µg/m3, standard deviation (SD) = 1.04–6.45 µg/m3) were generally similar on average but less variable compared to true concentrations (46.90–58.55 µg/m3, 4.66–6.45 µg/m3). Variability was even smaller when address information was restricted to the district (SD = 1.04–4.56 µg/m3), compared to complete addresses (1.56–6.45 µg/m3). This pattern was similar across eight ESs with the smallest mean and variability in ES4 where there is no spatial variability in exposure. Across nine prediction methods, the correlation with true exposure was generally higher in UK and UKCA (Pearson correlation coefficient = 0.26–0.70), with complete and incomplete address data respectively, compared to the other prediction methods (0.00–0.65) across all ESs except ES4 that showed little correlation (Table S2, Fig. S3). NM, IDWA, AA, and UKD gave relatively high correlation and slopes close to 1 when there was no mean structure with some spatial correlation in ES1-ES3, but low correlation otherwise. In contrast, LUR provided high correlations (0.60–0.65) when there was a dominant mean structure as shown in ES7 and ES8, but low correlations less than 0.1 without a mean structure.

Fig. 1
figure 1

Box-plots of true (TE true exposure) and predicted (NM nearest monitor, IDWA inverse distance weight average, LUR land-use regression, AA area average, UK universal kriging, UKD UK prediction at governmental offices, UKNA district average based on UK predictions at 422 neighborhood community centers, UKCA district average of UK predictions at 16,230 census tract centroids, UKGA district average of UK predictions at 610 1-km grid coordinates) annual-average PM10 concentrations at home addresses of 46,007 mothers by eight environmental scenarios (ES1–ES8) in the 1st simulation (blue boxes for true exposure; yellow and red boxes for predicted exposure with complete and incomplete addresses, respectively).

Health effect estimate properties by address availability

Performance of effect estimates of LBW for true and predicted annual-average PM10 concentrations became worse when data availability for PM10 or address was limited. Table 2 shows the average relative risk, as well as bias, RMSE, ASW, and CP of health effect estimates in four ESs including ES2, ES3, ES5, and ES8 where different exposure environments are more distinct: Tables S3 and S4, and Figs. S4 and S5 show all eight scenarios. Bias and RMSE tended to increase using predicted exposure compared to true exposure, while there was a slightly larger increase with incomplete addresses than with complete addresses. Larger bias is also seen in scatter plots of health effect estimates of predicted exposures against those of true exposures in Fig. S6. Performance varied more across different prediction methods and environmental scenarios under complete addresses than incomplete addresses. Regardless of address availability, CPs were close to 0.95 (Fig. S7). TPR was generally lower with incomplete addresses than complete addresses (Fig. S8).

Table 2 Estimated relative risks and their properties (Bias, RMSE, ASE, and CP) of health effect estimates of true and predicted PM10 concentrations on low birth weight over 1,000 simulations by address availability, exposure prediction methods, and environmental scenarios (ES2, ES3, ES5, and ES8).

Health effect estimate properties by exposure prediction

All prediction methods under complete addresses, and AA and UKD under incomplete addresses mostly showed negative bias indicating under-estimated health effects (Table 2, S3, and S4). Bias was particularly large for NM, AA, and UKD that relied on the measurement or prediction at a single monitoring site to assess individual exposure. However, UKD using predictions estimated at governmental offices that were largely located in highly populated areas gave smaller bias than NM and AA based on measurements at regulatory monitoring sites. When address data were fully available, UK gave lower bias and RMSE than other approaches across all ESs. This good performance was notably prominent in ES5 to ES8 which includes a mean structure. NM tended to provide small RMSE and ASE, but relatively large negative bias, while IDWA gave relatively small bias but large RMSE and ASE. LUR provided good performance only when there is a mean structure in the true environment (ES5 to ES8). When address data were limited to the district, three UK-based district averages showed much smaller bias with either direction and slightly higher RMSE and ASE compared to the other two prediction methods of AA and UKD. Among these three approaches, UKCA as the exposure averaged over a large number of population-representative points at the fine spatial scale showed better performance than UKNA and UKGA based on coarse spatial-scale population-representative points and spatially-representative points, respectively. TPR was also generally higher in UK and UK-based district averages (Fig. S8).

Health effect estimate properties by environmental scenarios

Better performance of UK and UKCA under complete and incomplete address conditions, respectively, were consistent across all eight ESs (Tables S3S5, and Figs. S5S8). Large bias in NM, AA, and UKD, and large uncertainty in UKNA and UKGA, possibly resulting from small variability in PM10, were also consistent across all ESs. ES4 showed large bias and RMSE and small TPR across all prediction methods.


This study focused on the impact of limited availability of address data on health effect estimation compared to complete availability. After hypothesizing that address data availability affects health effect analysis of predicted exposure, we explored the impact of address availability on the performance of health effect estimates depending on exposure prediction methods and environmental scenarios based on the real-world example of the association between PM10 and LBW. Eight environmental scenarios represented various pollution environments related to the different contributions of geographic features and spatial dependency. Furthermore, nine prediction methods exhibited commonly applied approaches of individual exposure assessment given limited monitoring data with and without additional limitation in address data. Our findings showed that when address data are limited, individual exposure modelled by geographic characteristics and averaged across population-representative points, as shown in UK-based averaging, can provide comparable accuracy in health effect estimation to those using complete address information. This improved accuracy was prominent compared to other exposure prediction approaches and generally consistent across different environmental scenarios.

Our simulation study intended to answer an important question that can help inference of epidemiological studies of air pollution relying on limited address data of subjects. Even though many recent studies developed advanced exposure prediction models and allowed the estimation of air pollution concentrations at people’s homes or workplaces, the benefit of this advance could be limited in many epidemiological studies that are based on existing cohorts and/or administrative health data with incomplete address information. As recent epidemiological studies of air pollution expanded their spatial and temporal coverage to the national or regional scale and to the past several decades, the reliance on existing health data lacking complete address information has become even greater. However, there have been few studies that investigated the impact of limited address information on health analysis and provided realistic guidance. For example, recent two nationwide cohort studies including limited address data applied prediction models to estimate individual-level long-term PM2.5 concentrations at zip code-level addresses of Medicare [4] and Canadian Census Health and Environment [37] cohort participants, and reported the association with total mortality. Our simulation findings of negative bias using single points in administrative areas suggest the possibility of underestimated health effects in such studies.

Our findings generally showed that kriging-based approaches gave good performance in health effect estimates consistently across different air pollution environments, when individual air pollution measurements are not available. While UK showed better performance compared to other prediction approaches when complete address data are available, UK averaging approaches outperformed with individual address data limited to the district. A possible explanation is that UK modelled by using both mean and variance structures well characterizes air pollution conditions at people’s residences even when there is no mean structure [38]. In addition, employment of population-representative locations and the following averaging process under the unavailability of precise residential addresses possibly minimized the impact of exposure misclassification. Bias was the smallest and also non-systematic as opposed to other prediction methods that consistently gave negative bias. Out of three UK averaging approaches, UKCA based on UK predictions at census tract centroids gave the lowest RMSE and ASE which were comparable to those of other prediction approaches under the complete address condition. UKCA also showed comparable TPR to those with complete addresses, while it was less likely to detect statistically significant health effect estimates overall with incomplete addresses. However, the benefit of UK-based averaging could be reduced, when we use predictions at the locations including those poorly represented for population as shown in UKGA. CPs close to 95% nominal level across all prediction models might have induced statistically non-significant health effect estimates. However, our finding of the true positive rate distinctively higher in kriging-based approaches compared to others indicates the advantage of kriging.

All prediction methods except for UK-based averaging generally showed underestimated health effects given limited PM10 or address data. This underestimation can be explained by exposure measurement error derived by poor characterization of individual exposure in prediction models [39, 40]. In our simulation, prediction methods heavily relying on a mean structure such as LUR gave greater underestimation when there is no mean structure in true exposure scenarios, while simple prediction approaches using measurements only shown as NM and IDWA gave larger underestimation when there is a mean structure. Prediction methods using a single location based on the nearest monitor (NM), or district governmental office (UKD) also gave larger negative bias than other methods. Bias was larger in AA and NM based on regulatory monitoring sites than UKD using population-representative locations. In addition, poor assessment of individual exposure can result from poor representativeness of prediction points used for averaging. Our study showed increased positive or negative bias in UKGA using grid coordinates than UKCA based on census tract centroids. Previous simulation studies reported that measurement error derived by a spatial misalignment between regulatory monitoring sites and people’s residences affected misspecification of prediction models and resulted in positive or negative bias in following health effect analysis [41,42,43]. Our findings of large bias in AA, NM, LUR, and UKGA possibly suggest the impact of this classical-type measurement error resulted from poor model specification. Relatively large uncertainty in kriging suggests the impact of the Berkson-type measurement error driven by spatial smoothing [42].

Our simulation using various environmental scenarios and parameters obtained from data analyses suggests a possible generalization of our findings to other pollutants and/or study areas. Although we focused on PM10 which is well known as a regional pollutant with relatively weak mean structure and large spatial correlation, we constructed seven environmental scenarios by assuming different spatial structures in addition to the ES8 based on the parameters estimated directly from the regulatory monitoring data in Seoul. This variation of spatial structure possibly represents more local or regional pollutants with different impacts of local sources or spatial homogeneity, and allows us to apply our findings to other pollutants such as PM2.5 and NO2 and/or different regions. In addition, our reliance on real-world data can improve the practical applicability of our simulation findings.

Our study includes several limitations to be further investigated in future research. First, we focused on ambient exposure and did not consider the impact of indoor exposure. However, this impact could be small for PM which showed relatively high infiltration compared to other pollutants [44]. Besides, our application of diverse environmental scenarios including locally heterogeneous exposure may also represent indoor and/or personal exposure. Second, we created mothers’ residential addresses using census tract centroids and assumed them fixed over the simulation. Future studies that apply real addresses of participants and/or incorporate mobility should investigate the sensitivity of our findings. Third, we did not consider multi-pollutant models and correlated exposure measurement error could affect bias [45]. Future studies should investigate this impact in cohort-study design. Lastly, we used low birth weight and logistic regression. Future studies should confirm whether our suggestions are consistent with different health outcomes and health analysis models.

In conclusion, this simulation study suggests that exposure prediction approaches well representing geographic environments and supplemented with population-representative prediction locations can improve accuracy in health effect estimation when complete individual address data are not available. Our findings also provide guidance for a preferred approach to improve the inference in future large-scale epidemiological studies of long-term air pollution.

Data availability

Simulated data and sample code are available from the github page of the first author YB Jun ( on reasonable request. Also, supplementary results are provided in the supplemental material.

Change history


  1. Beelen R, Raaschou-Nielsen O, Stafoggia M, Andersen ZJ, Weinmayr G, Hoffmann B, et al. Effects of long-term exposure to air pollution on natural-cause mortality: an analysis of 22 European cohorts within the multicentre ESCAPE project. Lancet. 2014;383:785–95.

    CAS  Article  Google Scholar 

  2. Cesaroni G, Forastiere F, Stafoggia M, Andersen ZJ, Badaloni C, Beelen R, et al. Long term exposure to ambient air pollution and incidence of acute coronary events: prospective cohort study and meta-analysis in 11 European cohorts from the ESCAPE Project. BMJ. 2014;348:f7412.

    Article  Google Scholar 

  3. Hoek G, Krishnan RM, Beelen R, Peters A, Ostro B, Brunekreef B, et al. Long-term air pollution exposure and cardio- respiratory mortality: a review. Environ Health. 2013;12:43.

    CAS  Article  Google Scholar 

  4. Di Q, Wang Y, Zanobetti A, Wang Y, Koutrakis P, Choirat C, et al. Air pollution and mortality in the medicare population. N Engl J Med. 2017;376:2513–22.

    CAS  Article  Google Scholar 

  5. Hoek G, Ranzi A, Alimehmeti I, Ardeleanu E-R, Arrebola JP, Ávila P, et al. A review of exposure assessment methods for epidemiological studies of health effects related to industrially contaminated sites. Epidemiologia e Prev. 2018;42:21–36.

    Google Scholar 

  6. Hoek G. Methods for assessing long-term exposures to outdoor air pollutants. Curr Environ Health Rep. 2017;4:450–62.

    CAS  Article  Google Scholar 

  7. Binkowski FS, Roselle SJ Models-3 Community Multiscale Air Quality (CMAQ) model aerosol component 1. Model description. J Geophys Res Atmos. 2003; 108.

  8. Hanha SR. Air quality model evaluation and uncertainty. JAPCA. 1988;38:406–12.

    Article  Google Scholar 

  9. Brauer M, Hoek G, van Vliet P, Meliefste K, Fischer P, Gehring U, et al. Estimating long-term average particulate air pollution concentrations: application of traffic indicators and geographic information systems. Epidemiology. 2003;14:228–39.

    PubMed  Google Scholar 

  10. Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, et al. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008;42:7561–78.

    CAS  Article  Google Scholar 

  11. Jerrett M, Arain A, Kanaroglou P, Beckerman B, Potoglou D, Sahsuvaroglu T, et al. A review and evaluation of intraurban air pollution exposure models. J Exposure Sci Environ Epidemiol. 2005;15:185–204.

    CAS  Article  Google Scholar 

  12. Jerrett M, Burnett RT, Ma R, Pope CA, Krewski D, Newbold KB, et al. Spatial analysis of air pollution and mortality in Los Angeles. Epidemiology. 2005;16:727–36.

    Article  Google Scholar 

  13. Sampson PD, Richards M, Szpiro AA, Bergen S, Sheppard L, Larson TV, et al. A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology. Atmos Environ. 2013;75:383–92.

    CAS  Article  Google Scholar 

  14. Paciorek CJ, Yanosky JD, Puett RC, Laden F, Suh HH. Practical large-scale spatio-temporal modeling of particulate matter concentrations. Ann Appl Stat. 2009;3:370–97.

    Article  Google Scholar 

  15. Kim OJ, Lee SH, Kang SH, Kim SY. Incident cardiovascular disease and particulate matter air pollution in South Korea using a population-based and nationwide cohort of 0.2 million adults. Environ Health: A Glob Access Sci Source. 2020;19:1–12.

    Article  Google Scholar 

  16. Kim OJ, Kim SY, Kim H Association between long-term exposure to particulate matter air pollution and mortality in a South Korean national cohort: Comparison across different exposure assessment approaches. Int J Environ Res Public Health. 2017; 14.

  17. Zeger SL, Dominici F, McDermott A, Samet JM. Mortality in the medicare population and chronic exposure to fine particulate air pollution in urban centers (2000-2005). Environ Health Perspect. 2008;116:1614–9.

    Article  Google Scholar 

  18. Carey IM, Atkinson RW, Kent AJ, van Staa T, Cook DG, Anderson HR. Mortality associations with long-term exposure to outdoor air pollution in a national english cohort. Am J Respiratory Crit Care Med. 2013;187:1226–33.

    CAS  Article  Google Scholar 

  19. Crouse DL, Peters PA, Hystad P, Brook JR, van Donkelaar A, Martin RV, et al. Ambient PM2.5, O3, and NO2 exposures and associations with mortality over 16 years of follow-up in the Canadian census health and environment cohort (CanCHEC). Environ Health Perspect. 2015;123:1180–6.

    CAS  Article  Google Scholar 

  20. Ueda K, Nagasawa SY, Nitta H, Miura K, Ueshima H. Exposure to particulate matter and long-term risk of cardiovascular mortality in Japan: NIPPON DATA80. J Atherosclerosis Thrombosis. 2012;19:246–54.

    CAS  Article  Google Scholar 

  21. Peters PA, Tjepkema M, Wilkins R, Fines P, Crouse DL, Chan PCW, et al. Data resource profile: 1991 Canadian Census Cohort. Int J Epidemiol. 2013;42:1319–26.

    Article  Google Scholar 

  22. Jung C-R, Lin Y-T, Hwang B-F. Ozone, particulate matter, and newly diagnosed Alzheimer’s disease: a population-based cohort study in Taiwan. J Alzheimer’s Dis. 2015;44:573–84.

    CAS  Article  Google Scholar 

  23. Laden F, Schwartz J, Speizer FE, Dockery DW. Reduction in fine particulate air pollution and mortality. Am J Respiratory Crit Care Med. 2006;173:667–72.

    CAS  Article  Google Scholar 

  24. H. FP, Marten M, B. AC, Gerard H, Rob B, Kees de H, et al. Air pollution and mortality in seven million adults: the Dutch environmental longitudinal study (DUELS). Environ Health Perspect. 2015;123:697–704.

    Article  Google Scholar 

  25. Huss A, Spoerri A, Egger M, Röösli M, Group SNCS. Aircraft noise, air pollution, and mortality from myocardial infarction. Epidemiol. 2010;21:829–36.

    Article  Google Scholar 

  26. Hansell A, Ghosh RE, Blangiardo M, Perkins C, Vienneau D, Goffe K, et al. Historic air pollution exposure and long-term mortality risks in England and Wales: prospective longitudinal cohort study. Thorax. 2016;71:330–8.

    Article  Google Scholar 

  27. Seong SC, Kim Y-Y, Park SK, Khang YH, Kim HC, Park JH, et al. Cohort profile: the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS) in Korea. BMJ Open. 2017;7:e016640.

    Article  Google Scholar 

  28. Choe S-A, Jang J, Kim MJ, Jun Y-B, Kim S-Y. Association between ambient particulate matter concentration and fetal growth restriction stratified by maternal employment. BMC Pregnancy Childbirth. 2019;19:246.

    Article  Google Scholar 

  29. National Institute of Environmental Research. 2016 NIER Annual Report. 2017.

  30. Yi S-J, Shon C, Min K-D, Kim H-C, Leem J-H, Kwon H-J, et al. Association between exposure to traffic-related air pollution and prevalence of allergic diseases in children, Seoul, Korea. BioMed Res Int. 2017;2017:4216107.

  31. Min KD, Kwon HJ, Kim KS, Kim SY. Air pollution monitoring design for epidemiological application in a densely populated city. Int J Environ Res Public Health. 2017;14:1–12.

    Article  Google Scholar 

  32. Cressie N Statistics for Spatial Data. Revised ed. Wiley-Interscience: Hoboken, NJ, 2015.

  33. Eum Y, Song I, Kim H-C, Leem J-H, Kim S-Y. Computation of geographic variables for air pollution prediction models in South Korea. Environ Health Toxicol. 2015; 30.

  34. Dockery DW, Pope CA 3rd, Xu X, et al. Association between air pollution and mortality in six U.S. cities. N. Engl J Med. 1993;29:1230–5.

    Google Scholar 

  35. Pope CA, Thun MJ, Namboodiri MM, Dockery DW, Evans JS, Speizer FE, et al. Particulate air pollution as a predictor of mortality in a prospective study of U.S. Adults. Am J Respiratory Crit Care Med. 1995;151:669–74.

    Article  Google Scholar 

  36. Pope CA, Lefler JS, Ezzati M, Higbee JD, Marshall JD, Kim SY et al. Erratum: Mortality risk and fine particulate air pollution in a large, representative cohort of U.S. adults (Environ Health Perspect, (2019), 127, 7, 10.1289/EHP4438). Environ Health Perspect. 2019; 127: 099002–1.

  37. Cakmak S, Hebbern C, Pinault L, Lavigne E, Vanos J, Crouse DL, et al. Associations between long-term PM2.5 and ozone exposure and mortality in the Canadian Census Health and Environment Cohort (CANCHEC), by spatial synoptic classification zone. Environ Int. 2018;111:200–11.

    CAS  Article  Google Scholar 

  38. Kim SY, Sheppard L, Kim H. Health effects of long-term air pollution: Influence of exposure prediction methods. Epidemiology. 2009;20:442–50.

    Article  Google Scholar 

  39. Szpiro AA, Sheppard L, Lumley T. Efficient measurement error correction with spatially misaligned data. Biostatistics. 2011;12:610–23.

    Article  Google Scholar 

  40. Sheppard L, Burnett RT, Szpiro AA, Kim S-Y, Jerrett M, Pope CA, et al. Confounding and exposure measurement error in air pollution epidemiology. Air Qual Atmos Health. 2012;5:203–16.

    Article  Google Scholar 

  41. Szpiro AA, Paciorek CJ, Sheppard L. Does more accurate exposure prediction necessarily improve health effect estimates? Epidemiol. 2011;22:680–5.

    Article  Google Scholar 

  42. Szpiro AA, Paciorek CJ. Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics. 2013;24:501–17.

    Article  Google Scholar 

  43. Lee A, Szpiro A, Kim SY, Sheppard L. Impact of preferential sampling on exposure prediction and health effect inference in the context of air pollution epidemiology. Environmetrics. 2015;26:255–67.

    CAS  Article  Google Scholar 

  44. Chen C, Zhao B. Review of relationship between indoor and outdoor particles: I/O ratio, infiltration factor and penetration factor. Atmos Environ. 2011;45:275–88.

    CAS  Article  Google Scholar 

  45. Dionisio KL, Chang HH, Baxter LK A simulation study to quantify the impacts of exposure measurement error on air pollution health risk estimates in copollutant time-series models. Environ Health: A Global Access Sci Source. 2016; 15.

Download references


This research was supported by the National Research Foundation of Korea (NRF) [grant number 2018R1A2B6004608], the National Cancer Center of Korea [NCC-2110570], and the Korean Foundation for Cancer Research [2020-A-2].

Author information

Authors and Affiliations



Y-BJ designed the study, performed statistical analyses, and wrote the manuscript. IS processed geographical data and interpreted study results. O-JK helped study design and interpreted study results. S-YK supervised study design, statistical analyses, interpretation of results, and manuscript writing. All authors critically reviewed and edited the manuscript.

Corresponding author

Correspondence to Sun-Young Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised due to a retrospective Open Access order.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jun, YB., Song, I., Kim, OJ. et al. Impact of limited residential address on health effect analysis of predicted air pollution in a simulation study. J Expo Sci Environ Epidemiol 32, 637–643 (2022).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Address
  • Exposure prediction
  • Health effect
  • Long-term exposure
  • Particulate matter


Quick links