Accurately assessing individual ambient air pollution exposure is a crucial part of epidemiological studies looking at the adverse health effect of poor air quality. This is particularly challenging in developing countries with high levels of air pollution, mostly due to sparse monitoring networks with a lack of consistent data.
We evaluated the performance of six different machine learning algorithms in predicting fine particulate matter (PM2.5) concentrations in Ulaanbaatar, Mongolia using data between 2010 and 2018. We found that the algorithms produce robust results based on performance metrics.
Random forest (RF) and gradient boosting models performed the best with leave-one-location-out cross-validated R2 of 0.82 for when using data from the entire study period. After applying tuned models on the hold-out test set, R2 increased to 0.96 for the RF and 0.90 for the gradient boosting model. We also predicted PM2.5 concentrations for each administrative area (khoroo) of the city using RF and maps of predictions show spatiotemporal variations that are in line with the location of the high-emission area (ger district), city center, and population density.
Our results provide evidence of the advantage and feasibility of machine learning approaches in predicting ambient PM2.5 levels in a setting with limited resources and extreme air pollution levels.
This is a preview of subscription content, access via your institution
Subscribe to Journal
Get full journal access for 1 year
only $19.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Franklin M, Zeka A, Schwartz J. Association between PM2.5 and all-cause and specific-cause mortality in 27 US communities. J Exposure Sci Environ Epidemiol. 2007;17:279–87.
Di Q, Wang Y, Zanobetti A, Wang Y, Koutrakis P, Choirat C, et al. Air pollution and mortality in the medicare population. N Engl J Med. 2017;376:2513–22.
Pope CA, Coleman N, Pond ZA, Burnett RT. Fine particulate air pollution and human mortality: 25+ years of cohort studies. Environm Res. 2020;183:108924.
Lippmann M, Ito K, N’adas A, Burnett RT. Association of particulate matter components with daily mortality and morbidity in urban populations. Research Report (Health Effects Institute) 2000:5–72, discussion 73–82.
Brook RD, Rajagopalan S, Pope CA, Brook JR, Bhatnagar A, Diez-Roux AV, et al. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the American heart association. Circulation. 2010;121:2331–78.
Cohen AJ, Brauer M, Burnett R, Anderson HR, Frostad J, Estep K, et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet. 2017;389:1907–18.
World Health Organization. WHO Air quality guidelines for particulate matter, ozone, nitrogen dioxide and sulfur dioxide: Global update 2005: Summary of risk assessment. Geneva: World Health Organization, 2006.
Davy PK, Gunchin G, Markwitz A, Trompetter WJ, Barry BJ, Shagjjamba D, et al. Air particulate matter pollution in Ulaanbaatar, Mongolia: determination of composition, source contributions and source locations. Atmos Pollut Res. 2011;2:126–37.
Batmunkh T, Kim YJ, Jung JS, Park K, Tumendemberel B. Chemical characteristics of fine particulate matters measured during severe winter haze events in Ulaanbaatar, Mongolia. J Air Waste Manag Assoc. 2013;63:659–70.
Guttikunda SK, Lodoysamba S, Bulgansaikhan B, Dashdondog B. Particulate pollution in Ulaanbaatar, Mongolia. Air Qual Atmosphere Health. 2013;6:589–601.
Nishikawa M, Matsui I, Batdorj D, Jugder D, Mori I, Shimizu A, et al. Chemical composition of urban airborne particulate matter in Ulaanbaatar. Atmos Environ. 2011;45:5710–5.
Allen RW, Gombojav E, Barkhasragchaa B, Byambaa T, Lkhasuren O, Amram O, et al. An assessment of air pollution and its attributable mortality in Ulaanbaatar, Mongolia. Air Qual Atmos Health. 2013;6:137–50.
Enkh-Undraa D, Kanda S, Shima M, Shimono T, Miyake M, Yoda Y, et al. Coal burning-derived SO2 and traffic-derived NO2 are associated with persistent cough and current wheezing symptoms among schoolchildren in Ulaanbaatar, Mongolia. Environ Health Preventive Med. 2019;24:66.
Enkhmaa D, Warburton N, Javzandulam B, Uyanga J, Khishigsuren Y, Lodoysamba S, et al. Seasonal ambient air pollution correlates strongly with spontaneous abortion in Mongolia. BMC Pregnancy Childbirth. 2014;14:146.
Hu X, Belle JH, Meng X, Wildani A, Waller LA, Strickland MJ, et al. Estimating PM 2.5 concentrations in the conterminous United States using the random forest approach. Environ Sci Technol. 2017;51:6936–44.
Brokamp C, Jandarov R, Hossain M, Ryan P. Predicting daily urban fine particulate matter concentrations using a random forest model. Environ Sci Technol. 2018;52:4173–9.
Nabavi SO, Haimberger L, Abbasi E. Assessing PM2.5 concentrations in Tehran, Iran, from space using MAIAC, deep blue, and dark target AOD and machine learning algorithms. Atmos Pollut Res. 2019;10:889–903.
Zhan Y, Luo Y, Deng X, Chen H, Grieneisen ML, Shen X, et al. Spatiotemporal prediction of continuous daily PM 2.5 concentrations across China using a spatially explicit machine learning algorithm. Atmos Environ. 2017;155:129–39.
Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int. 2019;130:104909.
Xu Y, Ho HC, Wong MS, Deng C, Shi Y, Chan T-C, et al. Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. Environ Pollut. 2018;242:1417–26.
Watson GL, Telesca D, Reid CE, Pfister GG, Jerrett M. Machine learning models accurately predict ozone exposure during wildfire events. Environ Pollut. 2019;254:112792.
Franklin M, Chau K, Kalashnikova O, Garay M, Enebish T, Sorek-Hamer M. Using multi-angle imaging spectroradiometer aerosol mixture properties for air quality assessment in Mongolia. Remote Sens. 2018;10:1317.
Jarvis A, Reuter HI, Nelson A, Guevara E Hole-filled seamless srtm data version 4. International Center for Tropical Agriculture (CIAT), http://srtm csi cgiar org (last access: 27 June 2019) 2008.
Narmandakh L, Galymbek K, Tsatsral B Report on 2018 Enumeration of air pollution sources in Ulaanbaatar. UB Air Pollution Reduction Agency: Ulaanbaatar, Mongolia, 2018.
Wright MN, Ziegler A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77:1–17.
Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H et al. Xgboost: Extreme gradient boosting. 2019 https://CRAN.R-project.org/package=xgboost.
Karatzoglou A, Smola A, Hornik K, Zeileis A. Kernlab – an S4 package for kernel methods in R. J Stat Softw. 2004;11:1–20.
Trevor Hastie SMD from mda:mars by, Thomas Lumley’s leaps wrapper. RTUAMF utilities with. Earth: Multivariate adaptive regression splines. 2019 https://CRAN.R-project.org/package=earth.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.
Wood SN, Pya N. S“afken B. Smoothing parameter and model selection for general smooth models (with discussion). J Am Stat Assoc. 2016;111:1548–75.
Kuhn M, Wickham H Recipes: preprocessing tools to create design matrices. 2019 https://github.com/tidymodels/recipes.
Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 2017;40:913–29.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria, 2019 https://www.R-project.org/.
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, Francois R et al. Welcome to the tidyverse. Journal of Open Source Software 2019;4:1686.
Pebesma E. Simple features for R: standardized support for spatial vector data. R J. 2018;10:439–46.
Kuhn M, Chow F, Wickham H. Rsample: General resampling infrastructure. 2019. https://CRAN.R-project.org/package=rsample.
Kuhn M Tune: Tidy tuning tools. 2019 https://github.com/tidymodels/tune.
Kuhn M, Vaughan D. Parsnip: A common API to modeling and analysis functions. 2019. https://CRAN.R-project.org/package=parsnip.
Breiman L, (ed.). Classification and regression trees. Boca Raton: Repr. Chapman & Hall [u.a.]; 1998.
Bi Q, Goodman KE, Kaminsky J, Lessler J What is machine learning? A primer for the epidemiologist. Am J Epidemiol. 2019. https://doi.org/10.1093/aje/kwz189.
Reid CE, Jerrett M, Petersen ML, Pfister GG, Morefield PE, Tager IB, et al. Spatiotemporal prediction of fine particulate matter during the 2008 Northern California wildfires using machine learning. Environ Sci Technol. 2015;49:3887–96.
Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere. 2019;10:373.
Lyapustin A, Wang Y, Korkin S, Huang D. MODIS Collection 6 MAIAC algorithm. Atmos Meas Tech. 2018;11:5741–65.
Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GS, et al. The changing paradigm of air pollution monitoring. Environ Sci Technol. 2013;47:11369–77.
Morawska L, Thai PK, Liu X, Asumadu-Sakyi A, Ayoko G, Bartonova A, et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Environ Int. 2018;116:286–99.
Castell N, Dauge FR, Schneider P, Vogt M, Lerner U, Fishbain B, et al. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ Int. 2017;99:293–302.
Bulot FMJ, Johnston SJ, Basford PJ, Easton NHC, Apetroaie-Cristea M, Foster GL et al. Long-term field comparison of multiple low-cost particulate matter sensors in an outdoor urban environment. Scientific Reports 2019; 9. https://doi.org/10.1038/s41598-019-43716-3.
Kelly KE, Whitaker J, Petty A, Widmer C, Dybwad A, Sleeth D, et al. Ambient and laboratory evaluation of a low-cost particulate matter sensor. Environ Pollut. 2017;221:491–500.
Bi J, Stowell J, Seto EYW, English PB, Al-Hamdan MZ, Kinney PL, et al. Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: a case study in Imperial County, California, USA. Environ Res. 2020;180:108810.
Masiol M, Z’ıkov’a N, Chalupa DC, Rich DQ, Ferro AR, Hopk PK. Hourly land-use regression models based on low-cost PM monitor data. Environ Res. 2018;167:7–14.
Bi J, Wildani A, Chang HH, Liu Y. Incorporating low-cost sensor measurements into high-resolution PM 2.5 modeling at a large spatial scale. Environ Sci Technol. 2020;54:2152–62.
Ganbat G, Baik JJ. Wintertime winds in and around the Ulaanbaatar metropolitan area in the presence of a temperature inversion. Asia-Pac J Atmos Sci. 2016;52:309–25.
TE would like to express his gratitude toward Dr. David Warburton of Saban Research Institute, Children’s Hospital Los Angeles and Dr. Rima Habre of Department of Preventive Medicine, University of Southern California for their support and advice. We also would like to thank Unurbat Dorj from NAMEM and Sanchir Dash from APRA for their help and support in acquiring and understanding UB air pollution data.
Doctoral training of TE was supported by the National Institutes of Health Fogarty International Center/National Institute of Environmental Health Sciences demonstration and education grant (1D43ES022862-01A1) between 2014 and 2017.
Conflict of interest
The authors declare that they have no conflict of interest.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Enebish, T., Chau, K., Jadamba, B. et al. Predicting ambient PM2.5 concentrations in Ulaanbaatar, Mongolia with machine learning approaches. J Expo Sci Environ Epidemiol 31, 699–708 (2021). https://doi.org/10.1038/s41370-020-0257-8
- Air Pollution
- Environmental Monitoring
- Exposure Modeling
- Particulate Matter