Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Predicting ambient PM2.5 concentrations in Ulaanbaatar, Mongolia with machine learning approaches

Abstract

Background

Accurately assessing individual ambient air pollution exposure is a crucial part of epidemiological studies looking at the adverse health effect of poor air quality. This is particularly challenging in developing countries with high levels of air pollution, mostly due to sparse monitoring networks with a lack of consistent data.

Methods

We evaluated the performance of six different machine learning algorithms in predicting fine particulate matter (PM2.5) concentrations in Ulaanbaatar, Mongolia using data between 2010 and 2018. We found that the algorithms produce robust results based on performance metrics.

Results

Random forest (RF) and gradient boosting models performed the best with leave-one-location-out cross-validated R2 of 0.82 for when using data from the entire study period. After applying tuned models on the hold-out test set, R2 increased to 0.96 for the RF and 0.90 for the gradient boosting model. We also predicted PM2.5 concentrations for each administrative area (khoroo) of the city using RF and maps of predictions show spatiotemporal variations that are in line with the location of the high-emission area (ger district), city center, and population density.

Conclusion

Our results provide evidence of the advantage and feasibility of machine learning approaches in predicting ambient PM2.5 levels in a setting with limited resources and extreme air pollution levels.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Study area and observed PM2.5.
Fig. 2: Random forest model performance.
Fig. 3: Random forest model prediction.
Fig. 4: Ranked importance of model variables.

Similar content being viewed by others

References

  1. Franklin M, Zeka A, Schwartz J. Association between PM2.5 and all-cause and specific-cause mortality in 27 US communities. J Exposure Sci Environ Epidemiol. 2007;17:279–87.

    Article  CAS  Google Scholar 

  2. Di Q, Wang Y, Zanobetti A, Wang Y, Koutrakis P, Choirat C, et al. Air pollution and mortality in the medicare population. N Engl J Med. 2017;376:2513–22.

    Article  CAS  Google Scholar 

  3. Pope CA, Coleman N, Pond ZA, Burnett RT. Fine particulate air pollution and human mortality: 25+ years of cohort studies. Environm Res. 2020;183:108924.

    Article  CAS  Google Scholar 

  4. Lippmann M, Ito K, N’adas A, Burnett RT. Association of particulate matter components with daily mortality and morbidity in urban populations. Research Report (Health Effects Institute) 2000:5–72, discussion 73–82.

  5. Brook RD, Rajagopalan S, Pope CA, Brook JR, Bhatnagar A, Diez-Roux AV, et al. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the American heart association. Circulation. 2010;121:2331–78.

    Article  CAS  Google Scholar 

  6. Cohen AJ, Brauer M, Burnett R, Anderson HR, Frostad J, Estep K, et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet. 2017;389:1907–18.

    Article  Google Scholar 

  7. World Health Organization. WHO Air quality guidelines for particulate matter, ozone, nitrogen dioxide and sulfur dioxide: Global update 2005: Summary of risk assessment. Geneva: World Health Organization, 2006.

  8. Davy PK, Gunchin G, Markwitz A, Trompetter WJ, Barry BJ, Shagjjamba D, et al. Air particulate matter pollution in Ulaanbaatar, Mongolia: determination of composition, source contributions and source locations. Atmos Pollut Res. 2011;2:126–37.

    Article  CAS  Google Scholar 

  9. Batmunkh T, Kim YJ, Jung JS, Park K, Tumendemberel B. Chemical characteristics of fine particulate matters measured during severe winter haze events in Ulaanbaatar, Mongolia. J Air Waste Manag Assoc. 2013;63:659–70.

    Article  CAS  Google Scholar 

  10. Guttikunda SK, Lodoysamba S, Bulgansaikhan B, Dashdondog B. Particulate pollution in Ulaanbaatar, Mongolia. Air Qual Atmosphere Health. 2013;6:589–601.

    Article  CAS  Google Scholar 

  11. Nishikawa M, Matsui I, Batdorj D, Jugder D, Mori I, Shimizu A, et al. Chemical composition of urban airborne particulate matter in Ulaanbaatar. Atmos Environ. 2011;45:5710–5.

    Article  CAS  Google Scholar 

  12. Allen RW, Gombojav E, Barkhasragchaa B, Byambaa T, Lkhasuren O, Amram O, et al. An assessment of air pollution and its attributable mortality in Ulaanbaatar, Mongolia. Air Qual Atmos Health. 2013;6:137–50.

    Article  CAS  Google Scholar 

  13. Enkh-Undraa D, Kanda S, Shima M, Shimono T, Miyake M, Yoda Y, et al. Coal burning-derived SO2 and traffic-derived NO2 are associated with persistent cough and current wheezing symptoms among schoolchildren in Ulaanbaatar, Mongolia. Environ Health Preventive Med. 2019;24:66.

    Article  CAS  Google Scholar 

  14. Enkhmaa D, Warburton N, Javzandulam B, Uyanga J, Khishigsuren Y, Lodoysamba S, et al. Seasonal ambient air pollution correlates strongly with spontaneous abortion in Mongolia. BMC Pregnancy Childbirth. 2014;14:146.

    Article  Google Scholar 

  15. Hu X, Belle JH, Meng X, Wildani A, Waller LA, Strickland MJ, et al. Estimating PM 2.5 concentrations in the conterminous United States using the random forest approach. Environ Sci Technol. 2017;51:6936–44.

    Article  CAS  Google Scholar 

  16. Brokamp C, Jandarov R, Hossain M, Ryan P. Predicting daily urban fine particulate matter concentrations using a random forest model. Environ Sci Technol. 2018;52:4173–9.

    Article  CAS  Google Scholar 

  17. Nabavi SO, Haimberger L, Abbasi E. Assessing PM2.5 concentrations in Tehran, Iran, from space using MAIAC, deep blue, and dark target AOD and machine learning algorithms. Atmos Pollut Res. 2019;10:889–903.

    Article  CAS  Google Scholar 

  18. Zhan Y, Luo Y, Deng X, Chen H, Grieneisen ML, Shen X, et al. Spatiotemporal prediction of continuous daily PM 2.5 concentrations across China using a spatially explicit machine learning algorithm. Atmos Environ. 2017;155:129–39.

    Article  CAS  Google Scholar 

  19. Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int. 2019;130:104909.

    Article  CAS  Google Scholar 

  20. Xu Y, Ho HC, Wong MS, Deng C, Shi Y, Chan T-C, et al. Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. Environ Pollut. 2018;242:1417–26.

    Article  CAS  Google Scholar 

  21. Watson GL, Telesca D, Reid CE, Pfister GG, Jerrett M. Machine learning models accurately predict ozone exposure during wildfire events. Environ Pollut. 2019;254:112792.

    Article  CAS  Google Scholar 

  22. Franklin M, Chau K, Kalashnikova O, Garay M, Enebish T, Sorek-Hamer M. Using multi-angle imaging spectroradiometer aerosol mixture properties for air quality assessment in Mongolia. Remote Sens. 2018;10:1317.

    Article  Google Scholar 

  23. Jarvis A, Reuter HI, Nelson A, Guevara E Hole-filled seamless srtm data version 4. International Center for Tropical Agriculture (CIAT), http://srtm csi cgiar org (last access: 27 June 2019) 2008.

  24. Narmandakh L, Galymbek K, Tsatsral B Report on 2018 Enumeration of air pollution sources in Ulaanbaatar. UB Air Pollution Reduction Agency: Ulaanbaatar, Mongolia, 2018.

  25. Wright MN, Ziegler A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77:1–17.

    Article  Google Scholar 

  26. Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H et al. Xgboost: Extreme gradient boosting. 2019 https://CRAN.R-project.org/package=xgboost.

  27. Karatzoglou A, Smola A, Hornik K, Zeileis A. Kernlab – an S4 package for kernel methods in R. J Stat Softw. 2004;11:1–20.

    Article  Google Scholar 

  28. Trevor Hastie SMD from mda:mars by, Thomas Lumley’s leaps wrapper. RTUAMF utilities with. Earth: Multivariate adaptive regression splines. 2019 https://CRAN.R-project.org/package=earth.

  29. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1–22.

    Article  Google Scholar 

  30. Wood SN, Pya N. S“afken B. Smoothing parameter and model selection for general smooth models (with discussion). J Am Stat Assoc. 2016;111:1548–75.

    Article  CAS  Google Scholar 

  31. Kuhn M, Wickham H Recipes: preprocessing tools to create design matrices. 2019 https://github.com/tidymodels/recipes.

  32. Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 2017;40:913–29.

    Article  Google Scholar 

  33. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria, 2019 https://www.R-project.org/.

  34. Wickham H, Averick M, Bryan J, Chang W, McGowan LD, Francois R et al. Welcome to the tidyverse. Journal of Open Source Software 2019;4:1686.

  35. Pebesma E. Simple features for R: standardized support for spatial vector data. R J. 2018;10:439–46.

    Article  Google Scholar 

  36. Kuhn M, Chow F, Wickham H. Rsample: General resampling infrastructure. 2019. https://CRAN.R-project.org/package=rsample.

  37. Kuhn M Tune: Tidy tuning tools. 2019 https://github.com/tidymodels/tune.

  38. Kuhn M, Vaughan D. Parsnip: A common API to modeling and analysis functions. 2019. https://CRAN.R-project.org/package=parsnip.

  39. Breiman L, (ed.). Classification and regression trees. Boca Raton: Repr. Chapman & Hall [u.a.]; 1998.

    Google Scholar 

  40. Bi Q, Goodman KE, Kaminsky J, Lessler J What is machine learning? A primer for the epidemiologist. Am J Epidemiol. 2019. https://doi.org/10.1093/aje/kwz189.

  41. Reid CE, Jerrett M, Petersen ML, Pfister GG, Morefield PE, Tager IB, et al. Spatiotemporal prediction of fine particulate matter during the 2008 Northern California wildfires using machine learning. Environ Sci Technol. 2015;49:3887–96.

    Article  CAS  Google Scholar 

  42. Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere. 2019;10:373.

    Article  Google Scholar 

  43. Lyapustin A, Wang Y, Korkin S, Huang D. MODIS Collection 6 MAIAC algorithm. Atmos Meas Tech. 2018;11:5741–65.

    Article  Google Scholar 

  44. Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GS, et al. The changing paradigm of air pollution monitoring. Environ Sci Technol. 2013;47:11369–77.

    Article  CAS  Google Scholar 

  45. Morawska L, Thai PK, Liu X, Asumadu-Sakyi A, Ayoko G, Bartonova A, et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Environ Int. 2018;116:286–99.

    Article  CAS  Google Scholar 

  46. Castell N, Dauge FR, Schneider P, Vogt M, Lerner U, Fishbain B, et al. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ Int. 2017;99:293–302.

    Article  CAS  Google Scholar 

  47. Bulot FMJ, Johnston SJ, Basford PJ, Easton NHC, Apetroaie-Cristea M, Foster GL et al. Long-term field comparison of multiple low-cost particulate matter sensors in an outdoor urban environment. Scientific Reports 2019; 9. https://doi.org/10.1038/s41598-019-43716-3.

  48. Kelly KE, Whitaker J, Petty A, Widmer C, Dybwad A, Sleeth D, et al. Ambient and laboratory evaluation of a low-cost particulate matter sensor. Environ Pollut. 2017;221:491–500.

    Article  CAS  Google Scholar 

  49. Bi J, Stowell J, Seto EYW, English PB, Al-Hamdan MZ, Kinney PL, et al. Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: a case study in Imperial County, California, USA. Environ Res. 2020;180:108810.

    Article  CAS  Google Scholar 

  50. Masiol M, Z’ıkov’a N, Chalupa DC, Rich DQ, Ferro AR, Hopk PK. Hourly land-use regression models based on low-cost PM monitor data. Environ Res. 2018;167:7–14.

    Article  CAS  Google Scholar 

  51. Bi J, Wildani A, Chang HH, Liu Y. Incorporating low-cost sensor measurements into high-resolution PM 2.5 modeling at a large spatial scale. Environ Sci Technol. 2020;54:2152–62.

    Article  CAS  Google Scholar 

  52. Ganbat G, Baik JJ. Wintertime winds in and around the Ulaanbaatar metropolitan area in the presence of a temperature inversion. Asia-Pac J Atmos Sci. 2016;52:309–25.

    Article  Google Scholar 

Download references

Acknowledgements

TE would like to express his gratitude toward Dr. David Warburton of Saban Research Institute, Children’s Hospital Los Angeles and Dr. Rima Habre of Department of Preventive Medicine, University of Southern California for their support and advice. We also would like to thank Unurbat Dorj from NAMEM and Sanchir Dash from APRA for their help and support in acquiring and understanding UB air pollution data.

Funding

Doctoral training of TE was supported by the National Institutes of Health Fogarty International Center/National Institute of Environmental Health Sciences demonstration and education grant (1D43ES022862-01A1) between 2014 and 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Temuulen Enebish.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Enebish, T., Chau, K., Jadamba, B. et al. Predicting ambient PM2.5 concentrations in Ulaanbaatar, Mongolia with machine learning approaches. J Expo Sci Environ Epidemiol 31, 699–708 (2021). https://doi.org/10.1038/s41370-020-0257-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41370-020-0257-8

Keywords

This article is cited by

Search

Quick links