Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Unmasking the sky: high-resolution PM2.5 prediction in Texas using machine learning techniques

Abstract

Background

Although PM2.5 (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies.

Objective

This study aimed to predict PM2.5 concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables.

Methods

We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM2.5 concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM2.5 concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations.

Results

Our predictive models demonstrate excellent in-sample model performance, as indicated by high R2 values generated from the gradient boosting models (0.94–0.97) and random forest models (0.81–0.90). However, the out-of-sample R2 values fall within a range of 0.52–0.75 for gradient boosting models and 0.44–0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM2.5 concentrations over time is observed in Eastern Texas.

Impact statement

We utilized machine learning approaches to predict PM2.5 levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R2 > 0.9).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison between PM2.5 observations and predictions calculated from gradient boosting models through out-of-sample validation.
Fig. 2

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. EPA. Integrated Science Assessment (ISA) for Particulate Matter. 2019. Available from: https://www.epa.gov/isa/integrated-science-assessment-isa-particulate-matter.

  2. Luong C, Zhang K. An assessment of emission event trends within the Greater Houston area during 2003–2013. Air Qual Atmosphere Health. 2017;10:543–54.

    Article  CAS  Google Scholar 

  3. Liu S, Zhang K. Fine particulate matter components and mortality in Greater Houston: Did the risk reduce from 2000 to 2011? Sci Total Environ. 2015;538:162–8.

    Article  CAS  PubMed  Google Scholar 

  4. Liu S, Ganduglia CM, Li X, Delclos GL, Franzini L, Zhang K. Fine particulate matter components and emergency department visits among a privately insured population in Greater Houston. Sci Total Environ. 2016;566:521–7.

    Article  PubMed  Google Scholar 

  5. Liu S, Ganduglia CM, Li X, Delclos GL, Franzini L, Zhang K. Short-term associations of fine particulate matter components and emergency hospital admissions among a privately insured population in Greater Houston. Atmos Environ. 2016;147:369–75.

    Article  CAS  Google Scholar 

  6. Danysh HE, Mitchell LE, Zhang K, Scheurer ME, Lupo PJ. Traffic‐related air pollution and the incidence of childhood central nervous system tumors: Texas, 2001–2009. Pediatr Blood Cancer. 2015;62:1572–8.

    Article  CAS  PubMed  Google Scholar 

  7. Zhang X, Zhao H, Chow WH, Bixby M, Durand C, Markham C, et al. Population‐Based Study of Traffic‐Related Air Pollution and Obesity in Mexican Americans. Obesity. 2020;28:412–20.

    Article  PubMed  Google Scholar 

  8. Rammah A, Whitworth KW, Han I, Chan W, Symanski E. PM2.5 metal constituent exposure and stillbirth risk in Harris County, Texas. Environ Res. 2019;176:108516.

    Article  CAS  PubMed  Google Scholar 

  9. Hu X, Waller LA, Al-Hamdan MZ, Crosson WL, Estes MG, Estes SM, et al. Estimating ground-level PM2.5 concentrations in the southeastern U.S. using geographically weighted regression. Environ Res. 2013;121:1–10.

    Article  CAS  PubMed  Google Scholar 

  10. Zhang X, Chu Y, Wang Y, Zhang K. Predicting daily PM2.5 concentrations in Texas using high-resolution satellite aerosol optical depth. Sci Total Environ. 2018;631-632:904–11.

    Article  CAS  PubMed  Google Scholar 

  11. Zhao C, Liu Z, Wang Q, Ban J, Chen NX, Li T. High-resolution daily AOD estimated to full coverage using the random forest model approach in the Beijing-Tianjin-Hebei region. Atmos Environ. 2019;203:70–8.

    Article  CAS  Google Scholar 

  12. Reid CE, Jerrett M, Petersen ML, Pfister GG, Morefield PE, Tager IB, et al. Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. Environ Sci Technol. 2015;49:3887–96.

    Article  CAS  PubMed  Google Scholar 

  13. Tong W, Li L, Zhou X, Hamilton A, Zhang K. Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Quality. Atmosphere Health. 2019;12:411–23.

    Article  CAS  Google Scholar 

  14. MRLC. NLCD 2011 Land Cover (CONUS). 2011. Available from: https://www.mrlc.gov/data/nlcd-2011-land-cover-conus-0.

  15. Kloog I, Chudnovsky AA, Just AC, Nordio F, Koutrakis P, Coull BA, et al. A New Hybrid Spatio-Temporal Model For Estimating Daily Multi-Year PM2.5 Concentrations Across Northeastern USA Using High Resolution Aerosol Optical Depth Data. Atmos Environ. 2014;95:581–90.

    Article  CAS  Google Scholar 

  16. Kumar N, Dong L, Chen J, Chen J. ltsk: Local Time Space Kriging. 2019. Available from: https://CRAN.R-project.org/package=ltsk.

  17. Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere. 2019;10:373.

    Article  Google Scholar 

  18. Cressie N, Wikle CK. Statistics for spatio-temporal data. Hoboken, New Jersey, USA: John Wiley & Sons; 2015.

  19. Smith RL, Kolenikov S, Cox LH. Spatiotemporal modeling of PM2.5 data with missing values. J Geophys Res Atmos. 2003;108:STS11–1.

  20. Paciorek CJ, Liu Y. Limitations of remotely sensed aerosol as a spatial proxy for fine particulate matter. Environ Health Perspect. 2009;117:904–9.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Xie Y, Wang Y, Zhang K, Dong W, Lv B, Bai Y. Daily estimation of ground-level PM2.5 concentrations over Beijing using 3 km resolution MODIS AOD. Environ Sci Technol. 2015;49:12280–88.

  22. Beckerman BS, Jerrett M, Serre M, Martin RV, Lee S-J, Van Donkelaar A, et al. A hybrid approach to estimating national scale spatiotemporal variability of PM2.5 in the contiguous United States. Environ Sci Technol. 2013;47:7233–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Chu Y, Liu Y, Li X, Liu Z, Lu H, Lu Y, et al. A Review on Predicting Ground PM2.5 Concentration Using Satellite Aerosol Optical Depth. Atmosphere. 2016;7:129.

    Article  Google Scholar 

  24. dos Santos Gonçalves K, Winkler MS, Benchimol-Barbosa PR, de Hoogh K, Artaxo PE, de Souza Hacon S, et al. Development of non-linear models predicting daily fine particle concentrations using aerosol optical depth retrievals and ground-based measurements at a municipality in the Brazilian Amazon region. Atmos Environ. 2018;184:156–65.

    Article  Google Scholar 

  25. Mirzaei M, Amanollahi J, Tzanis CG. Evaluation of linear, nonlinear, and hybrid models for predicting PM2.5 based on a GTWR model and MODIS AOD data. Air Qual Atmos Health. 2019;12:1215–24.

    Article  CAS  Google Scholar 

  26. Huang K, Bi J, Meng X, Geng G, Lyapustin A, Lane KJ, et al. Estimating daily PM2.5 concentrations in New York City at the neighborhood-scale: Implications for integrating non-regulatory measurements. Sci Total Environ. 2019;697:134094.

    Article  CAS  PubMed  Google Scholar 

  27. Bi J, Stowell J, Seto EY, English PB, Al-Hamdan MZ, Kinney PL, et al. Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: A case study in Imperial County, California, USA. Environ Res. 2020;180:108810.

    Article  CAS  PubMed  Google Scholar 

  28. Brokamp C, Jandarov R, Hossain M, Ryan P. Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model. Environ Sci Technol. 2018;52:4173–9.

    Article  CAS  PubMed  Google Scholar 

  29. Hu X, Belle JH, Meng X, Wildani A, Waller LA, Strickland MJ, et al. Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. Environ Sci Technol. 2017;51:6936–44.

    Article  CAS  PubMed  Google Scholar 

  30. Goldberg DL, Gupta P, Wang K, Jena C, Zhang Y, Lu Z, et al. Using gap-filled MAIAC AOD and WRF-Chem to estimate daily PM2.5 concentrations at 1 km resolution in the Eastern United States. Atmos Environ. 2019;199:443–52.

    Article  CAS  Google Scholar 

  31. Lv B, Hu Y, Chang HH, Russell AG, Cai J, Xu B, et al. Daily estimation of ground-level PM2.5 concentrations at 4 km resolution over Beijing-Tianjin-Hebei by fusing MODIS AOD and ground observations. Sci Total Environ. 2017;580:235–44.

    Article  CAS  PubMed  Google Scholar 

  32. Kianian B, Liu Y, Chang HH. Imputing Satellite-Derived Aerosol Optical Depth Using a Multi-Resolution Spatial Model and Random Forest for PM2.5 Prediction. Remote Sens. 2021;13:126.

    Article  Google Scholar 

Download references

Acknowledgements

The study was supported by Environmental Defense Fund. KZ was also supported by the American Heart Association grant (19TPA34830085) and the Empire Innovation Program (EIP) of the State University of New York. This paper does not necessarily reflect the views of the Environmental Defense Fund and the University at Albany.

Author information

Authors and Affiliations

Authors

Contributions

Kai Zhang: Conceptualization, Methodology, Writing – review & editing, Supervision, Funding acquisition. Jeffrey Lin: Formal analysis, Writing – review & editing, Visualization. Yuanfei Li: Writing – review & editing. Yue Sun: Data curation. Weitian Tong: Methodology, Writing – review & editing. Fangyu Li: Writing – review & editing. Lung-Chang Chien: Methodology, Writing – review & editing. Yiping Yang: Formal analysis. Wei-Chung Su: Writing – review & editing. Hezhong Tian: Writing – review & editing. Peng Fu: Data curation. Fengxiang Qiao: Writing – review & editing. Xiaobo Xue Romeiko: Writing – review & editing. Shao Lin: Writing – review & editing. Sheng Luo: Methodology, Writing – review & editing. Elena Craft: Conceptualization, Supervision, and Funding acquisition.

Corresponding author

Correspondence to Kai Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, K., Lin, J., Li, Y. et al. Unmasking the sky: high-resolution PM2.5 prediction in Texas using machine learning techniques. J Expo Sci Environ Epidemiol (2024). https://doi.org/10.1038/s41370-024-00659-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41370-024-00659-w

Keywords

Search

Quick links