Abstract
Background
Although PM2.5 (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies.
Objective
This study aimed to predict PM2.5 concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables.
Methods
We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM2.5 concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM2.5 concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations.
Results
Our predictive models demonstrate excellent in-sample model performance, as indicated by high R2 values generated from the gradient boosting models (0.94–0.97) and random forest models (0.81–0.90). However, the out-of-sample R2 values fall within a range of 0.52–0.75 for gradient boosting models and 0.44–0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM2.5 concentrations over time is observed in Eastern Texas.
Impact statement
We utilized machine learning approaches to predict PM2.5 levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R2 > 0.9).
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
EPA. Integrated Science Assessment (ISA) for Particulate Matter. 2019. Available from: https://www.epa.gov/isa/integrated-science-assessment-isa-particulate-matter.
Luong C, Zhang K. An assessment of emission event trends within the Greater Houston area during 2003–2013. Air Qual Atmosphere Health. 2017;10:543–54.
Liu S, Zhang K. Fine particulate matter components and mortality in Greater Houston: Did the risk reduce from 2000 to 2011? Sci Total Environ. 2015;538:162–8.
Liu S, Ganduglia CM, Li X, Delclos GL, Franzini L, Zhang K. Fine particulate matter components and emergency department visits among a privately insured population in Greater Houston. Sci Total Environ. 2016;566:521–7.
Liu S, Ganduglia CM, Li X, Delclos GL, Franzini L, Zhang K. Short-term associations of fine particulate matter components and emergency hospital admissions among a privately insured population in Greater Houston. Atmos Environ. 2016;147:369–75.
Danysh HE, Mitchell LE, Zhang K, Scheurer ME, Lupo PJ. Traffic‐related air pollution and the incidence of childhood central nervous system tumors: Texas, 2001–2009. Pediatr Blood Cancer. 2015;62:1572–8.
Zhang X, Zhao H, Chow WH, Bixby M, Durand C, Markham C, et al. Population‐Based Study of Traffic‐Related Air Pollution and Obesity in Mexican Americans. Obesity. 2020;28:412–20.
Rammah A, Whitworth KW, Han I, Chan W, Symanski E. PM2.5 metal constituent exposure and stillbirth risk in Harris County, Texas. Environ Res. 2019;176:108516.
Hu X, Waller LA, Al-Hamdan MZ, Crosson WL, Estes MG, Estes SM, et al. Estimating ground-level PM2.5 concentrations in the southeastern U.S. using geographically weighted regression. Environ Res. 2013;121:1–10.
Zhang X, Chu Y, Wang Y, Zhang K. Predicting daily PM2.5 concentrations in Texas using high-resolution satellite aerosol optical depth. Sci Total Environ. 2018;631-632:904–11.
Zhao C, Liu Z, Wang Q, Ban J, Chen NX, Li T. High-resolution daily AOD estimated to full coverage using the random forest model approach in the Beijing-Tianjin-Hebei region. Atmos Environ. 2019;203:70–8.
Reid CE, Jerrett M, Petersen ML, Pfister GG, Morefield PE, Tager IB, et al. Spatiotemporal prediction of fine particulate matter during the 2008 northern California wildfires using machine learning. Environ Sci Technol. 2015;49:3887–96.
Tong W, Li L, Zhou X, Hamilton A, Zhang K. Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Quality. Atmosphere Health. 2019;12:411–23.
MRLC. NLCD 2011 Land Cover (CONUS). 2011. Available from: https://www.mrlc.gov/data/nlcd-2011-land-cover-conus-0.
Kloog I, Chudnovsky AA, Just AC, Nordio F, Koutrakis P, Coull BA, et al. A New Hybrid Spatio-Temporal Model For Estimating Daily Multi-Year PM2.5 Concentrations Across Northeastern USA Using High Resolution Aerosol Optical Depth Data. Atmos Environ. 2014;95:581–90.
Kumar N, Dong L, Chen J, Chen J. ltsk: Local Time Space Kriging. 2019. Available from: https://CRAN.R-project.org/package=ltsk.
Zamani Joharestani M, Cao C, Ni X, Bashir B, Talebiesfandarani S. PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere. 2019;10:373.
Cressie N, Wikle CK. Statistics for spatio-temporal data. Hoboken, New Jersey, USA: John Wiley & Sons; 2015.
Smith RL, Kolenikov S, Cox LH. Spatiotemporal modeling of PM2.5 data with missing values. J Geophys Res Atmos. 2003;108:STS11–1.
Paciorek CJ, Liu Y. Limitations of remotely sensed aerosol as a spatial proxy for fine particulate matter. Environ Health Perspect. 2009;117:904–9.
Xie Y, Wang Y, Zhang K, Dong W, Lv B, Bai Y. Daily estimation of ground-level PM2.5 concentrations over Beijing using 3 km resolution MODIS AOD. Environ Sci Technol. 2015;49:12280–88.
Beckerman BS, Jerrett M, Serre M, Martin RV, Lee S-J, Van Donkelaar A, et al. A hybrid approach to estimating national scale spatiotemporal variability of PM2.5 in the contiguous United States. Environ Sci Technol. 2013;47:7233–41.
Chu Y, Liu Y, Li X, Liu Z, Lu H, Lu Y, et al. A Review on Predicting Ground PM2.5 Concentration Using Satellite Aerosol Optical Depth. Atmosphere. 2016;7:129.
dos Santos Gonçalves K, Winkler MS, Benchimol-Barbosa PR, de Hoogh K, Artaxo PE, de Souza Hacon S, et al. Development of non-linear models predicting daily fine particle concentrations using aerosol optical depth retrievals and ground-based measurements at a municipality in the Brazilian Amazon region. Atmos Environ. 2018;184:156–65.
Mirzaei M, Amanollahi J, Tzanis CG. Evaluation of linear, nonlinear, and hybrid models for predicting PM2.5 based on a GTWR model and MODIS AOD data. Air Qual Atmos Health. 2019;12:1215–24.
Huang K, Bi J, Meng X, Geng G, Lyapustin A, Lane KJ, et al. Estimating daily PM2.5 concentrations in New York City at the neighborhood-scale: Implications for integrating non-regulatory measurements. Sci Total Environ. 2019;697:134094.
Bi J, Stowell J, Seto EY, English PB, Al-Hamdan MZ, Kinney PL, et al. Contribution of low-cost sensor measurements to the prediction of PM2.5 levels: A case study in Imperial County, California, USA. Environ Res. 2020;180:108810.
Brokamp C, Jandarov R, Hossain M, Ryan P. Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model. Environ Sci Technol. 2018;52:4173–9.
Hu X, Belle JH, Meng X, Wildani A, Waller LA, Strickland MJ, et al. Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. Environ Sci Technol. 2017;51:6936–44.
Goldberg DL, Gupta P, Wang K, Jena C, Zhang Y, Lu Z, et al. Using gap-filled MAIAC AOD and WRF-Chem to estimate daily PM2.5 concentrations at 1 km resolution in the Eastern United States. Atmos Environ. 2019;199:443–52.
Lv B, Hu Y, Chang HH, Russell AG, Cai J, Xu B, et al. Daily estimation of ground-level PM2.5 concentrations at 4 km resolution over Beijing-Tianjin-Hebei by fusing MODIS AOD and ground observations. Sci Total Environ. 2017;580:235–44.
Kianian B, Liu Y, Chang HH. Imputing Satellite-Derived Aerosol Optical Depth Using a Multi-Resolution Spatial Model and Random Forest for PM2.5 Prediction. Remote Sens. 2021;13:126.
Acknowledgements
The study was supported by Environmental Defense Fund. KZ was also supported by the American Heart Association grant (19TPA34830085) and the Empire Innovation Program (EIP) of the State University of New York. This paper does not necessarily reflect the views of the Environmental Defense Fund and the University at Albany.
Author information
Authors and Affiliations
Contributions
Kai Zhang: Conceptualization, Methodology, Writing – review & editing, Supervision, Funding acquisition. Jeffrey Lin: Formal analysis, Writing – review & editing, Visualization. Yuanfei Li: Writing – review & editing. Yue Sun: Data curation. Weitian Tong: Methodology, Writing – review & editing. Fangyu Li: Writing – review & editing. Lung-Chang Chien: Methodology, Writing – review & editing. Yiping Yang: Formal analysis. Wei-Chung Su: Writing – review & editing. Hezhong Tian: Writing – review & editing. Peng Fu: Data curation. Fengxiang Qiao: Writing – review & editing. Xiaobo Xue Romeiko: Writing – review & editing. Shao Lin: Writing – review & editing. Sheng Luo: Methodology, Writing – review & editing. Elena Craft: Conceptualization, Supervision, and Funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, K., Lin, J., Li, Y. et al. Unmasking the sky: high-resolution PM2.5 prediction in Texas using machine learning techniques. J Expo Sci Environ Epidemiol (2024). https://doi.org/10.1038/s41370-024-00659-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41370-024-00659-w