Abstract
Background
Low-cost sensor networks for monitoring air pollution are an effective tool for expanding spatial resolution beyond the capabilities of existing state and federal reference monitoring stations. However, low-cost sensor data commonly exhibit non-linear biases with respect to environmental conditions that cannot be captured by linear models, therefore requiring extensive lab calibration. Further, these calibration models traditionally produce point estimates or uniform variance predictions which limits their downstream in exposure assessment.
Objective
Build direct field-calibration models using probabilistic gradient boosted decision trees (GBDT) that eliminate the need for resource-intensive lab calibration and that can be used to conduct probabilistic exposure assessments on the neighborhood level.
Methods
Using data from Plantower A003 particulate matter (PM) sensors deployed in Baltimore, MD from November 2018 through November 2019, a fully probabilistic NGBoost GBDT was trained on raw data from sensors co-located with a federal reference monitoring station and compared against linear regression trained on lab calibrated sensor data. The NGBoost predictions were then used in a Monte Carlo interpolation process to generate high spatial resolution probabilistic exposure gradients across Baltimore.
Results
We demonstrate that direct field-calibration of the raw PM2.5 sensor data using a probabilistic GBDT has improved point and distribution accuracies compared to the linear model, particularly at reference measurements exceeding 25 μg/m3, and also on monitors not included in the training set.
Significance
We provide a framework for utilizing the GBDT to conduct probabilistic spatial assessments of human exposure with inverse distance weighting that predicts the probability of a given location exceeding an exposure threshold and provides percentiles of exposure. These probabilistic spatial exposure assessments can be scaled by time and space with minimal modifications. Here, we used the probabilistic exposure assessment methodology to create high quality spatial-temporal PM2.5 maps on the neighborhood-scale in Baltimore, MD.
Impact statement
-
We demonstrate how the use of open-source probabilistic machine learning models for in-place sensor calibration outperforms traditional linear models and does not require an initial laboratory calibration step. Further, these probabilistic models can create uniquely probabilistic spatial exposure assessments following a Monte Carlo interpolation process.
Graphical abstract
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 6 print issues and online access
$259.00 per year
only $43.17 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are not publicly available due to them being a part of continuous ongoing research but are available from the corresponding author on reasonable request.
References
World Health Organization. 9 out of 10 people worldwide breathe polluted air, but more countries are taking action. 2018.
Cohen AJ, Brauer M, Burnett R, Anderson HR, Frostad J, Estep K, et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet. 2017;389:1907–18.
International Agency for Research on Cancer. Outdoor Air Pollution (Vol. 109). Lyon; 2016.
Environmental Protection Agency. NAAQS Table. 2010. Available from: https://www.epa.gov/criteria-air-pollutants/naaqs-table
Apte J, Messier K, Gani S, Brauer M, Kirchstetter T, Lunden M, et al. High-resolution air pollution mapping with Google Street View cars: exploiting big data (Supplemental Material). Environ Sci Technol. 2017;51:6999–7008.
Maryland Department of the Environment. Ambient Air Monitoring Network Plan for Calendar Year 2019. Baltimore; 2018.
Ye Q, Li HZ, Gu P, Robinson ES, Apte JS, Sullivan RC, et al. Moving beyond fine particle mass: High-spatial resolution exposure to source-resolved atmospheric particle number and chemical mixing state. Environ Health Perspect. 2020;128.
Saha PK, Sengupta S, Adams P, Robinson AL, Presto AA. Spatial correlation of ultrafine particle number and fine particle mass at urban scales: implications for health assessment. Environ Sci Technol. 2020;54:9295–304.
Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GSW, et al. The changing paradigm of air pollution monitoring. Environ Sci Technol. 2013;47:11369–77.
Piedrahita R, Xiang Y, Masson N, Ortega J, Collier A, Jiang Y, et al. The next generation of low-cost personal air quality sensors for quantitative exposure monitoring. Atmos Meas Tech. 2014;7:3325–36.
Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD. Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics. 2009;21:n/a–n/a.
Buehler C, Xiong F, Levy Zamora M, Skog K, Kohrman-Glaser J, Colton S, et al. Stationary and portable multipollutant monitors for high spatiotemporal resolution air quality studies including online calibration. Atmos Measurement Tech. 2020;in review.
Datta A, Saha A, Zamora ML, Buehler C, Hao L, Xiong F, et al. Statistical field calibration of a low-cost PM2.5 monitoring network in Baltimore. Atmos Environ. 2020;242:117761.
Morawska L, Thai PK, Liu X, Asumadu-Sakyi A, Ayoko G, Bartonova A, et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Vol. 116, Environment International. Elsevier Ltd; 2018. 286–99.
Levy Zamora M, Xiong F, Gentner D, Kerkez B, Kohrman-Glaser J, Koehler K. Field and laboratory evaluations of the low-cost plantower particulate matter sensor. Environ Sci Technol. 2019;53:838–49.
Borrego C, Ginja J, Coutinho M, Ribeiro C, Karatzas K, Sioumis T, et al. Assessment of air quality microsensors versus reference methods: The EuNetAir Joint Exercise – Part II. Atmos Environ. 2018;193:127–42.
Brokamp C, Jandarov R, Rao MB, LeMasters G, Ryan P. Exposure assessment models for elemental components of particulate matter in an urban environment: a comparison of regression and random forest approaches. Atmos Environ. 2017;151:1–11.
Loh BG, Choi GH. Calibration of portable particulate matter–monitoring device using web query and machine learning. Saf Health Work 2019;10:452–60.
Lim CC, Kim H, Vilcassim MJR, Thurston GD, Gordon T, Chen LC, et al. Mapping urban air quality using mobile sampling with low-cost sensors and machine learning in Seoul, South Korea. Environ Int. 2019;131:105022.
Zimmerman N, Presto AA, Kumar SPN, Gu J, Hauryliuk A, Robinson ES, et al. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos Meas Tech. 2018;11:291–313.
EPA. Risk Assessment Forum White Paper: Probabilistic Risk Assessment Methods and Case Studies. 2014. Available from: https://www.epa.gov/sites/production/files/2014-12/documents/raf-pra-white-paper-final.pdf
NIOSH. How NIOSH Conducts Risk Assessments. 2017. Available from: https://www.cdc.gov/niosh/topics/riskassessment/how.html
Daniels R, Gilbert S, Kuppusamy S, Kuempel E, Park R, Pandalai S, et al. Current Intelligence Bulletin 69 - NIOSH Practices in Occupational Risk Assessment. 2020.
Patton AN, Medvedovsky K, Zuidema C, Peters TM, Koehler K. Probabilistic machine learning with low-cost sensor networks for occupational exposure assessment and industrial hygiene decision making. Ann Work Exposures Health. 2022;66:580–90.
Buehler C, Xiong F, Zamora ML, Skog KM, Kohrman-Glaser J, Colton S, et al. Stationary and portable multipollutant monitors for high-spatiotemporal-resolution air quality studies including online calibration. Atmos Meas Tech. 2021;14:995–1013.
Duan T, Avati A, Ding DY, Thai KK, Basu S, Ng AY, et al. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. 2019.
Gneiting T, Raftery AE. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc. 2007;102:359–78.
Heffernan C, Peng R, Gentner DR, Koehler K, Datta A. Gaussian Process filtering for calibration of low-cost air-pollution sensor network data. arXiv. 2022 [cited 2022 Jun 7]. Report No.: arXiv:2203.14775. Available from: http://arxiv.org/abs/2203.14775
Baltimore City Department of Health. Neighborhood Health Profiles - Frequently Asked Questions | Baltimore City Health Department. 2017 [cited 2020 Sep 30]. Available from: https://health.baltimorecity.gov/node/231
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.
Chang FJ, Chang LC, Kang CC, Wang YS, Huang A. Explore spatio-temporal PM2.5 features in northern Taiwan using machine learning techniques. Sci Total Environ. 2020;736:139656.
Huang K, Xiao Q, Meng X, Geng G, Wang Y, Lyapustin A, et al. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain. Environ Pollut. 2018;242:675–83.
Zhan Y, Luo Y, Deng X, Grieneisen ML, Zhang M, Di B. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ Pollut. 2018;233:464–73.
Zhao Z, Qin J, He Z, Li H, Yang Y, Zhang R. Combining forward with recurrent neural networks for hourly air quality prediction in Northwest of China. Environ Sci Pollut Res. 2020;1–18.
Acknowledgements
This manuscript has not been formally reviewed by the Environmental Protection Agency (EPA). The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. The EPA does not endorse any products or commercial services mentioned in this publication. Further, any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Funding
This publication was developed under Assistance Agreement no. RD835871 awarded by the U.S. Environmental Protection Agency to Yale University. AP was supported by a grant from the U.S. Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health to the Johns Hopkins Education and Research Center for Occupational Safety and Health (award number T42 OH0008428). AD was supported by the National Science Foundation DMS-1915803 and the National Institute of Environmental Health Sciences (NIEHS) grant R01ES033739. CB was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1752134. DG and FX would also like to acknowledge support from HKF Technology and Ken Hu. MLZ was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under award numbers K99ES029116 and R00ES029116.
Author information
Authors and Affiliations
Contributions
AP: conceptualization, methodology, software, formal analysis, data curation, writing—original draft, writing—review & editing, visualization. AD: Methodology, software, formal analysis, data curation, writing—original draft, writing—review & editing. MLZ: Data curation, writing—review & editing, investigation. CB: writing—review & editing, investigation. FX: writing—review & editing, investigation. DG: writing—review & editing, investigation, funding acquisition. KK: conceptualization, methodology, data curation, writing—original draft, writing—review & editing, supervision, funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Patton, A., Datta, A., Zamora, M.L. et al. Non-linear probabilistic calibration of low-cost environmental air pollution sensor networks for neighborhood level spatiotemporal exposure assessment. J Expo Sci Environ Epidemiol 32, 908–916 (2022). https://doi.org/10.1038/s41370-022-00493-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41370-022-00493-y