Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Non-linear probabilistic calibration of low-cost environmental air pollution sensor networks for neighborhood level spatiotemporal exposure assessment



Low-cost sensor networks for monitoring air pollution are an effective tool for expanding spatial resolution beyond the capabilities of existing state and federal reference monitoring stations. However, low-cost sensor data commonly exhibit non-linear biases with respect to environmental conditions that cannot be captured by linear models, therefore requiring extensive lab calibration. Further, these calibration models traditionally produce point estimates or uniform variance predictions which limits their downstream in exposure assessment.


Build direct field-calibration models using probabilistic gradient boosted decision trees (GBDT) that eliminate the need for resource-intensive lab calibration and that can be used to conduct probabilistic exposure assessments on the neighborhood level.


Using data from Plantower A003 particulate matter (PM) sensors deployed in Baltimore, MD from November 2018 through November 2019, a fully probabilistic NGBoost GBDT was trained on raw data from sensors co-located with a federal reference monitoring station and compared against linear regression trained on lab calibrated sensor data. The NGBoost predictions were then used in a Monte Carlo interpolation process to generate high spatial resolution probabilistic exposure gradients across Baltimore.


We demonstrate that direct field-calibration of the raw PM2.5 sensor data using a probabilistic GBDT has improved point and distribution accuracies compared to the linear model, particularly at reference measurements exceeding 25 μg/m3, and also on monitors not included in the training set.


We provide a framework for utilizing the GBDT to conduct probabilistic spatial assessments of human exposure with inverse distance weighting that predicts the probability of a given location exceeding an exposure threshold and provides percentiles of exposure. These probabilistic spatial exposure assessments can be scaled by time and space with minimal modifications. Here, we used the probabilistic exposure assessment methodology to create high quality spatial-temporal PM2.5 maps on the neighborhood-scale in Baltimore, MD.

Impact statement

  • We demonstrate how the use of open-source probabilistic machine learning models for in-place sensor calibration outperforms traditional linear models and does not require an initial laboratory calibration step. Further, these probabilistic models can create uniquely probabilistic spatial exposure assessments following a Monte Carlo interpolation process.

Graphical abstract

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to them being a part of continuous ongoing research but are available from the corresponding author on reasonable request.


  1. World Health Organization. 9 out of 10 people worldwide breathe polluted air, but more countries are taking action. 2018.

  2. Cohen AJ, Brauer M, Burnett R, Anderson HR, Frostad J, Estep K, et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet. 2017;389:1907–18.

    Article  Google Scholar 

  3. International Agency for Research on Cancer. Outdoor Air Pollution (Vol. 109). Lyon; 2016.

  4. Environmental Protection Agency. NAAQS Table. 2010. Available from:

  5. Apte J, Messier K, Gani S, Brauer M, Kirchstetter T, Lunden M, et al. High-resolution air pollution mapping with Google Street View cars: exploiting big data (Supplemental Material). Environ Sci Technol. 2017;51:6999–7008.

    Article  CAS  Google Scholar 

  6. Maryland Department of the Environment. Ambient Air Monitoring Network Plan for Calendar Year 2019. Baltimore; 2018.

  7. Ye Q, Li HZ, Gu P, Robinson ES, Apte JS, Sullivan RC, et al. Moving beyond fine particle mass: High-spatial resolution exposure to source-resolved atmospheric particle number and chemical mixing state. Environ Health Perspect. 2020;128.

  8. Saha PK, Sengupta S, Adams P, Robinson AL, Presto AA. Spatial correlation of ultrafine particle number and fine particle mass at urban scales: implications for health assessment. Environ Sci Technol. 2020;54:9295–304.

    Article  CAS  Google Scholar 

  9. Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GSW, et al. The changing paradigm of air pollution monitoring. Environ Sci Technol. 2013;47:11369–77.

    Article  CAS  Google Scholar 

  10. Piedrahita R, Xiang Y, Masson N, Ortega J, Collier A, Jiang Y, et al. The next generation of low-cost personal air quality sensors for quantitative exposure monitoring. Atmos Meas Tech. 2014;7:3325–36.

    Article  Google Scholar 

  11. Szpiro AA, Sampson PD, Sheppard L, Lumley T, Adar SD, Kaufman JD. Predicting intra-urban variation in air pollution concentrations with complex spatio-temporal dependencies. Environmetrics. 2009;21:n/a–n/a.

    Article  Google Scholar 

  12. Buehler C, Xiong F, Levy Zamora M, Skog K, Kohrman-Glaser J, Colton S, et al. Stationary and portable multipollutant monitors for high spatiotemporal resolution air quality studies including online calibration. Atmos Measurement Tech. 2020;in review.

  13. Datta A, Saha A, Zamora ML, Buehler C, Hao L, Xiong F, et al. Statistical field calibration of a low-cost PM2.5 monitoring network in Baltimore. Atmos Environ. 2020;242:117761.

    Article  CAS  Google Scholar 

  14. Morawska L, Thai PK, Liu X, Asumadu-Sakyi A, Ayoko G, Bartonova A, et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how far have they gone? Vol. 116, Environment International. Elsevier Ltd; 2018. 286–99.

  15. Levy Zamora M, Xiong F, Gentner D, Kerkez B, Kohrman-Glaser J, Koehler K. Field and laboratory evaluations of the low-cost plantower particulate matter sensor. Environ Sci Technol. 2019;53:838–49.

    Article  CAS  Google Scholar 

  16. Borrego C, Ginja J, Coutinho M, Ribeiro C, Karatzas K, Sioumis T, et al. Assessment of air quality microsensors versus reference methods: The EuNetAir Joint Exercise – Part II. Atmos Environ. 2018;193:127–42.

    Article  CAS  Google Scholar 

  17. Brokamp C, Jandarov R, Rao MB, LeMasters G, Ryan P. Exposure assessment models for elemental components of particulate matter in an urban environment: a comparison of regression and random forest approaches. Atmos Environ. 2017;151:1–11.

    Article  CAS  Google Scholar 

  18. Loh BG, Choi GH. Calibration of portable particulate matter–monitoring device using web query and machine learning. Saf Health Work 2019;10:452–60.

    Article  Google Scholar 

  19. Lim CC, Kim H, Vilcassim MJR, Thurston GD, Gordon T, Chen LC, et al. Mapping urban air quality using mobile sampling with low-cost sensors and machine learning in Seoul, South Korea. Environ Int. 2019;131:105022.

    Article  CAS  Google Scholar 

  20. Zimmerman N, Presto AA, Kumar SPN, Gu J, Hauryliuk A, Robinson ES, et al. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos Meas Tech. 2018;11:291–313.

    Article  Google Scholar 

  21. EPA. Risk Assessment Forum White Paper: Probabilistic Risk Assessment Methods and Case Studies. 2014. Available from:

  22. NIOSH. How NIOSH Conducts Risk Assessments. 2017. Available from:

  23. Daniels R, Gilbert S, Kuppusamy S, Kuempel E, Park R, Pandalai S, et al. Current Intelligence Bulletin 69 - NIOSH Practices in Occupational Risk Assessment. 2020.

  24. Patton AN, Medvedovsky K, Zuidema C, Peters TM, Koehler K. Probabilistic machine learning with low-cost sensor networks for occupational exposure assessment and industrial hygiene decision making. Ann Work Exposures Health. 2022;66:580–90.

    Article  Google Scholar 

  25. Buehler C, Xiong F, Zamora ML, Skog KM, Kohrman-Glaser J, Colton S, et al. Stationary and portable multipollutant monitors for high-spatiotemporal-resolution air quality studies including online calibration. Atmos Meas Tech. 2021;14:995–1013.

    Article  CAS  Google Scholar 

  26. Duan T, Avati A, Ding DY, Thai KK, Basu S, Ng AY, et al. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. 2019.

  27. Gneiting T, Raftery AE. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc. 2007;102:359–78.

    Article  CAS  Google Scholar 

  28. Heffernan C, Peng R, Gentner DR, Koehler K, Datta A. Gaussian Process filtering for calibration of low-cost air-pollution sensor network data. arXiv. 2022 [cited 2022 Jun 7]. Report No.: arXiv:2203.14775. Available from:

  29. Baltimore City Department of Health. Neighborhood Health Profiles - Frequently Asked Questions | Baltimore City Health Department. 2017 [cited 2020 Sep 30]. Available from:

  30. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.

  31. Chang FJ, Chang LC, Kang CC, Wang YS, Huang A. Explore spatio-temporal PM2.5 features in northern Taiwan using machine learning techniques. Sci Total Environ. 2020;736:139656.

    Article  CAS  Google Scholar 

  32. Huang K, Xiao Q, Meng X, Geng G, Wang Y, Lyapustin A, et al. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain. Environ Pollut. 2018;242:675–83.

    Article  CAS  Google Scholar 

  33. Zhan Y, Luo Y, Deng X, Grieneisen ML, Zhang M, Di B. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ Pollut. 2018;233:464–73.

    Article  CAS  Google Scholar 

  34. Zhao Z, Qin J, He Z, Li H, Yang Y, Zhang R. Combining forward with recurrent neural networks for hourly air quality prediction in Northwest of China. Environ Sci Pollut Res. 2020;1–18.

Download references


This manuscript has not been formally reviewed by the Environmental Protection Agency (EPA). The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. The EPA does not endorse any products or commercial services mentioned in this publication. Further, any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


This publication was developed under Assistance Agreement no. RD835871 awarded by the U.S. Environmental Protection Agency to Yale University. AP was supported by a grant from the U.S. Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health to the Johns Hopkins Education and Research Center for Occupational Safety and Health (award number T42 OH0008428). AD was supported by the National Science Foundation DMS-1915803 and the National Institute of Environmental Health Sciences (NIEHS) grant R01ES033739. CB was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1752134. DG and FX would also like to acknowledge support from HKF Technology and Ken Hu. MLZ was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under award numbers K99ES029116 and R00ES029116.

Author information

Authors and Affiliations



AP: conceptualization, methodology, software, formal analysis, data curation, writing—original draft, writing—review & editing, visualization. AD: Methodology, software, formal analysis, data curation, writing—original draft, writing—review & editing. MLZ: Data curation, writing—review & editing, investigation. CB: writing—review & editing, investigation. FX: writing—review & editing, investigation. DG: writing—review & editing, investigation, funding acquisition. KK: conceptualization, methodology, data curation, writing—original draft, writing—review & editing, supervision, funding acquisition.

Corresponding author

Correspondence to Andrew Patton.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patton, A., Datta, A., Zamora, M.L. et al. Non-linear probabilistic calibration of low-cost environmental air pollution sensor networks for neighborhood level spatiotemporal exposure assessment. J Expo Sci Environ Epidemiol 32, 908–916 (2022).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:



Quick links