Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Machine learning for environmental monitoring


Public agencies aiming to enforce environmental regulation have limited resources to achieve their objectives. We demonstrate how machine-learning methods can inform the efficient use of these limited resources while accounting for real-world concerns, such as gaming the system and institutional constraints. Here, we predict the likelihood of a facility failing a water-pollution inspection and propose alternative inspection allocations that would target high-risk facilities. Implementing such a data-driven inspection allocation could detect over seven times the expected number of violations than current practices. When we impose constraints, such as maintaining a minimum probability of inspection for all facilities and accounting for state-level differences in inspection budgets, our reallocation regimes double the number of violations detected through inspections. Leveraging increasing amounts of electronic data can help public agencies to enhance their regulatory effectiveness and remedy environmental harms. Although employing algorithm-based resource allocation rules requires care to avoid manipulation and unintentional error propagation, the principled use of predictive analytics can extend the beneficial reach of limited resources.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Rate of inspection failure by predicted risk score.
Fig. 2: Inspection failure rates under different reallocations.
Fig. 3: Inspected facilities under different allocations.
Fig. 4: Influence of self-reported data on risk scores.

Similar content being viewed by others

Data availability

The raw data used in this analysis can be downloaded from the EPA’s ECHO website ( The processed datasets are also available with code at the Stanford Digital Repository (


  1. Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–495 (2015).

    Article  Google Scholar 

  2. Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483–485 (2017).

    Article  CAS  Google Scholar 

  3. Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ. Pers. 31, 87–106 (2017).

    Article  Google Scholar 

  4. Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. & Mullainathan, S. Human decision and machine predictions. Q. J. Econ. 133, 237–293 (2018).

    Google Scholar 

  5. Kang, J. S., Kuznetsova, P., Luca, M. & Choi, Y. Where not to eat? Improving public policy by predicting hygiene inspections using online reviews. In Proc. 2013 Conference on Empirical Methods in Natural Language Processing 1443–1448 (Association for Computational Linguistics, 2013).

  6. Chandler, D., Levitt, S. D. & List, J. A. Predicting and preventing shootings among at-risk youth. Am. Econ. Rev. 101, 288–292 (2011).

    Article  Google Scholar 

  7. O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books, New York, USA, 2016).

  8. Blumenthal-Barby, J. S. & Krieger, H. Cognitive biases and heuristics in medical decision making. Med. Decis. Making 35, 539–557 (2015).

    Article  CAS  Google Scholar 

  9. Mullainathan, S. & Obermeyer, Z. Does machine learning automate moral hazard and error? Am. Econ. Rev. 107, 476–480 (2017).

    Article  Google Scholar 

  10. Lund, L. C. Clean Water Act National Pollutant Discharge Elimination System Compliance Monitoring Strategy (United States Environmental Protection Agency, 2014);

  11. Friesen, L. Targeting enforcement to improve compliance with environmental regulations. J. Environ. Econ. Manage. 46, 72–85 (2003).

    Article  Google Scholar 

  12. Rivers, L., Dempsey, T., Mitchell, J. & Gibbs, C. Environmental regulation and enforcement: structures, processes and the use of data for fraud detection. J. Environ. Assess. Pol. Manage. 17, 1550033 (2015).

    Article  Google Scholar 

  13. Glicksman, R. L., Markell, D. L. & Monteleoni, C. Technological innovation, data analytics, and environmental enforcement. Ecol. Law. Q. 44, 41–88 (2017).

    Google Scholar 

  14. NPDES Compliance Inspection Manual Interim Revised Version, January 2017 (United States Environmental Protection Agency, 2017);

  15. National Pollutant Discharge Elimination System (NPDES) Electronic Reporting Rule (United States Environmental Protection Agency, 2015);

  16. Shimshack, J. P. & Ward, M. B. Enforcement and over-compliance. J. Environ. Econ. Manage. 55, 90–105 (2008).

    Article  Google Scholar 

  17. James, G., Witten, D., Hastie, T., & Tibshirani, R. An Introduction to Statistical Learning (Springer, New York, USA, 2013).

    Google Scholar 

  18. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction 2nd edn (Springer, New York, USA, 2009).

  19. Zliobaite, I. Fairness-aware machine learning: a perspective. Preprint at (2017).

  20. ICIS-NPDES Download Summary and Data Element Dictionary (United States Environmental Protection Agency, 2017);

  21. R Development Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2017).

  22. State Compliance Monitoring Expectations (United States Environmental Protection Agency, 2015);

Download references


We thank S. Athey, M. Burke, F. Burlig, K. Mach, A. D’Agostino, C. Anderson, K. Green, S. Hasan, D. Jiménez, H. Kim, A. R. Siders and A. Stock for comments. E.B. receives funding from the National Science Foundation Graduate Research Fellowship Program (DGE-114747), M.H. from the Department of Earth System Science at Stanford University, and N.B. from the Stanford Graduate Fellowship/David and Lucile Packard Foundation.

Author information

Authors and Affiliations



All three authors collaboratively designed the study, developed the methodology, assembled the data, wrote the code, performed the analysis, interpreted the results, and wrote the manuscript. E.B. and M.H. conducted the final analysis, with substantial input from N.B.

Corresponding author

Correspondence to E. Benami.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note 1, Supplementary Figures 1–6, Supplementary Tables 1–6, Supplementary References 1–4

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hino, M., Benami, E. & Brooks, N. Machine learning for environmental monitoring. Nat Sustain 1, 583–588 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing Anthropocene

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Anthropocene