Machine learning for environmental monitoring

Abstract

Public agencies aiming to enforce environmental regulation have limited resources to achieve their objectives. We demonstrate how machine-learning methods can inform the efficient use of these limited resources while accounting for real-world concerns, such as gaming the system and institutional constraints. Here, we predict the likelihood of a facility failing a water-pollution inspection and propose alternative inspection allocations that would target high-risk facilities. Implementing such a data-driven inspection allocation could detect over seven times the expected number of violations than current practices. When we impose constraints, such as maintaining a minimum probability of inspection for all facilities and accounting for state-level differences in inspection budgets, our reallocation regimes double the number of violations detected through inspections. Leveraging increasing amounts of electronic data can help public agencies to enhance their regulatory effectiveness and remedy environmental harms. Although employing algorithm-based resource allocation rules requires care to avoid manipulation and unintentional error propagation, the principled use of predictive analytics can extend the beneficial reach of limited resources.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Rate of inspection failure by predicted risk score.
Fig. 2: Inspection failure rates under different reallocations.
Fig. 3: Inspected facilities under different allocations.
Fig. 4: Influence of self-reported data on risk scores.

Data availability

The raw data used in this analysis can be downloaded from the EPA’s ECHO website (https://echo.epa.gov/). The processed datasets are also available with code at the Stanford Digital Repository (https://purl.stanford.edu/hr919hp5420).

References

  1. 1.

    Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–495 (2015).

    Article  Google Scholar 

  2. 2.

    Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483–485 (2017).

    CAS  Article  Google Scholar 

  3. 3.

    Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ. Pers. 31, 87–106 (2017).

    Article  Google Scholar 

  4. 4.

    Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J. & Mullainathan, S. Human decision and machine predictions. Q. J. Econ. 133, 237–293 (2018).

    Google Scholar 

  5. 5.

    Kang, J. S., Kuznetsova, P., Luca, M. & Choi, Y. Where not to eat? Improving public policy by predicting hygiene inspections using online reviews. In Proc. 2013 Conference on Empirical Methods in Natural Language Processing 1443–1448 (Association for Computational Linguistics, 2013).

  6. 6.

    Chandler, D., Levitt, S. D. & List, J. A. Predicting and preventing shootings among at-risk youth. Am. Econ. Rev. 101, 288–292 (2011).

    Article  Google Scholar 

  7. 7.

    O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Broadway Books, New York, USA, 2016).

  8. 8.

    Blumenthal-Barby, J. S. & Krieger, H. Cognitive biases and heuristics in medical decision making. Med. Decis. Making 35, 539–557 (2015).

    CAS  Article  Google Scholar 

  9. 9.

    Mullainathan, S. & Obermeyer, Z. Does machine learning automate moral hazard and error? Am. Econ. Rev. 107, 476–480 (2017).

    Article  Google Scholar 

  10. 10.

    Lund, L. C. Clean Water Act National Pollutant Discharge Elimination System Compliance Monitoring Strategy (United States Environmental Protection Agency, 2014); https://www.epa.gov/sites/production/files/2013-09/documents/npdescms.pdf

  11. 11.

    Friesen, L. Targeting enforcement to improve compliance with environmental regulations. J. Environ. Econ. Manage. 46, 72–85 (2003).

    Article  Google Scholar 

  12. 12.

    Rivers, L., Dempsey, T., Mitchell, J. & Gibbs, C. Environmental regulation and enforcement: structures, processes and the use of data for fraud detection. J. Environ. Assess. Pol. Manage. 17, 1550033 (2015).

    Article  Google Scholar 

  13. 13.

    Glicksman, R. L., Markell, D. L. & Monteleoni, C. Technological innovation, data analytics, and environmental enforcement. Ecol. Law. Q. 44, 41–88 (2017).

    Google Scholar 

  14. 14.

    NPDES Compliance Inspection Manual Interim Revised Version, January 2017 (United States Environmental Protection Agency, 2017); https://www.epa.gov/sites/production/files/2017-01/documents/npdesinspect.pdf

  15. 15.

    National Pollutant Discharge Elimination System (NPDES) Electronic Reporting Rule (United States Environmental Protection Agency, 2015); https://www.gpo.gov/fdsys/pkg/FR-2015-10-22/pdf/2015-24954.pdf

  16. 16.

    Shimshack, J. P. & Ward, M. B. Enforcement and over-compliance. J. Environ. Econ. Manage. 55, 90–105 (2008).

    Article  Google Scholar 

  17. 17.

    James, G., Witten, D., Hastie, T., & Tibshirani, R. An Introduction to Statistical Learning (Springer, New York, USA, 2013).

    Google Scholar 

  18. 18.

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction 2nd edn (Springer, New York, USA, 2009).

  19. 19.

    Zliobaite, I. Fairness-aware machine learning: a perspective. Preprint at https://arxiv.org/abs/1708.00754 (2017).

  20. 20.

    ICIS-NPDES Download Summary and Data Element Dictionary (United States Environmental Protection Agency, 2017); https://echo.epa.gov/tools/data-downloads/icis-npdes-download-summary

  21. 21.

    R Development Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2017).

  22. 22.

    State Compliance Monitoring Expectations (United States Environmental Protection Agency, 2015); https://echo.epa.gov/trends/comparative-maps-dashboards/state-compliance-monitoring-expectations

Download references

Acknowledgements

We thank S. Athey, M. Burke, F. Burlig, K. Mach, A. D’Agostino, C. Anderson, K. Green, S. Hasan, D. Jiménez, H. Kim, A. R. Siders and A. Stock for comments. E.B. receives funding from the National Science Foundation Graduate Research Fellowship Program (DGE-114747), M.H. from the Department of Earth System Science at Stanford University, and N.B. from the Stanford Graduate Fellowship/David and Lucile Packard Foundation.

Author information

Affiliations

Authors

Contributions

All three authors collaboratively designed the study, developed the methodology, assembled the data, wrote the code, performed the analysis, interpreted the results, and wrote the manuscript. E.B. and M.H. conducted the final analysis, with substantial input from N.B.

Corresponding author

Correspondence to E. Benami.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Note 1, Supplementary Figures 1–6, Supplementary Tables 1–6, Supplementary References 1–4

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hino, M., Benami, E. & Brooks, N. Machine learning for environmental monitoring. Nat Sustain 1, 583–588 (2018). https://doi.org/10.1038/s41893-018-0142-9

Download citation

Further reading