Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy

A preprint version of the article is available at arXiv.

Abstract

The growing use of machine learning in policy and social impact settings has raised concerns over fairness implications, especially for racial minorities. These concerns have generated considerable interest among machine learning and artificial intelligence researchers, who have developed new methods and established theoretical bounds for improving fairness, focusing on the source data, regularization and model training, or post-hoc adjustments to model scores. However, few studies have examined the practical trade-offs between fairness and accuracy in real-world settings to understand how these bounds and methods translate into policy choices and impact on society. Our empirical study fills this gap by investigating the impact of mitigating disparities on accuracy, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programmes across education, mental health, criminal justice and housing safety. Here we describe applied work in which we find fairness–accuracy trade-offs to be negligible in practice. In each setting studied, explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy. This observation was robust across policy contexts studied, scale of resources available for intervention, time and the relative size of the protected groups. These empirical results challenge a commonly held assumption that reducing disparities requires either accepting an appreciable drop in accuracy or the development of novel, complex methods, making reducing disparities in these applications more practical.

Your institute does not have access to this article

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Illustration of the methods used and motivating results.
Fig. 2: Comparing equity and performance metrics for different model selection strategies between policy contexts.
Fig. 3: Comparing disparity and performance metrics over time for different model selection strategies in the inmate mental health policy setting.
Fig. 4: Comparing disparity and performance metrics across programme scale and protected group size in the inmate mental health policy setting.

Data availability

Data from the inmate mental health context were shared through a partnership and data use agreement with the county government of Johnson County, KS (which collected and made available data from the county- and city-level agencies in their jurisdiction as described in the Methods). Data from the housing safety context were shared through a partnership and data use agreement with the Code Enforcement Division in the city of San Jose, CA. Data from the student outcomes setting were shared through a partnership and data use agreement with the Ministry of Education in El Salvador. Although the sensitive nature of the data for these three contexts required that the work was performed under strict data use agreements and the data cannot be made publicly available, researchers or practitioners interested in collaborating on these projects or with the agencies involved should contact the corresponding author for more information and introductions. The education crowdfunding dataset is publicly available at https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/data. A database extract with model outputs and disparity mitigation results using this dataset is available for download (see replication instructions in the GitHub repository linked in the code availability statement).

Code availability

The code used here for modelling, disparity mitigation and analysis for all four projects is available at https://github.com/dssg/peeps-chili (ref. 44). Complete instructions for replication of the education crowdfunding results reported here can be found in the README of this respository, along with a step-by-step jupyter notebook for performing the analysis.

References

  1. Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).

    Article  Google Scholar 

  2. Skeem, J. L. & Lowenkamp, C. T. Risk, race, and recidivism: predictive bias and disparate impact. Criminology 54, 680–712 (2016).

    Article  Google Scholar 

  3. Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine bias. ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  4. Raghavan, M., Barocas, S., Kleinberg, J. & Levy, K. Mitigating bias in algorithmic hiring: evaluating claims and practices. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 469–481 (ACM, 2020).

  5. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 336, 447–453 (2019).

    Article  Google Scholar 

  6. Ramachandran, A. et al. Predictive analytics for retention in care in an urban HIV clinic. Sci. Rep. https://doi.org/10.1038/s41598-020-62729-x (2020).

  7. Bauman, M. J. et al. Reducing incarceration through prioritized interventions. In Proc. 1st Conference on Computing and Sustainable Societies (COMPASS) (ed. Zegura, E.) 1–8 (ACM, 2018).

  8. Chouldechova, A. et al. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Mach. Learn. Res. 81, 134–148 (2018).

    Google Scholar 

  9. Potash, E. et al. Predictive modeling for public health: preventing childhood lead poisoning. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 2039–2047 (ACM, 2015).

  10. Chen, I. Y., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 3539–3550 (NIPS, 2018).

  11. Celis, L. E., Huang, L., Keswani, V. & Vishnoi, N. K. Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 319–328 (ACM, 2019).

  12. Zafar, M. B., Valera, I., Rodriguez, M. G. & Gummadi, K. P. Fairness beyond disparate treatment and disparate impact: learning classification without disparate mistreatment. In 26th International World Wide Web Conference (eds Barrett, R. & Cummings, R.) 1171–1180 (WWW, 2017).

  13. Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for group-fair and efficient machine learning. Proc. Mach. Learn. Res. 81, 119–133 (2018).

    Google Scholar 

  14. Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D., von Luxburg, U., Garnett, R., Sugiyama, M. & Guyon, I.) 3315–3323 (NIPS, 2016).

  15. Rodolfa, K. T. et al. Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 142–153 (ACM, 2020).

  16. Heidari, H., Gummadi, K. P., Ferrari, C. & Krause, A. Fairness behind a veil of ignorance: a welfare analysis for automated decision making. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 1265–1276 (NIPS, 2018).

  17. Friedler, S. A. et al. A comparative study of fairness-enhancing interventions in machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 329–338 (ACM, 2019).

  18. Kearns, M., Roth, A., Neel, S. & Wu, Z. S. An empirical study of rich subgroup fairness for machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 100–109 (ACM, 2019).

  19. Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. Proc. 20th International Conference on Artificial Intelligence and Statistics (eds Singh, A. & Zhu, J.) 962–970 (PMLR, 2017).

  20. Ghani, R., Walsh, J. & Wang, J. Top 10 ways your Machine Learning models may have leakage (Data Science for Social Good Blog, 2020); http://www.rayidghani.com/2020/01/24/top-10-ways-your-machine-learning-models-may-have-leakage

  21. Verma, S. & Rubin, J. Fairness definitions explained. In Proc. 2018 International Workshop on Software Fairness (eds Brun, Y., Johnson, B. & Meliou, A.) 1–7 (IEEE/ACM, 2018).

  22. Gajane, P. & Pechenizkiy, M. On formalizing fairness in prediction with machine learning. Preprint at https://arxiv.org/abs/1710.03184 (2018).

  23. Kleinberg, J. M., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In Proc. 8th Innovations in Theoretical Computer Science Conference (ed. Psounis, K.) 1–43 (ITCS, 2017).

  24. Krishna Menon, A. & Williamson, R. C. The cost of fairness in binary classification. In Proc. 1st Conference on Fairness, Accountability, and Transparency (eds Friedler, S. & Wilson, C.) 107–118 (PMLR, 2018).

  25. Huq, A. Racial equity in algorithmic criminal justice. Duke Law J. 68, 1043–1134 (2019).

    Google Scholar 

  26. Hamilton, M. People with complex needs and the criminal justice system. Curr. Iss. Crim. Justice 22, 307–324 (2010).

    Article  Google Scholar 

  27. James, D. J. & Glaze, L. E. Mental Health Problems of Prison and Jail Inmates (Department of Justice, Bureau of Justice Statistics, 2006); https://www.bjs.gov/content/pub/pdf/mhppji.pdf

  28. Fuller Torrey, E., Kennard, A. D., Eslinger, D., Lamb, R. & Pavle, J. More Mentally Ill Persons Are in Jails and Prisons Than Hospitals: A Survey of the States (Treatment Advocacy Center and National Sheriffs’ Association, 2010); http://tulare.networkofcare.org/library/final_jails_v_hospitals_study1.pdf

  29. Holtzen, H., Klein, E. G., Keller, B. & Hood, N. Perceptions of physical inspections as a tool to protect housing quality and promote health equity. J. Health Care Poor Underserv. 27, 549–559 (2016).

    Article  Google Scholar 

  30. Klein, E., Keller, B., Hood, N. & Holtzen, H. Affordable housing and health: a health impact assessment on physical inspection frequency. J. Public Health Manage. Practice 21, 368–374 (2015).

    Article  Google Scholar 

  31. Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483–485 (2017).

    Article  Google Scholar 

  32. Glaeser, E. L., Hillis, A., Kominers, S. D. & Luca, M. Crowdsourcing city government: using tournaments to improve inspection accuracy. Am. Econ. Rev. 106, 114–118 (2016).

    Article  Google Scholar 

  33. Levin, H. M. & Belfield, C. The Price We Pay: Economic and Social Consequences of Inadequate Education (Brookings Institution, 2007).

  34. Atwell, M. N., Balfanz, R., Bridgeland, J. & Ingram, E. Building a Grad Nation (America’s Promise Alliance, 2019); https://www.americaspromise.org/2019-building-grad-nation-report

  35. Lakkaraju, H. et al. A machine learning framework to identify students at risk of adverse academic outcomes. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 1909–1918 (ACM, 2015).

  36. Aguiar, E. et al. Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In Proc. Fifth International Conference on Learning Analytics and Knowledge (eds Baron, J., Lynch, G. & Maziarz, N.) 93–102 (ACM, 2015).

  37. Bowers, A. J., Sprott, R. & Taff, S. A. Do we know who will drop out? A review of the predictors of dropping out of high school: precision, sensitivity, and specificity. High School J. 96, 77–100 (2012).

    Article  Google Scholar 

  38. Morgan, I. & Amerikaner, A. Funding Gaps 2018 (The Education Trust, 2018); https://edtrust.org/wp-content/uploads/2014/09/FundingGapReport_2018_FINAL.pdf

  39. Hurza, M. What Do Teachers Spend on Supplies (Adopt a Classroom, 2015); https://www.adoptaclassroom.org/2015/09/15/infographic-recent-aac-survey-results-on-teacher-spending/

  40. Ghani, R. Triage (Center for Data Science and Public Policy, 2016); http://www.datasciencepublicpolicy.org/projects/triage/

  41. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  42. Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913–929 (2017).

    Article  Google Scholar 

  43. Ye, T. et al. Using machine learning to help vulnerable tenants in New York City. In Proc. 2nd Conference on Computing and Sustainable Societies (COMPASS) (eds Chen, J., Mankoff, J. & Gomes C.) 248–258 (ACM, 2019).

  44. Rodolfa, K. T. & Lamba, H. dssg/peeps-chili: release for trade-offs submission. Zenodo https://doi.org/10.5281/zenodo.5173254 (2021).

Download references

Acknowledgements

We thank the Data Science for Social Good Fellowship fellows, project partners and funders, as well as our colleagues at the Center for Data Science and Public Policy at University of Chicago for the initial work on projects that were extended and used in this study. We also thank K. Amarasinghe for helpful discussions on the study and drafts of this paper. Parts of this work were funded by the National Science Foundation under grant number IIS-2040929 (to K.T.R. and R.G.) and by a grant (unnumbered) from the C3.ai Digital Transformation Institute (to K.T.R., H.L. and R.G.).

Author information

Authors and Affiliations

Authors

Contributions

K.T.R. And R.G. conceptualized the study. K.T.R. designed the methodology, contributed to the software and investigation and wrote the original draft. H.L. contributed to the software and investigation, and reviewed and edited the manuscript. R.G. supervised the study, acquired funding and edited and reviewed the manuscript.

Corresponding author

Correspondence to Rayid Ghani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review informationNature Machine Intelligence thanks Nikhil Garg, Kristian Kersting and Allison Koenecke for their contribution to to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–7 and Tables 1–4.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rodolfa, K.T., Lamba, H. & Ghani, R. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat Mach Intell 3, 896–904 (2021). https://doi.org/10.1038/s42256-021-00396-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-021-00396-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing