Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy

Rodolfa, Kit T.; Lamba, Hemank; Ghani, Rayid

doi:10.1038/s42256-021-00396-x

Article
Published: 14 October 2021

Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy

Nature Machine Intelligence volume 3, pages 896–904 (2021)Cite this article

2877 Accesses
32 Citations
138 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

The growing use of machine learning in policy and social impact settings has raised concerns over fairness implications, especially for racial minorities. These concerns have generated considerable interest among machine learning and artificial intelligence researchers, who have developed new methods and established theoretical bounds for improving fairness, focusing on the source data, regularization and model training, or post-hoc adjustments to model scores. However, few studies have examined the practical trade-offs between fairness and accuracy in real-world settings to understand how these bounds and methods translate into policy choices and impact on society. Our empirical study fills this gap by investigating the impact of mitigating disparities on accuracy, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programmes across education, mental health, criminal justice and housing safety. Here we describe applied work in which we find fairness–accuracy trade-offs to be negligible in practice. In each setting studied, explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy. This observation was robust across policy contexts studied, scale of resources available for intervention, time and the relative size of the protected groups. These empirical results challenge a commonly held assumption that reducing disparities requires either accepting an appreciable drop in accuracy or the development of novel, complex methods, making reducing disparities in these applications more practical.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Illustration of the methods used and motivating results.**

**Fig. 2: Comparing equity and performance metrics for different model selection strategies between policy contexts.**

**Fig. 3: Comparing disparity and performance metrics over time for different model selection strategies in the inmate mental health policy setting.**

**Fig. 4: Comparing disparity and performance metrics across programme scale and protected group size in the inmate mental health policy setting.**

Machine learning and algorithmic fairness in public and population health

Article 29 July 2021

Vishwali Mhasawade, Yuan Zhao & Rumi Chunara

Algorithmic fairness in pandemic forecasting: lessons from COVID-19

Article Open access 10 May 2022

Thomas C. Tsai, Sercan Arik, … Tomas Pfister

Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities

Article Open access 30 July 2020

Jessica K. Paulus & David M. Kent

Data availability

Data from the inmate mental health context were shared through a partnership and data use agreement with the county government of Johnson County, KS (which collected and made available data from the county- and city-level agencies in their jurisdiction as described in the Methods). Data from the housing safety context were shared through a partnership and data use agreement with the Code Enforcement Division in the city of San Jose, CA. Data from the student outcomes setting were shared through a partnership and data use agreement with the Ministry of Education in El Salvador. Although the sensitive nature of the data for these three contexts required that the work was performed under strict data use agreements and the data cannot be made publicly available, researchers or practitioners interested in collaborating on these projects or with the agencies involved should contact the corresponding author for more information and introductions. The education crowdfunding dataset is publicly available at https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/data. A database extract with model outputs and disparity mitigation results using this dataset is available for download (see replication instructions in the GitHub repository linked in the code availability statement).

Code availability

The code used here for modelling, disparity mitigation and analysis for all four projects is available at https://github.com/dssg/peeps-chili (ref. ⁴⁴). Complete instructions for replication of the education crowdfunding results reported here can be found in the README of this respository, along with a step-by-step jupyter notebook for performing the analysis.

References

Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).
Article Google Scholar
Skeem, J. L. & Lowenkamp, C. T. Risk, race, and recidivism: predictive bias and disparate impact. Criminology 54, 680–712 (2016).
Article Google Scholar
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine bias. ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Raghavan, M., Barocas, S., Kleinberg, J. & Levy, K. Mitigating bias in algorithmic hiring: evaluating claims and practices. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 469–481 (ACM, 2020).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 336, 447–453 (2019).
Article Google Scholar
Ramachandran, A. et al. Predictive analytics for retention in care in an urban HIV clinic. Sci. Rep. https://doi.org/10.1038/s41598-020-62729-x (2020).
Bauman, M. J. et al. Reducing incarceration through prioritized interventions. In Proc. 1st Conference on Computing and Sustainable Societies (COMPASS) (ed. Zegura, E.) 1–8 (ACM, 2018).
Chouldechova, A. et al. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Mach. Learn. Res. 81, 134–148 (2018).
Google Scholar
Potash, E. et al. Predictive modeling for public health: preventing childhood lead poisoning. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 2039–2047 (ACM, 2015).
Chen, I. Y., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 3539–3550 (NIPS, 2018).
Celis, L. E., Huang, L., Keswani, V. & Vishnoi, N. K. Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 319–328 (ACM, 2019).
Zafar, M. B., Valera, I., Rodriguez, M. G. & Gummadi, K. P. Fairness beyond disparate treatment and disparate impact: learning classification without disparate mistreatment. In 26th International World Wide Web Conference (eds Barrett, R. & Cummings, R.) 1171–1180 (WWW, 2017).
Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for group-fair and efficient machine learning. Proc. Mach. Learn. Res. 81, 119–133 (2018).
Google Scholar
Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D., von Luxburg, U., Garnett, R., Sugiyama, M. & Guyon, I.) 3315–3323 (NIPS, 2016).
Rodolfa, K. T. et al. Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 142–153 (ACM, 2020).
Heidari, H., Gummadi, K. P., Ferrari, C. & Krause, A. Fairness behind a veil of ignorance: a welfare analysis for automated decision making. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 1265–1276 (NIPS, 2018).
Friedler, S. A. et al. A comparative study of fairness-enhancing interventions in machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 329–338 (ACM, 2019).
Kearns, M., Roth, A., Neel, S. & Wu, Z. S. An empirical study of rich subgroup fairness for machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 100–109 (ACM, 2019).
Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. Proc. 20th International Conference on Artificial Intelligence and Statistics (eds Singh, A. & Zhu, J.) 962–970 (PMLR, 2017).
Ghani, R., Walsh, J. & Wang, J. Top 10 ways your Machine Learning models may have leakage (Data Science for Social Good Blog, 2020); http://www.rayidghani.com/2020/01/24/top-10-ways-your-machine-learning-models-may-have-leakage
Verma, S. & Rubin, J. Fairness definitions explained. In Proc. 2018 International Workshop on Software Fairness (eds Brun, Y., Johnson, B. & Meliou, A.) 1–7 (IEEE/ACM, 2018).
Gajane, P. & Pechenizkiy, M. On formalizing fairness in prediction with machine learning. Preprint at https://arxiv.org/abs/1710.03184 (2018).
Kleinberg, J. M., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In Proc. 8th Innovations in Theoretical Computer Science Conference (ed. Psounis, K.) 1–43 (ITCS, 2017).
Krishna Menon, A. & Williamson, R. C. The cost of fairness in binary classification. In Proc. 1st Conference on Fairness, Accountability, and Transparency (eds Friedler, S. & Wilson, C.) 107–118 (PMLR, 2018).
Huq, A. Racial equity in algorithmic criminal justice. Duke Law J. 68, 1043–1134 (2019).
Google Scholar
Hamilton, M. People with complex needs and the criminal justice system. Curr. Iss. Crim. Justice 22, 307–324 (2010).
Article Google Scholar
James, D. J. & Glaze, L. E. Mental Health Problems of Prison and Jail Inmates (Department of Justice, Bureau of Justice Statistics, 2006); https://www.bjs.gov/content/pub/pdf/mhppji.pdf
Fuller Torrey, E., Kennard, A. D., Eslinger, D., Lamb, R. & Pavle, J. More Mentally Ill Persons Are in Jails and Prisons Than Hospitals: A Survey of the States (Treatment Advocacy Center and National Sheriffs’ Association, 2010); http://tulare.networkofcare.org/library/final_jails_v_hospitals_study1.pdf
Holtzen, H., Klein, E. G., Keller, B. & Hood, N. Perceptions of physical inspections as a tool to protect housing quality and promote health equity. J. Health Care Poor Underserv. 27, 549–559 (2016).
Article Google Scholar
Klein, E., Keller, B., Hood, N. & Holtzen, H. Affordable housing and health: a health impact assessment on physical inspection frequency. J. Public Health Manage. Practice 21, 368–374 (2015).
Article Google Scholar
Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483–485 (2017).
Article Google Scholar
Glaeser, E. L., Hillis, A., Kominers, S. D. & Luca, M. Crowdsourcing city government: using tournaments to improve inspection accuracy. Am. Econ. Rev. 106, 114–118 (2016).
Article Google Scholar
Levin, H. M. & Belfield, C. The Price We Pay: Economic and Social Consequences of Inadequate Education (Brookings Institution, 2007).
Atwell, M. N., Balfanz, R., Bridgeland, J. & Ingram, E. Building a Grad Nation (America’s Promise Alliance, 2019); https://www.americaspromise.org/2019-building-grad-nation-report
Lakkaraju, H. et al. A machine learning framework to identify students at risk of adverse academic outcomes. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 1909–1918 (ACM, 2015).
Aguiar, E. et al. Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In Proc. Fifth International Conference on Learning Analytics and Knowledge (eds Baron, J., Lynch, G. & Maziarz, N.) 93–102 (ACM, 2015).
Bowers, A. J., Sprott, R. & Taff, S. A. Do we know who will drop out? A review of the predictors of dropping out of high school: precision, sensitivity, and specificity. High School J. 96, 77–100 (2012).
Article Google Scholar
Morgan, I. & Amerikaner, A. Funding Gaps 2018 (The Education Trust, 2018); https://edtrust.org/wp-content/uploads/2014/09/FundingGapReport_2018_FINAL.pdf
Hurza, M. What Do Teachers Spend on Supplies (Adopt a Classroom, 2015); https://www.adoptaclassroom.org/2015/09/15/infographic-recent-aac-survey-results-on-teacher-spending/
Ghani, R. Triage (Center for Data Science and Public Policy, 2016); http://www.datasciencepublicpolicy.org/projects/triage/
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913–929 (2017).
Article Google Scholar
Ye, T. et al. Using machine learning to help vulnerable tenants in New York City. In Proc. 2nd Conference on Computing and Sustainable Societies (COMPASS) (eds Chen, J., Mankoff, J. & Gomes C.) 248–258 (ACM, 2019).
Rodolfa, K. T. & Lamba, H. dssg/peeps-chili: release for trade-offs submission. Zenodo https://doi.org/10.5281/zenodo.5173254 (2021).

Download references

Acknowledgements

We thank the Data Science for Social Good Fellowship fellows, project partners and funders, as well as our colleagues at the Center for Data Science and Public Policy at University of Chicago for the initial work on projects that were extended and used in this study. We also thank K. Amarasinghe for helpful discussions on the study and drafts of this paper. Parts of this work were funded by the National Science Foundation under grant number IIS-2040929 (to K.T.R. and R.G.) and by a grant (unnumbered) from the C3.ai Digital Transformation Institute (to K.T.R., H.L. and R.G.).

Author information

Authors and Affiliations

Machine Learning Department and Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA
Kit T. Rodolfa, Hemank Lamba & Rayid Ghani

Authors

Kit T. Rodolfa
View author publications
You can also search for this author in PubMed Google Scholar
Hemank Lamba
View author publications
You can also search for this author in PubMed Google Scholar
Rayid Ghani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.T.R. And R.G. conceptualized the study. K.T.R. designed the methodology, contributed to the software and investigation and wrote the original draft. H.L. contributed to the software and investigation, and reviewed and edited the manuscript. R.G. supervised the study, acquired funding and edited and reviewed the manuscript.

Corresponding author

Correspondence to Rayid Ghani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Nikhil Garg, Kristian Kersting and Allison Koenecke for their contribution to to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Discussion, Figs. 1–7 and Tables 1–4.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodolfa, K.T., Lamba, H. & Ghani, R. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat Mach Intell 3, 896–904 (2021). https://doi.org/10.1038/s42256-021-00396-x

Download citation

Received: 04 February 2021
Accepted: 29 August 2021
Published: 14 October 2021
Issue Date: October 2021
DOI: https://doi.org/10.1038/s42256-021-00396-x

This article is cited by

Fairness and bias correction in machine learning for depression prediction across four study populations
- Vien Ngoc Dang
- Anna Cascarano
- Karim Lekadir
Scientific Reports (2024)
Investigating fairness in machine learning-based audio sentiment analysis
- Sophina Luitel
- Yang Liu
- Mohd Anwar
AI and Ethics (2024)
Human visual explanations mitigate bias in AI-based assessment of surgeon skills
- Dani Kiyasseh
- Jasper Laca
- Andrew J. Hung
npj Digital Medicine (2023)
Translating intersectionality to fair machine learning in health sciences
- Elle Lett
- William G. La Cava
Nature Machine Intelligence (2023)
A translational perspective towards clinical AI fairness
- Mingxuan Liu
- Yilin Ning
- Nan Liu
npj Digital Medicine (2023)