The growing use of machine learning in policy and social impact settings has raised concerns over fairness implications, especially for racial minorities. These concerns have generated considerable interest among machine learning and artificial intelligence researchers, who have developed new methods and established theoretical bounds for improving fairness, focusing on the source data, regularization and model training, or post-hoc adjustments to model scores. However, few studies have examined the practical trade-offs between fairness and accuracy in real-world settings to understand how these bounds and methods translate into policy choices and impact on society. Our empirical study fills this gap by investigating the impact of mitigating disparities on accuracy, focusing on the common context of using machine learning to inform benefit allocation in resource-constrained programmes across education, mental health, criminal justice and housing safety. Here we describe applied work in which we find fairness–accuracy trade-offs to be negligible in practice. In each setting studied, explicitly focusing on achieving equity and using our proposed post-hoc disparity mitigation methods, fairness was substantially improved without sacrificing accuracy. This observation was robust across policy contexts studied, scale of resources available for intervention, time and the relative size of the protected groups. These empirical results challenge a commonly held assumption that reducing disparities requires either accepting an appreciable drop in accuracy or the development of novel, complex methods, making reducing disparities in these applications more practical.
Your institute does not have access to this article
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data from the inmate mental health context were shared through a partnership and data use agreement with the county government of Johnson County, KS (which collected and made available data from the county- and city-level agencies in their jurisdiction as described in the Methods). Data from the housing safety context were shared through a partnership and data use agreement with the Code Enforcement Division in the city of San Jose, CA. Data from the student outcomes setting were shared through a partnership and data use agreement with the Ministry of Education in El Salvador. Although the sensitive nature of the data for these three contexts required that the work was performed under strict data use agreements and the data cannot be made publicly available, researchers or practitioners interested in collaborating on these projects or with the agencies involved should contact the corresponding author for more information and introductions. The education crowdfunding dataset is publicly available at https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/data. A database extract with model outputs and disparity mitigation results using this dataset is available for download (see replication instructions in the GitHub repository linked in the code availability statement).
The code used here for modelling, disparity mitigation and analysis for all four projects is available at https://github.com/dssg/peeps-chili (ref. 44). Complete instructions for replication of the education crowdfunding results reported here can be found in the README of this respository, along with a step-by-step jupyter notebook for performing the analysis.
Chouldechova, A. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5, 153–163 (2017).
Skeem, J. L. & Lowenkamp, C. T. Risk, race, and recidivism: predictive bias and disparate impact. Criminology 54, 680–712 (2016).
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine bias. ProPublica (23 May 2016); www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Raghavan, M., Barocas, S., Kleinberg, J. & Levy, K. Mitigating bias in algorithmic hiring: evaluating claims and practices. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 469–481 (ACM, 2020).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 336, 447–453 (2019).
Ramachandran, A. et al. Predictive analytics for retention in care in an urban HIV clinic. Sci. Rep. https://doi.org/10.1038/s41598-020-62729-x (2020).
Bauman, M. J. et al. Reducing incarceration through prioritized interventions. In Proc. 1st Conference on Computing and Sustainable Societies (COMPASS) (ed. Zegura, E.) 1–8 (ACM, 2018).
Chouldechova, A. et al. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Mach. Learn. Res. 81, 134–148 (2018).
Potash, E. et al. Predictive modeling for public health: preventing childhood lead poisoning. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 2039–2047 (ACM, 2015).
Chen, I. Y., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 3539–3550 (NIPS, 2018).
Celis, L. E., Huang, L., Keswani, V. & Vishnoi, N. K. Classification with fairness constraints: a meta-algorithm with provable guarantees. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 319–328 (ACM, 2019).
Zafar, M. B., Valera, I., Rodriguez, M. G. & Gummadi, K. P. Fairness beyond disparate treatment and disparate impact: learning classification without disparate mistreatment. In 26th International World Wide Web Conference (eds Barrett, R. & Cummings, R.) 1171–1180 (WWW, 2017).
Dwork, C., Immorlica, N., Kalai, A. T. & Leiserson, M. Decoupled classifiers for group-fair and efficient machine learning. Proc. Mach. Learn. Res. 81, 119–133 (2018).
Hardt, M., Price, E. & Srebro, N. Equality of opportunity in supervised learning. In Proc. 30th International Conference on Neural Information Processing Systems (eds Lee, D. D., von Luxburg, U., Garnett, R., Sugiyama, M. & Guyon, I.) 3315–3323 (NIPS, 2016).
Rodolfa, K. T. et al. Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions. In Proc. 2020 Conference on Fairness, Accountability, and Transparency (eds Hildebrandt, M. & Castillo, C.) 142–153 (ACM, 2020).
Heidari, H., Gummadi, K. P., Ferrari, C. & Krause, A. Fairness behind a veil of ignorance: a welfare analysis for automated decision making. In Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K. & Cesa-Bianchi, N.) 1265–1276 (NIPS, 2018).
Friedler, S. A. et al. A comparative study of fairness-enhancing interventions in machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 329–338 (ACM, 2019).
Kearns, M., Roth, A., Neel, S. & Wu, Z. S. An empirical study of rich subgroup fairness for machine learning. In Proc. 2019 Conference on Fairness, Accountability, and Transparency (eds Boyd, D. & Morgenstern, J.) 100–109 (ACM, 2019).
Zafar, M. B., Valera, I., Rogriguez, M. G. & Gummadi, K. P. Fairness constraints: mechanisms for fair classification. Proc. 20th International Conference on Artificial Intelligence and Statistics (eds Singh, A. & Zhu, J.) 962–970 (PMLR, 2017).
Ghani, R., Walsh, J. & Wang, J. Top 10 ways your Machine Learning models may have leakage (Data Science for Social Good Blog, 2020); http://www.rayidghani.com/2020/01/24/top-10-ways-your-machine-learning-models-may-have-leakage
Verma, S. & Rubin, J. Fairness definitions explained. In Proc. 2018 International Workshop on Software Fairness (eds Brun, Y., Johnson, B. & Meliou, A.) 1–7 (IEEE/ACM, 2018).
Gajane, P. & Pechenizkiy, M. On formalizing fairness in prediction with machine learning. Preprint at https://arxiv.org/abs/1710.03184 (2018).
Kleinberg, J. M., Mullainathan, S. & Raghavan, M. Inherent trade-offs in the fair determination of risk scores. In Proc. 8th Innovations in Theoretical Computer Science Conference (ed. Psounis, K.) 1–43 (ITCS, 2017).
Krishna Menon, A. & Williamson, R. C. The cost of fairness in binary classification. In Proc. 1st Conference on Fairness, Accountability, and Transparency (eds Friedler, S. & Wilson, C.) 107–118 (PMLR, 2018).
Huq, A. Racial equity in algorithmic criminal justice. Duke Law J. 68, 1043–1134 (2019).
Hamilton, M. People with complex needs and the criminal justice system. Curr. Iss. Crim. Justice 22, 307–324 (2010).
James, D. J. & Glaze, L. E. Mental Health Problems of Prison and Jail Inmates (Department of Justice, Bureau of Justice Statistics, 2006); https://www.bjs.gov/content/pub/pdf/mhppji.pdf
Fuller Torrey, E., Kennard, A. D., Eslinger, D., Lamb, R. & Pavle, J. More Mentally Ill Persons Are in Jails and Prisons Than Hospitals: A Survey of the States (Treatment Advocacy Center and National Sheriffs’ Association, 2010); http://tulare.networkofcare.org/library/final_jails_v_hospitals_study1.pdf
Holtzen, H., Klein, E. G., Keller, B. & Hood, N. Perceptions of physical inspections as a tool to protect housing quality and promote health equity. J. Health Care Poor Underserv. 27, 549–559 (2016).
Klein, E., Keller, B., Hood, N. & Holtzen, H. Affordable housing and health: a health impact assessment on physical inspection frequency. J. Public Health Manage. Practice 21, 368–374 (2015).
Athey, S. Beyond prediction: using big data for policy problems. Science 355, 483–485 (2017).
Glaeser, E. L., Hillis, A., Kominers, S. D. & Luca, M. Crowdsourcing city government: using tournaments to improve inspection accuracy. Am. Econ. Rev. 106, 114–118 (2016).
Levin, H. M. & Belfield, C. The Price We Pay: Economic and Social Consequences of Inadequate Education (Brookings Institution, 2007).
Atwell, M. N., Balfanz, R., Bridgeland, J. & Ingram, E. Building a Grad Nation (America’s Promise Alliance, 2019); https://www.americaspromise.org/2019-building-grad-nation-report
Lakkaraju, H. et al. A machine learning framework to identify students at risk of adverse academic outcomes. In Proc. 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Cao, L. & Zhang, C.) 1909–1918 (ACM, 2015).
Aguiar, E. et al. Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In Proc. Fifth International Conference on Learning Analytics and Knowledge (eds Baron, J., Lynch, G. & Maziarz, N.) 93–102 (ACM, 2015).
Bowers, A. J., Sprott, R. & Taff, S. A. Do we know who will drop out? A review of the predictors of dropping out of high school: precision, sensitivity, and specificity. High School J. 96, 77–100 (2012).
Morgan, I. & Amerikaner, A. Funding Gaps 2018 (The Education Trust, 2018); https://edtrust.org/wp-content/uploads/2014/09/FundingGapReport_2018_FINAL.pdf
Hurza, M. What Do Teachers Spend on Supplies (Adopt a Classroom, 2015); https://www.adoptaclassroom.org/2015/09/15/infographic-recent-aac-survey-results-on-teacher-spending/
Ghani, R. Triage (Center for Data Science and Public Policy, 2016); http://www.datasciencepublicpolicy.org/projects/triage/
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Roberts, D. R. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40, 913–929 (2017).
Ye, T. et al. Using machine learning to help vulnerable tenants in New York City. In Proc. 2nd Conference on Computing and Sustainable Societies (COMPASS) (eds Chen, J., Mankoff, J. & Gomes C.) 248–258 (ACM, 2019).
Rodolfa, K. T. & Lamba, H. dssg/peeps-chili: release for trade-offs submission. Zenodo https://doi.org/10.5281/zenodo.5173254 (2021).
We thank the Data Science for Social Good Fellowship fellows, project partners and funders, as well as our colleagues at the Center for Data Science and Public Policy at University of Chicago for the initial work on projects that were extended and used in this study. We also thank K. Amarasinghe for helpful discussions on the study and drafts of this paper. Parts of this work were funded by the National Science Foundation under grant number IIS-2040929 (to K.T.R. and R.G.) and by a grant (unnumbered) from the C3.ai Digital Transformation Institute (to K.T.R., H.L. and R.G.).
The authors declare no competing interests.
Peer review information Nature Machine Intelligence thanks Nikhil Garg, Kristian Kersting and Allison Koenecke for their contribution to to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rodolfa, K.T., Lamba, H. & Ghani, R. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat Mach Intell 3, 896–904 (2021). https://doi.org/10.1038/s42256-021-00396-x