Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Comment
  • Published:

The curious case of the test set AUROC

A Publisher Correction to this article was published on 12 April 2024

This article has been updated

The area under the receiver operating characteristic curve (AUROC) of the test set is used throughout machine learning (ML) for assessing a model’s performance. However, when concordance is not the only ambition, this gives only a partial insight into performance, masking distribution shifts of model outputs and model instability.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Equivalence of ROC and AUROC for different distributions.
Fig. 2: Multiple data cohort discrepancies.
Fig. 3: Single-dataset discrepancy scores.

Change history

References

  1. Halligan, S., Altman, D. G. & Mallett, S. Eur. Radiol. 25, 932–939 (2015).

    Article  Google Scholar 

  2. Lobo, J. M., Jiménez-Valverde, A. & Real, R. Glob. Ecol. Biogeogr. 17, 145–151 (2008).

    Article  Google Scholar 

  3. Kwegyir-Aggrey, K., Gerchick, M., Mohan, M. Horowitz, A. & Venkatasubramanian, S. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) 1570–1583 (ACM, 2023).

  4. White, N., Parsons, R., Collins, G. & Barnett, A. BMC Med. 21, 339 (2023).

    Article  Google Scholar 

  5. Rabe, C. et al. Alzheimers Dement. 19, 1393–1402 (2023).

    Article  Google Scholar 

  6. Roberts, M. et al. Nat. Mach. Intell. 3, 199–217 (2021).

    Article  Google Scholar 

  7. Wynants, L. et al. BMJ 369, m1328 (2020).

    Article  Google Scholar 

  8. Chicco, D. & Jurman, G. BioData Min. 16, 4 (2023).

    Article  Google Scholar 

  9. Hazan, A. & Dittmer, S. CodeOcean https://doi.org/10.24433/CO.1960655.v1 (2023).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Roberts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roberts, M., Hazan, A., Dittmer, S. et al. The curious case of the test set AUROC. Nat Mach Intell 6, 373–376 (2024). https://doi.org/10.1038/s42256-024-00817-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00817-7

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics