Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI

To the Editor — Over the next decade, systems that are centered on artificial intelligence (AI), particularly machine learning, are predicted to become key components of several workflows within the health sector. Medical diagnosis is seen as one of the first areas that may be revolutionized by AI innovations. Indeed, more than 90% of health-related AI systems that have reached regulatory approval by the US Food and Drug Administration belong to the field of diagnostics1.

In the current paradigm, most diagnostic investigations require interpretation from a clinician to identify the presence of a target condition — a crucial step in determining subsequent treatment strategies. Despite being an essential step in the provision of patient care, many health systems find it increasingly difficult to meet the demand for the interpretation of diagnostic tests. To address this issue, diagnostic AI systems have been characterized as medical devices that may alleviate the burden placed on diagnosticians: by serving as case triage tools, enhancing diagnostic accuracy and stepping in as a second reader when necessary. As AI-centered diagnostic test accuracy (AI DTA) studies emerge, there has been a concurrent rise in systematic reviews that amalgamate the findings of comparable studies.

Notably, of these published AI DTA systematic reviews, 94% have been conducted in the absence of an AI-specific quality assessment tool2. The most commonly used instrument is the quality assessment of diagnostic accuracy studies (QUADAS-2) tool3. QUADAS-2 is a tool that assesses bias and applicability and its use is encouraged by current PRISMA 2020 guidance4. However, QUADAS-2 does not accommodate for niche terminology encountered in AI DTA studies, nor does it signal researchers to the sources of bias found within this class of study. Examples of such biases, when framed against the established domains of QUADAS-2 (patient selection; index test; reference standard; and flow and timing) are listed in Table 1.

Table 1 Examples of bias within AI DTA studies

To tackle these sources of bias, as well as AI-specific examples such as algorithmic bias, we propose an AI-specific extension to QUADAS-2 and QUADAS-C5, a risk of bias tool that has been developed for comparative accuracy studies. This new tool, termed QUADAS-AI, will provide researchers and policy-makers with a specific framework to evaluate the risk of bias and applicability when conducting reviews that evaluate AI DTA and reviews of comparative accuracy studies that evaluate at least one AI-centered index test.

QUADAS-AI will be complementary to ongoing reporting guideline tool initiatives, such as STARD-AI6 and TRIPOD-AI7. QUADAS-AI is being coordinated by a global project team and steering committee that consists of clinician scientists, computer scientists, epidemiologists, statisticians, journal editors, representatives of the EQUATOR Network11, regulatory leaders, industry leaders, funders, health policy-makers and bioethicists. Given the reach of AI technologies, we view that connecting global stakeholders is of the utmost importance for this initiative. In turn, we would welcome contact from any new potential collaborators.

References

  1. 1.

    Benjamens, S., Dhunnoo, P. & Meskó, B. npj Digit. Med. 3, 118 (2020).

    Article  Google Scholar 

  2. 2.

    Jayakumar, S. et al. npj Digital Med. (in the press).

  3. 3.

    Whiting, P. F. Ann. Intern. Med. 155, 529 (2011).

    Article  Google Scholar 

  4. 4.

    Page, M. J. et al. BMJ. 372, n71 (2021).

    Article  Google Scholar 

  5. 5.

    Yang, B. et al. Open Science Framework https://doi.org/10.17605/OSF.IO/HQ8MF (2018).

  6. 6.

    Sounderajah, V. et al. Nat. Med. 26, 807–808 (2020).

    CAS  Article  Google Scholar 

  7. 7.

    Collins, G. & Moons, K. Lancet 393, 1577–1579 (2019).

    Article  Google Scholar 

  8. 8.

    Liu, X. & Rivera, S. C. Nat. Med. 26, 1364–1374 (2020).

    CAS  Article  Google Scholar 

  9. 9.

    Harris, M. et al. PLoS One 14, e0226134 (2019).

    Google Scholar 

  10. 10.

    Roberts, M. et al. Nat. Mach. Intell. 3, 199–217 (2021).

    Article  Google Scholar 

  11. 11.

    The EQUATOR Network.Enhancing the QUAlity and Transparency Of Health Research; https://www.equator-network.org/ (accessed 27 September 2021).

Download references

Acknowledgements

Infrastructure support for this research was provided by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre (BRC). G.S.C. is supported by the NIHR Biomedical Research Centre and Cancer Research UK (programme grant C49297/A27294). D.T. is funded by National Pathology Imaging Co-operative, NPIC (project no. 104687), supported by a £50 million investment from the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, and managed and delivered by UK Research and Innovation (UKRI). F.G. is supported by the NIHR Applied Research Collaboration Northwest London. The views and opinions expressed herein are those of the authors and do not necessarily reflect the views of their employers or funders.

Author information

Affiliations

Authors

Contributions

V.S., S.R., N.H.S., M.G., R.G., C.E.K., X.L., G.S.C., D.W., A.E., H.A., D. Milea, D. McPherson, J.O., D. Treanor, J.F.C., M.L., M.M., M.D.F.M., M.D.A., S.M., P.W. and P.M.B. prepared the first draft of the manuscript. Critical edits, further direction and feedback have been attained from all co-authors (including A.K., B.M., D. Ting, D.C., D.K., F.G., L.H., J.D., M.D., P.N., S.M., S.C., S.S., A.D., D.M. and A.D.). The study described in the manuscript has been conceptualized, discussed and agreed upon between all co-authors.

Corresponding authors

Correspondence to Viknesh Sounderajah, Patrick M. Bossuyt or Ara Darzi.

Ethics declarations

Competing interests

A.K., S.S. and D.W. are employees at Google. A.D. and H.A. are employees at Flagship Pioneering UK Ltd. A.E. is an employee at Salesforce. DK is an employee at Optum. None of the other authors have any competing interests.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sounderajah, V., Ashrafian, H., Rose, S. et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med 27, 1663–1665 (2021). https://doi.org/10.1038/s41591-021-01517-0

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing