To the Editor — Over the next decade, systems that are centered on artificial intelligence (AI), particularly machine learning, are predicted to become key components of several workflows within the health sector. Medical diagnosis is seen as one of the first areas that may be revolutionized by AI innovations. Indeed, more than 90% of health-related AI systems that have reached regulatory approval by the US Food and Drug Administration belong to the field of diagnostics1.
In the current paradigm, most diagnostic investigations require interpretation from a clinician to identify the presence of a target condition — a crucial step in determining subsequent treatment strategies. Despite being an essential step in the provision of patient care, many health systems find it increasingly difficult to meet the demand for the interpretation of diagnostic tests. To address this issue, diagnostic AI systems have been characterized as medical devices that may alleviate the burden placed on diagnosticians: by serving as case triage tools, enhancing diagnostic accuracy and stepping in as a second reader when necessary. As AI-centered diagnostic test accuracy (AI DTA) studies emerge, there has been a concurrent rise in systematic reviews that amalgamate the findings of comparable studies.
Notably, of these published AI DTA systematic reviews, 94% have been conducted in the absence of an AI-specific quality assessment tool2. The most commonly used instrument is the quality assessment of diagnostic accuracy studies (QUADAS-2) tool3. QUADAS-2 is a tool that assesses bias and applicability and its use is encouraged by current PRISMA 2020 guidance4. However, QUADAS-2 does not accommodate for niche terminology encountered in AI DTA studies, nor does it signal researchers to the sources of bias found within this class of study. Examples of such biases, when framed against the established domains of QUADAS-2 (patient selection; index test; reference standard; and flow and timing) are listed in Table 1.
To tackle these sources of bias, as well as AI-specific examples such as algorithmic bias, we propose an AI-specific extension to QUADAS-2 and QUADAS-C5, a risk of bias tool that has been developed for comparative accuracy studies. This new tool, termed QUADAS-AI, will provide researchers and policy-makers with a specific framework to evaluate the risk of bias and applicability when conducting reviews that evaluate AI DTA and reviews of comparative accuracy studies that evaluate at least one AI-centered index test.
QUADAS-AI will be complementary to ongoing reporting guideline tool initiatives, such as STARD-AI6 and TRIPOD-AI7. QUADAS-AI is being coordinated by a global project team and steering committee that consists of clinician scientists, computer scientists, epidemiologists, statisticians, journal editors, representatives of the EQUATOR Network11, regulatory leaders, industry leaders, funders, health policy-makers and bioethicists. Given the reach of AI technologies, we view that connecting global stakeholders is of the utmost importance for this initiative. In turn, we would welcome contact from any new potential collaborators.
Benjamens, S., Dhunnoo, P. & Meskó, B. npj Digit. Med. 3, 118 (2020).
Jayakumar, S. et al. npj Digital Med. (in the press).
Whiting, P. F. Ann. Intern. Med. 155, 529 (2011).
Page, M. J. et al. BMJ. 372, n71 (2021).
Yang, B. et al. Open Science Framework https://doi.org/10.17605/OSF.IO/HQ8MF (2018).
Sounderajah, V. et al. Nat. Med. 26, 807–808 (2020).
Collins, G. & Moons, K. Lancet 393, 1577–1579 (2019).
Liu, X. & Rivera, S. C. Nat. Med. 26, 1364–1374 (2020).
Harris, M. et al. PLoS One 14, e0226134 (2019).
Roberts, M. et al. Nat. Mach. Intell. 3, 199–217 (2021).
The EQUATOR Network.Enhancing the QUAlity and Transparency Of Health Research; https://www.equator-network.org/ (accessed 27 September 2021).
Infrastructure support for this research was provided by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre (BRC). G.S.C. is supported by the NIHR Biomedical Research Centre and Cancer Research UK (programme grant C49297/A27294). D.T. is funded by National Pathology Imaging Co-operative, NPIC (project no. 104687), supported by a £50 million investment from the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, and managed and delivered by UK Research and Innovation (UKRI). F.G. is supported by the NIHR Applied Research Collaboration Northwest London. The views and opinions expressed herein are those of the authors and do not necessarily reflect the views of their employers or funders.
A.K., S.S. and D.W. are employees at Google. A.D. and H.A. are employees at Flagship Pioneering UK Ltd. A.E. is an employee at Salesforce. DK is an employee at Optum. None of the other authors have any competing interests.
About this article
Cite this article
Sounderajah, V., Ashrafian, H., Rose, S. et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med 27, 1663–1665 (2021). https://doi.org/10.1038/s41591-021-01517-0
This article is cited by
A systematic review of radiomics in giant cell tumor of bone (GCTB): the potential of analysis on individual radiomics feature for identifying genuine promising imaging biomarkers
Journal of Orthopaedic Surgery and Research (2023)
The past, current, and future of neonatal intensive care units with artificial intelligence: a systematic review
npj Digital Medicine (2023)
npj Digital Medicine (2023)
Recommendations for the use of pediatric data in artificial intelligence and machine learning ACCEPT-AI
npj Digital Medicine (2023)
Systematic review of artificial intelligence tack in preventive orthopaedics: is the land coming soon?
International Orthopaedics (2023)