Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer

Ng, Annie Y.; Oberije, Cary J. G.; Ambrózay, Éva; Szabó, Endre; Serfőző, Orsolya; Karpati, Edit; Fox, Georgia; Glocker, Ben; Morris, Elizabeth A.; Forrai, Gábor; Kecskemethy, Peter D.

doi:10.1038/s41591-023-02625-9

Download PDF

Article
Open access
Published: 16 November 2023

Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer

Annie Y. Ng ORCID: orcid.org/0000-0002-0016-2275¹,
Cary J. G. Oberije ORCID: orcid.org/0000-0003-0749-5117¹,
Éva Ambrózay²,
Endre Szabó²,
Orsolya Serfőző²,
Edit Karpati¹,
Georgia Fox¹,
Ben Glocker ORCID: orcid.org/0000-0002-4897-9356^1,3,
Elizabeth A. Morris⁴,
Gábor Forrai⁵ &
…
Peter D. Kecskemethy¹

Nature Medicine volume 29, pages 3044–3049 (2023)Cite this article

17k Accesses
10 Citations
179 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence (AI) has the potential to improve breast cancer screening; however, prospective evidence of the safe implementation of AI into real clinical practice is limited. A commercially available AI system was implemented as an additional reader to standard double reading to flag cases for further arbitration review among screened women. Performance was assessed prospectively in three phases: a single-center pilot rollout, a wider multicenter pilot rollout and a full live rollout. The results showed that, compared to double reading, implementing the AI-assisted additional-reader process could achieve 0.7–1.6 additional cancer detection per 1,000 cases, with 0.16–0.30% additional recalls, 0–0.23% unnecessary recalls and a 0.1–1.9% increase in positive predictive value (PPV) after 7–11% additional human reads of AI-flagged cases (equating to 4–6% additional overall reading workload). The majority of cancerous cases detected by the AI-assisted additional-reader process were invasive (83.3%) and small-sized (≤10 mm, 47.0%). This evaluation suggests that using AI as an additional reader can improve the early detection of breast cancer with relevant prognostic features, with minimal to no unnecessary recalls. Although the AI-assisted additional-reader workflow requires additional reads, the higher PPV suggests that it can increase screening effectiveness.

Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning

Article Open access 16 April 2024

Segment anything in medical images

Article Open access 22 January 2024

Transparent medical image AI via an image–text foundation model grounded in medical literature

Article 16 April 2024

Main

Breast cancer screening detects cancer at earlier stages¹, leading to a meaningful reduction in breast cancer mortality². Moreover, early detection can lead to less aggressive treatments, reducing treatment toxicity. Although breast screening reduces overall mortality, it has limitations that result in failure to detect cancer in a considerable number of screened individuals. In these cases, cancer may be found later between screening rounds (interval cancer)³ or at the next screening round⁴. Reported estimates for the rate of interval cancer detection vary widely between countries and screening programs with varying screening intervals, ranging from 0.7 to 4.9 per 1,000 screened women³. Among them, the proportion of cancer cases that could be detected retrospectively at previous rounds is estimated to be 22%⁴. In the past, computer-aided detection (CAD) systems were developed to improve cancer detection. However, the benefits of CAD found in experimental studies did not translate into real-world clinical benefits. The use of CAD resulted in increased recalls, more time needed to assess screens and more biopsies without improving cancer detection, ultimately conferring no screening benefit⁵.

Modern artificial intelligence (AI) based on deep learning is a different technology from past CAD systems and has demonstrated higher potential in supporting the quality of screening services and reducing workload, depending on its workflow integration^6,7,8,9,10. AI has the highest performance risk for cases with less common characteristics; thus, it requires assessment in large-scale studies. As retrospective studies make large-scale evaluations possible, they are crucial to validate the safety and effectiveness of AI before prospective use. However, retrospective results can be expected to translate to real clinical practice only when appropriate study methods are used to ensure that the analyzed data are representative of what AI would process in real-world deployments. Otherwise, the usefulness of AI in clinical practice is not guaranteed^4,11,12. Prospective evaluations are needed to assess the real-world performance of AI integrated into live clinical workflows; however, these have been limited to date¹³.

This service evaluation presents results from using a commercially available AI system, Mia (Kheiron Medical Technologies), configured with regulatory-cleared predetermined sensitivity and specificity operating points in pilot implementations and live use in daily practice. The performance and generalizability of the AI system used were previously confirmed in a large-scale retrospective AI generalizability study^8,9,14. The current analysis used prospectively collected postmarket real-world data to assess the effectiveness of the AI system as an additional component to standard screening procedures and a quality-control safety net in the AI-assisted additional-reader workflow to support early cancer detection.

Results

A three-phase approach was used to implement the AI system in an AI-assisted additional-reader workflow at four sites of MaMMa Egészségügyi Zrt. (MaMMa Klinika), a breast cancer screening institution that serves urban and rural populations in Hungary. The institution implements a 2-year screening interval and invites women aged 45–65 years to undergo screening. All institution sites also offer opportunistic screening, in which women who are not invited to screening but choose to participate are screened. These women undergo the same procedure as those participating in the population screening program. At the institution sites, full-field digital mammography images were obtained using the IMS Giotto Image 3DL and IMS Giotto Class systems, following the standard operating procedures at the four sites. All sites follow the standard double-reading workflow (with strictly no AI involvement) in which two radiologists review every case. When discordance arises, an arbitrator makes the decision to either recall or not recall a woman for further assessment. In the implemented AI-assisted additional-reader workflow, the AI system flagged cases for additional review among those classified by double reading as ‘no recall’. These positive discordant cases (that is, cases that AI flagged as ‘positive’ and human readers marked as ‘negative’) were additionally reviewed by a human arbitrator (additional arbitrator) to possibly recall additional cases and detect more cancerous cases at an early stage (Fig. 1). The additional arbitrator was provided with images containing AI-generated regions of interest highlighting areas suggestive of malignancy for their review.

The implementation of the AI system consisted of three phases to ensure the safe deployment of the AI-assisted additional-reader process into live use. The first phase aimed to demonstrate the clinical benefit of the AI-assisted additional-reader process in a limited pilot rollout in which only one senior radiologist reviewed the AI-flagged cases from a single site, with the original screening date between April 6 and September 28, 2021 inclusive. The second phase was launched as an extended multicenter pilot involving a wider rollout of the AI-assisted additional-reader process across four sites (including the initial pilot site) and three additional arbitrators (including the additional arbitrator from the first phase). In the second phase, the readers independently reviewed every case flagged by AI from April 6 through December 21, 2021, at the initial pilot site and from April 6 through June 30, 2021, at each of the other three sites. One of the additional arbitrators made the final decision on which cases to recall additionally based on the opinions of all three readers. The extended pilot also aimed to provide a training period for the three additional arbitrators before live use began.

Finally, the third phase involved a full live rollout of the AI system as an official addition to the standard of care across the four sites from July 4, 2022. In this phase, the three additional arbitrators independently made recall decisions. The live rollout is ongoing, and the results presented here cover cases through January 31, 2023. Results were also simulated with a predetermined higher-specificity operating point to inform the sites on how the AI-assisted additional-reader process may be further optimized to suit their needs. The summary details of the dataset periods are provided in Table 1. In live use, each AI-flagged case was independently reviewed by one of the three additional arbitrators who made the final recall decision on each case they reviewed. During the two pilot phases, additional recalls based on additional arbitration reviews were done after the screening participants had been informed of the double-reading decision. In the third phase involving implementation into daily practice, the screening participants were informed after the decision was finalized based on the additional arbitration reviews. All readers had specialist training and ≥14 years of screening mammography experience, with non-additional arbitrators reading approximately 12,000 screens per year and additional arbitrators reading 25,000 screens per year on average.

Table 1 Overview of screens per phase per site

Full size table

Patient characteristics

Table 2 shows the characteristics of participants in each phase. The initial pilot included 3,746 women with an average age of 58.2 (s.d. 11.0) years. Among them, 126 (3.4%) reported a family history of cancer and 479 (12.7%) had a Tabár parenchymal pattern classification of 4 or 5, correlating with high density. In the extended pilot (n = 9,112), the mean age was also 58.2 (s.d. 10.7) years. Tabár classification 4 or 5 was identified in 1,094 women (12.0%), and 274 women (3.0%) reported a family history of cancer. Finally, in the live-use phase, 15,953 women were included. The mean age was 58.6 (s.d. 10.5) years, with 615 women (3.9%) having reported a family history of cancer and 1,733 women (10.8%) having a Tabár classification of 4 or 5.

Table 2 Participant characteristics per phase

Full size table

Screening performance of the AI-assisted additional-reader workflow

Across the three phases, the implementation of the AI-assisted additional-reader workflow resulted in 24 more cancer cases detected (7% relative increase in cancer detection rate (CDR)) and 70 more women recalled (0.28% increase in absolute recall rate), at a positive predictive value (PPV) for screening of 20.0% (3% relative increase) (Table 3). The initial pilot, extended pilot and live-use assessments included 3,746 of 3,817 (98.1%), 9,112 of 9,266 (98.3%) and 15,953 of 16,256 (98.1%) double-read cases that the AI could process, respectively (Table 1). Table 3 shows the outcome metrics for each phase and reports the results of the McNemar test for sensitivity and CDR. In summary, standard double reading resulted in recall rates of 6.7% (initial pilot), 7.0% (extended pilot) and 7.7% (live use) and CDRs of 12.8 per 1,000 cases (initial pilot), 13.8 per 1,000 cases (extended pilot) and 14.9 per 1,000 cases (live use). For the initial and extended pilots, AI flagged for review 10.6% (396/3,746) and 11.2% (1,024/9,112) of cases, respectively. Before launching the AI system into live use, its decision threshold was adjusted to a more specific predetermined operating point to accommodate the site’s workload capacity, resulting in a smaller proportion of cases (7.4%, 1,186/15,953) flagged for additional review in live use. The additional arbitration reviews resulted in six (initial pilot), 22 (extended pilot) and 48 (live use) additional recalled cases, increasing the recall rate by 0.16% (initial pilot), 0.23% (extended pilot) and 0.25% (live use), respectively. From the additional recalls, six (initial pilot), 13 (extended pilot) and 11 (live use) additional cancer cases were found, increasing the CDR by 1.6 per 1,000 cases (a 13% relative increase), 1.4 per 1,000 cases (a 10% relative increase) and 0.7 per 1,000 cases (a 5% relative increase) for the initial pilot, extended pilot and live-use phases, respectively (all statistically significant with P < 0.05) (Table 3). Of the additional cancer cases, four (66.7%) in the initial pilot, ten (76.9%) in the extended pilot and five (45.5%) in the live-use phase were confirmed to be invasive. In addition, one case (16.7%) in the initial pilot, one case (7.7%) in the extended pilot and two cases (18.2%) in live use were in situ cancer. Meanwhile, one case (16.7%) in the initial pilot, two cases (15.4%) in the extended pilot and four cases (36.4%) in live use had missing invasiveness information. Of the additional cancer cases found with available data on either pathological or radiological tumor size, 50.0% (two of four) in the initial pilot, 40% (four of ten) in the extended pilot and 57.1% (four of seven) in live use were ≤10 mm. Overall, the screening performance of double reading plus the AI-assisted additional-reader workflow resulted in recall rates of 6.8% (initial pilot), 7.3% (extended pilot) and 8.0% (live use); arbitration rates of 13.6% (initial pilot), 14.2% (extended pilot) and 10.8% (live use); and CDRs of 14.4 per 1,000 cases (initial pilot), 15.3 per 1,000 cases (extended pilot) and 15.6 per 1,000 cases (live use).

Table 3 Outcome metrics for standard double reading versus double reading plus the AI-assisted additional-reader workflow

Full size table

Performance at a simulated higher-specificity operating point

When the performance of the AI system was evaluated at a predetermined higher-specificity operating point through simulations, the AI-assisted additional-reader workflow substantially reduced the proportion of cases requiring additional review to 2.4% (89/3,746), 3.0% (274/9,112) and 2.9% (457/15,953) for the initial pilot, extended pilot and live-use phases, respectively, while still detecting 5 of the 6 (1.3/1,000, a 10% relative increase) additional cancer cases found in the initial pilot, 11 of the 13 (1.2/1,000, a 9% relative increase) additional cancer cases found in the extended pilot and 10 of the 11 (0.6/1,000, a 4% relative increase) additional cancer cases found in live use (Table 4). Of the additional cancer cases, four (80.0%) in the initial pilot, nine (81.1%) in the extended pilot and five (50.0%) in live use were confirmed to be invasive; zero (0.0%) in the initial pilot, one (9.1%) in the extended pilot and two (20.0%) in live use were confirmed to be in situ cancer; and one (20.0%) in the initial pilot, one (9.1%) in the extended pilot and three (30.0%) in live use had missing invasiveness information.

Table 4 Outcome metrics for standard double reading versus double reading plus the AI-assisted additional-reader workflow at a higher-specificity operating point

Full size table

Discussion

This analysis of prospective real-world usage data provides evidence that using AI in clinical practice results in a measurable increase in breast cancer detection. We analyzed the effects of the AI-assisted additional-reader workflow in two pilot phases and found that the results were maintained when AI was used in daily screening practice. Moreover, the observed clinical benefit (a significant 5–13% increase in the rate of early detection of mostly invasive and small cancerous tumors) had minimal impact on recall rates, thereby demonstrating the possibility of increasing cancer detection with no false-positive additional recalls. Although the double-reading recall rate (6.7–7.7%) in this evaluation is in line with previous results published in the UK and Europe^9,15, the double-reading CDR is higher (14/1,000) than previously reported⁹—possibly resulting from the resumption of breast cancer screening programs after the coronavirus disease pandemic. Nevertheless, the AI-assisted additional-reader workflow supported the screening service by further increasing the rate of early cancer detection. It also can potentially reduce the proportion of cases requiring additional arbitration review to <3% of cases while still achieving increased cancer detection by 0.5–1.3 per 1,000 cases, corresponding to a 4–10% relative increase in cancer detection using a higher-specificity operating point. Future work investigating the implementation of a variety of operating points would be needed to confirm the extent of achievable improvement in early cancer detection in the context of sites with different needs, capacities and screening population characteristics.

Implementing AI into the diagnostic workflow requires careful monitoring of continued performance over time¹⁶. For the AI-assisted additional-reader workflow, the effectiveness of downstream clinical assessments of recalled positive discordant cases should be examined to ensure that potential cancer cases are found. Moreover, the AI-assisted additional-reader workflow could be combined with workflows focused on workload savings, such as using AI as an independent second reader. Large-scale retrospective studies of the same AI system used in this assessment have demonstrated that AI as an independent second reader can offer up to 45% workload savings^8,9, offsetting the 3–11% additional arbitration reads (1–6% additional overall reading workload) for the AI-assisted additional-reader workflow while providing the benefit of increased cancer detection.

The AI-assisted additional-reader workflow was designed to flag high-priority cases not recalled by standard double reading, likely making the flagged set of cases a more difficult or complex set to read. We believe that this would be helpful in the training of mammogram readers. The spectrum of disease detected with the AI-assisted additional-reader workflow will be assessed in future work covering features such as invasiveness, tumor size, grade and lymph node status.

Several limitations need to be considered when interpreting the presented results. First, data were collected from only one breast cancer screening institution (with four sites) in one country. As screening programs vary between clinical sites and countries, future studies must confirm the benefit of the AI-assisted additional-reader workflow in other settings and screening populations. Furthermore, as only one commercial AI system was evaluated, the results may not be representative of other commercially available systems. Additionally, given that the follow-up period in this prospective assessment ranged only from 2 to 9 months, no information is yet available about possible interval cancer cases in the studied population. A longer follow-up analysis is required for a more accurate assessment of AI’s potential for improving cancer detection in the context of interval cancer occurrence. Moreover, the impact of inter-reader variation on the AI-assisted additional-reader workflow’s screening outcomes remains unclear and needs to be assessed in follow-up work.

Despite the many challenges in developing, validating, deploying and monitoring AI to ensure patient safety, this evaluation shows that a commercially available AI system can be effectively deployed, with its previously predicted benefits realized in a prospective real-world assessment of a live clinical workflow. We believe that the findings highlight opportunities for using AI in breast screening while demonstrating concrete steps for its safe deployment. The phased prospective approach underlines the potential for various AI adoption pathways.

Methods

Datasets for analysis

This study is an analysis of postmarket data collected at MaMMa Klinika, a large breast cancer screening institution in Hungary. Structured query language was used to collect data. Custom code using Python software version 3.8.8 and open-source Python packages, including pandas version 1.2.4, NumPy version 1.20.1, sklearn version 0.24.1 and statsmodels version 0.12.2, were used for data analysis. The analysis complied with all relevant ethical regulations. External ethical review was not required as the AI system was used as part of the standard of care in the screening service at each implementation phase of this service evaluation. Ethical considerations were reviewed internally by the screening service provider, MaMMa Klinika. The evaluation used deidentified data and presented results in aggregate without listing data of individual screening participants to protect their anonymity. As a consequence, the evaluation also did not require patient consent.

Metrics

Standard breast screening metrics, CDR and recall rate were primarily used to assess the effects of the AI-assisted additional-reader workflow compared to standard double reading without AI. CDR was calculated as the number of screen-detected cancer cases detected divided by the number of all screening cases. Recall rate was calculated as the number of cases recalled divided by the number of all cases; this should not be confused with the term ‘recall’ often used as a metric for sensitivity in machine learning. Arbitration rate was calculated as the number of arbitrations conducted divided by the number of all cases, with the double-reading arbitration rate including only double-reading arbitrations and the total arbitration rate including double-reading and additional-reader arbitrations. PPV was calculated as the number of screen-detected cancer cases divided by the number of recalled screens. Sensitivity was calculated as the number of screen-detected cancer cases divided by the number of all known positive screens. Specificity was calculated as the number of non-recalled screens divided by the number of all non-positive screens. Positive discordance rate was calculated as the number of AI-flagged positive discordant cases divided by the number of all cases. As the AI-assisted additional-reader workflow occurs subsequently to the double-reading workflow on the same cases, paired comparisons between the AI-assisted additional-reader and double-reading workflows were possible, with an exact measurement of the impact of AI in terms of additional recalls and cancer cases found. All detected cancer cases were confirmed with biopsy or histopathological examination within 12 months of the original screen or judged to be cancer by the patient tumor board (multidisciplinary team).

Statistical analysis

No statistical method was used to predetermine sample sizes. No data were excluded from the analyses. Blinding was not required as randomization was not applied. The standard double-reading process did not involve the AI system, and readers were blinded to the AI system’s output during the double-reading process. The Wilson score method was used to calculate 95% CIs. The statistical significance of CDR differences was assessed using the McNemar test. A P value of <0.05 was defined as statistically significant.

AI system

This evaluation used a commercially available AI system (Mia version 2.0, Kheiron Medical Technologies). The AI system is intended to process only cases from female participants and works with standard DICOM (Digital Imaging and Communications in Medicine) cases as inputs. The AI system analyzes four images with two standard full-field digital mammography views (craniocaudal and mediolateral oblique) per breast. The AI system’s primary output per case is a single binary recommendation of ‘recall’ (for further assessment based on findings suggestive of malignancy) or ‘no recall’ (no further assessment until the next screening interval). The AI system can provide binary recall recommendations for six predetermined operating points, ranging from having a balanced trade-off between sensitivity and specificity to having trade-offs that emphasize either sensitivity or specificity. The AI system’s balanced sensitivity/specificity and higher-specificity operating points are most relevant when the AI system is used in the AI-assisted additional-reader workflow. The set of cases flagged by the AI system’s higher-specificity operating point in the AI-assisted additional-reader workflow is always a subset of the cases flagged by the AI system’s balanced sensitivity/specificity operating point. Therefore, results at the higher-specificity operating point can be precisely simulated based on the balanced operating point results. The optionality between the different operating point trade-offs makes a significant difference for practical applicability at sites with differing workforces. Additionally, the AI system provides regions of interest indicating image locations showing characteristics most suggestive of malignancy. Depending on the clinical workflow and exact integration of the AI system, the AI’s recommendation may be used independently or combined with human reader assessment.

The underlying technology of the AI system is based on deep convolutional neural networks (CNNs), which are state-of-the-art machine learning tools for image classification. The AI system is a combination (also known as an ensemble) of multiple models with a diverse set of different CNN architectures. Each model was trained for malignancy detection. The final prediction of the ensemble is obtained by aggregating individual model outputs, with a subsequent threshold applied to the malignancy detection score to generate a binary recommendation of ‘recall’ or ‘no recall’. The thresholds relate to one of the AI system’s six predetermined, clinically meaningful operating points according to desired sensitivity/specificity trade-offs.

The AI system was trained on a heterogeneous, large-scale collection of more than 1 million images from real-world screening programs across different countries, multiple sites and equipment from different vendors over a period of >10 years. Positive cases were defined as pathology-proven malignancies confirmed by fine-needle aspiration cytology, core needle biopsy, vacuum-assisted core biopsy and/or histological analysis of surgical specimens. Negative cases were confirmed through multiple years of follow-up.

The AI software version and operating points used in the present evaluation were fixed before each phase. None of the evaluation data were used in any aspect of algorithm development.

The AI system’s performance, generalizability and clinical utility were previously confirmed in a large-scale retrospective AI generalizability study^8,9,14. The study demonstrated that double reading with the AI system, compared to human double reading, resulted in at least noninferior recall rate, CDR, sensitivity, specificity and PPV for each mammography vendor and site, with superior recall rate, specificity and PPV observed for some mammography vendors and sites⁹. The double-reading simulation with the AI system indicated that using AI as an independent reader (in all cases it could process) can result in a 3.3–12.3% increase in the arbitration rate⁹ but can reduce human workload by 30.0–44.8%. AI as a supporting reader (used as a second reader only when it agrees with the first human reader) was found to be superior or noninferior on all screening metrics compared to human double reading while nearly halving the number of arbitrations (from 3.4% to 1.8%) and reducing the number of cases requiring second human reading (by up to 87%)⁸. Additionally, no differences in prognostic features (invasiveness, grade, tumor size and lymph node status) were found between the cancer cases detected by the AI system and those detected by human readers¹⁴. These findings imply that cancer cases detected by the AI system and human readers are likely to have similar clinical courses and outcomes, with limited or no downstream effects on screening programs, supporting the potential role of AI as a reader in the double-reading workflow.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Access to patient-level data and supporting clinical information can be made available upon request, contingent on patient privacy and confidentiality obligations and subject to information governance at MaMMa Klinika (Hungary). Data access requests can be made to the corresponding author by email at annie@kheironmed.com and will be processed within 4 weeks.

Code availability

The code used for training and deploying the evaluated AI system has many dependencies on internal tooling, proprietary components, infrastructure and hardware. Therefore, full code release is not feasible. We provide a technical description of the AI system in the online Methods, together with a code repository to facilitate the reproducibility of research involving deep learning models for breast cancer detection using digital mammography. The code provided at https://github.com/Kheiron-Medical/mammo-net demonstrates the training and testing of state-of-the-art CNNs that build the core component of most commercially available AI systems for breast cancer detection.

References

Duffy, S. W. et al. Mammography screening reduces rates of advanced and fatal breast cancers: results in 549,091 women. Cancer 126, 2971–2979 (2020).
Article PubMed Google Scholar
Zielonke, N. et al. Evidence for reducing cancer-specific mortality due to screening for breast cancer in Europe: a systematic review. Eur. J. Cancer 127, 191–206 (2020).
Article PubMed Google Scholar
Houssami, N. & Hunter, K. The epidemiology, radiology and biological characteristics of interval breast cancers in population mammography screening. NPJ Breast Cancer 3, 12 (2017).
Article PubMed PubMed Central Google Scholar
Hovda, T., Tsuruda, K., Hoff, S. R., Sahlberg, K. K. & Hofvind, S. Radiological review of prior screening mammograms of screen-detected breast cancer. Eur. Radiol. 31, 2568–2579 (2021).
Article PubMed Google Scholar
Lehman, C. D. et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern. Med. 175, 1828–1837 (2015).
Article PubMed PubMed Central Google Scholar
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Article CAS PubMed Google Scholar
Leibig, C. et al. Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis. Lancet Digit. Health 4, e507–e519 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ng, A. Y. et al. Artificial intelligence as supporting reader in breast screening: a novel workflow to preserve quality and reduce workload. J. Breast Imaging https://doi.org/10.1093/jbi/wbad010 (2023).
Sharma, N. et al. Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms. BMC Cancer 23, 460 (2023).
Article PubMed PubMed Central Google Scholar
Koch, H. W., Larsen, M., Bartsch, H., Kurz, K. D. & Hofvind, S. Artificial intelligence in BreastScreen Norway: a retrospective analysis of a cancer-enriched sample including 1254 breast cancer cases. Eur. Radiol. https://doi.org/10.1007/s00330-023-09461-y (2023).
Kim, C. et al. Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence. Eur. Radiol. https://doi.org/10.1007/s00330-022-09315-z (2023).
Marinovich, M. L. et al. Artificial intelligence (AI) for breast cancer screening: BreastScreen population-based cohort study of cancer detection. EBioMedicine 90, 104498 (2023).
Article CAS PubMed PubMed Central Google Scholar
Freeman, K. et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ 374, n1872 (2021).
Article PubMed PubMed Central Google Scholar
Oberije, C. J. G. et al. Comparing prognostic factors of cancers identified by artificial intelligence (AI) and human readers in breast cancer screening. Cancers 15, 3069 (2023).
Article PubMed PubMed Central Google Scholar
Peintinger, F. National breast screening programs across Europe. Breast Care 14, 354–358 (2019).
Article PubMed PubMed Central Google Scholar
Sahiner, B., Chen, W., Samala, R. K. & Petrick, N. Data drift in medical machine learning: implications and potential remedies. Br. J. Radiol. https://doi.org/10.1259/bjr.20220878 (2023).
Gram, I. T., Funkhouser, E. & Tabár, L. The Tabár classification of mammographic parenchymal patterns. Eur. J. Radiol. 24, 131–136 (1997).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

Annie Y. Ng, Cary J.G. Oberije and Éva Ambrózay contributed equally to this article and share first authorship. We thank MaMMa Egészségügyi Zrt. (MaMMa Klinika), Béker-Soft Informatika Kft., A. Vadászy, D. Visi, R. Kovács, C. Gadóczi, T. Rijken, J. Yearsley and S. Kerruish for supporting the collection of data and execution of the evaluation.

Author information

Authors and Affiliations

Kheiron Medical Technologies, London, UK
Annie Y. Ng, Cary J. G. Oberije, Edit Karpati, Georgia Fox, Ben Glocker & Peter D. Kecskemethy
MaMMa Egészségügyi Zrt., Budapest, Hungary
Éva Ambrózay, Endre Szabó & Orsolya Serfőző
Department of Computing, Imperial College London, London, UK
Ben Glocker
University of California, Davis, Davis, CA, USA
Elizabeth A. Morris
Duna Medical Center, Budapest, Hungary
Gábor Forrai

Authors

Annie Y. Ng
View author publications
You can also search for this author in PubMed Google Scholar
Cary J. G. Oberije
View author publications
You can also search for this author in PubMed Google Scholar
Éva Ambrózay
View author publications
You can also search for this author in PubMed Google Scholar
Endre Szabó
View author publications
You can also search for this author in PubMed Google Scholar
Orsolya Serfőző
View author publications
You can also search for this author in PubMed Google Scholar
Edit Karpati
View author publications
You can also search for this author in PubMed Google Scholar
Georgia Fox
View author publications
You can also search for this author in PubMed Google Scholar
Ben Glocker
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth A. Morris
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Forrai
View author publications
You can also search for this author in PubMed Google Scholar
Peter D. Kecskemethy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

É.A., P.D.K., E.K. and A.Y.N. contributed to the design of the work. É.A., E.S. and O.S. contributed to clinical data collection. C.J.G.O., A.Y.N., G. Fox and P.D.K. contributed to data analysis. C.J.G.O., A.Y.N., P.D.K., G. Fox, E.A.M. and G. Forrai contributed to data interpretation. A.Y.N., C.J.G.O., B.G. and P.D.K. contributed to manuscript drafting. A.Y.N., C.J.G.O., B.G., P.D.K., E.A.M. and G. Forrai contributed to manuscript revision. All authors read and approved the manuscript.

Corresponding author

Correspondence to Annie Y. Ng.

Ethics declarations

Competing interests

This postmarket analysis was funded by Kheiron Medical Technologies Ltd. (‘Kheiron’). C.J.G.O., E.K., A.Y.N., G. Fox, B.G. and P.D.K. are employees of Kheiron and hold stock options as part of the standard compensation package. E.A.M. holds an advisory board member position and stock options at Kheiron Medical Technologies. G. Forrai is a paid consultant for Kheiron Medical Technologies. É.A., E.S. and O.S. declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Ritse Mann and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ming Yang, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ng, A.Y., Oberije, C.J.G., Ambrózay, É. et al. Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer. Nat Med 29, 3044–3049 (2023). https://doi.org/10.1038/s41591-023-02625-9

Download citation

Received: 08 June 2022
Accepted: 04 October 2023
Published: 16 November 2023
Issue Date: December 2023
DOI: https://doi.org/10.1038/s41591-023-02625-9

This article is cited by

Multicancer screening test based on the detection of circulating non haematological proliferating atypical cells
- Natalia Malara
- Maria Laura Coluccio
- Francesco Gentile
Molecular Cancer (2024)
A framework to integrate artificial intelligence training into radiology residency programs: preparing the future radiologist
- Maria Jorina van Kooten
- Can Ozan Tan
- Derya Yakar
Insights into Imaging (2024)
Artificial intelligence in liver cancer — new tools for research and patient management
- Julien Calderaro
- Laura Žigutytė
- Jakob Nikolas Kather
Nature Reviews Gastroenterology & Hepatology (2024)
The promise of AI in personalized breast cancer screening: are we there yet?
- Despina Kontos
Nature Reviews Clinical Oncology (2024)
How to support the transition to AI-powered healthcare

Nature Medicine (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Patient characteristics

Screening performance of the AI-assisted additional-reader workflow

Performance at a simulated higher-specificity operating point

Discussion

Methods

Datasets for analysis

Metrics

Statistical analysis

AI system

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links