Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Optimizing risk-based breast cancer screening policies with reinforcement learning

Abstract

Screening programs must balance the benefit of early detection with the cost of overscreening. Here, we introduce a novel reinforcement learning-based framework for personalized screening, Tempo, and demonstrate its efficacy in the context of breast cancer. We trained our risk-based screening policies on a large screening mammography dataset from Massachusetts General Hospital (MGH; USA) and validated this dataset in held-out patients from MGH and external datasets from Emory University (Emory; USA), Karolinska Institute (Karolinska; Sweden) and Chang Gung Memorial Hospital (CGMH; Taiwan). Across all test sets, we find that the Tempo policy combined with an image-based artificial intelligence (AI) risk model is significantly more efficient than current regimens used in clinical practice in terms of simulated early detection per screen frequency. Moreover, we show that the same Tempo policy can be easily adapted to a wide range of possible screening preferences, allowing clinicians to select their desired trade-off between early detection and screening costs without training new policies. Finally, we demonstrate that Tempo policies based on AI-based risk models outperform Tempo policies based on less accurate clinical risk models. Altogether, our results show that pairing AI-based risk models with agile AI-designed screening policies has the potential to improve screening programs by advancing early detection while reducing overscreening.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Purchase on Springer Link

Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Retrospective patient trajectory from the MGH test set compared with recommended trajectories by different guidelines.
Fig. 2: Overview of Tempo.
Fig. 3: Early detection versus the number of mammograms per year at MGH, Emory, Karolinska and CGMH.
Fig. 4: Histogram of screening frequency for each screening guideline on MGH, Emory, Karolinska and CGMH test sets.

Similar content being viewed by others

Data availability

All datasets were used under license to the respective hospital system for the current study and are not publicly available. To access the MGH dataset, investigators should contact C.L. to apply for an IRB-approved research collaboration and obtain an appropriate data use agreement. To access the Karolinska dataset, investigators should contact F.S. to apply for an approved research collaboration and sign a data use agreement. To access the CGMH dataset, investigators should contact G.L. to apply for an IRB-approved research collaboration. To access the Emory dataset, investigators should contact H.T. to apply for an approved collaboration.

Code availability

All models and code used for training, evaluating and developing Tempo are publicly available at learningtocure.csail.mit.edu and github.com/yala/Tempo (https://doi.org/10.5281/zenodo.5585318).

References

  1. Smith, R. A. et al. Cancer screening in the United States, 2019: a review of current American Cancer Society guidelines and current issues in cancer screening. CA Cancer J. Clin. 69, 184–210 (2019).

    Article  Google Scholar 

  2. Wernli, K. J. et al. Screening for skin cancer in adults: updated evidence report and systematic review for the US Preventive Services Task Force. J. Am. Med. Assoc. 316, 436–447 (2016).

    Article  Google Scholar 

  3. Coleman, C. Early detection and screening for breast cancer. Semin. Oncol. Nurs. 33, 141–155 (2017).

    Article  Google Scholar 

  4. Curry, S. J. et al. Screening for cervical cancer: US Preventive Services Task Force recommendation statement. JAMA 320, 674–686 (2018).

    Article  Google Scholar 

  5. Gier, R. A. et al. High-performance crispr-cas12a genome editing for combinatorial genetic screening. Nat. Commun. 11, 3455 (2020).

    Article  CAS  Google Scholar 

  6. Yala, A., Lehman, C., Schuster, T., Portnoi, T. & Barzilay, R. A deep learning mammography-based model for improved breast cancer risk prediction. Radiology 292, 60–66 (2019).

    Article  Google Scholar 

  7. Gail, M. H. et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J. Natl Cancer Inst. 81, 1879–1886 (1989).

    Article  CAS  Google Scholar 

  8. Tyrer, J., Duffy, S. W. & Cuzick, J. A breast cancer prediction model incorporating familial and personal risk factors. Stat. Med. 23, 1111–1130 (2004).

    Article  Google Scholar 

  9. Bibbins-Domingo, K. et al. Screening for colorectal cancer: US Preventive Services Task Force recommendation statement. J. Am. Med. Assoc. 315, 2564–2575 (2016).

    Article  CAS  Google Scholar 

  10. Moyer, V. A. Screening for lung cancer: US Preventive Services Task Force recommendation statement. Ann. Intern. Med. 160, 330–338 (2014).

    PubMed  Google Scholar 

  11. Siu, A. L. Screening for breast cancer: US Preventive Services Task Force recommendation statement. Ann. Intern. Med. 164, 279–296 (2016).

    Article  Google Scholar 

  12. Yala, A. et al. Toward robust mammography-based models for breast cancer risk. Sci. Transl. Med. 13, eaba4373 (2021).

    Article  Google Scholar 

  13. Dembrower, K. et al. Comparison of a deep learning risk score and standard mammographic density score for breast cancer risk prediction. Radiology 294, 265–272 (2020).

    Article  Google Scholar 

  14. Lu, M. T. et al. Deep learning using chest radiographs to identify high-risk smokers for lung cancer screening computed tomography: development and validation of a prediction model. Ann. Intern. Med. 173, 704–713 (2020).

    Article  Google Scholar 

  15. Roijers, D. M., Vamplew, P., Whiteson, S. & Dazeley, R. A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013).

    Article  Google Scholar 

  16. Sutton R. S. & Barto A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).

  17. Yang, R. et al. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In Advances in Neural Information Processing Systems, 14636–14647 (NeurIPS, 2019).

  18. Oeffinger, K. C. et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. JAMA 314, 1599–1614 (2015).

    Article  CAS  Google Scholar 

  19. Monticciolo, D. L. et al. Breast cancer screening in women at higher-than-average risk: recommendations from the ACR. J. Am. Coll. Radiol. 15, 408–414 (2018).

    Article  Google Scholar 

  20. Shieh, Y. et al. Breast cancer screening in the precision medicine era: risk-based screening in a population-based trial. J. Natl. Cancer Inst. 109, djw290 (2017).

    Article  Google Scholar 

  21. Owens, D. K. et al. Medication use to reduce risk of breast cancer: US Preventive Services Task Force recommendation statement. J. Am. Med. Assoc. 322, 857–867 (2019).

    Article  Google Scholar 

  22. Visvanathan, K. et al. American Society of Clinical Oncology clinical practice guideline update on the use of pharmacologic interventions including tamoxifen, raloxifene, and aromatase inhibition for breast cancer risk reduction. J. Clin. Oncol. 27, 3235 (2009).

    Article  CAS  Google Scholar 

  23. Bakker, M. F. et al. Supplemental MRI screening for women with extremely dense breast tissue. N. Engl. J. Med. 381, 2091–2102 (2019).

    Article  Google Scholar 

  24. Gustave, R. et al. International randomized study comparing personalized, risk-stratified to standard breast cancer screening in women aged 40–70 (NCT03672331). http://clinicaltrials.gov/ct/show/NCT03672331 (2019).

  25. Le Boulc’h, M. et al. Comparison of breast density assessment between human eye and automated software on digital and synthetic mammography: impact on breast cancer risk. Diagn. Interv. Imaging 101, 811–819 (2020).

    Article  Google Scholar 

  26. McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

    Article  CAS  Google Scholar 

  27. Lotter, W. et al. Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nat. Med. 27, 244–249 (2021).

    Article  CAS  Google Scholar 

  28. Rodriguez-Ruiz, A. et al. Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study. Eur. Radiol. 29, 4825–4832 (2019).

    Article  Google Scholar 

  29. Maillart, L. M., Ivy, J. S., Ransom, S. & Diehl, K. Assessing dynamic breast cancer screening policies. Oper. Res. 56, 1411–1427 (2008).

    Article  Google Scholar 

  30. Ayer, T., Alagoz, O. & Stout, N. K. OR forum: a POMDP approach to personalize mammography screening decisions. Oper. Res. 60, 1019–1034 (2012).

    Article  Google Scholar 

  31. Wang, F., Zhang, S. & Henderson, L. M. Adaptive decision-making of breast cancer mammography screening: a heuristic-based regression model. Omega 76, 70–84 (2018).

    Article  Google Scholar 

  32. Mandelblatt, J. S. et al. Collaborative modeling of the benefits and harms associated with different US breast cancer screening strategies. Ann. Intern. Med. 164, 215–225 (2016).

    Article  Google Scholar 

  33. Trentham-Dietz, A. et al. Tailoring breast cancer screening intervals by breast density and risk for women aged 50 years or older: collaborative modeling of screening outcomes. Ann. Intern. Med. 165, 700–712 (2016).

    Article  Google Scholar 

  34. Schousboe, J. T., Kerlikowske, K., Loh, A. & Cummings, S. R. Personalizing mammography by breast density and other risk factors for breast cancer: analysis of health benefits and cost-effectiveness. Ann. Intern. Med. 155, 10–20 (2011).

    Article  Google Scholar 

  35. Ahuja, K. et al. Dpscreen: dynamic personalized screening. In Advances in Neural Information Processing Systems, 1321–1332 (NeurIPS, 2017).

  36. van Seijen, M. et al. Ductal carcinoma in situ: to treat or not to treat, that is the question. Br. J. Med. 121, 285–292 (2019).

    Google Scholar 

  37. Dembrower, P. et al. A multi-million mammography image dataset and population-based screening cohort for the training and evaluation of deep neural networks: the cohort of screen-aged women (CSAW). J. Digit. Imaging 33, 408–413 (2019).

    Article  Google Scholar 

  38. Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Alessandro, M. et al.) 1724-1734 (Association for Computational Linguistics, 2014).

  39. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  Google Scholar 

  40. Andrychowicz, M. et al. Hindsight experience replay. In Advances in Neural Information Processing Systems, 5048–5058 (NeurIPS, 2017).

Download references

Acknowledgements

This work was supported by grants from Susan G. Komen, the Breast Cancer Research Foundation, Quanta Computer, Anonymous Foundation and the MIT Jameel Clinic. This work was also supported by the Chang Gung Medical Foundation (grant SMRPG3K0051) and Stockholm Läns Landsting HMT (grant 201708002). We are grateful to the Cancer Center of Linkou CGMH for assistance with data collection under IRB no. 201901491B0C601 and R. Yang, J. Song and their team (Quanta Computer) for providing technical and computing support for analyzing the CGMH dataset.

Author information

Authors and Affiliations

Authors

Contributions

A.Y. and R.B. designed the research goals and aims. A.Y. and R.B. designed the model. A.Y. and R.B. designed the evaluation methodology. A.Y. wrote the software. C.L., G.L., F.S., Y.W., S.S., T.K., I.B., J.G. and H.T. curated the datasets. A.Y. and P.G.M. performed the analysis. P.G.M. created the visualizations. All authors contributed to manuscript writing. R.B. supervised the project.

Corresponding author

Correspondence to Adam Yala.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks William Lotter and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Estimated (circle) and observed (square) Mirai 5-year risk for two random patients in the MGH test set.

We estimated unobserved risk observations using an RNN, which was optimized to predict future risk assessments from past risk assessments on the MGH training set.

Extended Data Fig. 2 Histograms of early detection in months for Tempo-Mirai.

Histogram of early detection benefit in months relative to historical screening for patients who developed cancer in the MGH (top left), Emory (top right), Karolinska (bottom left), and CGMH (bottom right) test sets.

Extended Data Fig. 3 Histogram of screening recommendations for each screening policy.

MGH (top left), Emory (top right), Karolinska (bottom left), CGMH (bottom right).

Extended Data Fig. 4 Our early detection metric assumed that a cancer could be caught up to 18 months before diagnosis.

To test the robustness of our results to this assumption, we also evaluated our screening policies when changing this assumption to 6 months, 12 months and 24 months. For each policy, we report its screening efficiency, which is defined as its early detection benefit in months divided by the amount of mammograms it recommends per year. Asterisk denotes the policy with the highest screening efficiency.

Extended Data Fig. 5 Dataset construction flowcharts.

Dataset construction flow chart for the MGH dataset (top left), Emory (top right), Karolinska test set (bottom left), and CGMH test set (bottom right).

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yala, A., Mikhael, P.G., Lehman, C. et al. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nat Med 28, 136–143 (2022). https://doi.org/10.1038/s41591-021-01599-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-021-01599-w

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer