Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer

Abstract

Breast cancer is a heterogeneous disease with variable survival outcomes. Pathologists grade the microscopic appearance of breast tissue using the Nottingham criteria, which are qualitative and do not account for noncancerous elements within the tumor microenvironment. Here we present the Histomic Prognostic Signature (HiPS), a comprehensive, interpretable scoring of the survival risk incurred by breast tumor microenvironment morphology. HiPS uses deep learning to accurately map cellular and tissue structures to measure epithelial, stromal, immune, and spatial interaction features. It was developed using a population-level cohort from the Cancer Prevention Study-II and validated using data from three independent cohorts, including the Prostate, Lung, Colorectal, and Ovarian Cancer trial, Cancer Prevention Study-3, and The Cancer Genome Atlas. HiPS consistently outperformed pathologists in predicting survival outcomes, independent of tumor–node–metastasis stage and pertinent variables. This was largely driven by stromal and immune features. In conclusion, HiPS is a robustly validated biomarker to support pathologists and improve patient prognosis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the methodological approach and datasets used.
Fig. 2: Thematic categorization and selection of features using the CPS-II cohort.
Fig. 3: The HiPS.
Fig. 4: Stromal features critically impact the HiPS score and alter risk categorization in stage I cancers.
Fig. 5: Kaplan–Meier analysis of HiPS groups compared with the control groups.
Fig. 6: The HiPS score is consistent with established risk profiles.

Similar content being viewed by others

Data availability

Supplementary Table 27 contains our calculated histomic feature values, HiPS scores and subscores, and related data for the TCGA cohort. We provide this to facilitate reproducibility and to act as a resource for the scientific community. TCGA clinical data and WSIs are publicly available at gdc.cancer.gov. The Breast Cancer Semantic Segmentation dataset is available at github.com/PathologyDataScience/BCSS, and the NuCLS dataset is available at sites.google.com/view/nucls. These datasets were combined to produce the PanopTILs dataset, available at sites.google.com/view/panoptils. Requests for ACS data from the CPS-II or CPS-3 studies should be submitted to maddison.hall@cancer.org. Requests for PLCO data should be submitted at cdas.cancer.gov/learn/plco. Breast cancer genomic subtypes, hypoxia scores, fraction genome altered, aneuploidy scores, and mRNA expression profiles were obtained from the Genomic Data Commons Pancancer Atlas: gdc.cancer.gov/about-data/publications/pancanatlas. Immune subtypes and related pathway activations, as well as angiogenesis and lymphangiogenesis scores, were obtained from the PanImmune dataset: gdc.cancer.gov/about-data/publications/panimmune. CAF subtype abundance data for TCGA were obtained from a previous study81. xCell cell type abundance data for TCGA were obtained from a previous study61.

Code availability

The code is publicly available at github.com/PathologyDataScience/HiPS. Processing of histology images was performed using HistomicsTK (v.1.2.10, github.com/DigitalSlideArchive/HistomicsTK), histolab (v.0.6.0, github.com/histolab/histolab), and scikit-image (v.0.18.0, scikit-image.org). Analysis of clinical outcomes data was performed using Lifelines (v.0.27.8, github.com/CamDavidsonPilon/lifelines). Enrichment analysis with RNA profiles was performed using GSEAPy (v.1.0.6, gseapy.rtfd.io). Additional Python libraries used for database management, graphical plotting, scientific calculations, and other tasks include numpy v.1.19.4, pandas v.1.1.5, SQLAlchemy v.1.3.21, scipy v.1.5.4, scikit-learn v.0.23.2, imageio v.2.9.0, pillow v.8.0.1, matplotlib v.3.3.3, seaborn v.0.11.0, torch v.1.7.1, torchvision v.0.8.2, and pyvips v.2.1.15.

References

  1. Global Cancer Facts & Figures 4th Edition (American Cancer Society, 2018).

  2. Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).

    Article  PubMed  Google Scholar 

  3. American Joint Commission on Cancer AJCC Cancer Staging Manual 2017 (Springer International Publishing, 2017).

  4. Coughlin, S. S. Social determinants of breast cancer risk, stage, and survival. Breast Cancer Res. Treat. 177, 537–548 (2019).

    Article  PubMed  Google Scholar 

  5. Li, X. et al. Validation of the newly proposed American Joint Committee on Cancer (AJCC) breast cancer prognostic staging group and proposing a new staging system using the National Cancer Database. Breast Cancer Res. Treat. 171, 303–313 (2018).

    Article  PubMed  Google Scholar 

  6. Scarff, R. W. & Handley, R. S. Prognosis in carcinoma of the breast. Lancet 232, 582–583 (1938).

    Article  Google Scholar 

  7. BLACK, M. M., OPLER, S. R. & SPEER, F. D. Survival in breast cancer cases in relation to the structure of the primary tumor and regional lymph nodes. Surg. Gynecol. Obstet. 100, 543–551 (1955).

    CAS  PubMed  Google Scholar 

  8. Bloom, H. J. & Richardson, W. W. Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years. Br. J. Cancer 11, 359–377 (1957).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Elston, E. W. & Ellis, I. O. Method for grading breast cancer. J. Clin. Pathol. 46, 189–190 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Elston, C. W. & Ellis, I. O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19, 403–410 (1991).

    Article  CAS  PubMed  Google Scholar 

  11. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  12. Cardenas, M. A., Prokhnevska, N. & Kissick, H. T. Organized immune cell interactions within tumors sustain a productive T-cell response. Int. Immunol. 33, 27–37 (2021).

    Article  CAS  PubMed  Google Scholar 

  13. Sahai, E. et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat. Rev. Cancer 20, 174–186 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Liu, T., Zhou, L., Li, D., Andl, T. & Zhang, Y. Cancer-associated fibroblasts build and secure the tumor microenvironment. Front. Cell Dev. Biol. 7, 60 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Savas, P. et al. Clinical relevance of host immunity in breast cancer: from TILs to the clinic. Nat. Rev. Clin. Oncol. 13, 228–241 (2016).

    Article  CAS  PubMed  Google Scholar 

  16. Ha, S. Y., Yeo, S.-Y., Xuan, Y. & Kim, S.-H. The prognostic significance of cancer-associated fibroblasts in esophageal squamous cell carcinoma. PLoS ONE 9, e99955 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Conklin, M. W. et al. Aligned collagen is a prognostic signature for survival in human breast carcinoma. Am. J. Pathol. 178, 1221–1232 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Provenzano, P. P. et al. Collagen reorganization at the tumor–stromal interface facilitates local invasion. BMC Med. 4, 38 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Shekhar, M. P., Werdell, J., Santner, S. J., Pauley, R. J. & Tait, L. Breast stroma plays a dominant regulatory role in breast epithelial growth and differentiation: implications for tumor development and progression. Cancer Res. 61, 1320–1326 (2001).

    CAS  PubMed  Google Scholar 

  20. Couture, H. D. et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer 4, 30 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Rawat, R. R. et al. Deep learned tissue “fingerprints” classify breast cancers by ER/PR/Her2 status from H&E images. Sci. Rep. 10, 7275 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun. Med. 1, 14 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Bychkov, D. et al. Outcome and biomarker supervised deep learning for survival prediction in two multicenter breast cancer series. J. Pathol. Inform. 13, 9 (2022).

    Article  PubMed  Google Scholar 

  24. Calle, E. E. et al. The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics. Cancer 94, 2490–2501 (2002).

    Article  PubMed  Google Scholar 

  25. Cancer Genome Atlas NetworkComprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

    Article  Google Scholar 

  26. Zhu, C. S. et al. The Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) screening trial pathology tissue resource. Cancer Epidemiol. Biomark. Prev. 25, 1635–1642 (2016).

    Article  CAS  Google Scholar 

  27. Patel, A. V. et al. The American Cancer Society’s Cancer Prevention Study 3 (CPS-3): recruitment, study design, and baseline characteristics. Cancer 123, 2014–2024 (2017).

    Article  CAS  PubMed  Google Scholar 

  28. Haralick, R. M., Shanmugam, K. & Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).

    Article  Google Scholar 

  29. Doyle, S., Agner, S., Madabhushi, A., Feldman, M. & Tomaszewski, J. Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features. In Proc. 2008 5th IEEE Int. Symposium on Biomedical Imaging: From Nano to Macro 496–499 (IEEE, 2008).

  30. Gurcan, M. N. et al. Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).

    Article  PubMed  Google Scholar 

  32. Liu, Y., Han, D., Parwani, A. V. & Li, Z. Applications of artificial intelligence in breast pathology. Arch. Pathol. Lab. Med. https://doi.org/10.5858/arpa.2022-0457-RA (2023).

  33. Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association. J. Pathol. 249, 286–294 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Bychkov, D. et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 8, 3395 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Duanmu, H. et al. A spatial attention guided deep learning system for prediction of pathological complete response using breast cancer histopathology images. Bioinformatics 38, 4605–4612 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).

    Article  Google Scholar 

  41. Ribeiro, M. T. et al. "Why should i trust you?": explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).

  42. Amgad, M. et al. Explainable nucleus classification using Decision Tree Approximation of Learned Embeddings. Bioinformatics 38, 513–519 (2022).

    Article  CAS  PubMed  Google Scholar 

  43. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Leavitt, M. L. & Morcos, A. Towards falsifiable interpretability research. Preprint at arxiv.org/abs/2010.12016 (2020).

  45. Koh, P. W. et al. Concept bottleneck models. in Proc. 37th Int. Conf. on Machine Learning (eds III, H. D. & Singh, A.) Vol. 119, 5338–5348 (PMLR, 2020).

  46. Kirillov, A., He, K., Girshick, R., Rother, C. & Dollar, P. Panoptic segmentation. in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) (2019).

  47. Amgad, M., Salgado, R. & Cooper, L. L. A panoptic segmentation approach for tumor-infiltrating lymphocyte assessment: development of the MuTILs model and PanopTILs dataset. Preprint at medRxiv https://doi.org/10.1101/2022.01.08.22268814 (2023).

  48. Amgad, M., Salgado, R. & Cooper, L. A. D. MuTILs: a multiresolution deep-learning model for interpretable scoring of tumor-infiltrating lymphocytes in breast carcinomas using clinical guidelines. Preprint at medRxiv https://doi.org/10.1101/2022.01.08.22268814 (2022).

  49. Amgad, M. et al. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35, 3461–3467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Amgad, M. et al. NuCLS: a scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Gigascience 11, giac037 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Gutman, D. A. et al. The Digital Slide Archive: a software platform for management, integration, and analysis of histology for cancer research. Cancer Res. 77, e75–e78 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Schmid, P. et al. Pembrolizumab plus chemotherapy as neoadjuvant treatment of high-risk, early-stage triple-negative breast cancer: results from the phase 1b open-label, multicohort KEYNOTE-173 study. Ann. Oncol. 31, 569–581 (2020).

    Article  CAS  PubMed  Google Scholar 

  53. Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Wang, X. et al. Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases. Br. J. Cancer 119, 885–892 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Kalinsky, K. et al. 21-gene assay to inform chemotherapy benefit in node-positive breast cancer. N. Engl. J. Med. 385, 2336–2347 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).

    Article  CAS  PubMed  Google Scholar 

  57. van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

    Article  PubMed  Google Scholar 

  58. van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).

    Article  PubMed  Google Scholar 

  59. Howard, F. M. et al. Integration of clinical features and deep learning on pathology for the prediction of breast cancer recurrence assays and risk of recurrence. NPJ Breast Cancer 9, 25 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Lehmann, B. D. et al. Multi-omics analysis identifies therapeutic vulnerabilities in triple-negative breast cancer subtypes. Nat. Commun. 12, 6276 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).

    Article  PubMed  Google Scholar 

  63. Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173, 291–304.e6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705.e9 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Bhandari, V. et al. Molecular landmarks of tumor hypoxia across cancer types. Nat. Genet. 51, 308–318 (2019).

    Article  CAS  PubMed  Google Scholar 

  66. Buffa, F. M., Harris, A. L., West, C. M. & Miller, C. J. Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene. Br. J. Cancer 102, 428–435 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Winter, S. C. et al. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. Cancer Res. 67, 3441–3449 (2007).

    Article  CAS  PubMed  Google Scholar 

  68. Ragnum, H. B. et al. The tumour hypoxia marker pimonidazole reflects a transcriptional programme associated with aggressive prostate cancer. Br. J. Cancer 112, 382–390 (2015).

    Article  CAS  PubMed  Google Scholar 

  69. Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. DeNardo, D. G. et al. Leukocyte complexity predicts breast cancer survival and functionally regulates response to chemotherapy. Cancer Discov. 1, 54–67 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Mahmoud, S. M. A. et al. Tumor-infiltrating CD8+ lymphocytes predict clinical outcome in breast cancer. J. Clin. Oncol. 29, 1949–1955 (2011).

    Article  PubMed  Google Scholar 

  74. Oh, H. & Ghosh, S. NF-κB: roles and regulation in different CD4(+) T-cell subsets. Immunol. Rev. 252, 41–51 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Olkhanud, P. B. et al. Tumor-evoked regulatory B cells promote breast cancer metastasis by converting resting CD4+ T cells to T-regulatory cells. Cancer Res. 71, 3505–3515 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Varn, F. S., Mullins, D. W., Arias-Pulido, H., Fiering, S. & Cheng, C. Adaptive immunity programmes in breast cancer. Immunology 150, 25–34 (2017).

    Article  CAS  PubMed  Google Scholar 

  77. Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Chang, H. Y. et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2, e7 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193.e7 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Costa, A. et al. Fibroblast heterogeneity and immunosuppressive environment in human breast cancer. Cancer Cell 33, 463–479.e10 (2018).

    Article  CAS  PubMed  Google Scholar 

  81. Li, B. et al. Cell-type deconvolution analysis identifies cancer-associated myofibroblast component as a poor prognostic factor in multiple cancer types. Oncogene 40, 4686–4694 (2021).

    Article  CAS  PubMed  Google Scholar 

  82. Mhaidly, R. & Mechta-Grigoriou, F. Fibroblast heterogeneity in tumor micro-environment: role in immunosuppression and new therapies. Semin. Immunol. 48, 101417 (2020).

    Article  CAS  PubMed  Google Scholar 

  83. Asif, P. J., Longobardi, C., Hahne, M. & Medema, J. P. The role of cancer-associated fibroblasts in cancer invasion and metastasis. Cancers 13, 4720 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Kim, I., Choi, S., Yoo, S., Lee, M. & Kim, I.-S. Cancer-associated fibroblasts in the hypoxic tumor microenvironment. Cancers 14, 3321 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Ebbing, E. A. et al. Stromal-derived interleukin 6 drives epithelial-to-mesenchymal transition and therapy resistance in esophageal adenocarcinoma. Proc. Natl Acad. Sci. USA 116, 2237–2242 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Yu, Y. et al. Cancer-associated fibroblasts induce epithelial-mesenchymal transition of breast cancer cells through paracrine TGF-β signalling. Br. J. Cancer 110, 724–732 (2014).

    Article  CAS  PubMed  Google Scholar 

  87. Mariotto, A. et al. Expected monetary impact of Oncotype DX score-concordant systemic breast cancer therapy based on the TAILORx trial. J. Natl Cancer Inst. 112, 154–160 (2020).

    Article  PubMed  Google Scholar 

  88. Davis, B. A. et al. Racial and ethnic disparities in Oncotype DX test receipt in a statewide population-based study. J. Natl Compr. Canc. Netw. 15, 346–354 (2017).

    Article  PubMed  Google Scholar 

  89. Losk, K. et al. Factors associated with delays in chemotherapy initiation among patients with breast cancer at a comprehensive cancer center. J. Natl Compr. Canc. Netw. 14, 1519–1526 (2016).

    Article  PubMed  Google Scholar 

  90. Yousif, M. et al. Artificial intelligence applied to breast pathology. Virchows Arch. 480, 191–209 (2022).

    Article  PubMed  Google Scholar 

  91. Abubakar, M. et al. Tumor-associated stromal cellular density as a predictor of recurrence and mortality in breast cancer: results from ethnically diverse study populations. Cancer Epidemiol. Biomark. Prev. 30, 1397–1407 (2021).

    Article  CAS  Google Scholar 

  92. Li, H. et al. Collagen fiber orientation disorder from H&E images is prognostic for early stage breast cancer: clinical trial validation. NPJ Breast Cancer 7, 104 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Chen, Y. et al. Computational pathology improves risk stratification of a multi-gene assay for early stage ER+ breast cancer. NPJ Breast Cancer 9, 40 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Bejnordi, B. E. et al. Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images. In Proc. IEEE Int. Symp. Biomed. Imaging 929–932 (2017).

  96. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

    Article  CAS  PubMed  Google Scholar 

  97. Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).

    Article  CAS  PubMed  Google Scholar 

  99. Bilal, M. et al. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit. Health 3, e763–e772 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Mercan, C. et al. Deep learning for fully-automated nuclear pleomorphism scoring in breast cancer. NPJ Breast Cancer 8, 120 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).

    Article  PubMed  Google Scholar 

  102. Karagiannis, G. S. et al. Cancer-associated fibroblasts drive the progression of metastasis through both paracrine and mechanical pressure on cancer tissue. Mol. Cancer Res. 10, 1403–1418 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).

    Article  PubMed  Google Scholar 

  104. He, L. et al. Association between levels of tumor-infiltrating lymphocytes in different subtypes of primary breast tumors and prognostic outcomes: a meta-analysis. BMC Womens Health 20, 194 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).

    Article  PubMed  Google Scholar 

  106. AbdulJabbar, K. et al. Geospatial immune variability illuminates differential evolution of lung adenocarcinoma. Nat. Med. 26, 1054–1062 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Amgad, M. et al. Report on computational assessment of tumor infiltrating lymphocytes from the International Immuno-Oncology Biomarker Working Group. NPJ Breast Cancer 6, 16 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  109. Ping, Z. et al. A microscopic landscape of the invasive breast cancer genome. Sci. Rep. 6, 27545 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Thennavan, A. et al. Molecular analysis of TCGA breast cancer histologic types. Cell Genom. 1, 100067 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Garfinkel, L. Selection, follow-up, and analysis in the American Cancer Society prospective studies. Natl Cancer Inst. Monogr. 67, 49–52 (1985).

    CAS  PubMed  Google Scholar 

  112. Stellman, S. D. & Garfinkel, L. Smoking habits and tar levels in a new American Cancer Society prospective study of 1.2 million men and women. J. Natl Cancer Inst. 76, 1057–1063 (1986).

    CAS  PubMed  Google Scholar 

  113. Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer, 2015).

  115. van Rijthoven, M., Balkenhol, M., Siliņa, K., van der Laak, J. & Ciompi, F. HookNet: multi-resolution convolutional neural networks for semantic segmentation in histopathology whole-slide images. Med. Image Anal. 68, 101890 (2021).

    Article  PubMed  Google Scholar 

  116. Steyerberg, E. W. & Harrell, F. E. Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016).

    Article  PubMed  Google Scholar 

  117. Marcolini, A. et al. histolab: a Python library for reproducible digital pathology preprocessing with automated testing. SoftwareX 20, 101237 (2022).

    Article  Google Scholar 

  118. Achanta, R. et al. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012).

    Article  PubMed  Google Scholar 

  119. Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. in 2009 IEEE Int. Symposium on Biomedical Imaging: from Nano to Macro 1107–1110 (IEEE, 2009); https://doi.org/10.1109/ISBI.2009.5193250

  120. Ripley, B. D. The second-order analysis of stationary point processes. J. Appl. Probab. 13, 255–266 (1976).

    Article  Google Scholar 

  121. Amgad, M., Itoh, A. & Tsui, M. M. K. Extending Ripley’s K-function to quantify aggregation in 2-D grayscale images. PLoS ONE 10, e0144404 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  122. Lester, S. C. et al. Protocol for the examination of specimens from patients with invasive carcinoma of the breast. Arch. Pathol. Lab. Med. 133, 1515–1538 (2009).

    Article  PubMed  Google Scholar 

  123. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  124. Campbell, H. & Dean, C. B. The consequences of proportional hazards based model selection. Stat. Med. 33, 1042–1056 (2014).

    Article  CAS  PubMed  Google Scholar 

  125. Stensrud, M. J. & Hernán, M. A. Why test for proportional hazards? JAMA 323, 1401–1402 (2020).

    Article  PubMed  Google Scholar 

  126. Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).

    Article  CAS  PubMed  Google Scholar 

  127. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013).

    Article  Google Scholar 

Download references

Acknowledgements

We express sincere appreciation to all CPS-II and CPS-3 participants and to each member of the study and biospecimen management group. We would like to acknowledge the contributions to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries and cancer registries supported by the National Cancer Institute’s Surveillance Epidemiology and End Results Program. We thank the National Cancer Institute for access to NCI’s data collected by the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. We are grateful to the annotation team for the Breast Cancer Semantic Segmentation and NuCLS datasets. We would also like to acknowledge F.M. Howard and A.T. Pearson (University of Chicago) for providing us with the research-use Oncotype DX and MammaPrint scores for TCGA. Figures 14 and 6, and multiple supplementary figures, were created in part using BioRender.com. This work was supported by the US National Institutes of Health grants U01CA220401 and U24CA19436201. The ACS funds the creation, maintenance, and updating of the CPS-II and CPS-3 cohorts.

Author information

Authors and Affiliations

Authors

Contributions

M.A. and L.A.D.C. conceived of the research idea. M.A. carried out data analysis, model development, and model validation. M.A., M.A.T.E., and L.A.D.C. wrote the paper, and J.M.H., C.B., K.P.S., J.A.G., M.M.G., and L.R.T. edited the paper. J.M.H. and S.P. performed data curation for the Cancer Prevention Studies cohorts. K.P.S. provided expertise on breast cancer pathology. C.B., M.M.G., and L.R.T. provided expertise on breast cancer epidemiology and population science and assisted with the interpretation of results. D.A.G. provided assistance with computing and data visualization. L.R.T. and L.A.D.C. jointly supervised the work.

Corresponding author

Correspondence to Lee A. D. Cooper.

Ethics declarations

Competing interests

L.A.D.C. has invention disclosures registered at the Northwestern Office of Innovation and New Ventures, consults for Tempus, and advises Veracyte and Targeted Bioscience. D.A.G. holds stock options in Histowiz LLC and is a cofounder and stockholder of Switchboard, MD. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Po-Hsuan Cameron Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1–4, 6–23, and 29, and Figs. 1–51.

Reporting Summary

Supplementary Data

Excel file containing Supplementary Tables 5 and 24–28.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amgad, M., Hodge, J.M., Elsebaie, M.A.T. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat Med 30, 85–97 (2024). https://doi.org/10.1038/s41591-023-02643-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-023-02643-7

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer