Abstract
Breast cancer is a heterogeneous disease with variable survival outcomes. Pathologists grade the microscopic appearance of breast tissue using the Nottingham criteria, which are qualitative and do not account for noncancerous elements within the tumor microenvironment. Here we present the Histomic Prognostic Signature (HiPS), a comprehensive, interpretable scoring of the survival risk incurred by breast tumor microenvironment morphology. HiPS uses deep learning to accurately map cellular and tissue structures to measure epithelial, stromal, immune, and spatial interaction features. It was developed using a population-level cohort from the Cancer Prevention Study-II and validated using data from three independent cohorts, including the Prostate, Lung, Colorectal, and Ovarian Cancer trial, Cancer Prevention Study-3, and The Cancer Genome Atlas. HiPS consistently outperformed pathologists in predicting survival outcomes, independent of tumor–node–metastasis stage and pertinent variables. This was largely driven by stromal and immune features. In conclusion, HiPS is a robustly validated biomarker to support pathologists and improve patient prognosis.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Supplementary Table 27 contains our calculated histomic feature values, HiPS scores and subscores, and related data for the TCGA cohort. We provide this to facilitate reproducibility and to act as a resource for the scientific community. TCGA clinical data and WSIs are publicly available at gdc.cancer.gov. The Breast Cancer Semantic Segmentation dataset is available at github.com/PathologyDataScience/BCSS, and the NuCLS dataset is available at sites.google.com/view/nucls. These datasets were combined to produce the PanopTILs dataset, available at sites.google.com/view/panoptils. Requests for ACS data from the CPS-II or CPS-3 studies should be submitted to maddison.hall@cancer.org. Requests for PLCO data should be submitted at cdas.cancer.gov/learn/plco. Breast cancer genomic subtypes, hypoxia scores, fraction genome altered, aneuploidy scores, and mRNA expression profiles were obtained from the Genomic Data Commons Pancancer Atlas: gdc.cancer.gov/about-data/publications/pancanatlas. Immune subtypes and related pathway activations, as well as angiogenesis and lymphangiogenesis scores, were obtained from the PanImmune dataset: gdc.cancer.gov/about-data/publications/panimmune. CAF subtype abundance data for TCGA were obtained from a previous study81. xCell cell type abundance data for TCGA were obtained from a previous study61.
Code availability
The code is publicly available at github.com/PathologyDataScience/HiPS. Processing of histology images was performed using HistomicsTK (v.1.2.10, github.com/DigitalSlideArchive/HistomicsTK), histolab (v.0.6.0, github.com/histolab/histolab), and scikit-image (v.0.18.0, scikit-image.org). Analysis of clinical outcomes data was performed using Lifelines (v.0.27.8, github.com/CamDavidsonPilon/lifelines). Enrichment analysis with RNA profiles was performed using GSEAPy (v.1.0.6, gseapy.rtfd.io). Additional Python libraries used for database management, graphical plotting, scientific calculations, and other tasks include numpy v.1.19.4, pandas v.1.1.5, SQLAlchemy v.1.3.21, scipy v.1.5.4, scikit-learn v.0.23.2, imageio v.2.9.0, pillow v.8.0.1, matplotlib v.3.3.3, seaborn v.0.11.0, torch v.1.7.1, torchvision v.0.8.2, and pyvips v.2.1.15.
References
Global Cancer Facts & Figures 4th Edition (American Cancer Society, 2018).
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).
American Joint Commission on Cancer AJCC Cancer Staging Manual 2017 (Springer International Publishing, 2017).
Coughlin, S. S. Social determinants of breast cancer risk, stage, and survival. Breast Cancer Res. Treat. 177, 537–548 (2019).
Li, X. et al. Validation of the newly proposed American Joint Committee on Cancer (AJCC) breast cancer prognostic staging group and proposing a new staging system using the National Cancer Database. Breast Cancer Res. Treat. 171, 303–313 (2018).
Scarff, R. W. & Handley, R. S. Prognosis in carcinoma of the breast. Lancet 232, 582–583 (1938).
BLACK, M. M., OPLER, S. R. & SPEER, F. D. Survival in breast cancer cases in relation to the structure of the primary tumor and regional lymph nodes. Surg. Gynecol. Obstet. 100, 543–551 (1955).
Bloom, H. J. & Richardson, W. W. Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years. Br. J. Cancer 11, 359–377 (1957).
Elston, E. W. & Ellis, I. O. Method for grading breast cancer. J. Clin. Pathol. 46, 189–190 (1993).
Elston, C. W. & Ellis, I. O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19, 403–410 (1991).
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Cardenas, M. A., Prokhnevska, N. & Kissick, H. T. Organized immune cell interactions within tumors sustain a productive T-cell response. Int. Immunol. 33, 27–37 (2021).
Sahai, E. et al. A framework for advancing our understanding of cancer-associated fibroblasts. Nat. Rev. Cancer 20, 174–186 (2020).
Liu, T., Zhou, L., Li, D., Andl, T. & Zhang, Y. Cancer-associated fibroblasts build and secure the tumor microenvironment. Front. Cell Dev. Biol. 7, 60 (2019).
Savas, P. et al. Clinical relevance of host immunity in breast cancer: from TILs to the clinic. Nat. Rev. Clin. Oncol. 13, 228–241 (2016).
Ha, S. Y., Yeo, S.-Y., Xuan, Y. & Kim, S.-H. The prognostic significance of cancer-associated fibroblasts in esophageal squamous cell carcinoma. PLoS ONE 9, e99955 (2014).
Conklin, M. W. et al. Aligned collagen is a prognostic signature for survival in human breast carcinoma. Am. J. Pathol. 178, 1221–1232 (2011).
Provenzano, P. P. et al. Collagen reorganization at the tumor–stromal interface facilitates local invasion. BMC Med. 4, 38 (2006).
Shekhar, M. P., Werdell, J., Santner, S. J., Pauley, R. J. & Tait, L. Breast stroma plays a dominant regulatory role in breast epithelial growth and differentiation: implications for tumor development and progression. Cancer Res. 61, 1320–1326 (2001).
Couture, H. D. et al. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer 4, 30 (2018).
Rawat, R. R. et al. Deep learned tissue “fingerprints” classify breast cancers by ER/PR/Her2 status from H&E images. Sci. Rep. 10, 7275 (2020).
Gamble, P. et al. Determining breast cancer biomarker status and associated morphological features using deep learning. Commun. Med. 1, 14 (2021).
Bychkov, D. et al. Outcome and biomarker supervised deep learning for survival prediction in two multicenter breast cancer series. J. Pathol. Inform. 13, 9 (2022).
Calle, E. E. et al. The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics. Cancer 94, 2490–2501 (2002).
Cancer Genome Atlas NetworkComprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Zhu, C. S. et al. The Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) screening trial pathology tissue resource. Cancer Epidemiol. Biomark. Prev. 25, 1635–1642 (2016).
Patel, A. V. et al. The American Cancer Society’s Cancer Prevention Study 3 (CPS-3): recruitment, study design, and baseline characteristics. Cancer 123, 2014–2024 (2017).
Haralick, R. M., Shanmugam, K. & Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621 (1973).
Doyle, S., Agner, S., Madabhushi, A., Feldman, M. & Tomaszewski, J. Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features. In Proc. 2008 5th IEEE Int. Symposium on Biomedical Imaging: From Nano to Macro 496–499 (IEEE, 2008).
Gurcan, M. N. et al. Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017).
Liu, Y., Han, D., Parwani, A. V. & Li, Z. Applications of artificial intelligence in breast pathology. Arch. Pathol. Lab. Med. https://doi.org/10.5858/arpa.2022-0457-RA (2023).
Abels, E. et al. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association. J. Pathol. 249, 286–294 (2019).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Bychkov, D. et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 8, 3395 (2018).
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770 (2022).
Duanmu, H. et al. A spatial attention guided deep learning system for prediction of pathological complete response using breast cancer histopathology images. Bioinformatics 38, 4605–4612 (2022).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).
Ribeiro, M. T. et al. "Why should i trust you?": explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).
Amgad, M. et al. Explainable nucleus classification using Decision Tree Approximation of Learned Embeddings. Bioinformatics 38, 513–519 (2022).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Leavitt, M. L. & Morcos, A. Towards falsifiable interpretability research. Preprint at arxiv.org/abs/2010.12016 (2020).
Koh, P. W. et al. Concept bottleneck models. in Proc. 37th Int. Conf. on Machine Learning (eds III, H. D. & Singh, A.) Vol. 119, 5338–5348 (PMLR, 2020).
Kirillov, A., He, K., Girshick, R., Rother, C. & Dollar, P. Panoptic segmentation. in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) (2019).
Amgad, M., Salgado, R. & Cooper, L. L. A panoptic segmentation approach for tumor-infiltrating lymphocyte assessment: development of the MuTILs model and PanopTILs dataset. Preprint at medRxiv https://doi.org/10.1101/2022.01.08.22268814 (2023).
Amgad, M., Salgado, R. & Cooper, L. A. D. MuTILs: a multiresolution deep-learning model for interpretable scoring of tumor-infiltrating lymphocytes in breast carcinomas using clinical guidelines. Preprint at medRxiv https://doi.org/10.1101/2022.01.08.22268814 (2022).
Amgad, M. et al. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35, 3461–3467 (2019).
Amgad, M. et al. NuCLS: a scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Gigascience 11, giac037 (2022).
Gutman, D. A. et al. The Digital Slide Archive: a software platform for management, integration, and analysis of histology for cancer research. Cancer Res. 77, e75–e78 (2017).
Schmid, P. et al. Pembrolizumab plus chemotherapy as neoadjuvant treatment of high-risk, early-stage triple-negative breast cancer: results from the phase 1b open-label, multicohort KEYNOTE-173 study. Ann. Oncol. 31, 569–581 (2020).
Liu, J. et al. An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173, 400–416.e11 (2018).
Wang, X. et al. Characteristics of The Cancer Genome Atlas cases relative to U.S. general population cancer cases. Br. J. Cancer 119, 885–892 (2018).
Kalinsky, K. et al. 21-gene assay to inform chemotherapy benefit in node-positive breast cancer. N. Engl. J. Med. 385, 2336–2347 (2021).
Paik, S. et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004).
van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).
van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).
Howard, F. M. et al. Integration of clinical features and deep learning on pathology for the prediction of breast cancer recurrence assays and risk of recurrence. NPJ Breast Cancer 9, 25 (2023).
Lehmann, B. D. et al. Multi-omics analysis identifies therapeutic vulnerabilities in triple-negative breast cancer subtypes. Nat. Commun. 12, 6276 (2021).
Aran, D., Hu, Z. & Butte, A. J. xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol. 18, 220 (2017).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173, 291–304.e6 (2018).
Berger, A. C. et al. A comprehensive pan-cancer molecular study of gynecologic and breast cancers. Cancer Cell 33, 690–705.e9 (2018).
Bhandari, V. et al. Molecular landmarks of tumor hypoxia across cancer types. Nat. Genet. 51, 308–318 (2019).
Buffa, F. M., Harris, A. L., West, C. M. & Miller, C. J. Large meta-analysis of multiple cancers reveals a common, compact and highly prognostic hypoxia metagene. Br. J. Cancer 102, 428–435 (2010).
Winter, S. C. et al. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. Cancer Res. 67, 3441–3449 (2007).
Ragnum, H. B. et al. The tumour hypoxia marker pimonidazole reflects a transcriptional programme associated with aggressive prostate cancer. Br. J. Cancer 112, 382–390 (2015).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
DeNardo, D. G. et al. Leukocyte complexity predicts breast cancer survival and functionally regulates response to chemotherapy. Cancer Discov. 1, 54–67 (2011).
Mahmoud, S. M. A. et al. Tumor-infiltrating CD8+ lymphocytes predict clinical outcome in breast cancer. J. Clin. Oncol. 29, 1949–1955 (2011).
Oh, H. & Ghosh, S. NF-κB: roles and regulation in different CD4(+) T-cell subsets. Immunol. Rev. 252, 41–51 (2013).
Olkhanud, P. B. et al. Tumor-evoked regulatory B cells promote breast cancer metastasis by converting resting CD4+ T cells to T-regulatory cells. Cancer Res. 71, 3505–3515 (2011).
Varn, F. S., Mullins, D. W., Arias-Pulido, H., Fiering, S. & Cheng, C. Adaptive immunity programmes in breast cancer. Immunology 150, 25–34 (2017).
Thorsson, V. et al. The immune landscape of cancer. Immunity 48, 812–830.e14 (2018).
Chang, H. Y. et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol. 2, e7 (2004).
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193.e7 (2018).
Costa, A. et al. Fibroblast heterogeneity and immunosuppressive environment in human breast cancer. Cancer Cell 33, 463–479.e10 (2018).
Li, B. et al. Cell-type deconvolution analysis identifies cancer-associated myofibroblast component as a poor prognostic factor in multiple cancer types. Oncogene 40, 4686–4694 (2021).
Mhaidly, R. & Mechta-Grigoriou, F. Fibroblast heterogeneity in tumor micro-environment: role in immunosuppression and new therapies. Semin. Immunol. 48, 101417 (2020).
Asif, P. J., Longobardi, C., Hahne, M. & Medema, J. P. The role of cancer-associated fibroblasts in cancer invasion and metastasis. Cancers 13, 4720 (2021).
Kim, I., Choi, S., Yoo, S., Lee, M. & Kim, I.-S. Cancer-associated fibroblasts in the hypoxic tumor microenvironment. Cancers 14, 3321 (2022).
Ebbing, E. A. et al. Stromal-derived interleukin 6 drives epithelial-to-mesenchymal transition and therapy resistance in esophageal adenocarcinoma. Proc. Natl Acad. Sci. USA 116, 2237–2242 (2019).
Yu, Y. et al. Cancer-associated fibroblasts induce epithelial-mesenchymal transition of breast cancer cells through paracrine TGF-β signalling. Br. J. Cancer 110, 724–732 (2014).
Mariotto, A. et al. Expected monetary impact of Oncotype DX score-concordant systemic breast cancer therapy based on the TAILORx trial. J. Natl Cancer Inst. 112, 154–160 (2020).
Davis, B. A. et al. Racial and ethnic disparities in Oncotype DX test receipt in a statewide population-based study. J. Natl Compr. Canc. Netw. 15, 346–354 (2017).
Losk, K. et al. Factors associated with delays in chemotherapy initiation among patients with breast cancer at a comprehensive cancer center. J. Natl Compr. Canc. Netw. 14, 1519–1526 (2016).
Yousif, M. et al. Artificial intelligence applied to breast pathology. Virchows Arch. 480, 191–209 (2022).
Abubakar, M. et al. Tumor-associated stromal cellular density as a predictor of recurrence and mortality in breast cancer: results from ethnically diverse study populations. Cancer Epidemiol. Biomark. Prev. 30, 1397–1407 (2021).
Li, H. et al. Collagen fiber orientation disorder from H&E images is prognostic for early stage breast cancer: clinical trial validation. NPJ Breast Cancer 7, 104 (2021).
Chen, Y. et al. Computational pathology improves risk stratification of a multi-gene assay for early stage ER+ breast cancer. NPJ Breast Cancer 9, 40 (2023).
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Bejnordi, B. E. et al. Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images. In Proc. IEEE Int. Symp. Biomed. Imaging 929–932 (2017).
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Ehteshami Bejnordi, B. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).
Bilal, M. et al. Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study. Lancet Digit. Health 3, e763–e772 (2021).
Mercan, C. et al. Deep learning for fully-automated nuclear pleomorphism scoring in breast cancer. NPJ Breast Cancer 8, 120 (2022).
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).
Karagiannis, G. S. et al. Cancer-associated fibroblasts drive the progression of metastasis through both paracrine and mechanical pressure on cancer tissue. Mol. Cancer Res. 10, 1403–1418 (2012).
Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
He, L. et al. Association between levels of tumor-infiltrating lymphocytes in different subtypes of primary breast tumors and prognostic outcomes: a meta-analysis. BMC Womens Health 20, 194 (2020).
Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).
AbdulJabbar, K. et al. Geospatial immune variability illuminates differential evolution of lung adenocarcinoma. Nat. Med. 26, 1054–1062 (2020).
Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).
Amgad, M. et al. Report on computational assessment of tumor infiltrating lymphocytes from the International Immuno-Oncology Biomarker Working Group. NPJ Breast Cancer 6, 16 (2020).
Ping, Z. et al. A microscopic landscape of the invasive breast cancer genome. Sci. Rep. 6, 27545 (2016).
Thennavan, A. et al. Molecular analysis of TCGA breast cancer histologic types. Cell Genom. 1, 100067 (2021).
Garfinkel, L. Selection, follow-up, and analysis in the American Cancer Society prospective studies. Natl Cancer Inst. Monogr. 67, 49–52 (1985).
Stellman, S. D. & Garfinkel, L. Smoking habits and tar levels in a new American Cancer Society prospective study of 1.2 million men and women. J. Natl Cancer Inst. 76, 1057–1063 (1986).
Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer, 2015).
van Rijthoven, M., Balkenhol, M., Siliņa, K., van der Laak, J. & Ciompi, F. HookNet: multi-resolution convolutional neural networks for semantic segmentation in histopathology whole-slide images. Med. Image Anal. 68, 101890 (2021).
Steyerberg, E. W. & Harrell, F. E. Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016).
Marcolini, A. et al. histolab: a Python library for reproducible digital pathology preprocessing with automated testing. SoftwareX 20, 101237 (2022).
Achanta, R. et al. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012).
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. in 2009 IEEE Int. Symposium on Biomedical Imaging: from Nano to Macro 1107–1110 (IEEE, 2009); https://doi.org/10.1109/ISBI.2009.5193250
Ripley, B. D. The second-order analysis of stationary point processes. J. Appl. Probab. 13, 255–266 (1976).
Amgad, M., Itoh, A. & Tsui, M. M. K. Extending Ripley’s K-function to quantify aggregation in 2-D grayscale images. PLoS ONE 10, e0144404 (2015).
Lester, S. C. et al. Protocol for the examination of specimens from patients with invasive carcinoma of the breast. Arch. Pathol. Lab. Med. 133, 1515–1538 (2009).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Campbell, H. & Dean, C. B. The consequences of proportional hazards based model selection. Stat. Med. 33, 1042–1056 (2014).
Stensrud, M. J. & Hernán, M. A. Why test for proportional hazards? JAMA 323, 1401–1402 (2020).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14, 128 (2013).
Acknowledgements
We express sincere appreciation to all CPS-II and CPS-3 participants and to each member of the study and biospecimen management group. We would like to acknowledge the contributions to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries and cancer registries supported by the National Cancer Institute’s Surveillance Epidemiology and End Results Program. We thank the National Cancer Institute for access to NCI’s data collected by the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. We are grateful to the annotation team for the Breast Cancer Semantic Segmentation and NuCLS datasets. We would also like to acknowledge F.M. Howard and A.T. Pearson (University of Chicago) for providing us with the research-use Oncotype DX and MammaPrint scores for TCGA. Figures 1–4 and 6, and multiple supplementary figures, were created in part using BioRender.com. This work was supported by the US National Institutes of Health grants U01CA220401 and U24CA19436201. The ACS funds the creation, maintenance, and updating of the CPS-II and CPS-3 cohorts.
Author information
Authors and Affiliations
Contributions
M.A. and L.A.D.C. conceived of the research idea. M.A. carried out data analysis, model development, and model validation. M.A., M.A.T.E., and L.A.D.C. wrote the paper, and J.M.H., C.B., K.P.S., J.A.G., M.M.G., and L.R.T. edited the paper. J.M.H. and S.P. performed data curation for the Cancer Prevention Studies cohorts. K.P.S. provided expertise on breast cancer pathology. C.B., M.M.G., and L.R.T. provided expertise on breast cancer epidemiology and population science and assisted with the interpretation of results. D.A.G. provided assistance with computing and data visualization. L.R.T. and L.A.D.C. jointly supervised the work.
Corresponding author
Ethics declarations
Competing interests
L.A.D.C. has invention disclosures registered at the Northwestern Office of Innovation and New Ventures, consults for Tempus, and advises Veracyte and Targeted Bioscience. D.A.G. holds stock options in Histowiz LLC and is a cofounder and stockholder of Switchboard, MD. The other authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Po-Hsuan Cameron Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Tables 1–4, 6–23, and 29, and Figs. 1–51.
Supplementary Data
Excel file containing Supplementary Tables 5 and 24–28.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Amgad, M., Hodge, J.M., Elsebaie, M.A.T. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat Med 30, 85–97 (2024). https://doi.org/10.1038/s41591-023-02643-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-023-02643-7
This article is cited by
-
tRigon: an R package and Shiny App for integrative (path-)omics data analysis
BMC Bioinformatics (2024)
-
Towards a general-purpose foundation model for computational pathology
Nature Medicine (2024)
-
Clinical evaluation of deep learning-based risk profiling in breast cancer histopathology and comparison to an established multigene assay
Breast Cancer Research and Treatment (2024)