Abstract

The rates and routes of lethal systemic spread in breast cancer are poorly understood owing to a lack of molecularly characterized patient cohorts with long-term, detailed follow-up data. Long-term follow-up is especially important for those with oestrogen-receptor (ER)-positive breast cancers, which can recur up to two decades after initial diagnosis1,2,3,4,5,6. It is therefore essential to identify patients who have a high risk of late relapse7,8,9. Here we present a statistical framework that models distinct disease stages (locoregional recurrence, distant recurrence, breast-cancer-related death and death from other causes) and competing risks of mortality from breast cancer, while yielding individual risk-of-recurrence predictions. We apply this model to 3,240 patients with breast cancer, including 1,980 for whom molecular data are available, and delineate spatiotemporal patterns of relapse across different categories of molecular information (namely immunohistochemical subtypes; PAM50 subtypes, which are based on gene-expression patterns10,11; and integrative or IntClust subtypes, which are based on patterns of genomic copy-number alterations and gene expression12,13). We identify four late-recurring integrative subtypes, comprising about one quarter (26%) of tumours that are both positive for ER and negative for human epidermal growth factor receptor 2, each with characteristic tumour-driving alterations in genomic copy number and a high risk of recurrence (mean 47–62%) up to 20 years after diagnosis. We also define a subgroup of triple-negative breast cancers in which cancer rarely recurs after five years, and a separate subgroup in which patients remain at risk. Use of the integrative subtypes improves the prediction of late, distant relapse beyond what is possible with clinical covariates (nodal status, tumour size, tumour grade and immunohistochemical subtype). These findings highlight opportunities for improved patient stratification and biomarker-driven clinical trials.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

All code and scripts are available for academic use at https://github.com/cclab-brca/brcarepred.

Data availability

The genomic copy number, gene-expression and molecular-subtype information has been described previously12 and is available at the European Genome-Phenome Archive at https://www.ebi.ac.uk/ega/studies/EGAS00000000083. Clinical data are available in Supplementary Tables 58. The breast-cancer-recurrence predictor is available as a web application for academic use at https://caldaslab.cruk.cam.ac.uk/brcarepred.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Blows, F. M. et al. Subtyping of breast cancer by immunohistochemistry to investigate a relationship between subtype and short and long term survival: a collaborative analysis of data for 10,159 cases from 12 studies. PLoS Med. 7, e1000279 (2010).

  2. 2.

    Davies, C. et al. Long-term effects of continuing adjuvant tamoxifen to 10 years versus stopping at 5 years after diagnosis of oestrogen receptor-positive breast cancer: ATLAS, a randomised trial. Lancet 381, 805–816 (2013).

  3. 3.

    Sestak, I. et al. Factors predicting late recurrence for estrogen receptor-positive breast cancer. J. Natl Cancer Inst. 105, 1504–1511 (2013).

  4. 4.

    Sgroi, D. C. et al. Prediction of late distant recurrence in patients with oestrogen-receptor-positive breast cancer: a prospective comparison of the breast-cancer index (BCI) assay, 21-gene recurrence score, and IHC4 in the TransATAC study population. Lancet Oncol. 14, 1067–1076 (2013).

  5. 5.

    Pan, H. et al. 20-year risks of breast-cancer recurrence after stopping endocrine therapy at 5 years. N. Engl. J. Med. 377, 1836–1846 (2017).

  6. 6.

    Dowsett, M. et al. Integration of clinical variables for the prediction of late distant recurrence in patients with estrogen receptor-positive breast cancer treated with 5 years of endocrine therapy: CTS5. J. Clin. Oncol. 36, 1941–1948 (2018).

  7. 7.

    Harris, L. N. et al. Use of biomarkers to guide decisions on adjuvant systemic therapy for women with early-stage invasive breast cancer: American Society of Clinical Oncology clinical practice guideline. J. Clin. Oncol. 34, 1134–1150 (2016).

  8. 8.

    Sledge, G. W. et al. Past, present, and future challenges in breast cancer treatment. J. Clin. Oncol. 32, 1979–1986 (2014).

  9. 9.

    Richman, J. & Dowsett, M. Beyond 5 years: enduring risk of recurrence in oestrogen receptor-positive breast cancer. Nat. Rev. Clin. Oncol. 1, https://doi.org/10.1038/s41571-018-0145-5 (2018).

  10. 10.

    Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000).

  11. 11.

    Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).

  12. 12.

    Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

  13. 13.

    Ali, H. R. et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biol. 15, 431 (2014).

  14. 14.

    Putter, H., van der Hage, J., de Bock, G. H., Elgalta, R. & van de Velde, C. J. H. Estimation and prediction in a multi-state model for breast cancer. Biom. J. 48, 366–380 (2006).

  15. 15.

    Fisher, B. et al. Significance of ipsilateral breast tumour recurrence after lumpectomy. Lancet 338, 327–331 (1991).

  16. 16.

    Insa, A. et al. Prognostic factors predicting survival from first recurrence in patients with metastatic breast cancer: analysis of 439 patients. Breast Cancer Res. Treat. 56, 67–78 (1999).

  17. 17.

    Putter, H., Fiocco, M. & Geskus, R. B. Tutorial in biostatistics: competing risks and multi-state models. Stat. Med. 26, 2389–2430 (2007).

  18. 18.

    Wishart, G. C. et al. PREDICT: a new UK prognostic model that predicts survival following surgery for invasive breast cancer. Breast Cancer Res. 12, R1 (2010); erratum 12, 401 (2010).

  19. 19.

    Michaelson, J. S. et al. Improved web-based calculators for predicting breast carcinoma outcomes. Breast Cancer Res. Treat. 128, 827–835 (2011).

  20. 20.

    Ormandy, C. J., Musgrove, E. A., Hui, R., Daly, R. J. & Sutherland, R. L. Cyclin D1, EMS1 and 11q13 amplification in breast cancer. Breast Cancer Res. Treat. 78, 323–335 (2003).

  21. 21.

    Sanchez-Garcia, F. et al. Integration of genomic data enables selective discovery of breast cancer drivers. Cell 159, 1461–1475 (2014).

  22. 22.

    Shrestha, Y. et al. PAK1 is a breast cancer oncogene that coordinately activates MAPK and MET signaling. Oncogene 31, 3397–3408 (2012).

  23. 23.

    Holland, D. G. et al. ZNF703 is a common luminal B breast cancer oncogene that differentially regulates luminal and basal progenitors in human mammary epithelium. EMBO Mol. Med. 3, 167–180 (2011).

  24. 24.

    Reis-Filho, J. S. et al. FGFR1 emerges as a potential therapeutic target for lobular breast carcinomas. Clin. Cancer Res. 12, 6652–6662 (2006).

  25. 25.

    Liu, H. et al. Pharmacologic targeting of S6K1 in PTEN-deficient neoplasia. Cell Reports 18, 2088–2095 (2017).

  26. 26.

    Delmore, J. E. et al. BET bromodomain inhibition as a therapeutic strategy to target c-Myc. Cell 146, 904–917 (2011).

  27. 27.

    Pearson, A. et al. High-level clonal FGFR amplification and response to FGFR inhibition in a translational clinical trial. Cancer Discov. 6, 838–851 (2016).

  28. 28.

    Wapnir, I. L. et al. A randomized clinical trial of adjuvant chemotherapy for radically resected locoregional relapse of breast cancer: IBCSG 27-02, BIG 1-02, and NSABP B-37. Clin. Breast Cancer 8, 287–292 (2008).

  29. 29.

    Clark, G. M., Sledge, G. W. Jr, Osborne, C. K. & McGuire, W. L. Survival from first recurrence: relative importance of prognostic factors in 1,015 breast cancer patients. J. Clin. Oncol. 5, 55–61 (1987).

  30. 30.

    Kennecke, H. et al. Metastatic behavior of breast cancer subtypes. J. Clin. Oncol. 28, 3271–3277 (2010).

  31. 31.

    Fix, E. & Neyman, J. A simple stochastic model of recovery, relapse, death and loss of patients. Hum. Biol. 23, 205–241 (1951).

  32. 32.

    Broët, P. et al. Analyzing prognostic factors in breast cancer using a multistate model. Breast Cancer Res. Treat. 54, 83–89 (1999).

  33. 33.

    Meier-Hirmer, C. & Schumacher, M. Multi-state model for studying an intermediate event using time-dependent covariates: application to breast cancer. BMC Med. Res. Methodol. 13, 80 (2013).

  34. 34.

    Therneau, T. M. & Grambsch, P. M. Modeling Survival Data: Extending the Cox Model (Springer, New York, 2000).

  35. 35.

    de Wreede, L. C., Fiocco, M. & Putter, H. mstate: an R package for the analysis of competing risks and multi-state models. J. Stat. Software 38, 1–30 (2011).

  36. 36.

    Klein, J. P., Keiding, N. & Copelan, E. A. Plotting summary predictions in multistate survival models: probabilities of relapse and death in remission for bone marrow transplantation patients. Stat. Med. 12, 2315–2332 (1993).

  37. 37.

    Aalen, O., Borgan, O. & Gjessing, H. Survival and Event History Analysis—A Process Point of View (Springer, New York, 2008).

  38. 38.

    Fiocco, M., Putter, H. & van Houwelingen, H. C. Reduced-rank proportional hazards regression and simulation-based prediction for multi-state models. Stat. Med. 27, 4340–4358 (2008).

  39. 39.

    Hothorn, T., Bretz, F. & Westfall, P. Simultaneous inference in general parametric models. Biom. J. 50, 346–363 (2008).

  40. 40.

    Dunnett, C. W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 50, 1096–1121 (1955).

  41. 41.

    Prentice, R. L., Williams, B. J. & Peterson, A. V. On the regression analysis of multivariate failure time data. Biometrika 68, 373–379 (1981).

  42. 42.

    Harrell, F. E. J. Regression Modeling Strategies (Springer, 2001).

  43. 43.

    Li, Y. et al. Amplification of LAPTM4B and YWHAZ contributes to chemotherapy resistance and recurrence of breast cancer. Nat. Med. 16, 214–218 (2010).

  44. 44.

    Clarke, C. et al. Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. Carcinogenesis 34, 2300–2308 (2013).

  45. 45.

    Loi, S. et al. Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9, 239 (2008).

  46. 46.

    Nagalla, S. et al. Interactions between immunity, proliferation and molecular subtype in breast cancer prognosis. Genome Biol. 14, R34 (2013).

  47. 47.

    Schmidt, M. et al. The humoral immune system has a key prognostic impact in node-negative breast cancer. Cancer Res. 68, 5405–5413 (2008).

  48. 48.

    Desmedt, C. et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin. Cancer Res. 13, 3207–3214 (2007).

  49. 49.

    Miller, L. D. et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl Acad. Sci. USA 102, 13550–13555 (2005); correction 102, 17882 (2005).

  50. 50.

    Gautier, L., Cope, L., Bolstad, B. M. & Irizarry, R. A. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).

  51. 51.

    Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

  52. 52.

    Gendoo, D. M. A. et al. Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 32, 1097–1099 (2016).

  53. 53.

    Schröder, M. S., Culhane, A. C., Quackenbush, J. & Haibe-Kains, B. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. Bioinformatics 27, 3206–3208 (2011).

  54. 54.

    R Core Team.  R: A Language and Environment for Statistical Computing. http://www.r-project.org/ (2015).

Download references

Acknowledgements

We thank the women who participated in this study and the UK Cancer Registry. O.M.R. was supported by a Cancer Research UK (CRUK) travel grant (SWAH/047) to visit C. Curtis’ laboratory. C.R. is supported by award MTM2015-71217-R. C. Caldas is supported by ECMC, NIHR, the Mark Foundation for Cancer Research and Cancer Research UK Cambridge Centre (C9685/A25177). C. Curtis is supported by the National Institutes of Health through the NIH Director’s Pioneer Award (DP1-CA238296), the American Association for Cancer Research and the Breast Cancer Research Foundation. This study is dedicated to J.M.W. and J.N.W.

Reviewer information

Nature thanks Jeff Gerold, Martin A. Nowak, Peter Van Loo and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Author notes

  1. These authors contributed equally: Stephen-John Sammut, Jose A. Seoane.

Affiliations

  1. Cancer Research UK Cambridge Institute and Department of Oncology, Li Ka Shing Centre, University of Cambridge, Cambridge, UK

    • Oscar M. Rueda
    • , Stephen-John Sammut
    • , Suet-Feung Chin
    • , Maurizio Callari
    • , Rajbir Batra
    • , Bernard Pereira
    • , Alejandra Bruna
    • , H. Raza Ali
    • , Bin Liu
    • , Paul D. Pharoah
    •  & Carlos Caldas
  2. Department of Medicine, Division of Oncology, Stanford University School of Medicine, Stanford, CA, USA

    • Jose A. Seoane
    • , Jennifer L. Caswell-Jin
    •  & Christina Curtis
  3. Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA

    • Jose A. Seoane
    •  & Christina Curtis
  4. Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA

    • Jose A. Seoane
    •  & Christina Curtis
  5. Cambridge Breast Unit, Addenbrooke’s Hospital, Cambridge University Hospital NHS Foundation Trust, Cambridge, UK

    • Elena Provenzano
    • , Paul D. Pharoah
    •  & Carlos Caldas
  6. NIHR Cambridge Biomedical Research Centre and Cambridge Experimental Cancer Medicine Centre, Cambridge University Hospital NHS Foundation Trust, Cambridge, UK

    • Elena Provenzano
    • , Paul D. Pharoah
    •  & Carlos Caldas
  7. Research Institute in Oncology and Hematology, Winnipeg, Manitoba, Canada

    • Michelle Parisien
    •  & Leigh Murphy
  8. NIHR Comprehensive Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and Research Oncology, Cancer Division, King’s College London, London, UK

    • Cheryl Gillett
    •  & Arnie Purushotham
  9. Department of Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada

    • Steven McKinney
    •  & Samuel Aparicio
  10. Division of Cancer and Stem Cells, School of Medicine, University of Nottingham and Nottingham University Hospital NHS Trust, Nottingham, UK

    • Andrew R. Green
    •  & Ian O. Ellis
  11. Strangeways Research Laboratory, University of Cambridge, Cambridge, UK

    • Paul D. Pharoah
  12. Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Valladolid, Spain

    • Cristina Rueda

Authors

  1. Search for Oscar M. Rueda in:

  2. Search for Stephen-John Sammut in:

  3. Search for Jose A. Seoane in:

  4. Search for Suet-Feung Chin in:

  5. Search for Jennifer L. Caswell-Jin in:

  6. Search for Maurizio Callari in:

  7. Search for Rajbir Batra in:

  8. Search for Bernard Pereira in:

  9. Search for Alejandra Bruna in:

  10. Search for H. Raza Ali in:

  11. Search for Elena Provenzano in:

  12. Search for Bin Liu in:

  13. Search for Michelle Parisien in:

  14. Search for Cheryl Gillett in:

  15. Search for Steven McKinney in:

  16. Search for Andrew R. Green in:

  17. Search for Leigh Murphy in:

  18. Search for Arnie Purushotham in:

  19. Search for Ian O. Ellis in:

  20. Search for Paul D. Pharoah in:

  21. Search for Cristina Rueda in:

  22. Search for Samuel Aparicio in:

  23. Search for Carlos Caldas in:

  24. Search for Christina Curtis in:

Contributions

O.M.R., C. Caldas and C. Curtis conceived the study. O.M.R. performed statistical analyses and implemented the model. J.A.S. compiled the validation cohort and performed statistical analyses. S.-J.S. led the annotation of clinical samples, with input from S.-F.C., M.C., R.B., B.P., A.B., H.R.A., E.P., B.L., M.P., C.G., S.M., A.R.G., L.M., A.P., I.O.E., S.A. and C. Caldas. A.R.G., L.M., A.P., I.O.E., S.A. and C. Caldas provided data. P.D.P. and C.R. provided statistical advice. C. Caldas and S.A. are METABRIC principal investigators. O.M.R., J.A.S., J.L.C.-J., C. Caldas and C. Curtis interpreted the results. O.M.R., J.L.C.-J., C. Caldas and C. Curtis wrote the manuscript, which was approved by all authors. C. Caldas and C. Curtis supervised the study.

Competing interests

S.A. is founder and shareholder of Contextual Genomic and a scientific advisor to Sangamo Biosciences and Takeda Pharmaceuticals. C. Caldas is a scientific advisor to AstraZeneca-iMed and has received research funding from AstraZeneca, Servier and Genentech/Roche. C. Curtis is a scientific advisory board member and shareholder of GRAIL and consultant for GRAIL and Genentech. A patent application has been filed on aspects of the described work, entitled ‘Methods of treatment based upon molecular characterization of breast cancer’ (C. Curtis, C. Caldas, J.A.S. and O.M.R.).

Corresponding authors

Correspondence to Carlos Caldas or Christina Curtis.

Extended data figures and tables

  1. Extended Data Fig. 1 Description of the cohorts used in this study.

    a, Description of the METABRIC discovery cohort, clinical characteristics and flow chart of sample inclusion for analysis. b, Description of the validation cohort, clinical characteristics and flow chart of sample inclusion for analysis. DRFS, distant-relapse-free survival; DSS, disease-specific survival; OS, overall survival; RFS, relapse-free survival. The cohorts are as follows: GSE19615 (DFHCC cohort43), GSE42568 (Dublin cohort44), GSE9195 (Guyt2 cohort45), GSE45255 (IRB/JNR/NUH cohort46), GSE11121 (Maintz cohort47), GSE6532 (TAM cohort45), GSE7390 (Transbig cohort48) and GSE3494 (Upp cohort49). NA, not available.

  2. Extended Data Fig. 2 Effect of censoring nonmalignant deaths on the estimation of disease-specific survival, and prognostic value of clinical covariates at different disease states.

    a, Cumulative incidence computed as 1 − Kaplan–Meier (KM) estimator, using only disease-specific death as an end point and censoring other types of death. b, Cumulative incidence computed using a competing-risk model that takes into account different causes of death. The bias of the 1 − Kaplan–Meier estimator is visible. c, Distribution of age at the time of diagnosis for ER-negative and ER-positive patients. The number of patients in each group is indicated in all panels. This analysis was done with the full dataset. Box plots were computed using the median of the observations (centre line). The first and third quartiles are shown as boxes, and the whiskers extend to the ±1.58 interquartile range divided by the square root of the sample size. Outliers are shown as dots. d, log hazard ratios calculated using the multistate model stratified by ER status (n = 3,147) for different covariates, namely grade, lymph-node (LN) status, tumour size (size), time from surgery and time from local relapse (LR). log hazard ratios are shown for different states, including post-surgery (PS; hazard ratio of progressing to relapse or DSD), locoregional recurrence (LR; hazard ratio of progressing to distant relapse or DSD) and distant recurrence (DR; hazard ratio of cancer-specific death). 95% confidence intervals are shown. This analysis was done with the full dataset.

  3. Extended Data Fig. 3 Model calibration and validation in an external dataset.

    a, Internal validation of the global predictions of the models on all transitions using bootstrap (n = 200). Discriminant measures of predictive ability are shown on the x axis, as described in the Methods section ‘Model validation and calibration’. The y axis shows the optimism, that is, the difference between the training predictive ability and the test predictive ability of the discriminant measures (see Methods). b, Internal calibration of the global predictions of the models on all transitions using bootstrap (n = 200). The distribution of the mean absolute error between observed and predicted is plotted. c, External calibration of DSD risk and nonmalignant death risk using PREDICT 2.1 (n = 1,841). The distribution of the mean absolute error between the predictions of PREDICT and our model based on ER status only is plotted. ac, Box plots were computed using the median of the observations (centre line). The first and third quartiles are shown as boxes, and the whiskers extend to the ±1.58 interquartile range divided by the square root of the sample size (see Methods). d, Scatter plot of the predictions of DSD risk computed by PREDICT and our model based on the IntClust subtypes only at ten years (n = 1,841; see Methods). The Pearson correlation is shown. e, Concordance index (C-index) of prediction of risk of distant relapse (DRFS), disease-specific death (disease-specific survival, DSS), death (overall survival, OS) and relapse (RFS) in the 178 withheld METABRIC samples and in a metacohort composed of eight published studies among ER+/HER2 patients in the high-risk IntClust subtypes, where results are shown for individual cohorts and the combined metacohort (see Methods and Supplementary Information). Error bars correspond to 95% confidence intervals for the C-index. The number of patients in each group is indicated on the right.

  4. Extended Data Fig. 4 Different subtypes have distinct probabilities of recurrence.

    a, Average probability of experiencing a distant relapse (defined as the probability of having a distant relapse at any point followed by any other transition) or cancer-related death for the high-risk ER+ IntClust (IC) subtypes (IC1 n = 134, IC6 n = 81, IC9 n = 134, IC2 n = 69) relative to IC3 (n = 269), the ER+ subgroup with the best prognosis. This analysis was restricted to ER+/HER2 cases, which represent the vast majority for each of these subtypes. Error bars represent 95% confidence intervals around the mean. b, As for a, but showing the average probability of experiencing distant recurrence or cancer-related death after a local recurrence (IC1 n = 21, IC6 n = 10, IC9 n = 21, IC2 n = 13, IC3 n = 30). c, Average probability of recurrence (distant relapse or cancer-specific death) after locoregional relapse for all patients in each of the 11 IntClust subtypes. d, Median time until an additional relapse (distant recurrence or cancer-specific death) after local recurrence for all patients in each of the 11 IntClust subtypes (n = 270). This has been computed using a Kaplan–Meier approach with competing risks of progression and nonmalignant death. Error bars represent 95% confidence intervals around the median time. Asterisks denote situations in which the median time cannot be computed because fewer than 50% of the patients relapsed. This analysis was done with the molecular dataset. e, Average probability of cancer-related death after distant recurrence for all patients by subtype. f, As for d, except that the median time until cancer-specific death after distant recurrence is shown (n = 596). g, Mean probabilities of relapse after surgery and after five and ten disease-free years (see Methods and Supplementary Table 4) for the patients in each of the four IHC subtypes. Error bars represent 95% confidence intervals. The number of patients in each group is indicated. hk, As for cf, but for the IHC subtypes (same sample sizes). l, As for g, but for the PAM50 subtypes. The number of patients in each group is indicated. mp, As for hk, but for the PAM50 subtypes (with the same sample sizes, except for p where n = 593).

  5. Extended Data Fig. 5 The ER/HER2 integrative subtypes exhibit distinct risks of relapse.

    The probabilities of distant relapse or cancer-related death among ER/HER2 patients who were disease-free at five years after diagnosis reveal marked differences in the risk of relapse for TNBC IntClust subtype IC4ER versus the IC10 (basal-like enriched) subtype. Here the base clinical model with IHC subtypes is compared with the base clinical model plus IntClust subtype information. Error bars represent 95% confidence intervals. The number of patients in each group is indicated.

  6. Extended Data Fig. 6 Subtype-specific risks of relapse after locoregional relapse.

    Transition probabilities from locoregional recurrence to other states for individual average patients, stratified on the basis of ER, IHC, PAM50 or IntClust subtype. 95% confidence bands were computed using bootstrap. This analysis was done with the full dataset for the comparisons between ER+ and ER, and the molecular dataset for the remainder.

  7. Extended Data Fig. 7 Associations between probabilities of distant relapse ten years after locoregional relapse with clinico-pathological and molecular features of the primary tumour.

    For each patient that had a locoregional recurrence, the ten-year probability of having a distant relapse or cancer-related death is plotted against different variables. A loess fit is overlaid to highlight the relationship between the probability and tumour size or time of relapse. Box plots were computed using the median of the observations (centre line). The first and third quartiles are shown as boxes, and the whiskers extend to the ±1.58 interquartile range divided by the square root of the sample size. Outliers are shown as dots. This analysis was done with the molecular dataset and the model was stratified by IntClust subtype (n = 257).

  8. Extended Data Fig. 8 Subtype-specific risks of cancer-related death after a distant relapse.

    Transition probabilities from distant relapse to other states for individual average patients stratified on the basis of ER, IHC, PAM50 or IntClust subtype. 95% confidence bands were computed using bootstrap. This analysis was done with the full dataset for the comparisons between ER+ and ER, and the molecular dataset for the remainder.

  9. Extended Data Fig. 9 Distribution of the number of relapses by molecular subtype.

    a, Times of distant recurrence for ER and ER+ patients (n = 605). Each dot represents a distant recurrence, coded by colour for different sites. b, Distribution of the number of distant relapses for different subtypes (n = 609), based on ER status (ER+ n = 422, ER n = 187), IHC ER/HER2 status (ER+/HER2 n = 263, ER/HER2 n = 82, ER+/HER2+ n = 36, ER/HER2+ n = 41), PAM50 subtype (normal n = 33, luminal A n = 101, luminal B n = 138, basal n = 79, HER2 = 69) and IntClust subtype (IC1 n = 40, IC2 n = 25, IC3 n = 32, IC4ER+ n = 46, IC4ER n = 16, IC5 n = 72, IC6 n = 23, IC7 n = 24, IC8 n = 54, IC9 n = 38, IC10 n = 52). ER status was imputed on the basis of expression in four samples. These analyses were done with the recurrent-events cohort.

  10. Extended Data Fig. 10 Site-specific patterns of relapse in the IHC, PAM50 and IntClust subtypes.

    a, Left, percentages of patients with metastases at a given site in the IHC subtypes (bar plots, total numbers also indicated). Upright triangles indicate significant positive differences in that group with respect to the overall mean and inverted triangles indicate significant negative differences in that group with respect to the overall mean using simultaneous testing of all sites (see Methods). Location of metastatic sites is not anatomically accurate. Right, cumulative incidence functions (as 1 − Kaplan–Meier estimates) for each site of metastasis in the IHC subtypes. The same patient can have multiple sites of metastasis. b, As for a, but for the PAM50 subtypes. c, As for a, but for the IntClust subtypes. These analyses were done with the recurrent-events cohort. Female silhouettes are from the public-domain human body diagrams at https://commons.wikimedia.org/wiki/Human_body_diagrams.

Supplementary information

  1. Supplementary Information

    Supplementary Methods.

  2. Reporting Summary

  3. Supplementary Table 1

    Summary of clinico-pathological features of the cohort according to ER status (based on the full dataset) and for the IHC, PAM50 and IntClust subtypes (based on the molecular dataset).

  4. Supplementary Table 2

    Number of transitions between each state in the multistate model according to ER status (based on the full dataset) and for the IHC, PAM50 and IntClust subtypes (based on the molecular dataset).

  5. Supplementary Table 3

    Proportion of cases classified into each IntClust subtype mapping onto the IHC and PAM50 subtypes within the molecular dataset.

  6. Supplementary Table 4

    Transition probabilities and standard errors for each of the breast cancer subgroups. a, Predictions for each subgroup were computed taking the average and the standard deviation of the probabilities of all patients in each group. Standard deviations represent variability within each subtype. The probabilities of any transition ending up in a relapse group and all transitions visiting that state of the multistate model are included for patients stratified by ER status (based on the full dataset) and for the IHC, PAM50, and IntClust subtypes (based on the molecular dataset). b, Predictions for an average individual from each subgroup. These probabilities are computed by selecting an average individual and predicting the trajectory between each state of the multistate model in the and corresponding dataset for the distinct subtypes. The probabilities for staying in relapse are omitted for clarity and can be computed as one minus the sum of moving to the rest of the states. Standard errors represent uncertainty in the individual predictions.

  7. Supplementary Table 5

    Clinical information for the full dataset.

  8. Supplementary Table 6

    Clinical information for the molecular dataset.

  9. Supplementary Table 7

    Clinical information for the recurrent-events dataset.

  10. Supplementary Table 8

    Description of clinical variables provided in Supplementary Tables 5–7 for the full, molecular and recurrent-events datasets.

About this article

Publication history

Received

Accepted

Published

Issue Date

DOI

https://doi.org/10.1038/s41586-019-1007-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.