TP53 status has generally been observed to be an independent prognostic factor among breast cancer cases1,2,3,4,5,6. However, recent studies suggest the prognostic effect is subtype-dependent, with conflicting reports regarding its prognostic performance7,8,9,10. It is important to understand the prognostic value of TP53 status within ER subtypes, given that TP53 and ER pathways play essential roles in breast cancer, and due to recent evidence of crosstalk between their signaling pathways11,12,13,14.

Inconsistent results across previous studies may have been due in part to technical differences. Most studies on TP53 and survival among breast cancer patients have classified TP53 status using either DNA sequencing or immunohistochemistry (IHC) to detect nuclear overexpression of TP53 protein as a surrogate marker of mutation status. IHC methods may misclassify some mutant tumors as wildtype, and both methods may miss some tumors with functional defects in the TP53 pathway4,5,6,7,8,9,10,11,12,13,14,15,16. In contrast, RNA methods detect patterns of loss or activity downstream in the TP53 signaling pathway. As such, RNA-based TP53 classification methods may reduce misclassification of functional status and clarify associations with survival outcomes. It is also important to address the role of TP53 in in diverse populations and across ER subtypes.

We have sought to address these gaps by evaluating the prognostic value of a validated, RNA-based signature of TP53 functional status (overall and within ER subtypes). Black women have higher rates of TP53 mutant-tumors15,16,17 and may have different mutation types17, and therefore, we used data from the Carolina Breast Cancer Study, which oversampled Black and younger women. We compared the prognostic effects of TP53 in this diverse population to those from another large, mostly European dataset.


The eligible population included 3213 and 1343 breast cancer cases in CBCS and METABRIC, respectively (Table 1, Supplementary Fig. 1). The number of events for each outcome in the two populations are provided in Supplementary Fig. 1. Because the populations differ substantially in the distribution of ER status (50 and 29% ER negative in CBCS and METABRIC, respectively), Table 1 is stratified by ER to facilitate comparisons. Compared to METABRIC, both ER-positive and -negative cases in CBCS were younger at diagnosis, with tumors diagnosed at a lower grade, and a lower proportion of node-positive tumors. As the METABRIC population is predominantly non-Black, the most comparable population is the non-Black subgroup in CBCS. The differences in clinical characteristics between the studies became more pronounced when comparing METABRIC to the non-Black population in CBCS.

Table 1 Patient and clinical characteristics, stratified by estrogen receptor (ER) status.

Breast cancer-specific survival patterns varied across TP53 subtypes. Kaplan Meier plots (Figs. 1, 2) and multivariable models (Tables 2 and 3, Fig. 3) showed that TP53 mutant/mutant-like tumors (by RNA-, IHC-, and DNA-based methods) were associated with worse BCSS compared to wildtype/wildtype-like tumors. The strongest associations were observed for RNA-based TP53 mutant-like status (HR [95% CI] of 7.21 [3.76–13.82] in CBCS and 3.96 [2.73–5.76] in METABRIC), with associations especially attenuated for IHC-based TP53 mutant-like status (1.51 [1.04, 2.21] and 2.24 [1.35, 3.70), respectively). The hazard of TP53 mutant-like status decreased over time, particularly for the RNA-based classification. For example, in CBCS the HR of 7.21 (3.76–13.82) reflects the survival effect of TP53 mutant-like status compared to wildtype-like at one year of follow up, which decreased over time (T = 0.81 [0.74–0.87]).

Fig. 1: Kaplan–Meier survival curves for breast cancer-specific survival by tumor subtype, overall and stratified by ER status, among node negative breast cancer cases in CBCS.
figure 1

p values correspond to the log-rank test. The shaded regions correspond to the 95% confidence interval. BCSS = breast cancer-specific survival, CBCS = Carolina Breast Cancer Study, ER = estrogen receptor, IHC = immunohistochemistry, ER = estrogen receptor, IHC = immunohistochemistry, PAM50 = Prediction Analysis of Microarray 50.

Fig. 2: Kaplan-Meier survival curves for breast cancer-specific survival by tumor subtype, overall and stratified by ER status, among node negative breast cancer cases in METABRIC.
figure 2

p values correspond to the log-rank test. The shaded regions correspond to the 95% confidence interval. BCSS = breast cancer-specific survival, ER = estrogen receptor, IHC = immunohistochemistry, METABRIC = Molecular Taxonomy of Breast Cancer International Consortium, PAM50 = Prediction Analysis of Microarray 50.

Table 2 Hazard ratio (95% confidence interval) for the association between tumor subtype and breast cancer-specific survival among breast cancer cases in CBCS Phases 1–2, overall and stratified by estrogen receptor (ER) status.
Table 3 Hazard ratio (95% confidence interval) for the association between tumor subtype and breast cancer-specific survival among breast cancer cases in METABRIC, overall and stratified by estrogen receptor (ER) status.
Fig. 3: Association between tumor subtype and breast cancer-specific survival among breast cancer cases in CBCS and METABRIC, overall and stratified by estrogen receptor (ER) status.
figure 3

The error bars correspond to the 95% confidence intervals. CBCS = Carolina Breast Cancer Study, ER = estrogen receptor, IHC = immunohistochemistry, METABRIC = Molecular Taxonomy of Breast Cancer International Consortium, PAM50 = Prediction Analysis of Microarray 50.

As 60% of TP53 mutant-like tumors were Basal-like in CBCS, it was of interest to also evaluate Basal-like vs. non-Basal-like subtypes to see whether the survival associations mirrored those for TP53. The Kaplan Meier plots and multivariable models showed that these markers have similar effects. For example, in CBCS the HR (95% CI) for Basal-like vs. non-Basal-like status was 3.37 (1.99–5.71). In multivariable models for both populations, the overall associations were recapitulated when restricting to ER-positive cases. When restricting to ER-negative cases, there were no statistically significant associations between tumor subtypes and BCSS, except in CBCS where the magnitude of association between RNA-based TP53 status and survival was similar among ER-positive and -negative cases (4.66 [1.79–12.15] and 5.38 [1.84–15.78], respectively). Sensitivity analyses restricting CBCS to non-Black cases resulted in no change among ER-positive cases and an increased magnitude among ER-negative cases.

TP53 status was also associated with overall survival, regardless of classification method. Kaplan Meier plots (Supplementary Figs. 2 and 3) only showed statistically significant associations with OS when using DNA-based TP53 classification (as well as RNA-based TP53 in METABRIC). When adjusting for other clinical and tumor characteristics (Supplementary Tables 2 and 3, Supplementary Fig. 4), however, statistically significant associations were observed between all subtype classifications and OS. In CBCS, the strongest associations were observed when using RNA-based TP53 classification, with a similar magnitude among ER-positive and -negative cases. In METABRIC, survival associations were only observed among ER-positive cases.

The association between TP53 status and recurrence-free survival varied by ER status. In both populations, Kaplan Meier plots (Supplementary Figs. 5 and 6) and multivariable models (Supplementary Tables 4 and 5, Supplementary Fig. 7) demonstrated that RNA-based TP53 mutant-like status was associated with worse RFS, but the effect was only observed among ER-positive cases. In CBCS, the association was stronger when using RNA-based TP53 status (6.21 [3.27–11.80]) than when using IHC-based TP53 status (2.16 [1.24–3.78]). In METABRIC, IHC-based TP53 was not associated with RFS.

RNA-based TP53 status provided more prognostic information than the other markers of interest (DNA- and IHC-based TP53, and Basal-like status) in both populations (Supplementary Table 6). Among ER positives, only RNA- and DNA-based TP53 status provided significant prognostic value, with RNA-based TP53 being the greatest contributor (Δχ2 [p value] = 10.5 [0.005] and 24.7 [<0.001] in CBCS and METABRIC, respectively). Among ER negatives, RNA-based TP53 was the only prognostic marker in CBCS (12.5 [0.002]), and Basal-like status the only prognostic marker in METABRIC (7.5 [0.023]).

It is of interest to understand whether the effects of TP53 status differ between Black and non-Black cases; however, the sample size in CBCS allowed only exploratory analysis of these associations. Among ER-positive cases there were no interactions between RNA- or DNA-based TP53 status and race (p = 0.96 and 0.78, respectively), but an interaction was observed by IHC-based TP53 status (p = 0.03). Specifically, the association between mutant-like status and poorer BCSS was more pronounced for non-Black cases compared to Black cases. Among ER-negative cases there were suggestions of interactions between RNA- and DNA-based TP53 and race (p = 0.18 and 0.12, respectively), with the association between TP53 mutant/mutant-like status and poorer BCSS being more pronounced for non-Black cases compared to Black cases. No interaction, however, was observed when using IHC-based TP53 status (p = 0.52).


RNA-based TP53 functional score had stronger prognostic value than other technical methods in a population-based cohort including Black and Non-Black women in North Carolina. The survival effect of TP53 mutant-like status was most consistent among ER-positive cases, but also showed significant effects among ER-negative cases in CBCS (where ER negatives were prevalent at 33%). Given the proportion of cases who had both TP53 mutant-like and Basal-like phenotypes, it was important to also evaluate the effects of TP53 among Basal-like vs. non-Basal-like. The BCSS associations for Basal-like and TP53 were similar, but more high-risk cases were captured with the TP53 status classification. TP53 is an important prognostic marker with potential clinical value and may be useful among ER-negative patients for whom prognostic markers are otherwise lacking.

Prior studies have evaluated the survival effects of IHC and DNA-based TP53 status among breast cancer patients, with near consensus that TP53 mutant cases have poorer survival compared to wildtype (Table 4)1,2,3,4,6,7,18,19. Very few studies, however, have assessed survival differences by ER status. Among those that have, TP53 mutant cases were generally associated with worse outcomes among ER-positive cases7,9,10,20, in line with our findings. However, results among ER-negatives have been more mixed, with some reporting TP53 mutant cases having better survival9, but most finding no effect7,10,21. It may seem paradoxical that the more aggressive tumors were sometimes found to have better outcomes, but several mechanisms have been proposed, largely indicating enhanced chemosensitivity in ER negative/TP53 mutant tumors. In the present study we found a strong association between TP53 mutant-like status and poorer BCSS among ER negatives, which may demonstrate the importance of functional TP53 status over other classification methods. Additionally, the present findings come from a population-based study, unlike all other previous studies.

Table 4 Previously published associations between TP53 status and overall survival, by estrogen receptor (ER) status and TP53 classification method.

Sampling differences between METABRIC and CBCS may explain differences in results among ER-negative cases. In METABRIC, the sample size of ER-negative cases was relatively small (n = 303) and among these, almost all (92%) were classified as TP53 mutant-like by the RNA signature. Whereas in CBCS, there was a larger sample of ER-negatives (n = 1067), which included a smaller proportion TP53 mutant-like cases (86%). CBCS ER-negative cases were also lower grade and more frequently node negative. Given that the METABRIC samples were sourced from tumor banks, it is plausible that this study oversampled more aggressive tumors, reducing variation of TP53 phenotypes. It is also possible that the more diverse CBCS population led to a different distribution of TP53 mutations (i.e., different types of mutations). Ethnically diverse population-based studies incorporating multigene signatures are important for understanding the diversity of ER negative cases. When population characteristics become a key consideration in interpreting differences across studies, it suggests that either selection bias or relevant variables that vary across populations have not been addressed. However, the current study does show that stratification by ER status is critical and should be included in future studies of TP53-based prognostication.

A strength of this analysis was the racially diverse population with more younger women, and a larger proportion of ER-negative cases. Previous studies of TP53 and prognosis have included populations that are exclusively, or nearly exclusively, of European descent. Another strength was availability of data on TP53 status using three different classification methods. Perhaps the most important limitation was that we did not model treatment differences, precluding the assessment of the predictive value of TP53. A lesser limitation was our choice to use the full dataset for each classification method, inhibiting comparability across methods; but sensitivity analysis in METABRIC among those with complete data for all three classification methods (n = 752) produced effect estimates that were unchanged or slightly stronger than those reported in the main analysis. Due to overlap of the TP53 mutant-like and Basal-like phenotypes, we evaluated survival effects of Basal-like vs. non-Basal-like, but we did not evaluate all possible comparisons (e.g., Basal-like versus each of the other individual PAM50 intrinsic subtypes) because even within relatively large data sets, sample sizes did not allow for further stratification. Lastly, this RNA-based TP53 signature has been widely used and validated for research purposes and is operationalized using cohort normalization. However, a single sample predictor has not yet been developed, so it cannot be applied to a single sample or small cohort without making important assumptions. If this signature continues to demonstrate clinical value, development of a single sample is warranted.

The science of prognostication and prediction has generally been led by applications for ER-positive cases and has relied on factors that reflect tumor growth (e.g., proliferation scores). A marker such as TP53, which represents underlying tumor biology and may define molecular vulnerabilities to chemotherapeutics22,23, could address an unmet need. Particularly as immunotherapies become widely utilized, markers that identify tumors likely to benefit will be important. Homologous recombination deficiency status has been proposed as one possible approach24,25, but TP53 status may also merit consideration. RNA-based TP53 may be particularly valuable because of its interpretability as a pathway-level change and because it can be conveniently paired with other RNA-based assays. Further consideration of multigene TP53 scores in clinical care could be particularly important for ER-negative cases, for whom fewer predictive biomarkers are currently available.


Study populations

The Carolina Breast Cancer Study (CBCS) is a population-based study that enrolled participants in three phases between 1993 and 2013. Study details have been described previously26. Briefly, incident invasive breast cancers among women 20–74 years of age were identified using rapid case ascertainment. Black women and those younger than 50 years of age were oversampled. Clinical characteristics at diagnosis were assessed by collecting medical records and formalin-fixed paraffin-embedded (FFPE) tumor samples at study enrollment. All CBCS study procedures were approved by the University of North Carolina School of Medicine Institutional Review Board and participants provided written informed consent.

We compared the results from CBCS to those from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), which includes fresh-frozen primary breast tumors collected from five tumor banks across UK and Canada between 1977 and 2005. Clinical and genomic data was downloaded from cbioportal ( About 93% of subjects were of European descent and the population ranges in age at diagnosis from 22 to 96 years. With an age distribution that skews older (median = 61 years), METABRIC includes a large proportion of ER-positive cases (77%).

Eligible cases were those diagnosed at stage I–III, with available data on TP53 status (Supplementary Fig. 1). In METABRIC, only cases with data on tumor characteristics (stage, grade, size, and node status) were included.

Breast tumor markers


ER status was abstracted from clinical records for Phases 1–2. When missing, ER status was determined by the UNC central laboratory. For Phase 3, ER status for all cases was determined by the central laboratory. Concordance between central laboratory and clinical record was 93%27. Methods for tissue processing and IHC analysis of tumor markers have been described previously17,27,28,29. ER positivity and TP53 mutant-like status was defined using a 10% positivity threshold. We selected the 10% cutoff for ER because at the time of enrollment for Phases 1–2, it was not yet the clinical standard to classify ER borderline tumors (1% to <10% positivity) as ER positive. Additionally, a 10% cutoff for ER positivity has been shown to have a stronger association with molecular phenotypes (e.g., intrinsic subtypes)27. Tumor stage and size were abstracted from the medical records. Tumor grade was defined by centralized pathology review.

RNA expression in CBCS has been quantified using NanoString assays on at least one FFPE tumor sample per patient, with random replication to assess reproducibility27,30,31. A previously validated RNA signature that aggregates expression information on TP53-dependent genes was used to classify TP53 functional status (mutant-like or wildtype-like) based on a similarity-to-centroid approach (Supplementary Table 1)32. A research version of the PAM50 predictor was used to classify tumors into intrinsic subtypes30,33, which were then dichotomized as basal-like or non-basal-like (i.e., luminal A, luminal B, HER2-enriched, or normal-like).

For cases in CBCS phase 1, two complementary DNA-based methods were employed for detecting TP53 mutations using FFPE tumor samples. First, single strand conformational polymorphism (SSCP) analysis was used as a screening procedure to detect mutations in exons 4–8 of the TP53 gene, with subsequent manual radiolabeled sequencing of SSCP positives34. The Roche p53 Amplichip research test was also used to detect single base pair substitutions and single base pair deletions in exons 2–11, as well as splice sites (2 base pairs before and after each exon), in the TP53 gene35. All assays were carried out by the UNC central laboratory.


ER status, as well as other tumor characteristics (tumor grade, stage, and size) were obtained from the medical records. RNA and DNA were extracted for transcriptional and genomic profiling on the Illumina Human v3 microarray and Affymetrix SNP 6.0 platforms, respectively36. Tumors were classified for TP53 functional status (mutant-like/wildtype-like) using the RNA-based TP53 signature32 and for PAM50 intrinsic subtype (basal-like/non-basal-like) using a research version of the PAM50 predictor30,33.

Outcome assessment

The follow-up period for both studies is defined as the number of years between diagnosis and breast cancer death (for breast cancer-specific survival (BCSS)) and death due to any cause (for overall survival (OS)). For CBCS Phases 1–2, vital status and date of death were determined by linking with the National Death Index (NDI) in 2020. Breast cancer deaths were defined using the International Classification of Diseases breast cancer codes 174.9 (ICD-9) or C50.9 (ICD-10) as derived from death certificates. For METABRIC, vital status and time to death were obtained from the medical records.

Recurrence-free survival (RFS) was defined as time in years from diagnosis to first subsequent recurrent breast cancer (either local, regional, or distant). In CBCS Phase 3, recurrence date was abstracted from medical records after a patient reported a recurrence during follow-up telephone interviews (occurring at regular intervals). In METABRIC, recurrences and time to recurrences were obtained from the medical records.

All subjects who did not experience the outcome of interest were administratively censored at their date of last contact or the last linkage date to the NDI (for CBCS).

Statistical analyses

Kaplan-Meier plots were generated to compare survival patterns between TP53 subtypes defined using different classification methods (RNA signature, DNA sequencing, and IHC). Because of the overlap in TP53 mutant status and Basal-like intrinsic subtype, we also evaluated survival patterns by PAM50 intrinsic subtype (Basal-like/non-Basal-like) to determine whether the effects mirrored those for TP53. Survival patterns were assessed overall and within ER subtypes. Differences between the curves were evaluated using log-rank tests. Kaplan-Meier plots were restricted to node negative cases, while in multivariable models we retained these cases and included node status as an adjustment factor.

The prognostic value of the TP53 subtypes was evaluated using Cox proportional hazards models to compute hazard ratios (HRs) and 95% confidence intervals (CIs), overall and stratified by ER status, analyzing each TP53 classification method (RNA-, IHC, and DNA-based) separately. Again, we estimated survival effects for PAM50 intrinsic subtype (Basal-like vs. non-Basal-like) to assess whether they mirrored those for TP53. Minimally adjusted models accounted for age at diagnosis (as well as race and study phase in CBCS). Fully adjusted models additionally accounted for tumor stage, grade, size, and node status. Since tumor grade was missing for about 26% of cases in CBCS, covariates with missing values were addressed using the multiple imputation plus outcome approach37. TP53 status and PAM50 subtype were modeled with addition of a time-varying term (T) due to the observed violation of the proportionality assumption of the Cox model. The direction and magnitude of the change in HR over time is indicated by the log of this coefficient (i.e., log(T) < 1 indicates a decreasing hazard and log(T) > 1 indicates an increasing hazard). We estimated the prognostic value of each TP53 classification method as the change in likelihood ratio chi square (Δχ2) following a likelihood ratio test that compares the full prognostic model to a model after removing each TP53 classification schema. All statistical tests were two-sided and p value < 0.05 was used as the cut point for statistical significance. Statistical analyses were conducted in R software version 4.0.2 (R Foundation for Statistical Computing).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.