Introduction

The triple-negative breast cancer (TNBC) is the most aggressive subtype of breast tumours due to limited therapeutic options using targeted drugs and is biologically characterized by absence of expression of oestrogen receptor (ER), progesterone receptor (PR), and HER2 receptor.1 According to its molecular characteristics, TNBC could be classified the in six different subtypes.2

Race/ethnicity is a factor related to TNBC incidence, as these tumours are more frequent in African-Americans (21%) than in Caucasians (9–15%).3 TNBC incidence has been described in several Latin America countries with a 21.3% frequency in Peru, 23.1% in Mexico, 24.6% in Venezuela and 27% in Brazil.47 Considering breast cancer incidence in Peru (28/100,000 women), 850 out of ≈4,000 cases diagnosed each year would correspond to TNBC.4,8

The pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) is the best predictor of distant recurrence-free survival (DRFS) and overall survival; however, 3 years after surgery, the risk of recurrence and death is similar than other breast cancer subtypes, indicating that TNBC is composed by a mix of tumours with different prognosis.9 This observation could be explained by the molecular heterogeneity of TNBC, composed by subtypes with different clinical outcomes, where the basal type 1 achieve the highest pCR rates (52%), while the luminal-androgen receptor and the basal-like 2 subtypes achieve lower response rates (10% and 0%, respectively).10

Nowadays, there are commercially avalaible tests based on levels of gene expression; such as, the 21-genes recurrence score (OncotypeDx) and the 70-genes signature (Mammaprint). Despite the clinical utility, these tests have certain limitations. The 21-genes score (16 cancer-related and 5 control genes) calculates a recurrence score between 0 and 100, where the risk of recurrence increase with the score and estimate the likelihood of benefit with the adjuvant chemotherapy. This test is recommended by NCCN guidelines only in patients diagnosed with pT1–pT3 and pN0 and pN1mi ER+/HER2− breast tumours.1113 On the other hand, the 70-genes signature calculates a score to assign a risk group (high or low risk) and estimate the probability of 10 years recurrence from diagnosis.14 The 70-genes score (although not recommended by NCCN guidelines) has FDA approval for luminal patients with stage I or II, with 3 lymph nodes involved and invasive tumours <5 cm. Despite these molecular platforms have shown clinical utility, these genomic predictors are not useful in TNBC and new markers and predictors are needed in order to a improve the risk stratifications and therapeutic strategies.

The aim of our study was to evaluate genes related to TNBC aggressiveness to elaborate a gene signature associated with prognosis in terms of DRFS in TNBC resistant to NAC.

Results

Selection of genes related with DRFS

Workflow is shown in Figure 1. Overall, median DRFS in the discovery set was 22.3 months. Univariate Cox regression of 449 genes related to aggressiveness signatures identified 7 genes statistically related to DRFS (P<0.05): CCL5, CYBB, DDIT4, GTPBP4, KRT6B, PALMD and POLR1C.

Figure 1
figure 1

Overview of research design and workflow.

Three-genes prognostic score

Stepwise multivariate survival analysis performed by the Cox proportional hazards model of the seven genes from the previous step, selected three genes as independent prognostic factors: CCL5 (Chemokine (CC motif) ligand 5), DDIT4 (DNA-damage-inducible genes transcript 4) and POLR1C (Polymerase (RNA] I polypeptide C, 30 Kd1a), with P-values of 0.002, 0.005 and 0.004, respectively. CCL5 had protective effect (hazard ratio (HR)<1), whereas DDIT4 and POLR1C were associated to a worse outcome (HR>1; Table 1).

Table 1 Genes selected as independent prognostic factors in the stepwise multivariate survival analysis performed by the Cox proportional hazards model in the discovery set

The three-genes prognostic score results from the sum of multiplication of normalised expression levels of the genes by their respective regression coefficient as is described in the following formula:

Three - genes prognostic score = ( 0 . 393 × C C L 5 + 0 . 443 × D D I T 4 + 0 . 490 × P O L R 1 C )

The median score in the discovery set was 0.1494. Using this median as cutoff, was possible to establish two risk groups.

Individual value of CCL5, DDIT4 and POLR1C in TNBC

We evaluated CCL5, DDIT4 and POLR1C in the Kaplan–Meier Plotter online platform (kmplot.org) in order to analyse their influence in the recurrence-free survival (RFS) of TNBC patients.15 When group of patients was split into two groups according to median of expression, CCL5 and DDIT4 were associated with RFS (P=0.0012 and P=0.00034, respectively). POLR1C was not statistically associated with RFS (P=0.28); however, when systematically untreated patients were removed from the cohort, high expression of POLR1C was associated with a poor prognosis (P=0.0059; Figure 2).

Figure 2
figure 2

Individual impact of CCL5, DDIT4 and POLR1C genes recurrence-free survival (RFS) in triple-negative breast cancer patients (using the median of expression as cutoff) in the database of the online platform kmplot.org. (a) High expression of CCL5 was associated with good prognosis (P=0.0012). (b) High expression of DDIT4 confers poor prognosis (P=0.00034). (c) Expression of POLR1C was not significantly associated with the RFS in all patients (P=0.28); however, (d) excluding systematically untreated patients, a significant association is observed (0.0059).

Prognostic value of the three-genes signature

Evaluating the score as a continuous variable in the CoxPH analysis, a HR=2.72 for each unit change (P<0.001; 95% CI: 1.72–4.28) was estimated. The risk groups established by the median cutoff shown statistically significant differences in the DRFS, with a median survival of 39.6 months for the low-risk group and 15.5 months for the high-risk group (P<0.001; Figure 3a).

Figure 3
figure 3

Three-genes score dichotomised at the median as prognostic factor in terms of DRFS and MFS. (a) Survival curves of 82 TNBC in the discovery set (P<0.001); (b) 113 patients in the validation set (P=0.002); (c) in 107 TNBC patients regardless their response to neoadjuvant chemotherapy and (P=0.001) and (d) 47 patients with ER(−) and HER2 non-amplified breast tumours (P=0.041).

Validation cohort

Validation set (GSE25066)

The median DRFS in this group of patients was 35.4 months. In this group, the median of the score was 0.0024 and was able to identify two subgroups with different outcomes where the median DRFS for the low-risk group was not reached and the median DRFS for the high-risk group was 22 months (P=0.002) (Figure 3b).

Analysis of other data sets

Patients with TNBC regardless of their response to NAC (GSE58812)

The end point evaluated in this data set was metastases-free survival (MFS). The median score was −0.1009. The median of MFS was not reached and 4-year MFS rates were 83.5% vs 57.3% for low-risk vs high-risk groups according the three-genes signature (P=0.001; Figure 3c).

Patients with oestrogen negative tumours (GSE16446)

The median DRFS was 61.1 months. The median score was 0.0137 and was able to classify into two subgroups with different prognosis, where the median DFRS for the low-risk group was not reached and the median DRFS for the high-risk group was 47.1 months (P=0.041; Figure 3d).

Multivariate analysis

In the discovery set, in addition to three-genes prognostic signature, age, lymph node status were statistically related to the DFRS in the univariate analysis. In the multivariate analysis, only >3 involved nodes (HR=3.98, 95% CI: 1.73–9.12) and a poor three-genes prognostic signature (HR=2.03, 95% CI: 1.02–4.05) were independent factors associated with shorter DFRS (Table 2). In the validation set, in addition to the three-genes prognostic signature, the T-stage, nodal stage and clinical AJCC stage were statistically related to DFRS. In the multivariate analysis, T3–4 stage (HR=1.878; 95% CI: 1.054–3.346), nodal stage 2–3 (HR=2.78, 95% CI: 1.271–6.079) and the high-score group (HR=2.358, 95% CI: 1.359–4.091) were independent poor prognostic factors (Table 3).

Table 2 Univariate and multivariate analysis of patientś and tumour characteristics related to DRFS in the discovery set
Table 3 Univariate and multivariate analysis of patientś and tumour characteristics related to DRFS in the validation set 1 (GSE25066)

Discussion

Triple-negative breast cancer is a classification obtained by exclusion criteria rather than representing a well-defined entity such as other breast cancer subtypes.16 The pCR after NAC is the main indicator for good prognosis. In Peru, a study by Neciosup et al.,17 reported pCR rates of 9%, lower frequency than those reported in other series for TNBC (≈22% and up to 36% with addition of bevacizumab in the neoadjuvant setting).10,17,18

In the last decade, there has been growing interest to unveil the molecular biology of TNBC leading to identification of six TNBC subtypes with distinct biology and prognosis, where our research group could identify loss of DUSP4 expression as mechanism of resistance to NAC and 90% of TNBC have an actionable mutation candidates to be treated with targeted drugs.19,20

In our study we found three genes independently related with the outcome in several TNBC data sets. In addition, these genes can be combined in a linear score. Unfortunately, there is unavailability of certain clinical data in public data sets, including time-to-event data, leading to an important limitation in its use. Lack of important information in public data sets despite journal requirements has been previously addressed.21 To overcome this issue, we selected patients with residual disease from data sets profiled with a different platform (affymetrix microarrays) and with samples taken prior NAC for our validation cohort. Although it is expected that gene expression patterns could change after NAC, our three-genes signature was able to identify groups with different prognosis evaluating samples profiled before NAC. On the other hand, CCL5, DDIT4 and POLR1C in samples profiled after treatment in large data sets of TNBC patients (evaluated in the online platform kmplot.org) shown a significant association of gene expression levels with the RFS (Figure 2).

The gene CCL5 is located on 17q, and is part of a superfamily of secreted proteins involved in immunoregulatory and inflammatory processes. TCGA (The Cancer Genome Atlas) data show that 4% of breast tumours have CCL5 dysregulation (2% in basal tumours), mainly related to genetic downregulation which has been previously associated with breast cancer progression.22 CCL5 expressed by tumours recruits tumour infiltrant lymphocytes (TILs). TILs has an important role in the outcome of TNBC where >20% of TILs are associated with an better prognosis, while lower proportion of TILs (<10%) is associated with genetic or transcriptomic alterations in Ras/MAPK pathway.23 On the other hand, CCL5 was shown to be a mechanism of tumour escape in a mouse model of colorectal cancer, recruiting and improving the cytotoxic effects of T-regulatory cells against CD8+ T cells.24

The DDIT4 gene (DNA-damage-inducible transcript 4) encodes a protein related to adverse environmental conditions, whose action is the inhibition of mTOR.25 Despite the biological function of DDIT4, in our analysis this gene was related with tumour aggressiveness with an HR=1.56 (P=0.005) by each unit of change (Table 1). A recent report by Puissant et al.,26 describe that the product of DDIT4 could be cleaved by caspase 3 modifying its cellular function exerting anti-proliferative activities.26 TCGA data show that dysregulation of this gene in 4% of breast tumours, mainly overexpression (altered in 12% of tumours basal subtype).

The POLR1C gene (30 kDa), encodes a subunit of the RNA polymerase enzymes I and III, responsible for RNA synthesis.27 The mutation in this gene causes Treacher Collins syndrome that affects the development of bones and facial tissues.28 POLR1C gene is recurrently amplified and overexpressed in gastric cancer.29 TCGA data indicate that 11% of breast cancer cases have dysregulations in this gene (overexpression and downregulation). This gene is dysregulated in 21% of cases of basal tumours, where changes include overexpression. The molecular mechanisms of its involvement in tumour aggressiveness remain unclear.

In spite that this cohort of TNBC patients was evaluated previously and other genes were found to be related significantly with the outcome, such as a signature of MEK pathway activation and DUSP4 loss (corroborated in vivo and in vitro experiments), in the analysis we done in this work found other genes not previously reported.19,30 It is important to explore biomarkers under different strategies, because weak biological signals or interaction between markers could be ignored in some bioinformatics or statistical analysis, for example, when is used as multiple testing procedures.3133

Owing to the TNBC heterogeneity, some cases of this phenotype could correspond to luminal A, luminal B or HER2-enriched subtypes. In TNBC, the PAM50 test has a particular utility. PAM50 evaluates 50 genes required for determining intrinsic molecular subtypes of breast cancer.34 Because immunohistochemistry has limited ability to detect protein expression, PAM50 becomes a test that can identify more accurately the molecular subtypes within the TNBC phenotype, providing additional information for deciding a better therapeutic strategy.35,36

Our prognostic signature, unlike other commercially available platforms, is based only in three-genes expression. This finding could lead to the development of a cheap molecular test based in reverse transcription PCR (RT-PCR) and suitable in the clinical routine. The different median values of the score obtained in each data set could be explained by the different prognostic factor between them and/or differences between microarray platforms. On the another hand, our three-genes prognostic score produced robust results and could be evaluated in both biopsies and residual tumours after NAC.

In conclusion, our analysis of 449 genes related to aggressive molecular signatures identified three genes (CCL5, DDIT4 and POLR1C) that were independent prognostic factors whose combination resulted in a predictor able to identify TNBC patients with different outcome. These data encourage the prospective clinical validation of these genes using the technology of real-time PCR. The products of CCL5, DDIT4 and POLR1C genes could be used for develop new therapeutic strategies in TNBC and basal breast cancer.

Materials and methods

Discovery set

We evaluated 114 patients with TNBC who had residual tumour after NAC (diagnosed and treated at the Instituto Nacional de Enfermedades Neoplásicas, Lima, Peru). Cases with insufficient tissue or cellularity (n=5) or HER2-amplified (n=7) were excluded. In total, 82 cases were evaluable for analysis, Formalin-fixed paraffin-embedded residual tumour from surgical specimens were serially cut into 3- to 5-μm thick, then cellularity evaluation and nucleic acid extraction was done. RNA was extracted and purified using the RNEasy FFPE kits (Qiagen GmbH, Hilden, Germany).

Genes selected for evaluation

Overall, evaluation of 449 transcripts were selected based on their inclusion in published gene expression signatures: the PAM50 genes, signatures associated with DUSP4 loss, with MEK activation, with the enrichment of TGFβ inducible genes after NAC or signatures based on their association with the post-NAC Ki-67 score.19,34,37,38

NanoString nCounter analysis

NanoString nCounter technique captures and counts individual messenger RNA transcripts using unique pairs of capture and reporter probes with a colour code generated by ordered fluorescent segments specific for each transcript. As enzymatic reactions are not used, there is no bias or decrease of sensitivity compared with other techniques, such as microarrays.39 RNA extracted from formalin-fixed paraffin-embedded residual tumour was run on the nCounter Analysis System (NanoString Technologies, Seattle, WA, USA), according to the manufacturer’s protocol. Code sets were synthesised targeting 499 genes.

Validation set

GSE25066

We selected 113 TNBC (determined by immunohistochemistry) with residual disease after NAC. Gene profiling was done with U133A Affymetrix microarray platform (Affymetrix U133A chip, Santa Clara, CA, USA) in biopsies before NAC. List microarray probes evaluated in this data set are shown in Supplementary Table S1.

Analysis of other data sets

GSE58812

Composed by 107 TNBC patients regardless their response to neoadjuvant treatment (these data were not included in the data set. U133A Affymetrix microarray platform in biopsies before NAC. List of microarray probes evaluated in this data set are shown in Supplementary Table S1.

GSE16446

All patients in this data set were ER negative. We selected 47 patients with HER2 not-amplified tumours. Gene profiling with the U133 Plus 2.0 Affymetrix array was done in biopsies before NAC. List of microarray probes evaluated in this data set are shown in Supplementary Table S2.

Gene expression data preprocessing

In the discovery set, expression levels were normalised with Spike-controls, log2 transformed and median centred. In the data sets GSE25066, GSE58812 and GSE16446, probes for the same gene were collapsed to the higher value, then the expression data were log2 transformed and median centred. Values of normalised data values are shown in Supplementary Tables S3–S6.

Prognostic signature

The expression of 449 genes was evaluated to select genes strongly associated to DRFS (P<0.05) using Cox regression models. Genes significantly associated with DFRS were further tested using Cox proportional hazards regression with the stepwise method of selection, identifying those genes that were independent prognostic factors. We used a linear combination of the normalised values of gene expression levels multiplied by a weighting value for each gene (regression coefficients) to calculate a risk score for each patient. The proportional hazards assumption over time for the final Cox model for the dichotomised risk score was tested graphically using log–log survival functions, and the assumption of appropriateness in discovery and validation sets was confirmed (Supplementary Figure S1).

Survival analysis

DRFS was estimated by the Kaplan–Meier method. The log-rank test was used as the method of statistical inference. After calculating the risk score, the median was estimated. Using the median (specific to each group of patients) as a cutoff, two risk subgroups were established. Survival curves were compared using the log-rank test. A P value <0.05 was considered statistically significant.

Ethical considerations

This study involves a reanalysis of gene expression and clinical data obtained in a previous study that was approved by the IRB from the Instituto Nacional de Enfermedades Neoplasicas (INEN 10–018).