Background

Breast cancer (BC) is a heterogeneous disease with variations in morphological features, molecular profiles, and therapy responses [1]. Triple negative breast cancer (TNBC), defined by the absence of expression of Estrogen Receptor, Progesterone Receptor, and Human Epidermal Growth Factor 2, comprises 15–30% of BC, and presents considerable challenges with regard to clinical management due to lack of targeted therapies [2, 3]. Moreover, TNBC often has an unfavorable prognosis with increased probability of early metastasis, disease recurrence, and shorter overall survival [4, 5]. Although TNBC generally displays aggressive behavior, patient outcomes can vary considerably. Around 23% of early-diagnosed TNBC patients remain disease free for more than 5 years while death within 5 years of diagnosis is inevitable for almost all metastatic TNBC patients [6,7,8]. Therefore, the complexity, molecular variability, and unpredictability of TNBC behavior warrants further investigation [9]. The biological heterogeneity of TNBCs has provided an impetus to develop tools for prognostic stratification, however, there are inconsistent results owing to a small cohort of patients, gene expression datasets obtained from different gene expression platforms, and the use of microarray versus quantitative reverse transcriptase polymerase chain reaction, which also makes head-to-head comparison challenging [10, 11].

Various multigene prognostic tests are available for estrogen receptor-positive tumors for patient risk stratification and to guide therapy choice, whereas in estrogen receptor -negative tumors, and specifically TNBC tumors with a higher proliferation rate, these multigene signatures provide no clinical value [12]. Lehmann et al. used gene expression profiles to classify TNBCs into six molecular subtypes: Basal-like 1 and 2, Mesenchymal, Mesenchymal Stem-like, Immunomodulatory, and Luminal Androgen Receptor [13]. Burstein et al. proposed an alternative gene expression classification for TNBC categorizing the tumor into four TNBC molecular subtypes: Luminal Androgen Receptor, Mesenchymal, Basal like immune suppressed, and Basal like immune activated [14]. However, distant metastasis-free survival (DMFS) analysis showed poor prognosis for TNBCs regardless of their molecular profile subtype [15]. Therefore, there is an urgent unmet need for clinically validated prognostic markers that can predict outcomes for TNBC patients [15].

Unbiased omics technologies, including Next Generation Sequencing (NGS), are expected to lead a paradigm shift for precision medicine from a pathological microscopy-based diagnosis to gene signature-based diagnosis, prognosis, and treatment approaches [16]. NGS enables transcriptomic profiling of TNBC and identification of genomic alterations such as copy number changes, insertions, deletions, and mutations; consequently, studies exploring inter-tumor heterogeneity in different types of tumors are now possible [17, 18].

For successful NGS analysis, clinical samples must be maintained in conditions that would allow for DNA and RNA preservation and subsequent extraction. At present, most clinical samples are processed and archived as formalin-fixed, paraffin-embedded (FFPE) tissue samples in which the DNA and RNA necessary for NGS analysis is often fragmented [19]. However, FFPE tissue samples, if processed and stored properly, have been shown to preserve sufficient DNA and RNA material for extraction for NGS analysis [20]. The present study utilizes NGS transcriptomic analysis of a large cohort of TNBC FFPE tissue samples and aims to identify a molecular prognostic signature predicting risk for poor outcomes in TNBC.

Methods

Nottingham TNBC cohort

A retrospective well-characterized series of primary invasive TNBC (n = 333) samples obtained from patients presented to Nottingham City Hospital, UK between 1987 and 2006, was included in this study. Clinicopathological data, including patient age at diagnosis, tumor size, tumor grade, nodal stage, lymphovascular invasion (LVI), and Nottingham Prognostic Index were collected from patients’ medical records. The mean patient age was 48 years (range 27–69) and tumor sizes in diameter at the time of presentation ranged from 0.25–8.00 cm (1.5–2.8 cm within the interquartile), with a mean tumor size of 2.2 cm. Patients received a combination of treatment options including: surgery, radiation, and chemotherapy according to standard protocols [21]. Outcome data including BC-specific survival (BCSS) and DMFS were available and prospectively maintained. BCSS was defined as the time (in months) from the primary surgical treatment to the time of death from BC, while DMFS was defined as the duration (in months) from the time of primary surgery to the first occurrence of distant metastasis. Estrogen Receptor, Progesterone Receptor, and Human Epidermal Growth Factor 2 status of primary tumors were determined at the time of primary diagnosis from full-face sections of resected tumors according to published guidelines [22], (See Supplementary (A) for full details).

Transcriptomic analysis

RNA sequencing was performed on representative FFPE tissue of an in house TNBC cohort (n = 112), which had also been assessed histopathologically for tumor burden (See Supplementary (A) for full details). Artificial Neural Network (ANN) database mining approach was used to build a classifier using the RNA-sequence matrices and identify genes associated with disease outcomes (DMFS and BCSS). In ANN, learning rates and momentum were set at 0.1 and 0.5, respectively [23]. Each tumor sample had 39,684 corresponding genes. The input codes were “0” if patients showed neither evidence of metastasis (DMFS) nor death from BC (BCSS) within 5 years, and “1” if metastasis or death due to BC was evident in the first 5 years after diagnosis. Although BCSS is the ultimate endpoint of cancer outcome, DMFS was chosen as an end point based on the high likelihood of TNBC patients being diagnosed with distant metastases within 5 years of diagnosis [8]. Prior to ANN testing, a Monte Carlo cross-validation procedure was applied to avoid data over-fitting and false discovery. Documentation of such approach has proven to outperform the commonly used leave-one-out cross validation [24]. The input data were randomly divided into three subsets; 60% for training, 20% for validation to ensure model performance during the training process, and 20% for blind testing of the original model [25]. Genes identification by the forward stepwise approach using ANN was performed as described previously [26]. Based upon the distribution of performance on aforementioned model, ANN generated two panels of genes, representing the top 1% of the RNA sequence matrices that significantly predicted DMFS and BCSS, respectively. Genes common to both the DMFS and BCSS panels were identified using the Venny 2.0 online tool [27]. Receiver operating characteristics curves were generated to assess the predictive value of the differentially expressed gene panel presenting the sensitivity and specificity of the tested model (Supplementary (B) Fig. 1).

Pathway analysis

The online publicly available web-based gene set analysis tool, Webgestalt, (http://www.webgestalt.org/option.php) was used to identify differentially regulated canonical pathways using the overrepresentation enrichment analysis. The pathway analysis was based on the top 200 ranked genes predicting DMFS and BCSS. The reference gene list was set to the “genome_protein_coding”. The ratio of observed versus expected number of genes in the category was recorded for each significant category using the enrichment ratio (R) scores using Panther pathway database [28].

Prognostic gene signature score

In compliance with the Reporting Recommendations for Tumor Marker Prognostic Studies criteria (REMARK), the associations between the expression of genes in our 21-gene panel, common to both the DMFS and BCSS gene prediction panels identified by ANN, and DMFS or BCSS were evaluated both individually, as well as after adjusting for standard prognostic variables [29, 30]. Thus, DMFS and BCSS probabilities were individually computed on our gene panel using the Kaplan–Meier testing model. In addition, multivariate Cox regression analysis was used to calculate the estimate effect size [i.e., Hazard ratio (HR), along with 95% confidence interval (CI)] of the genes that were statistically significant in univariate Kaplan–Meier testing model for both DMFS and BCSS, which included the genes and standard prognostic variables, regardless of the statistical significance of standard prognostic variables in univariate analysis. The genes which showed significant prognostic impact independently in multivariate Cox regression analysis were further examined in a combined multivariate Cox regression analysis to identify a signature with a minimum number of genes that showed the most significant association with DMFS and BCSS.

External validation of transcriptomic data

For independent validation of the results, the prognostic value of the two-gene signature predictors of DMFS and BCSS were evaluated using the Breast Cancer Gene-Expression Miner v4. 0 (Bc-GenExMiner) database which includes RNA-sequence expression data from 4713 BC patients, including 254 TNBC patients [31]. These genes were also interrogated through the Genotype 2 outcome tool (http://www.g-2-o.com), a web-based server utilizing NGS and gene chip data of 6697 BC patients including 612 TNBC patients with outcome data. Computed receiver operating characteristics values were used to generate the transcriptomic fingerprint for mutational status from The Cancer Genome Atlas RNA-sequence and NGS mutation data. The average expression of significant genes was designated as a metagene for a given genotype. By employing gene chip data, associations between the expression of the metagene and patient outcomes were computed by multivariate Cox regression and Kaplan–Meier survival analysis [32].

Immunohistochemistry (IHC)

Assessment of the protein expression of the identified two-gene prognostic signature was performed using rabbit anti-SPDYC (NBP1-80832, lot # R36476, Novous Biological, UK) and rabbit anti-ACSM4 (PA5-62082, lot # R59771, Thermofisher, UK) antibodies on tissue microarrays prepared for the IHC cohort, (See Supplementary (A) for full details).

Statistical analysis

IBM SPSS 24.0 (Chicago, IL, USA) software was used for statistical analysis. For dichotomization of mRNA expression and protein expression levels of different genes, the X-tile bioinformatics version 3.6.1 (Yale University, USA) was utilized with DMFS as an endpoint. Cox proportional hazard models were used for multivariate analysis model adjusting for patients age, tumor grade, nodal stage, tumor size, and LVI status as covariates to adjust for potential confounding influence of these variables on associations between the tested genes and the outcomes of interest. Spearman’s Rho test was used to evaluate correlations between continuous variables of the transcriptomic and protein expression data whereas the chi-square test was performed to analyze relationships between categorical variables. A p value of <0.05 was deemed significant (See Supplementary (A) for full details).

Results

Gene selection

To build a classifier panel for outcome prediction in TNBC, ANN analysis of the RNA-sequence matrices data of the transcriptomic cohort was performed and genes were ranked based on relationships between their expression and clinical outcomes in terms of DMFS and BCSS. The top ranked genes predicting DMFS (DMFS genes panel) and those predicting BCSS (BCSS genes panel) were investigated to determine the most statistically enriched pathways (Supplementary (A) Table 2 & Supplementary (C) for full details).

Using the Venny tool, we identified a total of 21 genes that were common to both the DMFS and BCSS ANN panels. The 21-gene panel predicted patients’ DMFS and BCSS with 92% sensitivity and 94% specificity (Supplementary (B) Fig. 2). The probability of finding a gene by random chance in the top 200 was 0.03, whereas the probability of randomly finding the 21 genes collectively was 6.2 × 10−33, (Supplementary (B) Fig. 3).

Univariate Kaplan–Meier survival analysis showed that elevated expression of some genes was significantly associated with shorter DMFS and BCSS, whereas elevated expressions of other genes showed statistically significant association with longer DMFS and BCSS (Supplementary (A) Table 3 & Supplementary (B) Fig. 4a–d). Multivariate Cox regression analysis models incorporating patient’s age, tumor grade, nodal stage, tumor size, and LVI status revealed that eight of the 21 genes were independent predictors of DMFS and BCSS, (Supplementary (A) Table 4 A–D).

Prognostic two-gene signature

The prognostic gene signature was identified after statistically distilling the eight genes in a multivariate Cox regression analysis to identify a signature with a minimum number of genes that show most significant association with BCSS and DMFS. The analysis revealed two genes ACSM4 and SPDYC that most significantly and independently predicted both DMFS and BCSS (ACSM4; DMFS: p = 0.015, 95% CI = 1.21–6.13, HR = 2.72: BCSS: p = 0.004, 95% CI = 1.44–6.83, HR = 3.14), and (SPDYC; DMFS: p = 0.012, 95% CI = 1.23–5.45, HR = 2.59: BCSS: p = 0.016, 95% CI = 1.18–5.09, HR = 2.45) (Supplementary (A) Table 5). There was no linear association between the mRNA expression of ACSM4 and SPDYC. To investigate the prognostic value of the two-gene signature, a linear prognostic score was generated using the sum of the product of normalized expression levels of these two genes and their respective regression coefficients, as follows:

The prognostic two-gene signature score  = (ACSM4 normalized expression * ACSM4 expression β-value) + (SPDYC normalized expression * SPDYC expression β-value) (Table 1).

Table 1 Multivariate Cox regression analysis to generate a prognostic score for the two gene signature predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (Transcriptomic, n = 112).

Using X-tile cut-off generator, patients with higher mRNA expression score of the prognostic two-gene signature had worse outcome in terms of shorter DMFS and BCSS when compared with those with lower mRNA expression score (Fig. 1). Cox regression analysis confirmed that the prognostic two-gene signature harbors significant prognostic value in terms of predicting shorter DMFS and BCSS independent of patient age, tumor grade, nodal stage, tumor size, and LVI status (Table 2).

Fig. 1: Univariate Kaplan–Meier survival analysis of the prognostic two gene signature predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (Transcriptomic cohort, n = 112).
figure 1

Univariate Kaplan–Meier survival analyses to test associations between prognostic two gene signature at the transcriptomic level and clinical outcomes (Significant P values are bolded HR Hazard ratio).

Table 2 Multivariate Cox regression analysis for the two gene signature predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (Transcriptomic Cohort, n = 112).

External validation of genomic findings

Using the Bc-GenExMiner tool to analyze publicly available RNA-sequencing data, we observed that higher expression of SPDYC was significantly associated with worse prognosis in the whole/unselective cohorts of BC (n = 4308, p < 0.0001) [31]. Validating genes expressions on the restricted TNBC cohort (n = 254), revealed a similar trend of poor prognosis (p = 0.006) [31]. Moreover, the integration of our proposed prognostic two-gene signature in the public domain Genotype 2 outcome, using the median of each gene expression in the whole/unselective cohorts of BC (n = 4029), indicated that higher expression of ACSM4 and SPDYC were associated with worse prognosis (both p < 0.001). More importantly in the context of this study, the prognostic value of the two-gene signature (ACSM4 and SPDYC) were significantly associated with poorer outcome when examined in the TNBC subtype cohort alone (n = 612, p < 0.001) [32] (Fig. 2).

Fig. 2: Univariate Kaplan–Meier survival analysis of our proposed combinatorial two-gene signature predicting overall survival (public domain datasets).
figure 2

To validate our findings, we utilized the Breast Cancer Gene-Expression Miner v4.0 (bc-GenExMiner v4.0) datasets which includes 5861 breast cancer patients & Genotype 2 outcome public portal, A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6697 breast cancer patients. a In the Breast Cancer Gene-Expression Miner data portal, high SPDYC mRNA expression confers a poor prognosis in the whole (i.e., unselected cohorts) of Breast cancer patients (n = 4308, p value < 0.0001). b In the Breast Cancer Gene-Expression Miner data portal, high SPDYC mRNA expression confers poor prognosis in the Triple Negative Breast Cancer patients (n = 254, p value = 0.006). c In the Genotype 2 outcome public portal, high ACSM4 mRNA expression confers a poor prognosis outcome in the whole (i.e., unselected cohorts) of Breast cancer patients (n = 4029, p value < 0.0001). d In the Genotype 2 outcome public portal, high SPDYC mRNA expression confers a poor prognosis outcome in the whole (i.e., unselected cohorts) of Breast cancer patients (n = 4029, p value < 0.0001). e In the Genotype 2 outcome public portal, high SPDYC & ACSM4 mRNA expression confers a poor prognosis outcome in Triple Negative Breast Cancer patients (n = 612, p value < 0.0001). ** The data portal used to obtain the Kaplan–Meier plot integrates the somatic mutations in the gene and computes the combined transcriptional fingerprint of the mutation(s) using Receiver operating characteristics analysis of breast cancer RNA-seq data and uses the top up and down metagenes to estimate patient survival using Cox regression analysis on gene chip data. An important element is estimation of the transcriptional signature for each somatic mutation, which is carried out by Receiver operating characteristics analysis on the mutation and RNA-seq data.

Immunohistochemistry of the prognostic two-gene signature

The morphological assessment of the tissue samples revealed cytoplasmic expression for both proteins; ACSM4 (H-score range 5–295) and SPDYC (H-score range 5–290) (Supplementary (B) Fig. 5).

Univariate survival analysis revealed that higher expression of ACSM4 and SPDYC was significantly associated with patients’ poor outcomes (DMFS; p < 0.001, BCSS; p = 0.009 for ACSM4) and (DMFS and BCSS, both p = 0.004 for SPDYC) (Fig. 3), which is concordant with the findings obtained from transcriptomic data.

Fig. 3: Univariate Kaplan–Meier survival analysis for ACSM4 and SPDYC protein expression for association with Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).
figure 3

Univariate Kaplan–Meier survival analyses to test associations between the ACSM4 and SPDYC protein expression and clinical outcomes (Significant P values are bolded HR Hazard ratio).

Multivariate Cox regression analysis showed that SPDYC protein expression was an independent prognostic factor regardless of patient age, tumor grade, nodal stage, tumor size, and LVI status for DMFS (p = 0.015, 95% CI = 1.17–4.74, HR = 2.365) and BCSS (p = 0.015, 95% CI = 1.18–4.78, HR = 2.377). Likewise, multivariate Cox regression analysis showed that ACSM4 protein expression was a significant independent prognostic factor for DMFS (p = 0.002, 95% CI = 1.35–3.89, HR = 2.267), but not in BCSS (p = 0.057, 95% CI = 0.98–2.93, HR = 1.698) (Table 3a, b).

Table 3 (A): Multivariate Cox regression analysis for Individual Potential proteins associated with Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333). (B): Multivariate Cox regression analysis for Individual Potential proteins associated with Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).

In a combined multivariate Cox regression analysis, SPDYC protein expression was an independent prognostic factor that predicted shorter DMFS and BCSS (DMFS: p = 0.03, 95% CI = 1.07–5.86, HR = 2.50: BCSS: p = 0.03, 95% CI = 1.08–5.96 HR = 2.54), regardless of patient age, tumor grade, nodal stage, tumor size, and LVI status. ACSM4 protein expression also was observed to be an independent prognostic factor, associated with shorter DMFS (p = 0.003, 95% CI = 1.01–3.20, HR = 1.83), regardless of patient age, tumor grade, nodal stage, tumor size, and LVI status, but not with BCSS (p = 0.27, 95% CI = 0.76–2.56, HR = 1.40) (Table 4). Correspondingly, we observed a significant positive linear association between ACSM4 and SPDYC protein expression (r = 0.29, p < 0.001), signifying that these proteins might be synergistically driving TNBC disease progression (Fig. 4). Furthermore, using only cases that were informative for both biomarkers, a linear prognostic score was generated using Cox proportional hazard analysis to test whether dual expression of SPDYC and ACSM4 proteins was associated with worse outcome. The equation generated used the sum of the product of the quantitative H-score and their respective regression coefficient as follows:

Table 4 Combined multivariate Cox regression analysis for potential proteins associated with Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).
Fig. 4: Correlation between protein expression levels of SPDYC and ACSM4 (IHC Cohort, n = 333).
figure 4

Violin plots demonstrating a positive correlation between protein expressions of SPDYC and ACSM4 (Correlation Coefficient, r = 0.29, p = 0.00001).

Protein expression prognostic score:  = (ACSM4 H-score * ACSM4 H-score β-value) + (SPDYC H-score * SPDYC H-score β-value) (Table 5).

Table 5 Multivariate Cox regression analysis to build prognostic index for the two gene signature predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).

This protein expression prognostic score was then dichotomized using X-tile software to determine the optimal score to classify patients into high-and low-risk groups using DMFS as an end point. In the 257 investigated cases, the scores ranged from 15.43–365.05 with high protein expression risk scores (score >170) observed in 159/257 (62%) cases.

When testing the association between the prognostic score and outcome, univariate analysis demonstrated that cases with higher protein expression score had a significantly shorter DMFS (p = 0.02) but not BCSS (p = 0.06) (Fig. 5). Multivariate Cox regression analysis model demonstrated that protein expression prognostic score was an independent prognostic factor for DMFS (p = 0.03, 95% CI = 1.04–3.32, HR = 1.83) independent of patient age, tumor grade, nodal stage, tumor size, and LVI status, but not for BCSS (p = 0.07, 95% CI = 0.94–2.96, HR = 1.83) (Table 6).

Fig. 5: Univariate Kaplan–Meier survival analysis of the protein expression of the two-gene signature score predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).
figure 5

Univariate Kaplan–Meier survival analyses to test associations between the two gene prognostic signature protein expression and clinical outcome (Significant P values are bolded HR Hazard ratio).

Table 6 Multivariate Cox regression analysis for the protein expression prognostic score predicting Breast Cancer Specific Survival (BCSS) and Distant Metastasis-Free Survival (DMFS) (IHC Cohort, n = 333).

Finally, when we stratified our cohort based on chemotherapy treatment, the 10-year DMFS of patients who were not offered chemotherapy (n = 83) and showed low expression of ACSM4 was 84% compared with 44% of those with high expression and the difference was statistically significant (p = 0.005). However, those with low expression of SPDYC had 83% 10-year DMFS compared with 70% in those with high expression but the difference was not statistically significant (p = 0.209). Similarly, with the prognostic two-gene signature, the 10-year DMFS of patients with low expression was 84% compared with 69% of those with high expression (p = 0.309).

Testing the performance of the prognostic two-gene at the transcriptomic and protein Levels

The prognostic signature at the mRNA level captured 58% sensitivity, 69% specificity, 54% positive predictive value, 72% negative predictive value, and 64% accuracy in dichotomizing distant metastasis outcome of TNBC patients. In comparison, the prognostic signature at the protein level showed 73% sensitivity, 42% specificity, 30% positive predictive value, 82% negative predictive value, and 50% accuracy in dichotomizing distant metastasis outcome of TNBC patients (Supplementary (A) Table 6).

Discussion

Molecular classification of BC provides opportunities for enhanced personalized therapy [33]. In TNBC, conventional prognostic factors such as age, tumor size, tumor grade, and lymph node status have limited risk-predictive influence as these tumors are mostly of higher grade with increased chances of recurrence and metastasis [1]. Therefore, deciphering genomic profiles of TNBC using advanced techniques is an unmet need. Moreover, the utilization of ANN to mine the transcriptomic profile of TNBC in order to identify genes associated with clinical outcome is a promising approach to stratify patients for risk prediction [34].

In the current study, a discovery phase and two validation phases were implemented. The in-house transcriptomic TNBC cohort was used for the discovery phase for ANN analysis. Whereas, the protein expression and publicly available external transcriptomic BC data were used for the validation phases of findings. More importantly, regardless of the statistical differences in the distribution of clinicopathological parameters between transcriptomic and IHC cohorts, our gene signature showed statistical association with outcome both at transcriptomic and protein expression level. Our study supports the utility of applying ANN to integrate distinct clinical and molecular data to find novel prognostic biomarkers associated with TNBC poor outcome.

Our study employed ANN for the analysis of our transcriptomic cohort to discover novel prognostic genes associated with outcome in TNBC. ANN is a powerful tool for the analysis of complex data, overcoming high background noise, and thus identifying the influence of many interacting factors [35]. ANN analysis, unlike conventional statistical approaches such as hierarchical clustering, linear regression, and principal component analysis, is not limited by linear functionality; thus, identification of biological relationships between biomarkers and clinical outcomes is improved [24]. Furthermore, unlike conventional statistical techniques used in the medical diagnostic and prognostic approaches, ANN can produce greater accuracy model than its counterparts [36]. Therefore, it is highly suitable for the identification of potential key genes driving TNBC outcomes. ANN modeling uses a supervised learning approach, a multi-layer perception architecture with a sigmoid transfer function, where weights are updated by a back propagation algorithm [37].

In this study, ANN analysis identified the top ranked genes predicting DMFS and BCSS. We then employed a web-based tool to identify the signaling pathways significantly enriched in the significant top ranked gene panels. For instance, TNBC patients frequently harbor higher expression of the epidermal growth factor receptor EGFR; however, studies have failed to establish significant benefit from EGFR-targeted therapies or tyrosine kinase inhibitors, suggesting the need to therapeutically target other pathways in these tumors [38, 39]. Moreover, the significance and over-activation of pathways such as; P38 MAPK, the PDGF, and the RAS pathways in BC metastatic sites and their association with DMFS and BCSS in TNBC have been previously documented [40,41,42]. In addition, the 21-gene panel generated by ANN analysis that was strongly associated with both DMFS and BCSS in TNBC included several novel and potentially targetable biomarkers in TNBC outcome. For instance, higher expression of DOCK10 (also known as dedicator of cytokeratin-10/ZIZ3) [43], has been previously identified as an indicator of poor prognosis in TNBC patients and as a predictor of distant metastasis [44]. In our transcriptomic cohort, DOCK10 emerged as a significant prognostic marker of BCSS and DMFS however, it was not significantly prognostic in multivariate Cox regression analysis. We also found that high expression of BICC1, an RNA binding protein, a negative regulator of the WNT signaling pathway with potential involvement in regulating gene expression during embryonic development [45], was associated with DMFS but not with BCSS; thus, it was not included in the final signature.

In our study, we distilled the initial 21-gene panel down to eight genes that when tested individually for their prognostic value, were significantly associated with both DMFS and BCSS using univariate and multivariate analysis after adjusting for the potentially confounding variables. These genes are implicated in pro-oncogenic pathways in BC. PPL (also known as Periplakin) is a part of the cornified envelop in keratinocytes and desmosomes with intermediate filaments. PPL can act in the PKB/AKT-mediated signaling pathway [46]. In TNBC, silencing PPL decreased cell migration and invasion [47]. SPDYC is a member of the speedy/Ringo cyclin-dependent kinase (CDK) family with known functions in cell cycle transitions and progression [48]. SPDYC plays an important role in activating both CDK1 and CDK2 expression [49]. CDK2 high expression has been previously described to be associated with shorter survival in metastatic melanoma cases and endocrine resistance in SKBR3-HER2 positive BC cell lines [50, 51]. Furthermore, down regulation of CDK1 has been found to increase synthetic lethality of TNBC cell lines if accompanied with c-Myc high expression [52]. However, SPDYC role in BC is still undefined [48]. ACSM4 encodes a protein with known functions in the conjugation of carboxylic acids and in fatty acid beta oxidation. Interestingly, upregulation of metabolic pathways has been found to interact with cellular transcriptomic and proteomics of both CD4 and CD8 T cells in HIV disease [53]. Although ACSM4 has been shown to have a role in AIDS progression, there are no reports with its role in BC [54, 55]. We have previously reported a strong correlation between tumor infiltrating lymphocytes and TNBC outcome [56]. However, our current analysis did not identify known inflammation and immune response related genes associated with outcome in the TNBC 21-gene panel. Future studies should therefore seek to identify novel mechanisms contributing to aberrant inflammatory and immune response pathways involved in tumor infiltrating lymphocytes in TNBC. Furthermore, genes such as AC020931.1, DCTN1-AS1, RP11-29H23.5, PAXBP1-AS1, and RPS10P18 require further investigation to decipher their role and function in BC progression.

The original hypothesis underpinning this study was that a gene expression signature would more accurately predict both DMFS and BCSS in TNBC than a single gene. Multivariate Cox regression analysis enabled us to further filter the set of eight genes to a prognostic two-gene signature (ACSM4 and SPDYC) showing strong association with both DMFS and BCSS. We tested whether immunohistochemical assessment of the protein expression of the ACSM4 and SPDYC genes could be used to predict patient outcomes. Our study confirmed that protein expression had independent prognostic significance in TNBCs and showed strong statistical association with worse outcomes (i.e., shorter DMFS and BCSS). These genes when combined in a linear score, successfully stratified TNBC patients into high- and low-risk subgroups; in the former group, which is at a higher risk of developing distant metastasis, could benefit from greater vigilance and more aggressive treatment regimens. We have validated our ANN investigation and RNA-sequencing results by studying protein expression, which showed that a prognostic score derived from the immunohistochemical evaluation of the two biomarkers could significantly predict distant metastasis, and thus support personalized prognostic evaluation and guiding treatment choices to improve disease outcomes.

In this study, the prognostic value of the two-gene signature at the mRNA level yielded 58% sensitivity and 64% accuracy in dichotomizing distant metastasis outcome of TNBC patients. By contrast, at the protein level, our proposed two-gene signature demonstrated 73% sensitivity, and 50% accuracy in dichotomizing distant metastasis outcome of TNBC patients. Our proposed two-gene signature showed promising accuracy and sensitivity results in predicting the risk of distant metastasis in TNBC patients, which is even more important as presently TNBC patients solely rely on chemotherapy treatment. Moreover, those patients who are deemed at high risk of distant metastasis may benefit from the stratification for an improved treatment decision.

Furthermore, our proposed two-gene signature is only based on two genes (ACSM4 and SPDYC), unlike other commercially available prognostic assays including those designed for ER-positive tumors [57]. Our prognostic gene signature may be amenable to the development of affordable molecular tests based on quantitative reverse transcriptase polymerase chain reaction as the sensitivity, specificity, and accuracy of our two-gene signature is proved to be much stronger at the mRNA level. The prognostic gene signature might be suitable for use in routine clinical practice because the proposed two-gene signature has prognostic value in dichotomizing TNBC patients and may provide important information for treatment decisions.

The mainstay of TNBC treatment is cytotoxic chemotherapy [58]. However, chemotherapy decision for metastatic TNBC patients are given based on a combination of aspects relates to the disease and patient physical characteristics (i.e., tumor burden, patient age, co-morbidities, prior treatments received in the adjuvant setting, and patient preference) [59]. Despite the interesting finding of this study and the significant difference in the survival of patients who were not offered chemotherapy based on the expression of ACSM4 (with worse outcome of patients with high expression), the 10-year DMFS of patients with low expression (84%) may not justify recommendation for omission of chemotherapy in those patients. However, to make such a recommendation, a clinical trial utilizing a sufficiently large number of TNBC patients may be warranted to determine whether TNBC patients with low ACSM4 expression can avoid chemotherapy without worse outcome.

A challenge of applying the NGS technique to deciphering the molecular characteristics of TNBC tumors includes access to the technology and the integrity of tumor samples to guarantee sufficient tumor RNA extraction [60]. Variation in sample quality and preparation may negatively influence the outputs of NGS analysis and therefore must be carefully controlled. In addition, NGS analysis must consider intrinsic tumor heterogeneity between patients. Samples used in this study were processed in a strictly standardized procedure implemented in Nottingham University Hospitals with immediate sample fixation following surgery, with standard protocols optimized to preserve tissue architecture, subcellular details and importantly the integrity of biologic materials including proteins, DNA, and RNA. Nonetheless, our retrospective study was limited to a single center using an in-house transcriptomic and protein expression cohort for this investigation. However, the public domain data used in this study supports the value of both ACMS4 and SPDYC high expression conferring poor prognosis for BC patients, especially those diagnosed with TNBC molecular subtype. Hence, further external validation is strongly recommended.

Conclusion

Personalized medicine seeks to stratify BC patients to ensure optimal treatment and thus, improved patient outcomes. Our study has identified a two-gene signature that stratifies TNBC patients into high-and low-risk groups for developing distant metastasis, which can potentially guide clinical decision-making. The robust methods used herein to identify our prognostic gene signature followed by validation of the findings at the protein expression level, suggest that this promising two-gene signature provides avenues for further in vitro functional investigation and for new drug development for TNBC patients who are in dire need of effective therapeutic options.