Introduction

Estrogen and progesterone receptors (ER and PR) and HER2 are standard biomarkers used in clinical practice to aid the histopathological classification of breast cancer and management decisions. Hormone receptor- and HER2-positive tumors benefit from tamoxifen and anti-HER2 therapies, respectively. On the other hand, there are currently no targeted drug therapies for management of triple-negative breast cancer (TNBC), which lacks the expression of hormone receptors and HER2. TNBCs are more sensitive to chemotherapy than hormone receptor-positive tumors, because they are generally more proliferative, and pathological complete responses after chemotherapy are more likely in TNBC than in non-TNBC.1, 2 Paradoxically, TNBC is associated with poorer survival than non-TNBC, owing to more frequent relapse in TNBC patients with residual disease.1, 2 Only 31% of TNBC patients experience pathological complete responses after chemotherapy,3 emphasizing the need for targeted therapies.

Transcriptome profiling has been used to dissect the heterogeneity of breast cancer into five intrinsic ‘PAM50’ subtypes: Luminal A, Luminal B, Basal-like, HER2 and normal-like subtypes that relate to clinical outcomes.4, 5, 6, 7, 8 Several gene signatures have been developed to predict outcome or response to treatment, including MammaPrint,9 OncotypeDx10, 11 and Theros.12, 13, 14, 15 These commercial signatures rely on models that select geneson the basis of clinical phenotypes such as tumor response or survival time. Notwithstanding their clinical utilities, these models fail to identify core biological mechanisms for the phenotypes of interest. Recently, an approach based on biological function-driven gene coexpression signatures, ‘attractor metagenes’, has been applied to the prediction of survival in multiple cancers including breast cancer.16, 17 Three attractor metagenes, chromosomal instability (CIN), mesenchymal transition and lymphocyte-specific immune recruitment, were highly predictive of breast cancer survival.17 To some extent, this approach may helpclarify some previously published signatures. For example, proliferation and cell cycle signatures have been previously reported to associate with tumor grade and prognosis.15, 18 The attractor metagene approach suggests that these signatures are essentially CIN attractors enriched with genes that function at the kinetochore–microtubule interface.16

In this study, we initially performed multiple class comparisons using the Oncomine database,19 aiming to identify genes that were commonly deregulated in subgroups exemplifying aggressive clinical behavior: TNBC compared to non-TNBC and normal breast and tumors associated with distant metastasis and/or death compared to their respective counterparts. This analysis revealed a list of 206 recurrently deregulated genes that were enriched for CIN and ER metagenes. We derived an aggressiveness score based on the ratio of the CIN metagene to the ER metagene, and found that this score identified aggressive tumors in several other data sets regardless of the molecular subtype and clinico-pathological indicators. The aggressiveness score outperformed MammaPrint,9 OncotypeDx,10, 11 proliferation per cell cycle16, 20 and CIN20 signatures in multivariate Cox proportional hazards comparisons. Next, we found that depletion of proteins involved in kinetochore binding or chromosome segregation (TTK, TPX2, NDC80 and PBK) could be therapeutic and significantly reduced the survival of TNBC cell lines in vitro, particularly TTK. TTK inhibition with small-molecule inhibitor affected the survival of TNBC cell lines. We found that both TTK mRNA and protein levels associated with aggressive tumor phenotypes. Mitosis-independent expression of TTK protein was prognostic in TNBC and other aggressive breast cancer subgroups, suggesting that protection of CIN/aneuploidy drives aggressiveness and treatment resistance. Finally, we show that the combination of TTK inhibition with chemotherapy was effective in vitro in the treatment of cells that overexpress TTK, thus providing a therapeutic option for the protected CIN phenotype.

Results

Meta-analysis of gene expression profiles in TNBC

We performed a meta-analysis of published gene expression data, irrespective of platform, using the Oncomine database19 (version 4.5). We compared the expression profiles of 492 TNBC cases vs 1382 non-TNBC cases in eight data sets and found 1600 overexpressed and 1580 underexpressed genes in the TNBC cases (cutoff median P-value across the 8 data sets <1 × 10−5 from a Student’s t-test, Supplementary Figure 1). We also compared the expression profiles of primary breast cancers from 512 patients who developed metastases vs 732 patients who did not develop metastases at 5 years (7 data sets in total) to identify 500 overexpressed and 480 underexpressed genes in the metastasis cases (cutoff median P-value across the seven data sets <0.05 from a Student’s t-test, Supplementary Figure 1). Finally, we compared the expression profiles of 232 primary breast tumors from patients who died within 5 years with 879 patients who survived in seven data sets and found 500 overexpressed and 500 underexpressed genes in the poor survivors (cutoff median P-value across the seven data sets <0.05 from a Student’s t-test, Supplementary Figure 1). The union of these analyses—genes deregulated in TNBC and in tumors that metastasized or resulted in death within 5 years—generated a gene list of 305 overexpressed and 341 underexpressed genes (Supplementary Figures 2A and B). The deregulated genes from our analyses did not consider deregulation in comparison with normal breast tissue. To identify cancer-related genes, we used the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) data set21 as a validation data set. Of the 305 overexpressed and 341 underexpressed genes identified in the meta-analysis, 117 overexpressed and 89 underexpressed genes (206 genes) were deregulated in TNBC (250 cases) vs 144 adjacent normal tissue (1.5-fold-change cutoff; Supplementary Figures 2C and D).

Clinico-pathological features of the aggressiveness gene list

We compared the 206 genes from the above analysis, which we called the ‘aggressiveness gene list’ (Supplementary Table 1), with the recently described metagene attractors16, 17 and found that 45 of the overexpressed genes were in the CIN metagene, whereas 19 of the underexpressed genes were in the ER metagene (Supplementary Figure 3). The expression of the aggressiveness gene list was visualized in the METABRIC data set, stratified according to the histological subtypes by the GENUIS classification.22 As shown in Figure 1a, ER/HER2 (TNBC), in comparison with adjacent normal breast tissue, showed the highest upregulation of CIN genes (red in the heat map) and downregulation of ER signaling genes (green in the heat map). Tumors of other subtypes showed a range of deregulation of these genes. To quantify these trends, we calculated the ‘aggressiveness score’ as the ratio of the CIN metagene (average of expression of CIN genes) to the ER metagene (average of expression of ER genes). The aggressiveness score was highest for ER/HER2 (TNBC), followed by HER2+ then ER+ tumors (box plot in Figure 1). We also analyzed the aggressiveness score in the five intrinsic breast cancer subtypes predefined by the PAM50 classification8 and the ten integrative clustering subtypes defined by combined clustering of gene expression and copy number data subtypes21 (Supplementary Figure 4). The aggressiveness score was highest in the basal-like and the integrative clustering 10 subtypes, which are enriched for TNBC and have poor prognosis.

Figure 1
figure 1

Correlation of breast cancer subtypes and the aggressiveness gene list. The METABRIC data set was visualized according to the expression of the 206 genes (Supplementary Table 1) in the aggressiveness gene list. The aggressiveness score for each tumor was calculated as the ratio of the CIN metagene (average value for CIN genes expression) to the ER metagene (average value for ER genes expression). (a) The expression of the aggressiveness gene list according to the GENIUS histological classification. The box plot shows the aggressiveness score of the histological subtypes. (b) The overall survival of patients in the METABRIC data set was analyzed according to the aggressiveness score (upper row: by quartiles; lower row: by median) in all patients, non-TNBC patients and in patients with ER+ Grade 2 tumors. The hazard ratio (HR), CI and P-value for comparisons of upper quartile vs lower quartiles (upper row) and at the dichotomy across the median (high vs low) are shown (log-rank test, GraphPad Prism). The number of patients (n) in each group is shown in brackets. The expression of the aggressiveness score according to PAM50 and intClust subtypes and survival curves for ER+ grade 3 and PAM50 subtypes according to the aggressiveness score are in Supplementary Figures 4 and 5.

Interestingly, tumors of various subtypes scored higher than the median aggressiveness score (line in box plots in Figure 1 and Supplementary Figure 4). To this end, we examined the overall survival of patients in the METABRIC data set stratified by quartiles and also dichotomized by the median of the aggressiveness score. Tumors with a high aggressiveness score had worse survival than those with a low aggressiveness score. The survival of patients with non-TNBC tumors with high aggressiveness score had poor survival that was similar to TNBC patients (Figure 1b). Among ER+ tumors, we found that a high aggressiveness score predicted poor survival in both Grade 2 (Figure 1b) and Grade 3 (Supplementary Figure 4) tumors. Tumors with a high aggressiveness score showed poor survival regardless of the PAM50 intrinsic breast cancer subtypes (Supplementary Figure 4). The PAM50 classifier was prognostic only in tumors with a low aggressiveness score (Supplementary Figure 5).

One network of direct interactions in the aggressiveness gene list associates with patient survival

We performed network analysis on the aggressiveness gene list by using the Ingenuity Pathway Analysis and found a network with direct interactions between 97 of the 206 deregulated genes (Figure 2a). To find the minimal genes that represent the aggressiveness genes and this network, the 97 genes in this network were analyzed for their correlation with the CIN or ER metagenes and overall survival in the METABRIC data-set (Supplementary Table 2). We selected genes according to the following criteria: (1) highest correlation with the metagenes (Pearson’s correlation coefficient >0.7); (2) association with overall survival (Cox proportional hazards model, P<0.001); and (3) more than two-fold deregulation with least standard deviation of expression between high and low aggressiveness score tumors. These analyses identified two genes from the ER metagene (MAPT and MYB) and six genes from the CIN metagene (MELK, MCM10, CENPA, EXO1, TTK and KIF2C). These eight genes were maintained in a directly connected network (Figure 2b). The classification of tumors (high vs low across the median) from these eight genes, again representing the ratio of CIN and ER metagenes, predicted the classification from the 206 genes with 95% sensitivity and 97% specificity by prediction of microarray (PAM) analysis (data not shown). Importantly, a high score from these eight genes identified poor survival in all patients, non-TNBC patients and ER+ Grade 2 (Figure 2c).

Figure 2
figure 2

Network analysis of the aggressiveness gene list. (a) Ingenuity Pathway Analysis was performed using direct interactions on the 206 genes in the aggressiveness gene list (red is overexpressed and green is underexpressed). One network of high direct interactions was identified. (b) The genes in the network in A were investigated for their correlation with the aggressiveness score and overall survival (Supplementary Table 2), and eight genes (MAPT, MYB, MELK, MCM10, CENPA, EXO1, TTK and KIF2C) with the highest correlation were still connected in a direct interaction network. (c) The overall survival of patients in the METABRIC data set was analyzed according to the score from the 8 genes in C (upper row: by quartiles; lower row: by median) in all patients, non-TNBC patients and in patients with ER+ Grade 2 tumors.

Next, we explored the 8-gene score for prognosis in several molecular and histological settings in the METABRIC data set. The survival of patients with tumors with wild-type TP53 was stratified by the 8-gene score (Figure 3a). Patients with mutant TP53, which were mainly of high score, showed worse survival than those with wild-type TP53, suggesting that TP53 mutation is an independent prognostic factor. Patients with tumors with low or high expression of the proliferation marker Ki67 were stratified by the 8-gene score, suggesting that the 8-gene score is independent of proliferation (Figure 3a). We also found that the 8-gene score stratified the survival of patients from all stages of disease (Stage I—Stage III, Figure 3a). We focused on ER+ and found that, as in the case of ER+ Grade 2 tumors (Figure 2c), the 8-gene score stratified the survival of patients with ER+ Grade 3 tumors (Figure 3b). Importantly, the 8-gene score identified ER+LN and ER+LN+ patients who had poor survival similar to ERLN and ERLN+ patients, respectively (Figure 3b). High 8-gene score identified poor survival of patients with tumors of all PAM50 subtypes, and the prognostication by PAM50 classification was only evident in low 8-gene score tumors (Supplementary Figure 5).

Figure 3
figure 3

Survival of patients stratified by the 8-gene score in the METABRIC data set. The overall survival of patients in the METABRIC data set was analyzed according to the 8-gene score in selected settings in all patients (a) or in ER-positive patients only (b). (a) TP53 mutation was compared in high vs low 8-gene score (split by the median). The expression of the proliferation marker Ki67 was divided by dichotomy across the median, and patients in each of these groups were then stratified according to their 8-gene score (split by quartiles). Disease stages (Stage I—Stage III) were stratified by the median 8-gene score. (b) ER+ Grade 3, ER+ lymph node-negative (LN−) and ER+ LN+ tumors were stratified by the quartiles.

The 8-gene aggressiveness score in multivariate survival analysis

To exclude the possibility that the aggressiveness score—calculated using the 206 genes or the 8 genes—was redundant, we performed multivariate Cox proportional hazards model analysis in the METABRIC data-set (with Illumina platform) in comparison with conventional clinical variables and current gene signatures. As detailed in Table 1, the aggressiveness scores significantly associated with patient survival when compared with conventional variables and outperformed MammaPrint,9 OncotypeDx,10, 11 proliferation per cell cycle16, 20 and CIN20 signatures. Moreover, our aggressiveness scores outperformed the CIN4 classifier,23 which was recently developed from the CIN signature.

Table 1 Univariate and multivariate survival analysis of the aggressiveness score in the METABRIC data set

We performed validation of the six CIN and two ER genes in univariate survival association using the online tool Kaplan–Meier plotter24 (Supplementary Results and Supplementary Tables 3 and 4). More importantly, we performed multivariate survival analysis of the 8-gene score in four data sets (with Affymetrix platform from the Gene Expression Omnibus (GEO); GSE2990, GSE3494, GSE2034 and GSE25066). Again, the score was significantly associated with survival in a multivariate Cox proportional hazards model in every data set tested (Figure 4). Altogether, we found that in multiple data sets that used different platforms the 8-gene score identified patients with poor survival independently of other clinico-pathological indicators and outperformed current signatures.

Figure 4
figure 4

The 8-gene score associates with the survival of breast cancer patients. Four published data sets were used to validate the 8-gene score as a predictor of survival. The 8-gene score was calculated for tumors in each of the data sets, and the survival of patients was stratified according to the median 8-gene score; (a) GSE2990,15 (b) GSE3494,64 (c) GSE203465 and (d) GSE25066.51 The hazard ratio (HR), CI and P-value for comparisons of high vs low 8-gene score are shown in the Kaplan–Meier survival curves (log-rank test, GraphPad Prism). The number of patients (n) is shown in brackets. The table in each panel shows multivariate survival analysis using the Cox proportional hazard model including all available conventional indicators.

Therapeutic targets in the aggressiveness gene list

The overexpressed genes in the CIN metagene are involved in, or regulate, mitosis, spindle assembly and checkpoint, kinetochore attachment, chromosome segregation and mitotic exit. Thus, it is not surprising that several of the overexpressed genes are targets for molecular inhibitors, such as CDK125, 26 and AURKA/AURKB,27 and have been trialed preclinically and clinically.28 To this end, we performed small interfering RNA (siRNA) depletion against 25 genes of the CIN metagene in three TNBC cell lines: MDA-MB-231, SUM159PT and Hs578T. We found that knockdown of four genes (TTK, TPX2, NDC80 and PBK) consistently affected the survival of these cells (Figure 5a and Supplementary Table 2). The knockdown of TTK showed the worst survival, and as it was in the 8-gene score we selected TTK for further studies. We found that TTK protein was higher in TNBC cell lines compared with the near-normal MCF10A cell line and luminal/HER2 cell lines (Figure 5b). Next, we used the specific TTK inhibitor (TTKi) AZ3146 against a panel of breast cancer cell lines and found that TNBC cell lines were more sensitive to the TTKi (Figure 5c).

Figure 5
figure 5

Therapeutic targets in the aggressiveness gene list. (a) The TNBC cell lines, MDA-MB-231, SUM159PT and Hs578T were treated with control siRNA (Scrambled, Sc CTRL) or siRNA targeting the specified genes, and the survival of these cells was compared on day 6. Data show the average from the three cell lines where each cell line was treated in triplicate. *P<0.05 and ***P<0.001 from one-way ANOVA analysis performed using GraphPad Prism. Data for individual cell lines are shown in Supplementary Table 2. (b) A panel of breast cancer cell lines was used to prepare lysates for immunoblotting of TTK. Tubulin was used as the loading control. (c) Dose response curves for the treatment of breast cancer cell lines in the absence or presence of escalating doses of the TTK inhibitor (TTKi) AZ3146. The survival of cells was measured using the CellTitre MTS/MTA assay carried out 6 days after treatment. Percentage survival (n=3 per dose) was calculated as the percentage of the signal from treated cells to that from control cells. (d) The concentration of TTK required to affect the survival of 50% of the cells (IC50) was measured by GraphPad Prism from the dose response curves in C for each cell line.

TTK expression in aggressive tumors and potential for combination therapy

To further study the potential of TTK as therapeutic target, we investigated TTK expression at the mRNA and protein levels in breast cancer patients. We analyzed the correlation of TTK mRNA expression, dichotomized at the median, with clinico-pathological indicators in the METABRIC data set of 2000 patients (Table 2). High TTK mRNA expression was associated with younger age of tumor diagnosis, larger tumor size, higher tumor grade, higher Ki67 expression, TP53 mutations, an ER/PR-negative tumor phenotype, HER2 positivity and TNBC. On the basis of PAM50 subtyping, high TTK mRNA was associated with luminal B, HER2-enriched and basal-like tumors.

Table 2 Correlation of TTK mRNA level and clinico-pathological indicators in the METABRIC data set

We also analyzed TTK expression in a cohort of breast cancer patients (406 patients) by IHC. TTK and its activity are detected at all stages of the cell cycle; however, TTK is upregulated during mitosis.29 Thus, we observed TTK staining in non-mitotic cells to define high TTK levels (score of 3) in order to exclude the bias of elevated TTK level during mitosis. Similar to TTK mRNA, high TTK protein level (Table 3) was associated with high tumor grade, high Ki67 expression and TNBC status (particularly basal TNBC). Moreover, in agreement with the associations of TTK mRNA with the PAM50 intrinsic subtypes, high TTK protein was observed in HER2-positive and proliferative ER+/HER2 tumors (most related to luminal B) but low TTK protein was observed in nonproliferative ER+/HER2 tumors (most related to luminal A). In addition to these associations with aggressive phenotypes, we also found that high TTK protein significantly associated with aggressive histological features including ductal histology, pushing tumor border, lymph node involvement, nuclear pleomorphism, lymphocytic infiltration and higher mitotic scores (Table 3). Altogether, similar to the high aggressiveness score from the 206 genes or 8 genes, high levels of TTK mRNA and protein span across breast cancer subtypes marking aggressive behavior.

Table 3 Associations between TTK protein expression and clinico-pathological indicators

We examined the association of TTK protein level with patient survival and found that breast tumors with high TTK staining (category 3) had worse survival than other staining groups at 5 years (Figures 6a and b) and 10 and 20 years (Supplementary Figure 6). Importantly, high TTK staining (category 3) was not restricted to a particular histological subgroup or to tumors with high mitotic index (Figure 6c). Next, we focused on prognostication of aggressive subgroups (Grade 3, lymph node-positive, TNBC, HER2 or high Ki67) and found that high TTK protein level identified exceptionally aggressive tumors that lead to poor survival of less than 2 years (Figure 7a). Finally, to exploit our finding that TTK, as a part of the aggressiveness score, was associated with aggressive breast tumors and that TTK inhibition was effective in TNBC cell lines that overexpress this protein (Figure 5), we investigated the therapeutic potential of combining TTK inhibition with chemotherapy. We found that TTKi synergized with docetaxel at very low doses (sublethal doses) in the treatment of TNBC cell lines that overexpress TTK in comparison with cell lines that do not (Figure 7b), and that this combination induced apoptotic cell death (Figure 7c).

Figure 6
figure 6

TTK protein expression associates with breast cancer survival. The overall survival of patients in a large cohort of breast cancer patients (n=409) was stratified according to TTK staining by immunohistochemistry(scores 0–3). Kaplan–Meier survival curves are shown for all patients (a) with four TTK staining (categories 0–3) and (b) two categories (0–2 vs 3). Log-rank test and P-value were used for survival curves. (c) The distribution of high TTK staining (category 3) across histological subgroups and mitotic indices. Data show the mitotic index (median+range) measured as the number of mitotic cells in 10 high-power fields (hpf). The number of tumors with high TTK staining to the total number of tumors in the cohort is shown on the right. High TTK expression distributed across subtypes and did not associate with mitotic index.

Figure 7
figure 7

TTK associates with aggressive subtypes and is a therapeutic target. (a) Kaplan–Meier survival curves are shown for Grade 3 tumors, lymph node-positive patients (LN+) and LN+ patients with grade 3 tumors. Log-rank Test and P-value were used for these survival curves. For patients with TNBC and HER2, survival was statistically significant using the Gehan–Breslow–Wilcoxon test (P-values marked by two asterisks), which gives more weight to deaths at early time points. The poorer survival of patients with high Ki67 tumors and high TTK staining was a trend, but it did not reach significance. Survival curves and statistical analyses were performed using GraphPad Prism. (b) TNBC and non-TNBC cell lines were treated for 6 days with the specified concentrations of docetaxel (doc) alone, TTK inhibitor (TTKi) alone or the combinations. The survival of cells was measured using the MTS/MTA assay, as described in Methods. ***P<0.001 comparing the combination with single agents and with non-TNBC cell lines from two-way ANOVA in GraphPad Prism. (c) MDA-MB-231 cells were treated with docetaxel or TTKi alone or in combination and collected at 96 h to perform apoptosis assays by flow cytometry. Early apoptotic cells were defined as annexin V+/7-AAD−. **P<0.01 and ***P<0.001 comparing treatments using one-way ANOVA in GraphPad Prism.

Discussion

Our meta-analysis of gene expression in the Oncomine database identified a list of 206 genes enriched with two core biological functions/metagenes: CIN and ER signaling. We calculated the aggressiveness score, the ratio of CIN to ER metagenes, which was associated with the overall survival of breast cancer. A core of eight genes (six CIN genes and two ER signaling genes) was representative and recapitulated the correlations with outcome from the 206 genes. The score from the six CIN genes to the 2 ER signaling genes, 8-gene score, associated with survival in several breast cancer data sets. Our aggressiveness scores outperformed conventional variable and published signatures in multivariate survival analysis. Particularly in ER+ tumors, some cases have survival as poor as that of the aggressive HER2+ and TNBC subtypes. Our data suggest that the interplay of cancer-related biological functions, namely CIN and ER signaling, are better predictors of phenotypes than single genes or single functions. This notion is in line with recent studies showing that the interaction of biologically driven predictors provides better prognosis.16, 17, 30 Recently, all ER tumors were described to have a high level of CIN metagene; however, it was not clear that ER+ tumors could be described as low CIN tumors.16 In our study, we clarify that ER+ disease contains a considerable fraction of tumors that have a high level of CIN genes and that the relationship between CIN and ER genes is a powerful predictor of survival in these patients.

The fidelity of chromosome segregation is ensured by the proper attachment of the microtubules from the mitotic spindle to the kinetochores of chromosomes in a tightly regulated process, and CIN refers to the missegregation of whole chromosomes, thus producing aneuploidy.31 Using aneuploidy as a surrogate marker for CIN, Carter et al.20 developed a gene signature and found that this ‘CIN signature’ predicts clinical outcome in multiple cancers. More recently, a minimal gene set that captures the CIN signature CIN4 (AURKA, FOXM1, TOP2A and TPX2) was described as the first clinically applicable quantitative PCR-derived measure of tumor aneuploidy from formalin-fixed, paraffin-embedded tissue. As Grade 2 tumors have heterogeneous characteristics in terms of clinical outcome, the significance of the CIN4 classifier is the stratification of Grade 2 tumors into good and poor prognosis groups.23 Our aggressiveness scores were prognostic in all tumor grades and disease stages (stages I–III and lymph node-negative and positive) and outperformed the CIN signature and the CIN4 classifier in multivariate survival analysis in the METABRIC data set. Strikingly, but in agreement with previous studies,32, 33 the prognostication using the CIN metagene and our aggressiveness scores from gene expression levels were restricted to ER+ disease but not in the TNBC or HER2 subtypes. This may be explained by the fact that ER tumors have a high level of CIN metagene as per our results and those published previously.16 However, our results with TTK protein level clearly demonstrate that TNBC, HER2, high-grade, lymph node-positive and proliferative tumors contain subgroups with high TTK levels exclusive of mitotic cells and have poorer survival than those with low TTK expression or TTK expression in mitotic cells. We propose that there are two types of high expression of CIN genes that may not be clearly differentiated by mRNA expression studies. One form of elevated CIN genes relates to high levels of mitosis and proliferation, whereas the second form that we measured by immunohistochemistry exclusive of mitotic cells is driven by another aggressive phenotype: protection of aneuploidy and genomic instability. The recent study of the CIN4 classifier lends support to our proposition. In this study, using flow cytometry to measure aneuploidy by DNA content, the authors found that a substantial proportion of tumors with high CIN4 scores have a normal DNA ploidy and that a significant proportion of aneuploid cases had a low CIN4 score.23

Chromosome missegregation and aneuploidy enhance genetic recombination and defective DNA damage repair34 to drive a ‘mutator phenotype’ required for oncogenesis.35 Genomic instability caused by deregulated mitotic spindle assembly checkpoint and aneuploidy has been termed ‘non-oncogene addiction’.36, 37 It is tempting to suggest that CIN and aneuploidy are exploited by breast cancer stem cells, which are high in TNBCs38 owing to the link between cancer stem cells, aneuploidy and therapy resistance.39, 40 This is supported by studies that implicate several genes involved in the spindle assembly checkpoint and chromosome segregation in tumor initiation, progression and cancer stem cells, e.g., AURKA in ovarian cancer,41 MELK/FOXM1 in glioblastoma,42, 43 MELK44 and MAD245 in breast cancer and SKP2 in several cancers.46 The role of CIN genes in protecting aneuploidy could provide an insight to the paradox that TNBCs show a better response to chemotherapy owing to the higher level of proliferation, yet these tumors have poorer outcome. We propose that resistance in TNBC could be attributed to the ability of aneuploid cells to adapt and drive recurrence. At least in vivo, chemotherapy has been shown to induce the proliferation quiescent aneuploid cells as a mechanism for therapy resistance.39 We envisage that the high level of the CIN metagene in TNBC, particularly genes involved in chromosome segregation, is protective of this state. Indeed, one study found that a high level of TTK is protective of aneuploidy in breast cancer cells, and its silencing reduces the tumorigenicity of breast cancer cell lines in vivo.47 Our results from the patient cohort demonstrate that high TTK protein expression exclusive of mitosis was indeed prognostic in aggressive tumors and support the concept that protection from aneuploidy and genomic instability is an aggressive phenotype that drives poor outcome.

Our results with the TTK molecular inhibitor, in agreement with published studies using siRNA depletion,47, 48 support the idea of targeting chromosomal segregation in tumors with a high CIN phenotype as a therapeutic strategy. We also suggest that while TTK is high in TNBC, as previously described,47, 48 a considerable proportion of non-TNBC tumors that display aggressive features also show an elevated level of CIN genes, and would benefit from such targeted therapies. To our knowledge, the combination of sublethal doses of taxanes with TTK inhibition has not been investigated so far in breast cancer, but it has been investigated in other cancers.34 Our results reveal that TTK inhibition indeed sensitizes breast cancer cells with high TTK to docetaxel.

In conclusion, our study emphasizes that classification of breast cancer on the basis of biological phenotypes facilitates the understanding of the drivers of oncogenic phenotypes and therapeutic potentials. Importantly, our studies demonstrate that immunohistochemistry assessment of CIN genes, exemplified by TTK here, provide better characterization and understanding for the contribution of CIN to tumor aggressiveness and prognosis.

Materials and methods

Meta-analysis of global gene expression in TNBC

We performed a meta-analysis of global gene expression data in the Oncomine database19 (Compendia Bioscience, Ann Arbor, MI, USA) using a primary filter for breast cancer (130 data sets), sample filter to use clinical specimens and data set filters to use mRNA data sets with more than 151 patients (22 data sets). Patients of all ages, gender, disease stages or treatments were included. Three additional filters were applied to perform three independent differential analyses: (1) triple negative (TNBC cases vs non-TNBC cases, eight data sets;49, 50, 51, 52, 53, 54, 55 (2) metastatic event analysis at 5 years (metastatic events vs no metastatic events, seven data sets51, 53, 56, 57, 58, 59, 60); and (3) survival at 5 years (patients who died vs patients who survived, seven data sets52, 53, 55, 57, 60, 61, 62). Deregulated genes were selected on the basis of the median P-value of the median gene rank in overexpression or underexpression patterns across the data sets (Supplementary Figure 1). The union of these three deregulated gene lists resulted in a gene list of deregulated genes in aggressive breast cancers (Supplementary Figure 2). The METBRIC data set21 was used as the validation set for further analysis. The normalized z-score expression data of the METABRIC data set was extracted from Oncomine and imported into BRB-ArrayTools63 (V4.2, Biometric Research Branch, NCI, Bethesda, MD, USA) with built-in R Bioconductor packages. Survival curves for the METABRIC data set were constructed using GraphPad Prism v6.0 (GraphPad Software, San Diego, CA, USA), and the log-rank (Mantel–Cox) Test was used for statistical comparisons of survival curves.

Ingenuity Pathway Analysis and derivation of the eight-gene list

Pathway analysis was performed using the Ingenuity Pathway Analysis (Ingenuity Systems, Redwood City, CA, USA). For pathway analysis in IPA, we used only direct relationships. After pathway analysis, we set out to identify the minimum gene list that recapitulates the aggressiveness 206-gene list. We used the METABRIC data set to perform statistical filtering in the BRB-ArrayTools software to derive the minimum gene list as follows: (1) the correlation of each gene in the CIN metagene and the ER metagene to the metagene itself was determined by quantitative trait analysis using the Pearson’s correlation coefficient (univariate P-value threshold of 0.001); (2) the association of each gene with overall survival using univariate Cox proportional hazards model (univariate test P-value <0.001); and (3) the fold change of gene expression between high aggressiveness score tumors and low aggressiveness score tumors was calculated for each gene. We selected genes with Pearson’s correlation coefficient >0.7 to the metagenes, strongest survival association and more than two-fold deregulation between high and low agressiveness score tumors. The METABRIC data set and four publically available data sets were used to validate the 8-gene score. The four data sets (GSE25066,51 GSE3494,64 GSE299015 and GSE203465) were analyzed as described previously.66

Cell culture and drug treatments

Breast cancer cell lines were obtained from ATCC (Manassas, VA, USA) and cultured as per ATCC instructions. All cell lines were regularly tested for mycoplasma and authenticated using short tandem repeat profiling. For the siRNA screen, siRNA solutions (Shanghai Gene Pharma, Shanghai, China) were used to transfect cells (MDA-MB-231, SUM159PT and Hs578T) with 10 nM of respective siRNA using Lipofectamine RNAiMAX (Life Technologies, Carlsbad, CA, USA). For drug treatments, docetaxel and the TTK inhibitor AZ3146 were purchased from Selleck Chemicals LLC (Houston, TX, USA) and diluted in dimethylsulfoxide. Six days after siRNA knockdown or after drug treatments, the survival of cells in comparison with control was determined using the CellTiter 96 Assay, as per the manufacturer’s instructions (Promega Corporation, Fitchburg, WI, USA). For immunoblotting, standard protocols were used and membranes were probed with antibodies against TTK (anti-MPS1 mouse monoclonal antibody [N1] ab11108 (Abcam, Cambridge, UK) and γ-tubulin (Sigma-Aldrich, Sydney, NSW, Australia), and then developed using chemiluminescence reagent plus (Millipore, Billerica, MA, USA). Flow cytometry to quantify apoptosis was performed using Annexin V-Alexa488 and 7-AAD (Life Technologies), as per the manufacturer’s instruction by using the BD FACSCanto II flow cytometer (BD Biosciences, San Jose, CA, USA).

Breast cancer tissue microarrays, immunohistochemical and survival analysis

The Brisbane Breast Bank collected fresh breast tumor samples from consenting patients; the study was approved by the local ethics committees. Tissue microarrays were constructed from duplicate cores of formalin-fixed, paraffin-embedded breast tumor samples from patients undergoing resection at the Royal Brisbane and Women’s Hospital between 1987 and 1994. For biomarker analysis, whole tumor sections or tissue microarrays (depending on the marker) were stained with antibodies against ER, PR, Ki67, HER2, CK5/6, CK14, EGFR and TTK (Supplementary Table 5), and scored by trained Pathologists. The Vectastain Universal ABC kit (Vector Laboratories, Burlingame, CA, USA) was used for signal detection according to the manufacturer’s instructions. Stained sections were scanned at high resolution (ScanScope Aperio, Leica Microsystems, Wetzlar, Germany), and then images were segmented into individual cores for analysis using the Spectrum software (Aperio, Wetzlar, Germany). Survival and other clinical data were collected from the Queensland Cancer Registry and original diagnostic Pathology reports, and in addition we performed an internal histopathological review (SRL) of representative tumor sections from each case, stained with H&E. For the analysis of HER2 amplification, tissue microarrays were analyzed using HER2 CISH. Criteria for assigning prognostic subgroups in this study are summarized in Supplementary Figure 7.

Other statistical analysis

Statistical analyses were performed using GraphPad Prism v6.0. The types of tests used are stated in Figure Legends. Univariate and multivariate Cox proportional hazards regression analyses were performed using MedCalc for Windows, version 12.7 (MedCalc Software, Ostend, Belgium).