The CINSARC signature predicts the clinical outcome in patients with Luminal B breast cancer

CINSARC, a multigene expression signature originally developed in sarcomas, was shown to have prognostic impact in various cancers. We tested the prognostic value for disease-free survival (DFS) of CINSARC in a series of 6035 early-stage invasive primary breast cancers. CINSARC had independent prognostic value in the Luminal B subtype and not in the other subtypes. In Luminal B patients receiving adjuvant endocrine therapy but no chemotherapy, CINSARC identified patients with different 5-year DFS (90% [95%CI 86–95] in low-risk vs. 79% [95%CI 75–84] in high-risk, p = 1.04E−02). Luminal B CINSARC high-risk tumors were predicted to be less sensitive to endocrine therapy and CDK4/6 inhibitors, but more vulnerable to homologous recombination targeting and immunotherapy. We concluded that CINSARC adds prognostic information to that of clinicopathological features in Luminal B breast cancers, which might improve patients’ stratification and better orient adjuvant treatment. Moreover, it identifies potential therapeutic avenues in this aggressive molecular subtype.


INTRODUCTION
During the last decades, significant progresses have been achieved in early breast cancer management, most notably through the routine use of post-operative systemic treatment including adjuvant cytotoxic chemotherapy and endocrine therapy 1,2 . Yet, the benefits conferred by these treatments are not uniformly distributed across the various molecular subtypes of disease described from gene expression profiling 3 . Thus, only endocrine receptor (ER)-positive breast cancers benefit from endocrine therapy, whereas cytotoxic chemotherapy, without and with anti-HER2 agents, has maximum efficacy in triple-negative and HER2-positive subtypes, respectively. In ER-positive/HER2-negative breast cancer, the so-called luminal-like breast cancer, only a minor subset of patients, with either a large tumor burden or a highly proliferative and aggressive biology, derive an actual benefit from chemotherapy. Accordingly, various prognostic signatures have been established and made commercially available to help identify these patients, and are now increasingly used in the clinic 4,5 . These signatures distinguish patients with low-, intermediate-, and high-risk of unfavorable outcome, the latter being recommended for adjuvant chemotherapy. Nevertheless, in those patients with molecularly-defined high-risk disease, the level of therapeutic discrepancy remains significant, because some patients receiving adjuvant chemotherapy will relapse and die, while a relatively high number of those high-risk patients could still achieve cure with endocrine therapy alone. Thus, alternative or additional molecular predictors are needed in this population.
The CINSARC (Complexity INdex in SARComas) signature was originally elaborated as a predictor of clinical outcome in soft tissue sarcomas with complex genetics and was subsequently demonstrated to have prognostic impact in different tumor types, including breast cancer 6,7 . CINSARC classifies the tumor samples into high-risk or low-risk of relapse. It includes genes implicated in mitosis and maintenance of chromosomes integrity, the deregulation of which may result in elevated genomic instability. Moreover, aberrant expression of CINSARC proteins was also suggested to favor higher migration and invasion abilities 8,9 . All of these features are associated with increased tumor aggressiveness and may explain the potential of this signature to prognosticate the recurrence of cancer across multiple malignancies.
Regarding the prognostic value of CINSARC in breast cancer, it is necessary to examine how it compares with the classical clinicopathological prognostic features in multivariate analysis. Thus, to further examine the potential prognostic value of CINSARC in breast cancer, we examined a set of 6035 early-stage, invasive primary breast cancers with publicly available gene expression and clinicopathological annotations including survival. We found that CINSARC had independent prognostic value in the Luminal B subtype but not in the other subtypes, notably in patients treated with adjuvant endocrine therapy without chemotherapy, thus identifying a subset of luminal B breast cancer in which therapeutic de-escalation might be possible. In addition, we identified in CINSARC high-risk patients an enrichment in gene signatures associated with response to PARP inhibitors and immunotherapy, thus providing potential clues to treat these poor-prognosis patients.

RESULTS
The prognostic value of CINSARC in breast cancer is not independent We analyzed our database of 8930 patients with early breast cancer, including 6,035 treated with primary surgery and with available DFS data (Supplementary Table 1). With a median follow-up of 77 months (range 1-382), 1759 experienced a DFS event, while 4276 remained disease-free for a 5-year DFS of 75% (95%CI, 74-76) (Fig. 1a). Applying CINSARC to this population identified 2945 CINSARC-high risk (49%) and 3090 CINSARC-low risk (51%) breast cancer patients (Fig. 1b), with significantly different 5-year DFS (67% vs. 83%, respectively; p < 2E−16, log-rank test). In univariate analysis (Table 1) (Table 3). Other clinicopathological features independently associated with shorter DFS included pathological lymph node involvement, tumor size, and type. Such prognostic complementarity between the clinicopathological variables and CINSARC was tested using the likelihood ratio (LR) test: CINSARC added prognostic information to that provided by the combination of clinicopathological variables (ΔLR-X 2 = 6.53, p = 1.06E−02). Because a major aim of using a prognostic signature in Luminal breast cancer is therapeutic de-escalation, we assessed the prognostic value of CINSARC in the 554 Luminal B patients treated with adjuvant endocrine therapy only, without adjuvant chemotherapy. As shown in Fig. 2b, CINSARC identified 222 lowrisk Luminal B patients with 90% 5-year DFS (95%CI 86-95), significantly better than the CINSARC high-risk patients (79% 5year DFS [95%CI, 75-84]; p = 1.04E−02, log-rank test). Of note in this population also, CINSARC had independent prognostic value in multivariate analysis (HR = 1.62 [95%CI 1.11-2.37], p = 1.16E −02, Wald test), together with pathological lymph node involvement and tumor size (Table 3), and added independent prognostic information to these clinicopathological features (ΔLR-X 2 = 6.71, p = 9.58E−03). We built a prognostic clinicogenomic model based on these three variables in a randomly defined learning set of 247 samples and tested its prognostic value in the validation set of 247 remaining samples: as shown in Fig. 3, the model was robust and identified an even lower risk subgroup with 5-year DFS of 93% (95%CI [89-97]).
We had previously shown the prognostic complementarity and independence for DFS of commercial prognostic proliferationbased signatures (70_gene, Recurrence Score, ROR-P) and the ICR immune signature 10 . Thus we tested such independence between CINSARC and immune signatures including the Palmer's metagenes (B-cells, T-cells, and CD8 T-cells) 11 , the Rooney' cytolytic activity score 12 , and the three signatures predictive for response to immune therapy (ICR, TIS, and TLS). Multivariate analysis (Table 4) showed that in each case, CINSARC remained significant as well as each immune signature, suggesting independent prognostic value.

CINSARC classes and therapeutic vulnerability in the Luminal B subtype
Beyond prognostication, multigene signatures may also help identify therapeutic targets that might improve survival of patients with high risk of recurrence. Thus, in the whole population of 2028 Luminal B patients of our database, we wondered whether the two CINSARC classes displayed different probabilities of response to specific systemic therapies routinely used or under development in breast cancer (Table 5).
Regarding chemotherapy, 94 CINSARC high-risk and 56 CINSARC low-risk cases were informative about achievement or not of a pathological complete response (pCR) after anthracycline/taxane-based neoadjuvant chemotherapy. High-risk patients had a numerical but non-statistically significant increase in pCR rate (20%) as compared to low-risk patients (12%, p = 0.270, Fisher exact test). Close percentages were observed when considering the probability of pCR as defined using an expression signature of pathological response to neoadjuvant chemotherapy in breast cancer 13 : 25% of high-risk patients were predicted with pCR versus 17% of low-risk patients, and the difference was significant (p = 1.43E−04). By contrast, CINSARC high-risk patients were associated with a lower probability of sensitivity to hormone therapy (88%) according to the E2F4activation signature 14 , as compared to CINSARC low-risk patients (57%; p = 3.73E−57). Altogether, these results suggested that CINSARC high-risk patients might be more sensitive to chemotherapy and less sensitive to hormone therapy than low-risk patients.
We also examined the potential vulnerabilities of CINSARC classes to targeted therapies, using predictive gene signatures. We observed higher RBsig 15 and E2F regulon 16

Biological correlates of CINSARC classes in the Luminal B subtype
To further elucidate the biological differences between the two CINSARC classes and identify potential therapeutic targets, we compared their whole-exome mutational, whole-genome copy number and transcriptional, and proteomic (RPPA) profiles by using the TCGA dataset, which included 297 Luminal B cases. No genomic alteration was differentially mutated, deleted or amplified between the two classes (Supplementary Tables 4-6). A total of 510 genes were differentially expressed between the two classes ( Supplementary Fig. 1, Supplementary Table 7). The robustness of this gene list was confirmed in the METABRIC independent validation set, and ontology analysis revealed a large preponderance of mitotic processes, including mitotic spindle assembly and chromosomal segregation, and DNA repair among the genes upregulated in the high-risk class (Supplementary Table  8). Proteomic analysis using RPPA results identified 16 proteins with differential expression between the two CINSARC classes ( Table 6, Supplementary Table 9), including proteins involved in the cell cycle (cyclin B1, p27 kip1 , cyclin E2), cell proliferation (FOXM1 and its 14-3-3_zeta regulator 22 , ASNS 23 ), DNA repair (KU80, RAD50, ERCC5, MSH6), AKT/mTOR pathway (4E-BP1, p70S6K), and epigenetic regulator (GCN5L2).

DISCUSSION
By examining the prognostic value of CINSARC signature in a large population of early breast cancers, we found that CINSARC was independently associated with survival outcome in the Luminal B subtype. In this subtype, CINSARC also identified potential vulnerabilities to specific therapeutics, including innovative classes of compounds that have been recently approved in HER2negative breast cancer, as well as biological features that could be exploited as future therapeutic targets. These results may Log-rank, p=3.67E-08      6,24 , and is currently prospectively tested to guide treatment in these tumors. CINSARC was also demonstrated to have prognostic value in various other tumor types and was proposed as a universal prognostic biomarker 7 . Based on a multivariate analysis involving several hundreds of clinically and biologically annotated breast cancers, our results demonstrate that CINSARC is not independently associated with survival in this disease and that its prognostic importance is dependent on the molecular subtypes. Thus, it is likely that in ERBB2-positive and basal-like breast cancers, other drivers than CINSARC genes are prominently leading the metastatic process, while in Luminal A breast cancers estrogen receptor signaling plays a major role. In Luminal B, the main biological processes that are captured by CINSARC, such as mitosis and chromosomal instability, may be of particular interest to predict clinical outcomes. And multivariate analyses showed that such prognostic value was also independent from that of immune signatures, clearly suggesting that mitosis and chromosomal instability and immune response provide complementary prognostic information. Of course because of a few limitations inherent to retrospective studies and associated biases), further validation in larger and prospective studies is warranted.
Second, while Luminal B breast cancer is thought to be an aggressive subtype and thus is almost always candidate to adjuvant chemotherapy, CINSARC also allowed identifying a population of patients with favorable outcome while only receiving adjuvant endocrine treatment without any adjuvant chemotherapy. In addition, combining CINSARC with clinical features, such as tumor size and lymph node status, identified a low-risk class of patients with a 93% probability of being disease-free at 5 years. All current prognostic signatures in breast cancer aim to separate low-risk patients, in which adjuvant chemotherapy may be safely spared and endocrine therapy alone may guarantee a high level of cure, from high-risk patients, in which endocrine treatment is not enough and adjuvant chemotherapy should be added. Yet, in the latter subgroup, 60-70% of patients would still be cured by endocrine treatment alone, which represents a high level of residual therapeutic inadequacy. Thus, CINSARC could be helpful in detecting those patients with "low-risk" Luminal B subtype in which the benefit of adjuvant chemotherapy remains questionable and might be replaced by alternative less toxic approaches.
Third, CINSARC also revealed potential therapeutic vulnerabilities in Luminal B breast cancers that may impact the future management of this hard-to-treat subtype. We found that CINSARC high-risk tumors were predicted to be more sensitive to chemotherapy but more resistant to endocrine therapy. Importantly, these high-risk tumors were associated with RB1 inactivation, indicating a higher probability of resistance to CDK4/6 inhibitors 25 , a therapeutic class improving survival in ER/PR-positive/HER2-negative advanced breast cancers and currently under investigation in the adjuvant setting [26][27][28][29][30][31][32] . Therefore, a low-risk CINSARC signature could identify Luminal B breast cancers with both relatively favorable outcome and relative resistance to chemotherapy, but with sensitivity to endocrine therapy and CDK4/6 inhibitors, making this combination an attractive alternative to evaluate in this population. Moreover, in accordance with its tight biological relationship with chromosomal instability and rearrangements, we found that CINSARC signature predicted higher sensitivity to both DNA repair-and immune-targeting therapeutics. Thus, high-risk CINSARC tumors were found to display more frequently a high HRD score (in nearly 20% of patients). Although PARP inhibitors were only approved in HER2-negative advanced breast cancer with germline BRCA1/2 mutation (gBRCAm) 33,34 , including half of patients displaying ER-positive tumors, clinical trials are now underway to evaluate these compounds in other genetic contexts. Thus, CINSARC might contribute to better identify these tumors displaying gBRCA wild-type but HRD features that may also prove to be sensitive to PARP inhibitors and other DNA repair targeting therapeutics. High-risk patients were also predicted to be more sensitive to immunotherapy. Essentially developed in triplenegative breast cancer, with promising results in both advanced and early settings 35,36 , recent data indicate that immune checkpoint inhibitors might be also active in ER-positive breast cancer 37 , thus, CINSARC could be useful to identify those Luminal B patients who could be candidate to PD1/PD-L1 targeting agents.
Finally, our study also allowed describing biological features associated with CINSARC in Luminal B breast cancer and thus proposing new therapeutic avenues in the field. As expected, genes and proteins associated with high-risk signature were involved in mitotic processes, chromosomal segregation, cell cycle and proliferation, as well as DNA repair. Interestingly, cyclin E2 protein was found to be up-regulated in high-risk tumors. Both cyclin E2 and cyclin E1 are able to complex with CDK2 through G1-to-S-phases, allowing RB1 phosphorylation and thus cell cycle progression, and both were shown to promote resistance to endocrine treatment 38 and CDK4/6 inhibitors 13,39 . Importantly, high cyclin E2 expression may predict activity of CDK2-targeted approaches that are in development, either as specific CDK2 inhibitors or pan-CDK inhibitors that include CDK2 in their spectrum of activity 40 . Other potentially actionable proteins upregulated in CINSARC high-risk tumors include 4E-BP1 and p70S6K, which are downstream effectors of mTOR and AKT pathways, respectively. While mTOR inhibitor everolimus has been registered in endocrine treatment-resistant advanced breast cancer and is under investigation in high-risk early breast cancer 41 , several AKT inhibitors are currently evaluated in advanced breast cancer, including endocrine treatment-resistant luminal disease 42 . Ultimately, CINSARC high-risk tumors may represent a favorable subpopulation to investigate those compounds in the early setting. Of note, histone acetyl transferase GCN5L2, which was shown to regulate TGFβ signaling pathway and induce expression of epithelial-mesenchymal transition 43 , was also upregulated in high-risk tumors and may indicate a potential for epigenetic treatment in this subtype.
In conclusion, we found that CINSARC, a multigene signature initially developed in sarcomas, has an independent prognostic value in breast cancer restricted to the Luminal B subtype. CINSARC may not only identify a subgroup of tumors with relatively favorable outcome, which may not require adjuvant chemotherapy, but also suggests clues to better select patients with a higher probability of benefit from therapeutics under investigation in early breast cancer, such as cell cycle inhibitors, DNA repair targeting agents, immune checkpoint inhibitors, AKT/ mTOR inhibitors, and epigenetic regulating agents.

Breast cancer samples and molecular profiling
We analyzed our breast cancer gene expression database 10 pooled from 36 public datasets (Supplementary Table 10 Supplementary Table 10. Our study is based upon public data from published studies in which ethics approval and informed consent to participate were already obtained by authors. This study was approved by our institutional review board (Comité d'Orientation Stratégique, COS). Gene expression profiles had been generated using DNA microarrays and RNA-Seq, and collected from the National Center for Biotechnology Information (NCBI)/Genbank GEO and ArrayExpress databases, and authors' website. The final pooled data set contained 8930 nonredundant non-metastatic, non-inflammatory, primary, invasive breast cancers. Before analysis, data were processed as previously described 10 . Briefly, the pre-analytic processing first included normalization of each data A. Goncalves et al. set separately, and was done by Robust Multi-Array (RMA) with the oligo R package (version 1.46.0) for Affymetrix data and by quantile normalization with the limma R package (version 3.38.3) for other microarray platforms. When multiple probes mapped to the same GeneID, we retained the one with the highest variance in each data set. We log2-transformed the already normalized TCGA RNAseq data. We also collected DNA and proteomic processed data from TCGA (whole-exome sequencing (WES), array-CGH and HRD score, and RPPA) and METABRIC (targeted-NGS, array-CGH).

Analysis of molecular profiles
To avoid biases related to trans-institutional immunohistochemical analyses and thanks to the bimodal distribution of respective mRNA expression levels, the ER, progesterone receptor (PR), and HER2 statutes (negative/positive) were defined on transcriptional data of ESR1, PGR, and HER2 respectively, as previously described 44 . In addition to the CINSARC signature 6 , we applied to each dataset separately several multigene signatures: PAM50 5 allowing to define the Luminal A, Luminal B, ERBB2enriched, Basal, and Normal subtypes, immune signatures including the Palmer's B-cell, T-cell, and CD8+ T-cell signatures 11 , and the Rooney' cytolytic activity score 12 , 107-gene signature predictive for pathological response to anthracycline-based neoadjuvant chemotherapy in breast cancer 13 , E2F4-activation signature predictive for response to hormone therapy in breast cancer 14 , Rbsig 15 and E2F regulon 16 signatures predictive for resistance to CDK4/6 inhibitors on breast cancer pre-clinical models 14 and clinical samples of PALOMA-3 trial 16 , and immune signatures predictive for response to immune checkpoint inhibitors: ICR (Immune Constant of Rejection) 19 and TIS (T cell-inflamed signature) 20 signatures and a TLS (tertiary lymphoid structures) signature 21 .
We also compared the molecular profiles of CINSARC high-risk versus low-risk Luminal B samples by applying supervised analyses to TCGA and METABRIC data sets at different levels: WES mutational, copy number alterations (CNA), and RPPA data using logistic regression with significance thresholds of p ≤ 0.05 and q ≤ 0.10, and transcriptional data using moderated t-test with significance thresholds of fold-change |FC | > 1.5, p ≤ 0.05 and q ≤ 0.10. This later used the TCGA set as learning set and the METABRIC set as independent validation set. Ontology analysis of the resulting gene list was based on the GO biological processes of the Database for Annotation, Visualization and Integrated Discovery (DAVID; david.abcc.ncifcrf.gov/).

Statistical analysis
Correlations between tumor classes and clinicopathological variables were analyzed using the one-way analysis of variance (ANOVA) or the Fisher's exact test when appropriate. Disease-free survival (DFS) was calculated from the date of diagnosis until the date of disease recurrence or death from any cause. Follow-up was measured from the date of diagnosis to the date of last news for event-free patients. Survivals were calculated using the Kaplan-Meier method and curves were compared with the log-rank test. Uni-and multivariate prognostic analyses were done using Cox regression analysis (Wald test). The variables submitted to univariate analyses included patients' age at diagnosis (≤50 years vs > 50), pathological type (lobular vs ductal vs other), pathological axillary lymph node status (pN: negative vs positive), pathological tumor size (pT1 vs pT2 vs pT3), pathological grade (1 vs 2 vs 3), PAM50-derived molecular subtypes (Luminal A vs Luminal B vs Normal vs Basal vs ERBB2-enriched), delivery of adjuvant chemotherapy (CT), delivery of adjuvant hormone therapy (HT), and CINSARC-based classifications. The likelihood ratio (LR) tests were used to assess the prognostic information provided beyond that of a clinical model, assuming a X 2 distribution. Changes in the LR values (LR-ΔX 2 ) measured quantitatively the relative amount of information of one model compared with another. All statistical tests were two-sided at the 5% level of significance. In the case of multiple testing, the p-values were replaced by the corrected q-values. Statistical analysis was done using the survival package (version 2.30) in the R software (version 2.9.1; http://www.cran.r-project.org/). We followed the reporting REcommendations for tumor MARKer prognostic studies (REMARK criteria) 45 .

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The data generated and analyzed during this study are described in the following data record: https://doi.org/10.6084/m9.figshare.14350871 46 . All data sets of primary breast cancer were downloaded from the Gene Expression Omnibus (GEO, https:// www.ncbi.nlm.nih.gov/geo/), ArrayExpress (https://www.ebi.ac.uk/arrayexpress/), Genomic Data Commons (GDC, https://portal.gdc.cancer.gov/) and cBioPortal (https://www.cbioportal.org/) databases. All accession IDs are provided in Supplementary Table 10 (Table S10 revised.xlsx), which is included with the data record. The data underlying the figures and tables are contained in the files 'Goncalves_suppor-ting_data.xlsx' and ' Table S8.xlsx', which are included with the data record. A detailed list of the data underlying each figure and table is also available in the file 'Goncalves_2021_underlying_data_list.xlsx', which is included with the data record.

CODE AVAILABILITY
Normalization of public data sets were done by Robust Multi-Array (RMA) with the oligo R package (version 1.46.0) for Affymetrix data and by quantile normalization with the limma R package (version 3.38.3) for other microarray platforms. Supervised analysis was done using a moderated t-test with empirical Bayes statistic included in the limma R package (version 3.38.3). For correction of the multiple-testing hypothesis, False Discovery Rate (FDR) was assessed using qvalue R package (version 2.14.1) (Storey et al., Annals of Statistics, 2003). Several multigene signatures were applied to each dataset separately: CINSARC 6 , PAM50 5 , and 107-gene predictive signatures 13 , who were based on nearest-centroid classification using genes, data and distance method described in each respective study. Also were applied Rbsig 15 , E2F regulon 16 , ICR 19 , TIS 20 , TLS 21 , Palmer's immune modules (B-cells, T-cells, and CD8 T-cells) 11 , and the Rooney' cytolytic activity score 12 signatures who were based on a Z-score metagene using gene list described in each respective study. Statistics analysis was done with the stats R package (version 3.5.2) and the survival R package (version 3.1-12) for survival analysis.