Evaluation of multiple transcriptomic gene risk signatures in male breast cancer

Male breast cancer (BCa) is a rare disease accounting for less than 1% of all breast cancers and 1% of all cancers in males. The clinical management is largely extrapolated from female BCa. Several multigene assays are increasingly used to guide clinical treatment decisions in female BCa, however, there are limited data on the utility of these tests in male BCa. Here we present the gene expression results of 381 M0, ER+ve, HER2-ve male BCa patients enrolled in the Part 1 (retrospective analysis) of the International Male Breast Cancer Program. Using a custom NanoString™ panel comprised of the genes from the commercial risk tests Prosigna®, OncotypeDX®, and MammaPrint®, risk scores and intrinsic subtyping data were generated to recapitulate the commercial tests as described by us previously. We also examined the prognostic value of other risk scores such as the Genomic Grade Index (GGI), IHC4-mRNA and our prognostic 95-gene signature. In this sample set of male BCa, we demonstrated prognostic utility on univariate analysis. Across all signatures, patients whose samples were identified as low-risk experienced better outcomes than intermediate-risk, with those classed as high risk experiencing the poorest outcomes. As seen with female BCa, the concordance between tests was poor, with C-index values ranging from 40.3% to 78.2% and Kappa values ranging from 0.17 to 0.58. To our knowledge, this is the largest study of male breast cancers assayed to generate risk scores of the current commercial and academic risk tests demonstrating comparable clinical utility to female BCa.


INTRODUCTION
Male breast cancer (BCa) represents~1% of all newly diagnosed cancers in men 1 and~1% of all breast cancers 2 . Research into this rare disease has been limited, with treatment largely extrapolated from knowledge about female BCa 3 . Surgical management is usually modified radical mastectomy, with a minority of patients being offered breast conserving treatment 4 . Local and systemic treatment is largely informed by treatment indications and regimens used in female breast cancer. However, for adjuvant endocrine treatment, the use of aromatase inhibitors (AIs) alone is not recommended, with tamoxifen for at least 5 years indicated for ER/PgR positive tumors 3,5 . Where AIs are indicated, for example in metastatic male BCa, pituitary blockade with an LHRH agonist or orchiectomy is recommended 6 .
Genetic counselling is recommended for all men with BCa, regardless of family history, due to strong links between male BCa and BRCA2 mutations, seen in 10% of men with BCa 3,7-9 . Transcriptomic multiparametric assays are now integrated into clinical management guidelines for early female BCa 10 both as prognostic tools and to identify patients for adjuvant chemotherapy 11,12 . Most guidelines refer exclusively to female BCas with respect to the use of these multiparametric assays. Data relating to these tests in male BC are from retrospective series, most with small numbers of cases limited to evaluation of prognosis for single tests [13][14][15][16] ; however, we are not aware of any analyses which provide comparative data on multiple signatures with respect to patient outcome. We developed a method to compare signatures using a combined quantitative mRNA array covering key molecular signatures 17 , which have been trained against the results of the same signatures measured by the original methodology 18 . We describe here an analysis of male BCa samples from the EORTC cohort 19 using these "trained" signatures to compare the result of each test and to determine the association between test result and prognosis in the context of a multiinstitutional male BCa cohort.

RESULTS mRNA profiling
Out of the 1483 patients included in the parental study, 699 (47.1%) patients met the eligibility criteria of the research project; the main reasons for exclusion were missing tissue, event status, or event dates (see Supplementary Fig. 1). As previously reported 19 , no evidence for a selection bias due to missing data has been identified. From these, 389 samples had sufficient material for extraction and 381 samples yielded sufficient RNA.
All 381 samples assayed were successfully analyzed using the custom NanoString gene expression panel and passed the quality control ( Supplementary Fig. 1, Supplementary Table 1). Table 1 details the distribution of risk classification across tests, which markedly differs from one gene signature to another. The proportion of high risk patients ranged from 15.7 to 63.5% and for low risk patients the range was 9.4-53.5% (Table 1)

Survival analysis
Seventy-four patients experienced a locoregional recurrence or distant progression qualifying as events for TTR endpoint, of whom, 55 (74.3%) reported a distant progression as first event. Sixty-one patients experienced a distant recurrence qualifying as events for the TTDR endpoint and 38 patients died after a distant recurrence (BCSS endpoint). Seventy-four patients died in the absence of a distant progression and these deaths were considered as competing risks.
Also critical to this analysis, with respect to outcome, when competing risks of deaths not preceded by distant recurrence were accounted for, the cumulative incidence of specific BCarelated events were consistently lower in patients classified as low risk. The 5-year cumulative incidence of locoregional or distant recurrence in low risk patients ranged from 3 [2][3][4][5]. On the contrary, in multivariate-adjusted analyses for TTR and TTDR, the effect of gene-signatures was no longer significant ( Table 3).

DISCUSSION
The use of molecular prognostic assays in female BCa is now well established, but evidence relating to their performance in male BCa patients is sparse. In this study we show, using computational methods to recapitulate multiple BCa prognostic signatures, evidence for the prognostic impact of multiple gene signatures. However, we also show evidence of discordance between signatures, applied to the same case, similar to that seen in female BCa 20 . This study highlights the potential utility of molecular prognostic signatures in male breast cancer but suggests that more research is needed if we are to fully understand the potential value of different approaches to assessing prognosis and directing treatment, using molecular tools, in men with breast cancer.
Critical to our study is the close correlation between the computationally derived "signature trained" scores and true results as shown in our recent paper 18 . For ROR-PT results the correlation coefficient between "trained" and true assay results was 0.93, comparing true to "trained" results showed 90% of cases within the same risk category (low, intermediate, high-see ref. 18 ). Similarly for "Oncotype Dx-trained" results the correlation coefficient between true and "trained" results was 0.87 with 75% of results giving the same risk category (see ref. 18 ) and only 1% of cases disagreeing by more than 1 risk category. For MammaPrint -trained results, which were calculated only as categorical high versus low risk groups, over 90% of cases were classified in the same risk group by "trained" and true results 18    Using a common analysis platform and computational methods to recapitulate prognostic scores we found that all molecular signatures tested: Oncotype DX-trained 21,22 , Prosigna-ROR-PT 23,24 , MammaPrint 25-27 , Genomic Grade Index 28 , IHC4-mRNA based IHC4 29 , and our novel 95-gene signature 17 demonstrated the ability to segregate male BCas into high and low prognostic risk groupings. All signatures were associated with significant differences in 5-year survival for time to recurrence, time to distant relapse and breast cancer-specific survival, between low and high risk groups in univariate analyses (Figs. 1-2, Supplementary Figs. 2-5). However, due to the relatively small number of breast cancer-specific events, we were unable to demonstrate the statistically significant prognostic impact of the majority of signatures in multivariate analysis when adjusting for the following key clinico-pathological covariates:age, grade, nodal status and tumor size and treatment variables (adjuvant chemotherapy, radiotherapy, endocrine treatment). There have been few reports on the utility of prognostic signatures, developed using female BCas, when applied to male BCas and these have largely focused on the utility of Oncotype DX 13,15 . Previous studies showed an 81% 5-year BCa-specific survival for men with recurrence scores >31, slightly lower than the 86.3% 5-year BCSS observed for men with recurrence scores >25 shown in the current study, but commensurate with the different thresholds used 13,15 . In the study by Massarweh et al. 15 , 27.8% of men exhibited RS > 25 compared with 12.4% with scores >31. Given the modest number of events in both studies, we believe our results are broadly comparable to those reported by Massarweh et al. 15 . Results from a similar study by Wang et al. 14 show a higher all-cause mortality rate in all risk groups, but is limited by failure to exclude competing causes of death, which accounted for almost 50% of events in our current study. This high percentage may be due to the fact that male BC patients are older and have more co-morbidities than their female counterparts. We are unaware of studies reporting patient outcome in male BCa when stratified by tests other than Oncotype DX, making it more challenging to draw comparisons between studies using these molecular assays. With respect to the 50-gene signature driving molecular subtypes (Prosigna/PAM50), a study by Sanchez-Munoz et al. 16 , profiled 67 invasive male BCas using the NanoString panel identifying 60% of cases as Luminal B, 30% Luminal A and 10% HER2-enriched; which is consistent with our findings 18 , however, we are not aware of any studies of male BCas profiled reporting Prosigna risk scores.
As with prior comparisons in female BCa 20 , we demonstrate poor agreement between risk signatures in male BCa with kappa values ranging from 0.17 to 0.58 (Table 2). This modest agreement reinforces observations from larger cohorts of female BCas that different molecular risk scores based on limited mRNA panels may not capture all features related to risk in this population. This conclusion is supported by multiple analyses showing the added value of combining multiple risk signatures in female BCa 30 and our own recent data highlighting the modest AUCs associated with different molecular signatures 17 with respect to predicting outcome. Despite different methodologies used, AUCs of timedependent ROC curves at 5 years for male breast cancer cases fall within the same range of the AUCs reported in female patients 17 . This provides no indication that different cut points for risk would apply to male rather than female breast cancer, however, given the small sample size of the present study it is premature to exclude this possibility entirely. All signatures assessed would appear appropriate for use in male breast cancer patients.
There are several key limitations to our current research project. Firstly, we have used computational methods to calculate the  relevant risk signatures rather than the original assays as used in the clinical setting. This limitation is offset in part by the use of a training and validation approach to benchmark results for Oncotype DX, Prosigna, and MammaPrint results against true assay results 18 , but remains a limitation for other tests. Secondly, the analyses were conducted in a retrospective dataset in which not all data were systematically collected in all patients including, the cause of death is not reported for a substantial number of patients leading to a substantial proportion of competing risks. As a result, we have not presented overall survival (all causes) data since this would be confounded by the lack of data as to cause of death in many patients. Despite this study representing one of the largest cohorts of male breast cancers analyzed to date, the sample size and in particular the number of breast cancer related events, limit the statistical power of this analysis. In particular, we were not able to compare the impact of multiple tests performed in sequence due to a lack of statistical power nor were we able to assess the potential impact of tests on chemoprediction. Notwithstanding these limitations we are able to show the ability of a number of existing multiparametric tests (including Mamma-Print, Oncotype Dx, Prosigna ROR-PT, Genomic Grade Index and a novel 95-gene signature) to provide useful prognostic information in male breast cancer. These data provide evidence to support the utility of multiple prognostic assays in the context of male breast cancer. Further research to identify the optimal prognostic approach to male breast cancer, perhaps including genomic features such as mutations and copy number alterations, is warranted in addition to investigating the role of intratumoural heterogeneity.

Patients and samples
The retrospective cohort study of the EORTC/TBCRC/BIG/NABCG International Male Breast Cancer Program enrolled male patients with histologically proven BCa, diagnosed between 1990 and 2010, across multiple participating institutions 19 . Ethics approval was provided by the University of Toronto (#30035), a waiver of consent was approved since patient contact was not feasible due to death or loss to follow-up and the research involved no risk to patients whose identify was coded and confidentially protected. Patients with all disease stages (early, locally advanced, and metastatic) were included, irrespective of the treatment received.
Availability of a tissue sample (Formalin-Fixed-Paraffin-Embedded-FFPE) of good quality was mandatory for enrollment. Biological material was handled and analyzed centrally according to published guidelines for adoption across BCa clinical trials, conducted by BIG and NABCG, in 2008 31 . Patients in this research project were selected from the retrospective cohort study based on the following exclusion criteria: patients ineligible for the analysis of the parental retrospective cohort, with metastatic (M1/MX) disease, ER-ve per central pathology or local pathology (if central pathology not available), HER2+ve or unknown based on central pathology, insufficient information for assessment of recurrence free survival. In addition, samples with insufficient RNA or which failed the quality control criteria were excluded. All institutions participating in the retrospective cohort study obtained ethical approval from their institutions including consent waivers.

RNA extraction and expression profiling using NanoString
Profiling of all samples was performed using mRNA extracted and analyzed using the NanoString codeset as described previously 17 at the Ontario Institute for Cancer Research (OICR).
Derivation of signature-trained risk stratification scores from candidate assays Based on our study comparing two different approaches to the generation of simulated risk scores 18 we selected a training and validation approach based on results obtained from the OPTIMA prelim study 20 to best fit risk stratification scores generated for this study to those derived from the relevant commercial assay. For all tests we used the suffix "-trained" to discriminate the computationally derived assays scores from the commercially derived scores, e.g., Oncotype DX-trained vs.  18 . We modified the original cut point for "high risk" for the Oncotype DX test in line with reported results from the TAILORx trial 11,32 and our previous reported results from OPTIMA prelim 20 . For "Prosigna", results refer throughout to the ROR-PT risk score in clinical use, which includes tumor pathological size. For the Genomic Grade Index (GGI), the suffix "-like" refers to recapitulation of the risk score as previously described though not trained against a benchmark dataset. The IHC4-mRNA signature is similarly modelled to estimate risk by the transcriptomic expression of ER, PgR, Ki67, and HER2 originally based on the immunohistochemical signature described by Cuzick et al. 33 The 95-gene signature has been previously described by our group 18 .

Statistical analyses
Results from the expression profiling using NanoString were provided to EORTC to perform the statistical analysis of clinical data, long term outcomes, and local and central pathology data. Descriptive statistical analysis was performed for patient characteristics, disease characteristics, and treatment(s) administered. Cross-tabulation of risk classification (low, intermediate-where applicable, high) as defined by the different gene signatures were tabulated to assess concordance and agreement of classification across the different gene signatures. Concordance index and kappa agreement coefficients and their corresponding 95% confidence interval (CI) were estimated. When cross-tabulating gene signatures with different number of categories (i.e., three categories such as low, intermediate, high versus two categories such as high, low), the intermediate category was combined with the low category and Cohen's simple kappa was estimated while for ternary versus ternary comparisons, the weighted kappa was used.
The prognostic value of the gene signatures was assessed for the following endpoints: time to distant relapse (TTDR) defined as the time until the first distant progression, time to relapse (TTR) defined as the time until the first loco-regional recurrence or distant progression, breast cancerspecific survival (BCSS) defined as the time until breast cancer related death, considering death preceded by a distant relapse. For these endpoints, deaths in the absence of distant relapse are considered as competing risk. The endpoints were calculated from the time of first diagnosis of BCa. Patients without an event for the above endpoints were censored at the last date known alive.
The event rates at 5 years and corresponding 95% confidence intervals were estimated by the cumulative incidence method. Cumulative incidence functions between the risk groups were compared based on the Gray test at a significance level of 0.05. Fine and Gray models were used to estimate the univariate and adjusted hazard ratio (HR) and their corresponding 95%CI. The multivariate models were adjusted for known prognostic clinico-pathological variables (age, grade, nodal status and tumor size) and treatment variables (adjuvant chemotherapy, radiotherapy, endocrine treatment) and the multivariate p-value was estimated with the use of a Wald test. Due to the low number of events for BCSS, only univariate analyses were conducted for this endpoint. The proportional hazard assumption was checked graphically using a plot of the log cumulative hazard. The analyses were not adjusted for multiple testing.
The ability of the gene signatures to predict clinical outcome at 5 years was assessed by time-dependent receiver operating characteristic curves (ROC) and the corresponding area under the curve (AUC).The underlying method of ROC curves has been extended to the setting of censored observations and presence of competing risks 34 . Time-dependent ROC curves at 5 years were plotted and the corresponding AUCs estimated for each endpoint (Time to relapse, Time to distant relapse, Breast Cancerspecific survival) and for each gene signature to the exception of MammaPrint. As described previously 18 when training the algorithm for MammaPrint, only dichotomized risk categories were available preventing any AUC analysis with this signature. Cases were patients that experienced the event of interest in the first five years of follow-up, while controls were defined as patients that were either event-free at 5 years, or experienced a competing event in the first 5 years of follow-up.
Analyses were performed with SAS software, version 9.4 (SAS Institute) and the time-dependent ROC curves were plotted in R, version 4.0.0, with the timeROC package.