Clinical implications of intrinsic molecular subtypes of breast cancer for sentinel node status

Axillary lymph node status is an important prognostic factor for breast cancer patients and sentinel lymph node biopsy (SLNB) is a less invasive surgical proxy. We examined if consecutively derived molecular subtypes from primary breast cancers provide additional predictive value for SLNB status. 1556 patients with a breast cancer > 10 mm underwent primary surgical procedure including SLNB and tumor specimens were assigned with a transcriptomics-based molecular subtype. 1020 patients had a negative sentinel node (SN) and 536 a positive. A significant association between tumor size and SN status (p < 0.0001) was found across all samples, but no association between size and SN status (p = 0.14) was found for BasL tumors. A BasL subtype was a predictor of an SN-negative status (p = 0.001, OR 0.58, 95% CI 0.38;0.90) and among the BasL, postmenopausal status was a predictor for SN-negative status (p = 0.01). Overall survival was significantly lower (p = 0.02) in patients with BasL tumors and a positive SN. Interestingly, we identified a significant correlation between hormone receptor activity and SN status within the BasL subtype. Taken together, molecular subtypes and hormone receptor activity of breast cancers add predictive value for SLNB status.

Axillary lymph node (ALN) status is a major prognostic factor in breast cancer and is accurately predicted by sentinel lymph node biopsy (SLNB) 1,2 . Compared to an axillary lymph node dissection (ALND), SLNB is associated with fewer late effects, but nevertheless has undesirable morbidity 3 . Thus, following more than a decade of omitting ALND if SLNB is negative, the search has been intensified for non-invasive prognosticators that might replace SLNB 4 . A predictive value of SLNB/ALN status has previously been explored for age at time of diagnosis, tumor size, multifocal tumor, lymphovascular invasion, tumor grade, and HER2-, progesterone-, and ER-receptor statuses [5][6][7] . These studies were generally aimed at developing algorithms that may predict ALN status and ultimately allow specific patient subgroups to completely avoid axillary surgery. The study by Reyal et al., which used surrogate markers (ER-and HER2-receptor statuses) for molecular subtype assignment of a large series of > 2500 early-stage breast cancer patients treated with conservative surgery including an SLNB, found both ER-positivity and the interaction covariate between ER-and HER2-statuses to be significant predictors of a positive sentinel node (SN) 8 . The study also developed a nomogram model for predicting ALN status; the algorithm was validated in an independent breast cancer cohort showing similar accuracy in predicting positive SLNB 9 . A recent study proposed a new algorithm to predict ALN status based on a selection of regression variables and confirmed age at time of diagnosis, tumor size, lymphovascular invasion, and molecular subtypes defined by surrogate markers as predictive factors 10 .
Intrinsic molecular subtypes have emerged as promising predictors of breast cancer recurrence; each subtype is a distinct biological entity and associated with specific prognostic and therapeutic features 11,12 . The pivotal studies proposed five original subclasses: Luminal A, luminal B, normal-like, HER2-enriched, and basal-like. Four of the subclasses can be distinguished by a 50-gene molecular classifier (PAM50) which has also been developed as a commercial FDA approved platform 13 . A more recent taxonomy by Guedj  www.nature.com/scientificreports/ molecular signatures by applying integrative transcriptomics analysis, leading to six stable molecular subtypes. Four of these, termed LumA, LumB, LumC, and NormL, are all ER-, progesterone (PR)-, and androgen-receptor (AR)-positive, while the fifth subtype, mApo, is AR-positive but ER-and PR-negative. The last subtype, BasL, is characterized by its lack of receptor expression (ER-, PR-, and AR-negative). The BasL subtype often comprises tumors that are so-called triple-negative, but these terms are not interchangeable 14 . In the six-subtype scheme, the ERBB2-amplified tumors are split between the mApo and the highly proliferative LumC subgroup 15,16 . Additionally, while the subtype names in the original five-subtype and the six-subtype schemes are similar, the underlying definition of each subtype is not identical (i.e. Luminal A of the five-subtype scheme is not identical to LumA of the six-subtype scheme).
To enable molecular insights, transcriptomics analysis has in recent years become a part of the standard of care for breast cancer patients in several clinical settings. In our regional hospital, all primary breast cancers are assigned with a molecular subtype from the scheme by Guedj et al. (named: CITBCMST) as part of our standard of care diagnostic pipeline 17 . This set-up provides a platform to test whether consecutive assignments of molecular subtypes may have additional predictive value for SLNB status.
One of the earliest studies assessing the predictive value of surrogate molecular subtypes showed that basallike tumors were more likely to be ALN-negative 18 . Supportingly, a larger study based on > 4000 early stage breast cancers and surrogate molecular subtypes found the triple-negative subtype as a predictor of negative ALN status 19 . Conversely, triple-positive tumors have been shown to be predictive for a positive ALN status 20 . Later studies have confirmed the triple-negative subtype to be a predictor for negative ALN status 21,22 . However, a study by Lu et al. did not report any improvements in the predictive models for ALN status by applying the gene expression profiles from 129 tumor specimens 23 . Nonetheless, intrinsic subtypes should be recognized by expression signatures consisting of a minimum of 50 transcripts 13,24,25 . Thus, we hypothesized that transcriptomics-derived molecular subtypes based on consecutive primary breast cancer samples would be an optimized setting to predict SLNB status and subsequently permit specific patient subgroups to avoid axillary surgery. www.nature.com/scientificreports/ For a general overview of the 1556 samples, a principal component analysis (PCA) was generated to illustrate the assignments of subtypes ( Fig. 2A) and the distribution of SN status (Fig. 2B). The figure clearly depicts the difference of the BasL subtype compared to the luminal subtypes.

Results
When considering the two subtypes with ERBB2-amplified tumors (mApo and LumC) collectively, 36% of tumors were SN-positive. The same fraction of SN-positivity was observed in the combined set of ERBB2-normal luminal tumors (NormL, LumA and LumB). In contrast, merely 21% of the tumors with a BasL subtype had a positive SN status.
In terms of systemic treatment of the included patients, we observed that for the NormL and LumA subtypes, almost twice as many patients within the SN-positive fraction were treated with both endocrine-and chemotherapy (41 and 33%, respectively) relative to the SN-negative patients (19 and 18%, respectively), Table 1.
Subtypes, clinical characteristics, and SN status. Considering the entire cohort, neither the patients' menopausal status nor malignancy grade were associated to SN metastases (p = 0.11 and p = 0.12, respectively), whereas a clear association between tumor size and SN status (p < 0.0001) was found (Table 2). However, a tumor size of > 20 mm was only significantly associated to positive SN status among the samples assigned with luminal subtypes (NormL, LumA, LumB and LumC), and no significant association was identified for the samples assigned with mApo and BasL subtypes (data not shown). Since the BasL subtype was markedly different in the distribution of SN status compared to the other subtypes, heterogeneity between BasL vs. all other subtypes was investigated for association between SN status and menopausal status, malignancy grade and tumor size ( Table 3). No statistically significant interaction was found, but we observed a trend (p = 0.054) for menopausal status, with a marked difference in the proportion of SN-positive patients among postmenopausal patients.
Estrogen receptor activity and SN status. Subsequently, we sought to investigate if the level of ERprotein was associated with SN status in the BasL assigned patients and found that IHC-based ER status and SN status were not correlated, p = 0.36 (data not shown). Conversely, a comparison of the microarray-based ESR1 expression vs. SN status did show a significantly higher ESR1 level in the SN-positive patients (p = 0.01). This result lead us to conduct a GSEA using gene sets related to ER and PR expression which are upregulated in ER + vs. ER-human breast cancer samples (see Materials and Methods for gene signatures). We found a significant (false discovery rate-adjusted p-value < 0.05) upregulation for all five tested gene sets when comparing SN-positive BasL samples to SN-negative BasL samples (Fig. 3). Based on these results, we conclude that there is a correlation between hormone receptor activity and SN status within the BasL subtype.

Survival analysis.
Finally, we assessed the overall survival and disease-free survival of the patient cohort according to the assigned molecular subtypes and found a distributions of molecular subtypes in agreement with the overall consensus; LumA and NormL showed a favorable prognosis, both for OS and DFS, followed by LumB and LumC, and finally mApo and BasL showed the worst prognosis ( Fig. 4, panels A and B).
Subsequently, we examined the OS and DFS in the group of patients with BasL tumors according to SN status and found a significantly (p = 0.02) lower OS of patients with a positive SN vs. a negative SN (Fig. 5A, B). Evaluating the OS and DFS in the group of patients with LumA tumors according to SN status showed no significant difference between patients with a positive SN vs. a negative SN (Fig. 5C, D).

Discussion
Our study confirmed a solid association between tumor size and SN status. However, this was only the case among the samples assigned with luminal subtypes (NormL, LumA, LumB and LumC). These findings are largely in agreement with the studies relying solely on surrogate markers 8,22 . In contrast, and to the best of our knowledge, it has not previously been shown that prediction of SN status is independent of tumor size in the BasL and mApo subtypes, when determined by multi-gene transcript-based signatures. Thus, applying a taxonomy that identifies a broader range of distinct subtypes may enable a more precise prediction of SN status.
The LumC and mApo subtypes are characterized by including the ERBB2-amplified tumors among other distinct molecular features like high proliferation (LumC) and overexpression of androgen-receptor (mApo), thus we speculated that an association to positive SN status was plausible. However, our results showed no evidence of subtypes including ERBB2-amplified tumors having a greater risk of a positive SN. This is not entirely in agreement with the previous study by Reyal et al. who found HER2-positive status in combination with ERpositive status, to be a significant predictor of a positive SN 8 . The discrepancies could be due to the differences in cohorts, since Reyal et al. included all patients with ALN metastasis, as well as the comparison of two different entities; single marker sample annotations as opposed to annotations by intrinsic subtypes derived from transcriptome-based signatures.
Interestingly, although it is well-established that BasL subtype has the poorest prognosis, BasL tumors in our cohort were not associated with SN-positive status. This agrees with reports where triple-negative receptor profiles have functioned as surrogate marker for BasL subtype. However, since this lack of association of BasL subtypes with positive SN status is rather peculiar, we looked deeper into the clinical behavior of patients with a BasL subtype and found a significant association if we segregated the BasL cohort into pre-and postmenopausal www.nature.com/scientificreports/    26 highlighted the differences among tumors assigned with a BasL subtype by comparing ER +/HER2-and ER-tumors and showed that ER +/HER2-BasL patients had a more similar clinical outcome to ER +/HER2-Luminal patients than to the ER-BasL. This is in line with the essence of our GSEA-results, which indicate that BasL-samples with an enrichment of estrogen-pathway signatures are overrepresented among the SN-positive patients. In other words, these samples are perhaps more like the luminal ER-positive subtypes in behavior, also regarding SN status. It has recently been shown that negative predictive factors for OS in BRCA1/2 positive carriers included lymph nodes metastases 27 . We were not able to confirm these results since our cohort of BRCA1/2 positive carriers merely comprises 29 patients, so statistical analysis is connected to great uncertainties. A limitation of our study is that the number BRCA1/2 positive patients is to sparse to draw conclusions on possible association with SN status. We found a clear association between OS and SN status in patients with a BasL tumor; SN-positive patients had a worse prognosis. Low risk of lymphatic spread in patients with triple-negative breast cancer has been shown before 2,4 . Several authors have suggested that axillary node status is losing its significance as a prognostic marker with the increasing use of molecular subtypes, especially in patients with triple-negative breast cancer, and axillary staging might not be mandatory 28 . Irrespective of node status the vast majority of patients with a BasL breast cancer will be recommended (neo)adjuvant chemotherapy and may not obtain a beneficial impact from postmastectomy radiotherapy if node positive 29 . However, the results of SUPREMO must be awaited before this can be further clarified 30 . Despite the low risk of lymphatic spread in patients with BasL subtypes, lymph node metastases in these patients may still, according to our results, be of prognostic significance.
The major limitation of our study is the time of follow-up which is only four years. Particularly in patients with ER-positive breast cancer, recurrence can occur up to twenty years after time of primary diagnosis 31 . Thus, evaluating OS and DFS for patients with ER-positive cancers in our study has limited significance. Hence, at this early stage of the follow-up period, we found no prognostic association of patients with a LumA subtype and a positive SN biopsy, neither in OS-nor DFS-analysis.
In the present study we sought to test whether consecutive assignments of molecular subtypes may have additional prognostic value as predictive markers for positive SN. The design of our study is unique as it relies solely on transcriptomics-based molecular subtypes generated from consecutive breast cancer biopsy samples collected and analyzed in a prospective real-life clinical setting during a period of five years. Overall, the results derived from the present study are based on a reasonable number of patients (> 1500) and can be translated into any breast cancer clinic.
Although the molecular profiles of breast cancers are heterogenous, recent decades of investigations have proven how well multi-gene profiles cluster into separate intrinsic molecular subtypes. Each individual subtype has a distinct molecular profile that provides insights to a targeted treatment strategy and each subtype is associated with clinical characteristics and specific prognosis. However, axillary status and staging of the patient is a major prognostic factor in breast cancer management and the application of SN status guides the level of adjuvant treatment and radiotherapy for the bigger part of primary breast cancer patients. www.nature.com/scientificreports/ In conclusion, we have shown that molecular subtypes are associated with SN status and a BasL subtype is a significant predictor of a SN-negative status. Hence, when assessing SN status, we can conclude that the BasL subtype is indeed a different entity compared to luminal subtypes. However, we have also shown that BasL samples with an enrichment of estrogen-pathway signatures are overrepresented among the SN-positive patients and thus show multiple overlapping clinical features with the luminal ER-positive subtypes.

Materials and methods
Patients and tumor samples. The cohort includes 3002 consecutive registrations from female primary breast cancer patients, clinically and diagnostically assessed at Rigshospitalet, Copenhagen University Hospital during the period from November 2014 until September 2019. Tumor specimens were subjected to molecular subtyping as a part of routine diagnostic work-up. This register-based study was conducted with approval of the Danish Data Protection Agency (jr. no.: 2012-58-0004) and Danish Breast Cancer Group (jr. no.: DBCG-2015-14). The study did not include any contact with patients nor use of biological material, and thus ethical approval, including the need to obtain informed consent, was explicitly waived by the Ethical Committee of the Capital Region of Denmark. None of the authors could access identifying patient information when analyzing the data. Fresh tumor specimens, extracted during surgery, were inspected by pathologists, and tumor biopsies of around 100 mg were stored in RNALater (Thermo Fisher Scientific, Waltham, MA, USA). RNA was isolated using the AllPrep DNA/RNA purification kit (Qiagen, Hilden, Germany). The integrity of the RNA was measured using the Agilent RNA 6000 Nano Kit on an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA). RNA was reverse transcribed and used for cRNA synthesis, labeling and hybridization with GeneChipVR Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA, USA) according to the manufacturer's protocol. In short, arrays were washed and stained with phycoerytrin conjugated streptavidin using the Affymetrix Fluidics Station 450 and scanned in the Affymetrix GeneArray 3000 7G scanner to generate fluorescent images. Probe intensity files (.CEL files) were generated in the GeneChip Command Console Software (AGCC; Affymetrix, Santa Clara, CA, USA).
Sentinel node status. Preoperative axillary sonography was performed in all patients to identify lymph node metastases. In case of suspicious lymph nodes by sonography, fine needle aspiration cytology (FNAC) Table 3. Logistic regressions for sentinel node status. Multivariate regressions were used to analyze subtype (BasL or Non-BasL) together with each of menopausal state, tumor size, and tumor grade, respectively, and the odds ratio (OR) and 95% confidence interval (CI) within each subtype are reported. Wald tests were used to derive p values within each of the individual subtypes (P within ) as well as for the interaction term (P interaction ). Because only a limited number of non-BasL patients were screened for BRCA1/2 carrier status, univariate logistic regression was performed for these features within the BasL subtype. Row-wise percentages in parentheses. www.nature.com/scientificreports/ was performed. If malignant tumor cells were found by FNAC, an immediate ALN dissection was offered, and these patients were excluded. In the remaining patients SLNB was performed. Prior to surgery, 99mTc labeled NanoColl was injected subareolarly, and the blue dye, Patent Blue, was injected at the tumor site or the subareolar region. All radioactive and/or stained lymph nodes as well as lymph nodes suspicious by palpation were removed as sentinel nodes for histopathological examination, including multisectioning and immunohistochemical (IHC) staining. Metastases were staged according to 8th edition of the American Joint Committee on Cancer staging manual where macrometastases are defined as metastases > 2 mm, micrometastases as metastases > 0.2 mm to ≤ 2.00 mm, while isolated tumor cells (ITC) are defined as deposits of cells ≤ 0.2 mm or ≤ 200 cells. Patients were considered SN-positive if macro-or micrometastases were found in the SN, and SN-negative if isolated tumor cells or no metastases were found, but only patients with macrometastases were offered ALND.

Microarray analysis.
The probe level data (.CEL files) were transformed into expression measures using R version 3.2.5 32 . For each sample, the raw intensity .CEL file was preprocessed together with 30 existing breast cancer samples from Rigshospitalet by quantile normalization, and probe summaries were extracted via robust multi-array average (RMA) using the affy package 33 . Subsequently, ComBat 34 from the sva package 35 was applied for batch correction of 12 of the reference samples and the sample of interest together with the CITBCMST 16 www.nature.com/scientificreports/ core set (n = 355), as presented in Rossing et al. 17 . Sample origin was used as batch and initially predicted CITB-CMST subtypes (determined using the citbcmst R-package) acted as covariates 16 .
For each sample, a subtype was subsequently assigned using the CITBCMST tool 16 , which assigns samples to one of six subtypes-BasL, mApo, LumA, LumB, LumC, or NormL-using a distance-to-centroid approach relying on expression of 375 probe sets. In the standard CITBCMST tool, a confidence score for each subtype is also provided. If a sample is close to a single centroid, it is labeled as "core", whereas if it is close to multiple or no centroids, it will be labeled as "mixed" or "outlier", respectively. In this study, all outlier samples (n = 20) were removed from further analysis since they classified as normal tissue due to contamination of normal breast tissue cells 36 . All mixed samples were labeled based on the single closest centroid, leaving only six subtypes in the downstream data analysis. Visualizations were generated using the ggplot2 (https ://ggplo t2.tidyv erse.org) and cowplot (https ://CRAN.R-proje ct.org/packa ge=cowpl ot) packages.
Estrogen receptor protein immunohistopathological analysis. Analyses for ER were performed by immunohistochemistry (IHC) using tissue micro array technique (TMA), with two cores of 2 mm from the invasive front of each tumor, as previously reported 37 . Staining for ER (SP1, diluted 1:25) from Ventana Medical Systems was carried out according to the manufacturer's instructions. Scoring of ER protein was semi-quantitative with a positive cutoff point of ≥ 1% for ER-positive tumors.
Blood sample and germline mutation screening. Genomic DNA was isolated using the ReliaPrep Large Volume HT gDNA Isolation Kit (Promega, Madison, WI, USA) and a Tecan Freedom EVO HSM2.0 Workstation according to the manufacturer's instructions. Mutation screening was done using the breast cancer-predisposing gene-panel as previously described 38 . Sequencing was performed on a MiSeq (Illumina, San Diego, CA, USA) to an average depth of at least 100. Sequencing data were analyzed using Sequence Pilot (JSI Medical Systems, Ettenheim, Germany), where variants are called if the non-reference base frequency was above 25%. Variants are numbered according to the following GenBank accession numbers: NM_007294 (BRCA1) and NM_000059 (BRCA2) using the guidelines from the Human Genome Variation Society (www.hgvs.org/mutno men). All class 3-5 variants were verified by Sanger sequencing on an ABI 3730 DNA Analyzer using DNA purified from a second blood sample.  Statistical analysis. Information on age at diagnosis, menopausal status, tumor size, grade of malignancy, ER-status, SN status, adjuvant treatment, clinical follow-up, and vital status was obtained from the clinical database of DBCG. BRCA1 and BRCA2 class 4-5 variants were considered as "positive" status, and patients with at least one positive status were labeled "BRCA1/2 positive". Associations between SN status and other characteristics were analyzed by χ 2 tests. Univariate logistic regression analysis was applied to assess odds ratios and corresponding 95% confidence intervals for SN-positive status over SN-negative and using the Wald test. For variables with an unknown category, patients with unknown values were excluded from the tests. Multivariate logistic regression analysis was applied to assess heterogeneity according to BasL vs. Non-BasL for each of menopausal status, tumor size, and grade, respectively. Each interaction was tested in a separate model, including subtype and the parameter of interest. Follow-up time was quantified in terms of a Kaplan-Meier estimate of potential follow-up. OS and DFS was analyzed unadjusted by the Kaplan-Meier method, and groups were compared using the log-rank test. All p values are two-tailed, and level of significance set to 0.05. Statistical analyses were performed using the R software 32 .
BasL-specific analysis. All .CEL files from BasL patients (n = 170) were loaded into R 3.6.1 32 using jus-tRMA from affy 33 . For processing Entrez gene-level expression, the BrainArray 39 Custom CDF v. 24 was used, and otherwise default parameter settings were used. Entrez Gene IDs were subsequently translated to HGNC symbols using the BrainArray annotation from hgu133plus2hsentrezg.db. After filtering out data with missing symbols, expression data for 20,418 genes remained. Differential expression analysis was carried out for all genes using limma 40 , and a label-permuting Gene Set Enrichment Analysis (GSEA) was carried out using fgseaLabel from the fgsea package 41 with 100,000 permutations and default settings. The labels for both tests were the per-sample SN status. For this, we used the ER-and PR-sets from Gatza et al. 42 as well as the following three MSigDB 43 gene sets: "DOANE_BREAST_CANCER_ESR1_UP" 44 , "VANTVEER_BREAST_CANCER_ESR1_ UP" 45 , and "YANG_BREAST_CANCER_ESR1_UP" 46 . Barcode enrichment plots for the GSEA analysis was generated using a modified version of the script: https ://githu b.com/Peepe rLab/Rtool box/blob/maste r/R/Replo tGSEA .R. The modified version is available on GitHub: https ://githu b.com/cblig aard/Rtool s. Tests for simple comparisons regarding receptor status or expression level and SN status and menopausal status were carried out using Pearson's Chi-squared and two-tailed Wilcoxon rank-sum tests in R 32 .
Data availability. Expression profiles are available in the online data repository Gene Expression Omnibus (GEO).

Method statement.
All methods were carried out in accordance with relevant guidelines and regulations.