Molecular differences between younger versus older ER-positive and HER2-negative breast cancers

Qing, Tao; Karn, Thomas; Rozenblit, Mariya; Foldi, Julia; Marczyk, Michal; Shan, Naing Lin; Blenman, Kim; Holtrich, Uwe; Kalinsky, Kevin; Meric-Bernstam, Funda; Pusztai, Lajos

doi:10.1038/s41523-022-00492-0

Download PDF

Article
Open access
Published: 07 November 2022

Molecular differences between younger versus older ER-positive and HER2-negative breast cancers

npj Breast Cancer volume 8, Article number: 119 (2022) Cite this article

4 Citations
20 Altmetric
Metrics details

Subjects

Abstract

The RxPONDER and TAILORx trials demonstrated benefit from adjuvant chemotherapy in patients age ≤ 50 with node-positive breast cancer and Recurrence Score (RS) 0–26, and in node-negative disease with RS 16–25, respectively, but no benefit in older women with the same clinical features. We analyzed transcriptomic and genomic data of ER+/HER2− breast cancers with in silico RS < 26 from TCGA (n = 530), two microarray cohorts (A: n = 865; B: n = 609), the METABRIC (n = 867), and the SCAN-B (n = 1636) datasets. There was no difference in proliferation-related gene expression between age groups. Older patients had higher mutation burden and more frequent ESR1 copy number gain, but lower frequency of GATA3 mutations. Younger patients had higher rate of ESR1 copy number loss. In all datasets, younger patients had significantly lower mRNA expression of ESR1 and ER-associated genes, and higher expression of immune-related genes. The ER- and immune-related gene signatures showed negative correlation and defined three subpopulations in younger women: immune-high/ER-low, immune-intermediate/ER-intermediate, and immune-low/ER-intermediate. We hypothesize that in immune-high cancers, the cytotoxic effect of chemotherapy may drive the benefit, whereas in immune-low/ER-intermediate cancers chemotherapy induced ovarian suppression may play important role.

Trastuzumab deruxtecan in HER2-positive advanced gastric cancer: exploratory biomarker analysis of the randomized, phase 2 DESTINY-Gastric01 trial

Article Open access 14 May 2024

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Tumor biomarkers for diagnosis, prognosis and targeted therapy

Article Open access 20 May 2024

Introduction

Most breast cancers are diagnosed in women older than 50¹. Age is not only a risk factor for cancer, but it also interacts with adjuvant chemotherapy benefit in hormone receptor positive/human epidermal growth factor receptor-2 negative (HR+/HER2−) breast cancers². Three randomized trials demonstrated greater chemotherapy benefit in younger compared to older women³. The TAILORx trial showed improved invasive disease-free survival (IDFS) with chemotherapy in addition to adjuvant endocrine therapy in patients younger than 50 with lymph-node negative breast cancer and OncotypeDx 21-gene Recurrence Scores (RS) between 16 and 25, no benefit was seen in women older than 50⁴. The RxPONDER trial randomized patients with 1–3 positive lymph nodes and RS 0–25 to either adjuvant endocrine therapy or endocrine therapy plus chemotherapy⁵. It also demonstrated improved IDFS with chemotherapy in premenopausal patients, or in patients 50 or younger, but no benefit was seen in older women⁵. In the MINDACT trial, a subset of HR+/HER2− patients with high clinical risk and low genomic risk (by the MammaPrint assay) were randomly assigned to receive adjuvant chemotherapy or not⁶. An exploratory analysis showed improved distant metastasis-free survival (DMFS) with chemotherapy compared to endocrine therapy alone in women younger than 50, but not in women older than 50⁶. In all three trials, the most frequently used endocrine therapy for premenopausal women was tamoxifen.

It is unclear what explains the interaction between age and adjuvant chemotherapy benefit. Age is difficult to separate from its association with menopausal status. The mean age of onset of menopause is 51 years in Western countries and by age 55 approximately 85% of women have undergone menopause^7,8. Adjuvant chemotherapy in pre-menopausal women can induce menopause in an age-dependent manner^9,10. The NSABP B-47 clinical trial showed that chemotherapy induced amenorrhea in pre-menopausal women is common but it is often discordant with hormone level measurements. In this study, 85% of patients were amenorrhoeic at 12 months after starting adjuvant chemotherapy but only 28 and 22% had postmenopausal estradiol levels at 12 and 24 months¹¹. The SOFT and TEXT trials demonstrated that in premenopausal HR+ patients ovarian suppression plus an aromatase inhibitor is more effective than tamoxifen alone to improve recurrence-free survival^12,13. Chemotherapy-induced menopause can therefore contribute to adjuvant chemotherapy benefit. However, younger patients also have more chemotherapy sensitive cancers. A pooled analysis of 9000 patients enrolled in neoadjuvant chemotherapy trials showed that the pathologic complete response (pCR) rate is significantly higher in the younger HR+/HER2− patients¹⁴.

In the past 20 years, three types of molecular features emerged that predict endocrine and chemotherapy sensitivities in early stage-breast HR+/HER2− cancer; (i) expression of estrogen receptor (ER) regulated genes is a measure of endocrine sensitivity and is associated with better prognosis¹⁵, (ii) proliferation, and (iii) immune infiltration related markers are independently associated with greater chemotherapy sensitivity in neoadjuvant chemotherapy trials^16,17,18.

The goal of the current analysis was to compare differences in estrogen receptor (ER)-, proliferation-, and immune-related gene expressions, and somatic mutation patterns and mutation burden between younger (≤50 years of age) and older (≥55 years) patients with HR+/HER2− breast cancer that could explain the chemotherapy benefit in younger women. These age cohorts were selected because the ≤50 group is highly enriched in pre-menopausal women and represents the group where all the chemotherapy benefit accrues, whereas the ≥55 group is almost entirely composed of post-menopausal women⁸. We further restricted our analysis to the subset of patients who were in the lower 80% range of in silico RS distribution to mimic the RxPONDER and TAILORX populations that excluded women with RS > 25.

Results

Patient characteristics

Patient and tumor characteristics, including molecular subtype distribution, and available treatment information are presented in Table 1. The median ages of the younger and older patients ranged between 45–46 and 66–69 years across the datasets.

Table 1 Demographics and characteristics of ER+/HER2− cases in five cohorts.

Full size table

Differences in ER signaling, cell proliferation, and immune infiltration

ESR1 mRNA expression was significantly lower in younger women in all cohorts (P < 0.001; Fig. 1a, c, e, Supplementary Fig. 1). Lower mRNA expression in bulk RNA analysis could be due to either fewer ER-positive cancer cells, that could be reflected by lower ER percent positivity by immunohistochemistry (IHC), or to lower ER mRNA expression within ER-positive cells. To distinguish between these two possibilities, we plotted age distribution in ten IHC percent positivity brackets from 1 to 10% to >90% in increments of 10 in the TCGA data where this information was available (n = 338). We observed no statistically significant correlation between age and increasing ER IHC percent positivity (τ = 0.036, P < 0.19, Supplementary Fig. 2a). Overall, ESR1 mRNA expression increased as IHC percent positivity increased (τ = 0.27, P < 0.0001), reaching a plateau after > 40% (Supplementary Fig. 2b). ESR1 mRNA expression showed positive association with age at diagnosis (Spearman coefficient = 0.41, P < 0.0001) (Supplementary Fig. 2c). A regression model of ESR1 mRNA expression using age and IHC positivity showed contribution of both parameters but a larger effect size of age (standardized beta 0.365) than percentage of IHC positivity (standardized beta 0.215). This suggests that the overall lower ESR1 mRNA expression in younger patients is primarily driven by lower ESR1 mRNA levels in ER positive cancer cells.

**Fig. 1: Expression of estrogen receptor (*ESR1*) and proliferation marker (*MKI67*) in older and younger ER+/HER2− breast cancer patients.**

Next, we assessed the expression of four gene signatures that are positively associated with endocrine therapy sensitivity including a 4-gene ERS¹⁹, a 7-gene ERS-Lum¹⁹, a 106-gene ERS-Pos signature¹⁵, and a 59-gene ERS-Neg signature¹⁵ which is negatively associated with ER expression and endocrine sensitivity¹⁵. Both in the TCGA and in the Metabric cohort, the ERS, ERS-Lum, and ERS-Pos signatures were all significantly lower (FDR < 0.03) while the ERS-Neg signature was higher (FDR < 0.001) in younger patients (Table 2). Similarly, in both microarray cohorts, and in the SCAN-B-cohort, the ERS-Pos signature was lower and the ERS Neg signature was higher in the younger age group (FDR < 0.002; Table 2). The two smaller signatures, ERS and ERS-Lum, showed nominally lower expressions in younger patients in cohort-A without reaching statistical significance. In cohort-B, ERS showed lower expression in young patients whereas ERS-Lum was similar between age groups (Table 2). Overall, these results indicate not only downregulation of ESR1 mRNA expression but also lower ER-associated gene expression in ER positive cancers of younger compared to older patients.

Table 2 Estrogen receptor and immune and proliferation-related signatures in younger and older breast cancer patients.

Full size table

mRNA expression of the MKI67 gene, that codes for the Ki67 proliferation marker, was similar between age groups in TCGA and microarray cohort-A, but was slightly but statistically significantly higher in the younger patients in microarray cohort-B (Fig. 1b, d, f and Supplementary Fig. 1). The expression of a 12-gene mitotic kinase gene signature (MKS), that has been associated with worse prognosis in HR positive breast cancers and higher sensitivity to neoadjuvant chemotherapy¹⁴, did not differ statistically significantly between the age groups in all cohorts (Table 2). However, the most highly proliferative tumors with the highest 20% of in silico RS were not included in this analysis by design.

Next, we assessed 4 different immune cell signatures²⁰ and a tumor inflammation signature²¹ that were previously shown to predict response to chemotherapy and immune checkpoint inhibitor therapy (Table 2). In the TCGA, B-cell, T-cell, Mast-cell, and TIS signatures were significantly higher, the dendritic signature only showed nominally increased expression (FDR = 0.22). In the microarray Cohort-A, B cells and mast cells were significantly higher, the T cell and TIS signatures showed a trend for higher expression. In Cohort-B, T cells, B cells, TIS, and dendritic cells signatures were significantly higher in younger patients (Table 2). We also evaluated these gene signatures in the METABRIC and SCAN-B data sets and found similar associations (Table 2). We also performed an immune cell composition analysis in the TCGA data using the ConsensusTME method²². Consistent with the gene signature results, younger patients had higher levels B cells, Cytotoxic cells, Endothelial, Fibroblasts, Plasma cells, CD4 T cell, CD8 T cells, and T regulatory cell markers (Supplementary Fig. 3).

Next, we assessed correlation between the ESR1, MKI67 expression, and the 10 gene signatures in Table 2. The MKI67 expression and MKS signature, and ESR1 expression and the ERS-Pos gene signature were each highly correlated. The correlation between ESR1 and the other ER-related gene signatures was less strong. Among the immune signatures, the T cell, B cell, and TIS signatures showed the highest co-expression. The ER-related and immune signatures showed moderate negative correlation in all 3 data sets (Pearson correlation coefficients −0.24, −0.31, −0.25) suggesting independent predictive functions (Supplementary Fig. 4). The distributions of the B cell and ERS-Pos signatures in the TCGA cohort are shown on Fig. 1g, h and illustrate that in the age ≤50 group, three patient populations are intermixed including those with immune-intermediate/ER-intermediate (largest subset), immune-low/ER-intermediate, and immune-high/ER-low (smallest subset) cancers, while in the older age group the immune-low/ER-high cancers are predominant.

Differentially expressed genes and pathways between age groups

In the TCGA, we identified 713 up- and 77 downregulated genes in younger patients (Fig. 2a and Supplementary Table 1). In microarray cohorts A and B, we found 122 and 95 upregulated and 15 and 14 downregulated genes, respectively (Fig. 2b, c, Supplementary Tables 2 & 3, and Supplementary Fig. 5). Thirty-one upregulated genes in younger patients were shared in all three analyses (Fig. 2d, e). Twenty-five and 11 of the 31 overlapped DEGs were also upregulated in young patients in SCAN-B and METABRIC cohort, respectively (Supplementary Table 4). ESR1 and CRABP2 were down-regulated in both SCAN-B and METABRIC cohorts (Supplementary Table 4). In gene set enrichment analysis, 22 biological pathways showed differential expression by age in TCGA; 7 were immune and inflammation related, the others represented estrogen, K-ras, and hedgehog signaling, epithelial mesenchymal transition, angiogenesis, and apical junction/apical surface pathways (Supplementary Table 5).

**Fig. 2: Differentially expressed genes between younger versus older patients.**

Comparison of somatic mutations and copy number variations (CNV) in younger versus older patients in TCGA

The somatic mutation burden was significantly higher in older patients (P < 0.0001; Fig. 3a), consistent with age-related accumulation of mutations²³. At gene level, 13 genes had mutation frequencies ≥ 5% and only GATA3 showed a significantly higher mutation frequency in younger patients (26% versus 12%, P < 0.0001; Fig. 3b). In multivariate logistic regression analysis, luminal B tumors were associated with the enrichment of GATA3 mutations (P = 0.011, odds ratio = 2.18), younger patients also had higher rate of GATA3 mutations (P < 0.0001, odds ratio = 3.15). These results are consistent with an earlier report that showed GATA3 mutation enrichment in luminal B cancers from young women²⁴.

**Fig. 3: Somatic mutation profiles between younger versus older TCGA patients.**

We also compared the CNV gain and loss of 705 Catalog Of Somatic Mutations In Cancer (COSMIC) genes²⁵. We identified high rate of CNV gain of ESR1, LATS1, ARID1B, SGK1, and MYB genes (odds ratio > 8.5, FDR < 0.05) in old patients (Supplementary Table 6). Young patients have a higher rate of CNV loss of ESR1 gene (odds ratio = 0.45, FDR = 0.03, Supplementary Table 6). In addition to ESR1, we identified 19 and 29 genes have higher rate of CNV loss in young and old patients, respectively (Supplementary Table 6).

Discussion

In independent data sets including n = 4507 ER+/HER2− breast cancers, we found that cancers in patients 50 or younger have lower expression of ESR1 and ER-related genes and higher expression of immune related genes. Increasing ER expression with older age has been described in earlier studies that analyzed all breast cancer subtypes together²⁶. A significant linear relationship between increasing age and ESR1 mRNA expression was also seen in luminal-A and -B breast cancers²⁷. The biological reasons behind this phenomenon are unclear. In normal breast epithelium in premenopausal women, the ER expression fluctuates during the menstrual cycle, and ER expression is highest during the follicular phase^28,29. Based on this observation, one would expect higher average ER expression in premenopausal women, however, we found the opposite. We hypothesize that ER expression in breast epithelial cells, and in cancers that arise from them, may increase as estrogen levels decrease with aging due to a feedback loop. Indeed, several studies showed increased ER expression in normal breast epithelium with increasing age^30,31.

The clinical relevance of lower ESR1 and ER related gene expression in cancers of younger women is uncertain. However, ER-associated genes are components of all clinically validated multi-gene prognostic signatures³², and higher expression levels are associated with better prognosis with adjuvant endocrine therapy³³. Higher ER-associated gene expression is also associated with longer PFS and OS in metastatic breast cancer treated with endocrine therapy³⁴. These results suggest that lower ESR1 and ER-related gene expression in younger women may indicate lower endocrine sensitivity. Intensifying endocrine therapy could maximize benefit, which is consistent with clinical trial results that demonstrated ovarian suppression plus tamoxifen, or exemestane, is more effective than tamoxifen alone to improve recurrence-free survival in premenopausal women.

The higher immune gene expression in younger HR+/HER2− breast cancer patients compared to older patients has not previously been reported. The cause of the higher immune infiltration is unknown. Somatic mutation burden that could increase neoantigen load was lower in younger patients. The gene expression data suggests an important role for CXCL13 that was the most highly and consistently overexpressed chemokine in cancers from younger women. CXCL13 is secreted by dendritic and endothelial cells, and is a powerful B cell attractant, that can also activate helper T cells³⁵. High expression of CXCL13 is predictive of better survival in HR+/HER2− breast cancer patients treated with adjuvant chemotherapy³⁶, and is associated with higher pathologic complete response rate after neoadjuvant chemotherapy in HR+ breast cancers¹⁷. These observations suggest that HR+/HER2− breast cancer in younger patients may have higher chemotherapy sensitivity due to greater immune infiltration in the tumor microenvironment than cancers in older women, even if proliferation related predictive markers are similar. When we examined immune and ER related gene expression distributions jointly, we found 3 distinct sub-populations among younger women; (i) immune-high/ER-low, (ii) immune-intermediate/ER-intermediate, and (iii) immune-low/ER-intermediate cancers. The impact of adjuvant chemotherapy is likely different in these different subgroups. We hypothesize that in immune-high/intermediate and ER-low/intermediate cancers the cytotoxic effect drives the benefit, whereas in immune-low/ER-intermediate cancers chemotherapy-induced ovarian suppression plays a more important role. These observations add to the already existing literature that described general molecular differences between breast cancers in younger and older women including elevated integrin/laminin and EGFR and TGFβ signaling and numerous age-associated genes^37,38,39. To increase our ability to identify differences between pre- and post-menopausal ER+ breast cancers our analysis focused on cancers from woman < 50 and >55 years of age groups and excluded the perimenopausal age group 50 to 55. We further restricted our analysis by excluding cases with the highest 20% of in silico RS. This is an important feature of our analysis that has impacted the findings, unlike all previous studies that find higher prevalence of luminal B cancers in younger women, our comparison cohorts were balanced for luminal A and B subtypes. This indicates that the higher chemotherapy benefit is not due to higher proportion of Luminal B cancers among premenopausal women with Recurrence Score <26. Finally, our purpose was to examine differences, if they exist, in carefully selected clinically validated biologic features that predict for chemotherapy and endocrine therapy sensitivity so that we could generate a hypothesis of why younger patients benefit more from chemotherapy.

This study has limitations. We were unable to assess the interaction between adjuvant treatments, molecular features and survival in the young women due to lack of patient specific treatment information in our datasets and lack of randomization. However, we describe a testable hypothesis that could be examined in future clinical trials prospectively, or retrospectively, when gene expression data becomes available from samples of the TAILORx or RxPONDER trials. We describe biological features that are highly reproducible across independent datasets and across different mRNA quantification platforms which implies that these robust gene expression features could be captured by standardized assays in the future.

Overall, our analysis suggests that both the cytotoxic and endocrine effects of adjuvant chemotherapy could contribute to the overall survival benefit seen in younger patients but the relative contributions of these effects may vary by the immune cell composition and ER expression of these cancers.

Methods

TCGA breast cancer cohort

mRNA expression, somatic mutation, and clinical data of 1085 primary breast cancer patients were obtained from TCGA (https://gdc.cancer.gov/about-data/publications/pancanatlas). The RNAseq expression matrix of Fragments per Kilobase of transcript per Million mapped reads (FPKM) was upper quantile normalized and subsequently log2 transformed. Percent ER positivity assessed by routine clinical immunohistochemistry (IHC) was available for 1037 cases⁴⁰. We excluded the ER-negative (n = 238) and HER2 amplified (n = 100) cases, and cases without ER information (n = 48). We assigned HER2 status based on HER2 mRNA expression that follows a bimodal expression pattern⁴¹. We used the Bayesian information criterion to find the number of components in the Gaussian mixture model and used GaMRed (http://cellab.polsl.pl/index.php/software?id=28)⁴² to select the optimal threshold value (normalized FPKM equal to 15.17) to define HER2 gene overexpression. To mimick the TAILORx and RxPONDER populations we also excluded case with the top 20% in silico calculated RS score (n = 74). For final analysis, we grouped ER+/HER2− cancers (n = 530) into ≤ 50 (n = 159) or ≥ 55 years of age (n = 371) at diagnosis (Supplementary Fig. 6).

Microarray cohorts

From publicly available Affymetrix microarray datasets we identified 2007 unique, previously untreated breast cancer samples that were (i) annotated with age, (ii) had raw MAS5 data deposited, and (iii) were ER+/HER2−⁴³ (Supplementary Fig. 6). We assembled 27 Affymetrix U133A datasets from GEO (https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) (E-TABM-158, GSE11121, GSE12276, GSE16391, GSE17907, GSE18864, GSE19615, GSE20194, GSE2034, GSE2109, GSE21653, GSE22035, GSE22513, GSE2603, GSE26971, GSE2990, GSE3494, GSE4611, GSE46184, GSE4922, GSE5327, GSE6532, GSE6532, GSE6596, GSE7390, GSE9195, MDA133) with no overlap to the RNA-Seq sample cohort from TCGA. We included only datasets with MAS5 data available (i.e., Individual sample level normalized expression data) without cohort-based normalization steps (e.g., RMA). A total of 3292 unique samples were annotated with age and had raw MAS5 data deposited. From these, we selected 2007 ER+/HER2− samples based on gene expression data as previously described⁴³ (Supplementary Fig. 6). Supplementary Table 7 lists details for each sample including clinical information and a link to the corresponding expression data.

For the most accurate identification of differentially expressed genes, we aimed to assemble the most homogenous combined dataset with respect to technical bias and platform heterogeneity. To accomplish this, we used our previously described pipeline⁴⁴ and designated this dataset as “Cohort A”. We calculated a technical comparability metric “C” which is the sum of squared normalized differences between dataset means and global means for all genes and considered datasets highly comparable if normalized C < 0.05. This resulted in 13 data sets including n = 1170 samples assigned into Cohort-A. For a second independent validation, we also combined all remaining datasets into Cohort B including n = 837 samples that correspond to data with grater technical heterogeneity (Supplementary Fig. 6).

From each cohort, we then excluded cases in the top 20% of highest in silico Recurrence score values to mimic a clinical cohort similar to that of TAILORx that included only patients with RS < 26. This resulted in n = 936 cases in Cohort A and n = 669 cases in Cohort B. For final analysis, we grouped ER+/HER2− cancers into ≤ 50 (n = 281 in cohort-A, n = 162 in cohort-B) versus ≥ 55 (n = 584 in cohort-A, n = 447 in cohort-B) years of age (Supplementary Fig. 6).

METABRIC datasets

Normalized tumor mRNA expression data and the clinical metadata of 1908 breast cancer patients⁴⁵ were download from www.cbioportal.org. We excluded 723 ER-negative or HER2 amplified cases, 61 cases without ER or HER2 status, and 240 cases with the top 20% RS score. For final analysis, we grouped ER+/HER2− cancers (n = 867) into ≤ 50 (n = 157) or ≥ 55 years of age (n = 710) at diagnosis (Supplementary Fig. 6).

SCAN-B datasets

Normalized tumor mRNA expression data and the clinical metadata of 2969 breast cancer patients were downloaded from the Gene Expression Omnibus (GEO) database (GSE96058)⁴⁶ (Supplementary Fig. 6). ER status assessed by immunohistochemistry was available for 2,783 patients, and HER2 status reported by situ hybridization was available for 2868 patients. We excluded the ER-negative (n = 224) and HER2 amplified (n = 378) cases, cases without ER (n = 199) or HER2 (101) status, and cases with top 20% RS score (n = 409). For final analysis, we grouped ER+/HER2− cancers (n = 1636) into ≤ 50 (n = 305) or ≥ 55 years of age (n = 1331) at diagnosis (Supplementary Fig. 6).

Calculation of in silico recurrence score

We calculated an in silico recurrence score for each sample using the oncotypedx function of the genefu R library⁴⁷. These scores approximate the clinical OncotypeDX RS but are not equivalent due to different dynamic ranges of the measurements. In clinical studies, 15–20% of cases submitted for OncotypeDx testing have RS > 25^48,49. In the screening phase of TAILORx, 17% of patients had RS > 25. To approximate this distribution, we excluded patients with the top 20% of the highest continuous in silico recurrence scores.

Molecular subtyping

Molecular subtype assignments of TCGA samples were obtained from Peng et al.⁵⁰. To assign molecular subtypes to samples from the microarray cohorts we used the R package AIMS under R version 3.3.0⁵¹.

Gene-expression signatures

To assess ER and Ki67 expression in the microarray data, we used the ESR1 probe set 205225_at, and the average of four MKI67 probe sets as previously described⁴³. Ten mRNA expression signatures were obtained from literature including four estrogen-related signatures (e.g., ERS, ERS Luminal¹⁹, ERS Pos Symmans¹⁵, and ERS Neg Symmans¹⁵), four immune cell signatures (e.g., T Cell, B Cell, Mast Cell, Dendritic Cell²⁰, and Tumor inflammation signature [TIS]²¹), and one proliferation signature (Mitosis Kinase Score, MKS¹⁹) (Supplementary Table 8). For each signature, we calculated the average normalized expression of the member genes and transformed to z-score across all cases in each cohort.

Immune-cell composition analysis

Immune cell composition was estimated using the ConsensusTME²² method that estimates the contribution of 18 immune cell types to the tissue microenvironment. We used normalized TCGA mRNA expression data as input and select ssGSEA method for immune cell signature analysis with the ConsensusTME R package²².

Differentially expressed genes

To identify differentially expressed genes (DEGs) in TCGA RNAseq data (representing 20,282 human genes), we calculated fold change and t-test p-value for each gene between younger and older cases. DEGs were defined as fold change ≥ 1.50 (i.e., upregulated) or ≤ 0.67 (i.e., downregulated) with Benjamini Hochberg corrected false discovery rate (FDR) < 0.05. To identify DEGs from Affymetrix microarray data, we applied the limma R package⁵². To avoid batch effects, we included the original Affymetrix source dataset as covariate. Identical fold change filters were used as for TCGA data.

Gene set enrichment analysis

Log2 transformed fold changes of all 20,282 genes of TCGA samples were used as gene rank values to perform gene set enrichment analysis using the fgsea⁵³ package in R using the hallmark gene set (n = 50) of the Molecular Signatures Database (MSigDB)⁵⁴.

Somatic mutation analysis

Somatic mutations which were available for 427 older and 183 younger TCGA breast cancer cases were obtained from the Multi-Center Mutation Calling in Multiple Cancers (MC3) dataset⁵⁵. Somatic mutation burden was calculated as the total number of somatic mutations across all genes in each cancer. For comparison of gene level somatic mutation frequencies between age groups we only considered the nonsynonymous mutations, including missense, non-sense, frameshifting, in-frame shifting, or splice-site altering single-nucleotide changes or indels and statistical significance was assessed with Fisher’s exact test. A multivariate logistic regression model was used to evaluate the association of Luminal B subtype and age group with the mutation status of GATA3:

$${\it{GATA3}}\;{{{\mathrm{status}}}}\sim {{{\mathrm{Age}}}}\;{{{\mathrm{group}}}} + {{{\mathrm{Luminal}}}}\;{{{\mathrm{B}}}}\;{{{\mathrm{status}}}} + {{{\mathrm{Age}}}}\;{{{\mathrm{group}}}}\, \ast \,{{{\mathrm{Luminal}}}}\;{{{\mathrm{B}}}}\;{{{\mathrm{status}}}}$$

Association of ER status and age at diagnosis

We estimated the statistical significance of the trend of the ER IHC percentage categories with ESR1 mRNA expression and age at diagnosis using Jonckheere Terpstra (JT) trend analysis⁵⁶. P-values were calculated using the “JonckheereTerpstraTest” function of “DescTools” R package⁵⁷. Kendall’s tau (τ) coefficient was estimated to measure the increasing (positive value) or decreasing (negative value) trend for each trend analysis. We estimated the correlation between ESR1 mRNA expression and age of diagnosis using Spearman’s rank correlation analysis.

Copy number variation analysis

We obtained gene-level somatic CNV data of TCGA patients from the PanCanAtlas Aneuploidy study (https://gdc.cancer.gov/about-data/publications/pancanatlas)⁵⁸. The CNVs of 25,128 genes of 513 ER+/HER2− patients were available. We focus on the 703 genes that overlapped with the COSMIC cancer gene list. The gene-level events indicate that the copy number gain/loss effect an entire chromosome arm or a specific genomic region that encodes gene. CNV was assessed with Affymetrix SNP 6.0 arrays⁵⁸ and gene-level CNV values were generated by GISTIC⁵⁹. A GISTIC call of +1 or +2 was considered a gain and −1 or −2 was considered a loss, and 0 as wild-type for association analysis in our study. The association of CNV gain or loss with the age group was assessed with Fisher’s exact test. Odds ratio larger than one were consider as CNVs enriched in old patients, and less than one means enriched in young patients.

Statistical analysis

The Chi-squared test was used to compare categorical variables of patient characteristics. Wilcoxon rank-sum test was used to compare the expression signatures, and somatic mutation burden. P-values were adjusted for multiple comparisons using Benjamini–Hochberg method. A regression model of ESR1 mRNA using age, ER IHC percentage categories, and their interaction was used to assess the contribution of both parameters. All analyses were performed in R version 3.6.1⁵¹.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All the data that support the funding in this study are public available and Web links of those datasets are available in the Methods section, additional information can be provided by the authors upon reasonable request.

Code availability

All codes for data cleaning and analysis are available at GitHub https://github.com/tao-qing/npjYoungVsOld.

References

DeSantis, C. E. et al. Breast cancer statistics, 2019. CA Cancer J. Clin. 69, 438–451 (2019).
Article PubMed Google Scholar
Burstein, H. J. et al. Adjuvant endocrine therapy for women with hormone receptor-positive breast cancer: American Society of Clinical Oncology Clinical Practice Guideline update on ovarian suppression. J. Clin. Oncol. 34, 1689–1701 (2016).
Article CAS PubMed Google Scholar
Piccart, M. J. et al. Gene expression signatures for tailoring adjuvant chemotherapy of luminal breast cancer: stronger evidence, greater trust. Ann. Oncol. https://doi.org/10.1016/j.annonc.2021.05.804 (2021).
Sparano, J. A. et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N. Engl. J. Med. 379, 111–121 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kevin Kalinsky, W. E. B. et al. San Antonio Breast Cancer Symposium (San Antonio, 2020).
Piccart, M. et al. 70-gene signature as an aid for treatment decisions in early breast cancer: updated results of the phase 3 randomised MINDACT trial with an exploratory analysis by age. Lancet Oncol. 22, 476–488 (2021).
Article CAS PubMed Google Scholar
Broekmans, F. J., Knauff, E. A., te Velde, E. R., Macklon, N. S. & Fauser, B. C. Female reproductive ageing: Current knowledge and future trends. Trends Endocrinol. Metab. 18, 58–65 (2007).
Article CAS PubMed Google Scholar
Krailo, M. D. & Pike, M. C. Estimation of the distribution of age at natural menopause from prevalence data. Am. J. Epidemiol. 117, 356–361 (1983).
Article CAS PubMed Google Scholar
Vriens, I. J. et al. The correlation of age with chemotherapy-induced ovarian function failure in breast cancer patients. Oncotarget 8, 11372–11379 (2017).
Article PubMed PubMed Central Google Scholar
Furlanetto, J. et al. Chemotherapy-induced ovarian failure in young women with early breast cancer: Prospective analysis of four randomised neoadjuvant/adjuvant breast cancer trials. Eur. J. Cancer 152, 193–203 (2021).
Article CAS PubMed Google Scholar
Ganz, P. A. et al. NRG Oncology/NSABP B-47 menstrual history study: Impact of adjuvant chemotherapy with and without trastuzumab. NPJ Breast Cancer 7, 55 (2021).
Article CAS PubMed PubMed Central Google Scholar
Francis, P. A. et al. Adjuvant ovarian suppression in premenopausal breast cancer. N. Engl. J. Med. 372, 436–446 (2015).
Article PubMed Google Scholar
Regan, M. M. et al. Absolute benefit of adjuvant endocrine therapies for premenopausal women with hormone receptor-positive, human epidermal growth factor receptor 2-negative early breast cancer: TEXT and SOFT trials. J. Clin. Oncol. 34, 2221–2231 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loibl, S. et al. Outcome after neoadjuvant chemotherapy in young breast cancer patients: a pooled analysis of individual patient data from eight prospectively randomized controlled trials. Breast Cancer Res. Treat. 152, 377–387 (2015).
Article CAS PubMed Google Scholar
Symmans, W. F. et al. Genomic index of sensitivity to endocrine therapy for breast cancer. J. Clin. Oncol. 28, 4111–4119 (2010).
Article PubMed PubMed Central Google Scholar
Hatzis, C. et al. A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA 305, 1873–1881 (2011).
Article CAS PubMed PubMed Central Google Scholar
Denkert, C. et al. Tumor-associated lymphocytes as an independent predictor of response to neoadjuvant chemotherapy in breast cancer. J. Clin. Oncol. 28, 105–113 (2010).
Article CAS PubMed Google Scholar
Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).
Article PubMed Google Scholar
Bianchini, G. et al. Proliferation and estrogen signaling can distinguish patients at risk for early versus late relapse among estrogen receptor positive breast cancers. Breast Cancer Res. 15, R86 (2013).
Article PubMed PubMed Central Google Scholar
Danaher, P. et al. Gene expression markers of Tumor Infiltrating Leukocytes. J. Immunother. Cancer 5, 18 (2017).
Article PubMed PubMed Central Google Scholar
Ayers, M. et al. IFN-gamma-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. Invest. 127, 2930–2940 (2017).
Article PubMed PubMed Central Google Scholar
Jimenez-Sanchez, A., Cast, O. & Miller, M. L. Comprehensive benchmarking and integration of tumor microenvironment cell estimation methods. Cancer Res. 79, 6238–6246 (2019).
Article CAS PubMed Google Scholar
Qing, T. et al. Germline variant burden in cancer genes correlates with age at diagnosis and somatic mutation burden. Nat. Commun. 11, 2438 (2020).
Article CAS PubMed PubMed Central Google Scholar
Griffith, O. L. et al. The prognostic effects of somatic mutations in ER-positive breast cancer. Nat. Commun. 9, 3476 (2018).
Article PubMed PubMed Central Google Scholar
Tate, J. G. et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Article CAS PubMed Google Scholar
Clark, G. M., Osborne, C. K. & McGuire, W. L. Correlations between estrogen receptor, progesterone receptor, and patient characteristics in human breast cancer. J. Clin. Oncol. 2, 1102–1109 (1984).
Article CAS PubMed Google Scholar
Liedtke, C. et al. The prognostic impact of age in different molecular subtypes of breast cancer. Breast Cancer Res. Treat. 152, 667–673 (2015).
Article CAS PubMed Google Scholar
Haynes, B. P. et al. Differences in expression of proliferation-associated genes and RANKL across the menstrual cycle in estrogen receptor-positive primary breast cancer. Breast Cancer Res. Treat. 148, 327–335 (2014).
Article CAS PubMed Google Scholar
Haynes, B. P. et al. Expression of key oestrogen-regulated genes differs substantially across the menstrual cycle in oestrogen receptor-positive primary breast cancer. Breast Cancer Res. Treat. 138, 157–165 (2013).
Article CAS PubMed Google Scholar
Gulbahce, H. E., Blair, C. K., Sweeney, C. & Salama, M. E. Quantification of estrogen receptor expression in normal breast tissue in postmenopausal women with breast cancer and association with tumor subtypes. Appl Immunohistochem. Mol. Morphol. 25, 548–552 (2017).
Article CAS PubMed Google Scholar
Khan, S. A., Rogers, M. A., Khurana, K. K., Meguid, M. M. & Numann, P. J. Estrogen receptor expression in benign breast epithelium and breast cancer risk. J. Natl Cancer Inst. 90, 37–42 (1998).
Article CAS PubMed Google Scholar
Sotiriou, C. & Pusztai, L. Gene-expression signatures in breast cancer. N. Engl. J. Med. 360, 790–800 (2009).
Article CAS PubMed Google Scholar
Iwamoto, T. et al. Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer. J. Natl Cancer Inst. 103, 264–272 (2011).
Article CAS PubMed Google Scholar
Sinn, B. V. et al. SETER/PR: A robust 18-gene predictor for sensitivity to endocrine therapy for metastatic breast cancer. NPJ Breast Cancer 5, 16 (2019).
Article PubMed PubMed Central Google Scholar
Kazanietz, M. G., Durando, M. & Cooke, M. CXCL13 and its receptor CXCR5 in cancer: Inflammation, immune response, and beyond. Front. Endocrinol. 10, 471 (2019).
Article Google Scholar
Razis, E. et al. The role of CXCL13 and CXCL9 in early breast cancer. Clin. Breast Cancer 20, e36–e53 (2020).
Article CAS PubMed Google Scholar
Liao, S. et al. The molecular landscape of premenopausal breast cancer. Breast Cancer Res. 17, 104 (2015).
Article PubMed PubMed Central Google Scholar
Rueda, O. M. et al. Dynamics of breast-cancer relapse reveal late-recurring ER-positive genomic subgroups. Nature 567, 399–404 (2019).
Article CAS PubMed PubMed Central Google Scholar
Osako, T. et al. Age-correlated protein and transcript expression in breast cancer and normal breast tissues is dominated by host endocrine effects. Nat. Cancer 1, 518–532 (2020).
Article CAS PubMed Google Scholar
Cancer Genome Atlas, N. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Article Google Scholar
Wang, J., Wen, S., Symmans, W. F., Pusztai, L. & Coombes, K. R. The bimodality index: A criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data. Cancer Inf. 7, 199–216 (2009).
Google Scholar
Marczyk, M., Jaksik, R., Polanski, A. & Polanska, J. Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition. BMC Bioinform. 14, 101 (2013).
Article Google Scholar
Karn, T. et al. Data-driven derivation of cutoffs from a pool of 3,030 Affymetrix arrays to stratify distinct clinical types of breast cancer. Breast Cancer Res. Treat. 120, 567–579 (2010).
Article PubMed Google Scholar
Karn, T. et al. Control of dataset bias in combined Affymetrix cohorts of triple negative breast cancer. Genom. Data 2, 354–356 (2014).
Article PubMed PubMed Central Google Scholar
Curtis, C. et al. The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Article CAS PubMed PubMed Central Google Scholar
Brueffer, C. et al. Clinical value of RNA sequencing-based classifiers for prediction of the five conventional breast cancer biomarkers: A report from the population-based multicenter Sweden cancerome analysis network-breast initiative. JCO Precis. Oncol. https://doi.org/10.1200/PO.17.00135 (2018).
Gendoo, D. M. et al. Genefu: An R/Bioconductor package for computation of gene expression-based signatures in breast cancer. Bioinformatics 32, 1097–1099 (2016).
Article CAS PubMed Google Scholar
Choi, I. S. et al. The 21-gene recurrence score assay and prediction of chemotherapy benefit: A propensity score-matched analysis of the SEER database. Cancers https://doi.org/10.3390/cancers12071829 (2020).
Stemmer, S. M. et al. Ten-year clinical outcomes in N0 ER+ breast cancer patients with recurrence score-guided therapy. NPJ Breast Cancer 5, 41 (2019).
Article PubMed PubMed Central Google Scholar
Peng, X. et al. Molecular characterization and clinical relevance of metabolic expression subtypes in human cancers. Cell Rep. 23, 255–269 e254 (2018).
Article CAS PubMed PubMed Central Google Scholar
Paquet, E. R. & Hallett, M. T. Absolute assignment of breast cancer intrinsic molecular subtype. J. Natl Cancer Inst. 107, 357 (2015).
Article PubMed Google Scholar
Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl Genet. Mol. Biol. 3, 1–25 (2004).
Article Google Scholar
Gennady Korotkevich, V. S. & Sergushichev, A. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2019).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281 e277 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jonckheere, A. R. A distribution-free k-sample test against ordered alternatives. Biometrika https://doi.org/10.2307/2333011 (1954).
Signorell, A. DescTools: Tools for Descriptive Statistics and Exploratory Data Analysis. https://andrisignorell.github.io (2020).
Taylor, A. M. et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell 33, 676–689 e673 (2018).
Article CAS PubMed PubMed Central Google Scholar
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by grants from the H.W. & J. Hector-Stiftung, Mannheim, Germany (M82) to Thomas Karn and Uwe Holtrich, and from the Susan Komen Foundation Leadership Award (SAC160076) and Breast Cancer Research Foundation Investigator Award (BCRF-21-133) to Lajos Pusztai.

Author information

These authors contributed equally: Tao Qing, Thomas Karn.

Authors and Affiliations

Breast Medical Oncology, School of Medicine, Yale University, New Haven, CT, USA
Tao Qing, Mariya Rozenblit, Julia Foldi, Michal Marczyk, Naing Lin Shan, Kim Blenman & Lajos Pusztai
Department of Gynecology and Obstetrics, Goethe-University Frankfurt, Frankfurt, Germany
Thomas Karn & Uwe Holtrich
Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
Michal Marczyk
Department of Hematology and Medical Oncology, Winship Cancer Institute, Emory University, Atlanta, GA, USA
Kevin Kalinsky
Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Funda Meric-Bernstam

Authors

Tao Qing
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Karn
View author publications
You can also search for this author in PubMed Google Scholar
Mariya Rozenblit
View author publications
You can also search for this author in PubMed Google Scholar
Julia Foldi
View author publications
You can also search for this author in PubMed Google Scholar
Michal Marczyk
View author publications
You can also search for this author in PubMed Google Scholar
Naing Lin Shan
View author publications
You can also search for this author in PubMed Google Scholar
Kim Blenman
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Holtrich
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Kalinsky
View author publications
You can also search for this author in PubMed Google Scholar
Funda Meric-Bernstam
View author publications
You can also search for this author in PubMed Google Scholar
Lajos Pusztai
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Concept and design: L.P.; data curation, analysis, and interpretation: T.Q., T.K., L.P., and U.H.; drafting of the manuscript: L.P., T.Q., and T.K.; critical revision of the manuscript for important: K.K., M.R., J.F., N.L.S., K.B., and F.M.B.; statistical analysis: T.Q., T.K., and M.M.; obtained funding: L.P., T.K., and U.H. T.Q. and T.K. contributed equally.

Corresponding author

Correspondence to Lajos Pusztai.

Ethics declarations

Competing interests

L.P. has received consulting fees and honoraria from Pfizer, Astra Zeneca, Merck, Novartis, Bristol-Myers Squibb, Genentech, Eisai, Pieris, Immunomedics, Seattle Genetics, Clovis, Syndax, H3Bio, and Daiichi. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Reporting Summary

Supplemental Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qing, T., Karn, T., Rozenblit, M. et al. Molecular differences between younger versus older ER-positive and HER2-negative breast cancers. npj Breast Cancer 8, 119 (2022). https://doi.org/10.1038/s41523-022-00492-0

Download citation

Received: 26 March 2022
Accepted: 26 October 2022
Published: 07 November 2022
DOI: https://doi.org/10.1038/s41523-022-00492-0

This article is cited by

Real-world use of multigene signatures in early breast cancer: differences to clinical trials
- Luca Licata
- Rita De Sanctis
- Giampaolo Bianchini
Breast Cancer Research and Treatment (2024)