GATA3 somatic mutations are associated with clinicopathological features and expression profile in TCGA breast cancer patients

The effect of somatic mutations and the gene expression profiles on the prognosis is well documented in cancer research. This study was conducted to evaluate the association of GATA3 somatic mutations with tumor features, survival, and expression profiles in breast cancer. Clinicopathological information was compared between TCGA-BRCA patients with GATA3-mutant and non-mutant tumors in all patients as well as in ER-positive subgroup. Cox-regression method was used to evaluate the association of the GATA3 mutation status with overall survival time. Differential gene expression, functional annotation, and protein–protein interaction analyses were performed using edgeR, Metascape, DAVID, STRING and CytoNCA. GATA3-mutant and non-mutant samples had significantly different clinicopathological features (p < 0.05). While GATA3 mutation status was not associated with the overall survival in the entire cohort (padj = 0.52), the GATA3-wild type ER-positive cases had a better prognosis than mutant ones (padj = 0.04). GATA3 expression was higher in tumors than normal tissues. Several pathways were different between mutant and non-mutant groups (p < 0.05). Interleukin-6 was found as the highest scored gene in both comparisons (normal vs. mutant and normal vs. non-mutant groups) in the entire patient and in the ER-positive subgroup, suggesting the association of IL6 with breast tumorigenesis. These findings suggest that GATA3 mutations can be associated with several tumor characteristics and influence the pattern of gene expression. However, GATA3 mutation status seems to be a prognostic factor for the disease only in ER-positive patients.


Scientific Reports
| (2021) 11:1679 | https://doi.org/10.1038/s41598-020-80680-9 www.nature.com/scientificreports/ prognostic factor 11 , several researchers have reported that it was associated with better survival in breast cancer patients 12,13 . Also, it has been reported that breast tumors expressing low levels of GATA3 were correlated with larger tumors 14 . A literature report suggests that related pathways may be the reason for the association of this gene with some clinical features of breast cancer 15,16 . In light of these findings, GATA3 has been considered as an important gene in breast development and cancer 17 . However, the role of GATA3 somatic mutations in the development of breast tumor characteristics, patient survival outcomes, and its impact on tumor gene expression profiles is poorly understood.
In this study, we evaluated the genomic alterations of GATA3 in breast tumors, using the data collected by TCGA 18 , and analyzed the associations of GATA3 somatic mutations with tumor features, patient survival, and tumor gene expression profiles to highlight the clinical importance of this gene in breast cancer.

Results
GATA3 somatic mutation status and association with clinicopathological features. In the TCGA-BRCA cohort, tumors of 975/1085 female patients were evaluated for somatic mutations. Among these patients, a total of 103 different GATA3 mutations were identified in 138 patients (14.15%). Insertions constituted the largest type of mutations (50.5%), followed by deletions (29.1%) and substitutions (20.4%). A large portion of the mutations (74.7%) resulted in frame-shifts and variant effect predictor (VEP) 19 has indicated 96.3% of all mutations were predicted to have a high or moderate impact. The most frequent mutation was X309, which is a two-base pairs (CA) deletion/splice site mutation (chr10:g.8069470delCA, annotated as GATA3 X309_splice in the GDC portal). This mutation was detected in tumors from 21 patients (15.22% of patients with GATA3 mutations). There were 11 additional recurrent mutations identified in more than one patient (n = 2-8), while the rest of the mutations were detected only in one patient.
The average diagnosis age was 45.66 ± 13.65 and 58.77 ± 12.97 in patients with and without GATA3 mutations, respectively (p = 0.001; Table 1). We compared the GATA3 mutation status in patients with different age categories. This analysis showed that the proportion of the patients with GATA3 mutated tumors was higher in the patients diagnosed under 40 years of age compared to those who were diagnosed after 40 years of age [20 of 89 patients under 40 years old (22.5%) and 118 of 885 patients above 40 years old (13.3%), respectively; p = 0.02]. In addition to age at diagnosis, menopausal status was significantly different between patients with and without GATA3 mutations (p = 0.00004; Table 1). Other clinicopathological characteristics that were associated with GATA3 mutation status in this patient cohort (Table 1) are the following: pathologic tumor size was significantly different between patients with GATA3 mutant tumors compared to patients with wild-type GATA3 tumors (p = 0.01). A significant difference was also seen with tumor histological types. There was a strong relationship between the GATA3 mutation and ER/PR status; almost none of the tumors with GATA3 mutations were ERnegative (Table 1). Additionally, in the multivariable logistic regression analysis, age at diagnosis, tumor size (pT), PR status, and histological tumor type were found to be independently associated factors of GATA3 mutation status in breast cancer (Table 2).
We repeated these analyses in the ER-positive subgroup (Tables 1, 2). Overall, the results in this subgroup analysis were similar to that of the entire patient cohort. An interesting finding in the ER-positive subgroup analysis was that the mutant cases were more frequently presented than non-mutants in the Asian population (Table 1).
GATA3 somatic mutations and prognosis. The median overall survival was 10.80 ± 0.7 years (11.69 ± 3.63 and 10.61 ± 2.19 years in patients without GATA3 mutation compared with patients with GATA3 mutation, respectively; p = 0.73). There was no significant difference between the two groups in terms of median survival time. This finding was also similar in the ER-positive subgroup (Table 3).
Univariate Cox proportional hazard analysis indicated age at diagnosis, menopause status, lymph node ratio, history of neoadjuvant therapy and adjuvant radiation therapy to be associated with survival times in the patients. Also, several tumor characteristics including margin status, pathologic tumor size (pT), lymph node (pN), and stage were associated with overall survival (Table S2).
While Multivariable Cox regression model adjusting for prognostic factors revealed that GATA3 somatic mutation status was not an independent prognostic factor for all patients (p adj = 0.52), wild type samples indicated better prognosis in the ER-positive subgroup (p adj = 0.04) (Table 3). However, age (p adj = 0.0001), stage (p adj = 7.461E−10) and radiation therapy (p = 0.003) were significantly and independently associated with overall survival time in the entire patient cohort. Analysis of the ER-positive cases indicated age (p adj = 2.411E−8) and stage (p adj = 0.026) as independent factors associated with overall survival time.
Gene expression analysis. According to the TCGA expression data, GATA3 expression level was higher in GATA3-mutant (log FC = 2.78, p = 4.38E−34 in all patients and log FC = 2.66, p = 2.07E−57 in ER-positive subgroup) and non-mutant (log FC = 1.76, p = 2.11E−21 in all patients and log FC = 1.96, p = 3.24E−46 in ERpositive subgroup) tumors than normal tissues. While mutant tumors had a higher level than non-mutants (log FC = 1.02; p = 1.15E−12), this was not detected in the analysis of the ER-positive breast cancer patients.
A total of 4816 differentially expressed genes (DEGs) were observed between the GATA3-mutant and normal tissues (2476 up-regulated and 2340 down-regulated genes). Additionally, there were a total of 4308 DEGs between the GATA3-non-mutant and normal tissues (2593 up-regulated and 1715 down-regulated genes). Finally, 907 DEGs between the non-mutant and mutant tumors were found: 169 genes were up-regulated and 738 genes were down-regulated at an FDR < 0.05 and log fold change (log FC) > 1. In the ER-positive subgroup, 4522 (2143 up-regulated and 2379 down-regulated genes), 4066 (2055 up-regulated and 2011 down-regulated genes) and 480 genes (103 up-regulated and 377 down-regulated genes) were found in the comparison between mutant www.nature.com/scientificreports/ versus normal, non-mutant versus normal and non-mutant versus mutant tumors, respectively. Volcano plots are shown in Fig. 1.
The most up and down-regulated DEGs in three categories of comparison are listed in Table 4. MYH2 and CKM in mutant versus normal and non-mutant versus normal and SMR3B in mutant versus non-mutant were the top down-regulated genes. The top up-regulated genes were MUC2, S100A7A and ALDOB in mutant versus Table 1. Results of univariate logistic regression analysis examining the association between GATA3 mutation status and clinical features. AJCC American Joint Committee on Cancer, CI confidence interval, ER estrogen receptor, IDC invasive ductal carcinoma, IHC immunohistochemistry, ILC invasive lobular carcinoma, ISH in situ hybridization, OR odds ratio, PR progesterone receptor, TNBC triple negative breast cancer. a Significant p values are shown in bold. b According to ISH/IHC results. c Association between the receptor status and GATA3 mutation status cannot be estimated because all GATA3 mutant tumors are also ER and/or PR positive.
Significant p values are shown in bold. d Other category includes rare types of tumors (e.g. Metaplastic, Medullary tumors). Venn diagram shows the common and specific genes in every group. As it can be seen in Fig. 2, 389 and 236 genes are common in the three groups of all and ER-positive patients, respectively, that might be involved in breast carcinogenesis and also be influenced by GATA3 mutations.

Categories Wild n (%) Mutant n (%) p value a OR (95% CI) Wild n (%) Mutant n (%) p value a OR (95% CI)
Functional annotation analysis of differentially expressed genes. To gain an insight into the functionality of the DEGs between normal and tumor (mutant and non-mutant) samples, gene set enrichment analysis was performed using the Metascape and DAVID functional enrichment tool. According to DAVID outputs, 36 pathways found to be significantly different between GATA3-mutant and normal samples, 7 pathways had been previously reported as the most important pathways related to breast cancer (p ≤ 0.05) [20][21][22] . Evaluation of Table 3. Results of the univariate and multivariable Cox regression analysis for GATA3 mutation status. CI confidence interval, HR hazards ratio.   Table S3.

PPI network of module analysis.
To gain a better understanding of the biological relationships between breast cancer-related genes, the genes that share the same GO term related to breast cancer were examined in the STRING database. Results indicated that 116 and 95 genes (proteins) for all patients and 142 and 191 for ER-positive subgroup matched the database and were used to construct the PPI network between GATA3 mutant tumor and normal tissues (Fig. 3) and between GATA3 non-mutant tumor and normal tissues, respectively (Fig. 4). The top nodes with high topology score that were calculated by three centrality methods, were considered as hub nodes. Interleukin 6 (IL6) had the highest scores in three centrality methods in both comparisons between normal and mutant and normal and non-mutant groups in all patients as well as ER-positive subgroup.  www.nature.com/scientificreports/ groups. PPI analysis did not find any prominent network when the two mutant and non-mutant groups were compared, which may be due to the limited number of identified gene sets.

Discussion
Cancer, as a multifactorial disease with complex pathological features, is influenced by genetic factors. However, somatic mutations are amongst the most important well-known genetic factors involved in cancer. The role of somatic mutations in tumor development and progression of cancer has been confirmed through advances in technology and increasing knowledge about mutation characteristics. In this study, we focused on the analysis of a gene with known roles in breast cancer, GATA3 8,16 , using the large-scale data obtained by the TCGA project 18 . In this cohort, the frequency of somatic mutations in GATA3 was 14.15%. As previously reported, this gene is one of the three genes representing more than 10% somatic mutations in all breast cancer patients 23 . The analysis of clinical factors in relationship with the GATA3 somatic mutations reported in TCGA-BRCA project revealed that GATA3 mutations were associated with several clinical features and pathological subtypes of breast cancer. Also, differential gene expression analysis has identified different patterns of expression in normal samples, GATA3  www.nature.com/scientificreports/ mutant and non-mutant tumor tissues in the entire cohort as well as in the ER-positive cases. Furthermore, our results also showed three pathways were significantly different between GATA3 mutant and non-mutant tumors.
Our results suggested that patients with GATA3 mutant tumors were significantly younger than those patients without GATA3 mutations. A previous report has indicated that younger luminal B cases had GATA3 mutations more frequently than older patients 24 . This finding has also been validated in metastatic breast cancer patients 27 . Since ER-positive younger patients indicated poorer prognosis 28 , a higher rate of GATA3 mutations may have clinical importance.
Our results suggested the importance of GATA3 in tumor size in the TCGA dataset. It has been previously reported that mutational load is correlated with the size of tumor in breast cancer patients 29 . Therefore, it is expected to observe a higher rate of GATA3 mutation in larger tumors. Furthermore, a higher rate of rare types of tumor (Mixed Histology, Mucinous Carcinoma and Medullary Carcinoma) was observed in association with GATA3 mutations (Table 1). Conversely, after adjustment and also in the ER-positive group, we found a significant difference in mutation status between ILC and IDC, but not in rare types of breast cancer ( Table 2). These results may be affected by the small number of rare types in comparison with ductal carcinoma of the breast. However, this may highlight the impact of mutations on different features of breast tumors ( Table 2). In addition, the results of our analysis showed ER-positive tumors harbored almost all GATA3 somatic mutations detected in the patient cohort. This finding confirms previous reports showing an association of GATA3 with ER-positive status and luminal differentiation, which may reflect its role in response to chemotherapy 30 . Also, a study has shown that GATA3 up-regulates and stabilizes ER mRNA transcription 31 . In contrast, GATA3 expression is down-regulated by progestin-induced PR activation 32 . It may explain the association of GATA3 mutations with the luminal type of breast cancer as a hormone receptor-positive type.
As the two aspects of GATA3 have been studied, i.e. a difference in expression between mutant and nonmutant or normal tissues and the impact of its mutations on tumor properties, it can be postulated that in agreement with previous studies, our data support the higher level of expression in tumor tissues than normal samples 16,33 and the lack of importance of GATA3 somatic mutations as an independent factor in patient survival 11,34 . However, non-mutant samples showed better survival than others in ER-positive patients. META-BRIC data indicated the prognostic value of GATA3 X308_Splice mutation, as the mutant samples had better survival than wild-type ones both in all patients and ER-positive patients 35 . On the other hand, in samples representing a high expression of GATA3, mutant patients had longer survival than wild-types, and mutations in the second GATA3 zinc-finger (ZnFn2) was associated with lower survival time than other mutations 36 . Another study has also reported that a significant association of GATA3 mutations with hormone receptor-positive situation may reflect the better prognosis of the disease 17 . All of these different findings suggest the importance of mutation type and co-consideration of other related factors in the association of GATA3 somatic mutation with overall survival. However, different factors including the number of mutant samples and the study settings may cause this variation. We acknowledge such variation in these findings can make it more difficult to come to a straightforward conclusion. Regarding to the higher level of expression in tumor samples (GATA3 mutant and non-mutant) than normal ones, a meta-analysis study confirmed the relation between GATA3 overexpression and favorable phenotypes including ER-positive status 14 . On the other hand, a cell line study indicated the active GATA3 transcription factors cause proliferative phenotypes and promote the growth of ER-positive breast cancer cell lines 37 . In addition to the impact of the mutation on expression level, somatic mutations may affect the binding site and influence the rate of downstream genes expression and result in a changed transcriptional network 36,38 . Furthermore, it has been observed that higher rate of GATA3 mutations in ER-positive patients may lead to resistance to endocrine therapy 27 . Therefore, all of these findings indicate diverse activities of GATA3 www.nature.com/scientificreports/ protein which affect the luminal breast epithelial cells via different pathways can neutralize the impact of this gene on the prognosis of the disease. MYH2, as a down-regulated gene in GATA3-mutant and non-mutant samples, encodes an Actin-based motor protein with the skeletal muscle contraction activity. According to the Human Protein Atlas 18 , MYH2 protein was not detected in breast tissues, however, low amount of RNA has been observed 39 . Since GATA3 mutants compared with non-mutant tumor samples did not indicate any difference in expression of MYH2, its lower expression in tumor samples may be resulted due to the tumor environment. Similar to MYH2, CKM (Muscle type of CK) is down-regulated in tumor (GATA3 mutant and non-mutant) tissues. Expression of this gene in mRNA level has previously been shown in breast samples 39 . Furthermore, a decreased level of serum CK has been specified in breast cancer patients 40 . Moreover, SMR3B gene (submaxillary gland androgen-regulated protein 3B) was identified to have differential expression between GATA3 mutant and non-mutant tumor tissues as mutant samples indicate a lower level of expression. Previously, it has been predicted SMR3B has GATA3 transcription factor binding site motif 41 and is expressed more in triple-negative breast cancer patients with poor prognosis compared to the low-risk patients 42 . As GATA3 protein has a role in expression regulation, lower level of SMR3B expression in tumor carrying GATA3 mutations can be explained by this fact. CSN1S1 (Casein Alpha S1), is another top down-regulated gene in GATA3 mutant samples compared to non-mutants in the ER-positive subgroup. Its RNA expression has been identified in breast tissue, however, the protein has only been detected in lactating breast. Because of significantly different protein expression in benign prostate hyperplasia compared with normal and tumor prostate tissues, CSN1S1 has been reported as a potential biomarker for early identification of benign prostate hyperplasia patients 43 . Moreover, CSN1S1 has identified as a tumor suppressor that controls breast tumor growth and metastasis 44 . According to our finding, GATA3-mutants had larger tumor size that it may be due to the down-regulation of CSN1S1.
We found MUC2 over-expression in GATA3-mutant tumor than normal samples. MUC2 is up-regulated in mucinous carcinomas 45 , and have higher expression in invasive breast tumors than adjacent normal tissues. A significantly higher level of serum MUC2 has also been found in breast cancer patients compared with healthy people 46 . Furthermore, as a prognostic effector, MUC2 protein is associated with shorter disease-free survival 47 . Evaluation of a cell line with the limited expression of MUC2 indicated a decreased rate of proliferation and better response to chemotherapy by efficiently induced apoptosis 48 . These findings confirmed the potential prominent role of MUC2 expression as the prognostic marker in breast cancer. However, the relationship between GATA3 and MUC2 remains to be evaluated. Another up-regulated gene, S100A15, is a calcium-binding protein with higher expression in non-mutant tumors than normal ones. While there is evidence which indicates elevated S100A15 transcripts in ER/PR negative breast cancers 49 , the association of this gene with breast cancer prognosis has not been confirmed 50 . In the ER-positive subgroup, CST5 (Cystatin D), was the first top differentially upregulated gene between non-mutants and normal. This gene has been down-regulated in colon cancer 51 , and its induction by calcitriol can also prevent the breast cancer cells growth 52 . The mutant and non-mutant comparison showed Aldolase B (ALDOB), a glycolytic enzyme, to be up-regulated in GATA3 mutant samples. However, tumor samples did not show differential expression in comparison with normal ones. Previous studies indicated a decreased level of ALDOB in several cancers 53,54 . Therefore, the higher expression of ALDOB in GATA3 mutant breast cancer tumors may be caused by involved common regulatory pathways that need to be confirmed by functional and gene-gene interaction analyses. Furthermore, according to the Venn diagram, 75 genes in the entire patient group and 46 in ER-positive subgroup, were differentially expressed between GATA3-mutant and non-mutant tumors that may indicate the impact of GATA3 in the expression profile of the tumor cells.
Considering the differently expressed pathways, previously indicated to be associated with breast cancer, protein digestion and absorption pathway was different between all categories 55 . Other pathways were specifically different between mutant and non-mutant tumors. Wnt/β-catenin signaling pathway is a modulating factor of mammary gland morphogenesis and cell properties 20 and mediates the increase of GATA3 expression 21 . Consistent with our finding, a previous study indicated WNT/β-catenin signaling as an enriched gene set in GATA3 X308_Splice mutant breast tumor 35 . Cell adhesion molecules (CAMs) was another different gene set between tumor samples. The role of this pathway has been recognized in the carcinogenesis and metastasis of breast cancer. Therefore, evaluation of the involved genes can be diagnostic, prognostic and therapeutic targets 22,56 . Besides, the regulatory role of GATA3 in adhesion molecules expression has been identified in cell culture analysis 57 . Hence, expression variation of these genes can happen in association with GATA3 situation induced by mutations. These findings may reflect the interactions between GATA3 and genes involved in WNT and cell adhesion molecules pathways in the pathogenesis of breast cancer. Furthermore, we found that systemic lupus (SLE) erythematosus pathway is differentially expressed between ER-positive breast tumor and normal tissues. It has been shown that SLE is influenced by estrogen-estrogen receptor-mediated signaling through the modulation of cytokine production 58 . There are also reports indicating a lower rate of hormone-dependent cancers in SLE patients although they may tend for a higher incidence of triple-negative breast cancer compared to general population 59 . As the main finding of the protein-protein interaction analysis, IL6 was identified to be an important hub node in the comparison between tumor and normal samples. In line with our results indicating the contribution of this gene in different pathways, IL6 overexpression has been previously described in breast cancer 60 . Many cellular functions including oncogenesis are influenced by IL6 61 . These findings suggest the crucial role of IL6 in the pathogenesis of breast cancer and the importance of targeting this gene in the treatment of the disease. www.nature.com/scientificreports/

Conclusion
In conclusion, our results suggest that GATA3 mutation status is associated with a number of clinicopathological features, as well as with overall survival time only in ER-positive breast cancer. Our results also indicate a possible common biological process involving GATA3 mutations and ER/PR status, which needs to be confirmed by functional analyses. The GATA3 mutations may influence the expression profile of the tumor cells via impact on expression and activity rate of the GATA3 gene. These findings should also be confirmed using gene-gene interaction analyses and homogenous samples.

Methods
Patients and data files. The study population has consisted of female breast cancer patients in the TCGA-BRCA cohort. Information on the GATA3 mutations in the tumors was retrieved from https ://porta l.gdc.cance r.gov. This information was available for 975 of the patients. The tumor mRNA expression data (level 3 data; including raw count data) was extracted from Illuminahiseq_rnaseqV2-exon_quantification (MD5) data file at https ://gdac.broad insti tute.org/. This data was available for 771 tumor (671 non-mutant and 100 mutant) and 99 normal tissues of the patients. Demographic and clinical data were obtained from the file rendered by the Legacy Archive of the GDC portal at https ://porta l.gdc.cance r.gov/legac y-archi ve/files /735bc 5ff-86d1-421a-8693-6e6f9 20555 63. Categorization of the study population was performed according to standard protocols [62][63][64][65][66][67][68] . All analyses were also replicated in ER-positive samples including 482 non-mutant and 92 mutant tumor tissues.
Computational analysis of expression profile. The edgeR program (http://bioco nduct or.org/packa ges/relea se/bioc/html/edgeR .html) is a Bioconductor software package for examining the differential expression of replicated count data using an over-dispersed Poisson model and Empirical Bayes methods to account for both biological and technical variability and moderate the degree of over-dispersion across transcripts 69 . This program was used to determine the DEGs in the normal tissues when compared to the tumors (GATA3 mutant and non-mutant). The probabilistic methods were used by edgeR to evaluate the differential expression. The affected genes determined based on a false discovery rate (FDR) < 0.05 and a log Fold change (FC) > 1.

Functional annotation of differentially expressed genes (DEGs). The proteins encoded by DEGs
were analyzed, and annotated using Metascape, "A Gene Annotation and Analysis Resource", which can be used to analyze multi-platform OMICs data (http://metas cape.org/gp/index .html), DAVID "Database for Annotation, Visualization and Integrated Discovery" (https ://david .ncifc rf.gov/) [70][71][72] to test for gene set enrichment analysis, Gene Ontology (GO) terms and pathways. According to the database, DAVID pathways output is based on KEGG (Kyoto Encyclopedia of Genes and Genomes). Only terms with modified Fisher Exact p value ≤ 0.05 were considered significant. Metascape is a web-based portal, and is useful for functional annotations of genes 73 .
Protein-protein interaction (PPI) network. DEGs (corrected p values ≤ 0.05) were imported to the search tool of STRING (v10.0, http://strin g-db.org/) for the retrieval of interacting genes/proteins by selecting Homo sapiens as the organism. STRING can identify a network of close interactions among this set of genes based on information on experimental as well as predicted protein interactions. The three methods including degree centrality, betweenness centrality, and closeness centrality were used to calculate the topology scores of nodes in the PPI network using the CytoNCA 74 .
Statistical analysis. Demographic and clinical/molecular data that were examined during statistical analyses are shown in the supplementary information file, Table S1. Comparison between variables between the two groups (mutant vs. non-mutant) was examined using Pearson's Chi-squared test for categorical variables and independent sample t test for continuous variables. Univariate logistic regression analysis was used to examine the associations of GATA3 somatic mutation status with different variables, and the odds ratios (OR) and 95% confidence intervals (CIs) were presented. Multivariate logistic regression analysis was used to assess the variables that were independently predictive of the GATA3 mutation status. For this purpose, covariates with p values ≤ 0.05 in the univariate analysis were entered into a multivariable model, excluding the rare variables (ER status and hormone receptor status). In addition, menopause status and age at diagnosis were highly associated, thus, menopausal status (which had more missing data than the age at diagnosis) was excluded from the multivariable model. Overall survival (OS) time is defined as the time from diagnosis till the time of death or last contact. Associations between variables and OS were examined using the Kaplan-Meier plots/Log-rank test and Cox proportional hazards regression methods. Results of the univariate Cox regression analysis was used to select the variables to be entered into the multivariable Cox regression models. For this purpose, covariates with p values less than 0.05 in the univariate analysis were entered into a covariate selection method (Backward-LR), excluding the rare variables, such as metastasis status (pM) and history of neoadjuvant therapy, and highly correlated variables. Highly correlated variables included menopausal status (excluded) and age at diagnosis, tumor size (pT) (excluded) and stage, and lymph node ratio (excluded) and lymph node status (pN). As a result, age, stage, and radiation therapy status were selected for the analysis of the entire cohort. Association of the GATA3 mutation status with OS was then examined in a multivariable Cox model after adjusting for these clinical factors. Similar to this process, OS analysis was done for the ER-positive subgroup. After excluding the rare variables, such as metastasis status and history of neoadjuvant therapy, and highly correlated variables including menopausal status, tumor size and lymph node ratio, the variables including age, stage, and lymph node status were selected