Abstract
The effect of somatic mutations and the gene expression profiles on the prognosis is well documented in cancer research. This study was conducted to evaluate the association of GATA3 somatic mutations with tumor features, survival, and expression profiles in breast cancer. Clinicopathological information was compared between TCGA-BRCA patients with GATA3-mutant and non-mutant tumors in all patients as well as in ER-positive subgroup. Cox-regression method was used to evaluate the association of the GATA3 mutation status with overall survival time. Differential gene expression, functional annotation, and protein–protein interaction analyses were performed using edgeR, Metascape, DAVID, STRING and CytoNCA. GATA3-mutant and non-mutant samples had significantly different clinicopathological features (p < 0.05). While GATA3 mutation status was not associated with the overall survival in the entire cohort (padj = 0.52), the GATA3-wild type ER-positive cases had a better prognosis than mutant ones (padj = 0.04). GATA3 expression was higher in tumors than normal tissues. Several pathways were different between mutant and non-mutant groups (p < 0.05). Interleukin-6 was found as the highest scored gene in both comparisons (normal vs. mutant and normal vs. non-mutant groups) in the entire patient and in the ER-positive subgroup, suggesting the association of IL6 with breast tumorigenesis. These findings suggest that GATA3 mutations can be associated with several tumor characteristics and influence the pattern of gene expression. However, GATA3 mutation status seems to be a prognostic factor for the disease only in ER-positive patients.
Introduction
Breast cancer, the most common type of cancer in women worldwide, is a heterogeneous disease with different pathological and molecular features and subtypes1. The disease is caused by both environmental and genetic factors2. In this regard, numerous genetic risk factors have been identified for tumor development and progression3. Except for the genes with highly penetrant and hereditary mutations, such as BRCA1 and BRCA24, the genetic basis of breast cancer and the role of genetic variations and their effects on malignant transformation are currently complex and requires further investigations. Several studies have demonstrated that somatic mutations in oncogenes and tumor suppressor genes are major drivers of different types of breast tumors and correlate with clinicopathological characteristics of the disease, response to therapy, or prognosis5,6,7. GATA binding protein 3 (GATA3) is one of the important genes involved in breast cancer development8.
GATA binding protein 3 is a transcription factor that encodes a protein member of the GATA family. GATA family members have two conserved Zinc-finger DNA binding domains. This transcription factor binds to promoters of target genes through the consensus (A/T)GATA(A/G) motifs9. Previous studies have demonstrated that GATA3 protein has crucial roles in cell development and differentiation in different types of cells, including mammary tissue10. Therefore, variations in its expression can affect downstream pathways and result in changes in cellular characteristics as its higher expression has been identified in hormone receptor-positive breast cancer patients11. While some data have pointed out that the GATA3 expression level is not an independent prognostic factor11, several researchers have reported that it was associated with better survival in breast cancer patients12,13. Also, it has been reported that breast tumors expressing low levels of GATA3 were correlated with larger tumors14. A literature report suggests that related pathways may be the reason for the association of this gene with some clinical features of breast cancer15,16. In light of these findings, GATA3 has been considered as an important gene in breast development and cancer17. However, the role of GATA3 somatic mutations in the development of breast tumor characteristics, patient survival outcomes, and its impact on tumor gene expression profiles is poorly understood.
In this study, we evaluated the genomic alterations of GATA3 in breast tumors, using the data collected by TCGA18, and analyzed the associations of GATA3 somatic mutations with tumor features, patient survival, and tumor gene expression profiles to highlight the clinical importance of this gene in breast cancer.
Results
GATA3 somatic mutation status and association with clinicopathological features
In the TCGA-BRCA cohort, tumors of 975/1085 female patients were evaluated for somatic mutations. Among these patients, a total of 103 different GATA3 mutations were identified in 138 patients (14.15%). Insertions constituted the largest type of mutations (50.5%), followed by deletions (29.1%) and substitutions (20.4%). A large portion of the mutations (74.7%) resulted in frame-shifts and variant effect predictor (VEP)19 has indicated 96.3% of all mutations were predicted to have a high or moderate impact. The most frequent mutation was X309, which is a two-base pairs (CA) deletion/splice site mutation (chr10:g.8069470delCA, annotated as GATA3 X309_splice in the GDC portal). This mutation was detected in tumors from 21 patients (15.22% of patients with GATA3 mutations). There were 11 additional recurrent mutations identified in more than one patient (n = 2–8), while the rest of the mutations were detected only in one patient.
The average diagnosis age was 45.66 ± 13.65 and 58.77 ± 12.97 in patients with and without GATA3 mutations, respectively (p = 0.001; Table 1). We compared the GATA3 mutation status in patients with different age categories. This analysis showed that the proportion of the patients with GATA3 mutated tumors was higher in the patients diagnosed under 40 years of age compared to those who were diagnosed after 40 years of age [20 of 89 patients under 40 years old (22.5%) and 118 of 885 patients above 40 years old (13.3%), respectively; p = 0.02]. In addition to age at diagnosis, menopausal status was significantly different between patients with and without GATA3 mutations (p = 0.00004; Table 1). Other clinicopathological characteristics that were associated with GATA3 mutation status in this patient cohort (Table 1) are the following: pathologic tumor size was significantly different between patients with GATA3 mutant tumors compared to patients with wild-type GATA3 tumors (p = 0.01). A significant difference was also seen with tumor histological types. There was a strong relationship between the GATA3 mutation and ER/PR status; almost none of the tumors with GATA3 mutations were ER-negative (Table 1). Additionally, in the multivariable logistic regression analysis, age at diagnosis, tumor size (pT), PR status, and histological tumor type were found to be independently associated factors of GATA3 mutation status in breast cancer (Table 2).
We repeated these analyses in the ER-positive subgroup (Tables 1, 2). Overall, the results in this subgroup analysis were similar to that of the entire patient cohort. An interesting finding in the ER-positive subgroup analysis was that the mutant cases were more frequently presented than non-mutants in the Asian population (Table 1).
GATA3 somatic mutations and prognosis
The median overall survival was 10.80 ± 0.7 years (11.69 ± 3.63 and 10.61 ± 2.19 years in patients without GATA3 mutation compared with patients with GATA3 mutation, respectively; p = 0.73). There was no significant difference between the two groups in terms of median survival time. This finding was also similar in the ER-positive subgroup (Table 3).
Univariate Cox proportional hazard analysis indicated age at diagnosis, menopause status, lymph node ratio, history of neoadjuvant therapy and adjuvant radiation therapy to be associated with survival times in the patients. Also, several tumor characteristics including margin status, pathologic tumor size (pT), lymph node (pN), and stage were associated with overall survival (Table S2).
While Multivariable Cox regression model adjusting for prognostic factors revealed that GATA3 somatic mutation status was not an independent prognostic factor for all patients (padj = 0.52), wild type samples indicated better prognosis in the ER-positive subgroup (padj = 0.04) (Table 3). However, age (padj = 0.0001), stage (padj = 7.461E−10) and radiation therapy (p = 0.003) were significantly and independently associated with overall survival time in the entire patient cohort. Analysis of the ER-positive cases indicated age (padj = 2.411E−8) and stage (padj = 0.026) as independent factors associated with overall survival time.
Gene expression analysis
According to the TCGA expression data, GATA3 expression level was higher in GATA3-mutant (log FC = 2.78, p = 4.38E−34 in all patients and log FC = 2.66, p = 2.07E−57 in ER-positive subgroup) and non-mutant (log FC = 1.76, p = 2.11E−21 in all patients and log FC = 1.96, p = 3.24E−46 in ER-positive subgroup) tumors than normal tissues. While mutant tumors had a higher level than non-mutants (log FC = 1.02; p = 1.15E−12), this was not detected in the analysis of the ER-positive breast cancer patients.
A total of 4816 differentially expressed genes (DEGs) were observed between the GATA3-mutant and normal tissues (2476 up-regulated and 2340 down-regulated genes). Additionally, there were a total of 4308 DEGs between the GATA3-non-mutant and normal tissues (2593 up-regulated and 1715 down-regulated genes). Finally, 907 DEGs between the non-mutant and mutant tumors were found: 169 genes were up-regulated and 738 genes were down-regulated at an FDR < 0.05 and log fold change (log FC) > 1. In the ER-positive subgroup, 4522 (2143 up-regulated and 2379 down-regulated genes), 4066 (2055 up-regulated and 2011 down-regulated genes) and 480 genes (103 up-regulated and 377 down-regulated genes) were found in the comparison between mutant versus normal, non-mutant versus normal and non-mutant versus mutant tumors, respectively. Volcano plots are shown in Fig. 1.
Volcano plats showed analysis of differential expressed genes (DEGs) between the normal compared with the tumors (GATA3-mutant and non-mutant). (A) Log2-fold change mutant and normal; (B) non-mutant and normal; (C) non-mutant and mutant; (D) Log2-fold change mutant and normal in ER-positive patients; (E) non-mutant and normal in ER-positive patients; (F) non-mutant and mutant in ER-positive patients. Green dots represent significantly DEGs (FDR < 0.05 and log FC > 1).
The most up and down-regulated DEGs in three categories of comparison are listed in Table 4. MYH2 and CKM in mutant versus normal and non-mutant versus normal and SMR3B in mutant versus non-mutant were the top down-regulated genes. The top up-regulated genes were MUC2, S100A7A and ALDOB in mutant versus normal, non-mutant versus normal and mutant versus non-mutant, respectively. The ER-positive subgroup analysis showed MUC2, CST5 and ALDOB as the top up-regulated genes and MYH2 as the top down-regulated gene between mutant versus normal, non-mutant versus normal, and CSN1S1 as the top down-regulated gene between mutants versus non-mutant samples.
Venn diagram shows the common and specific genes in every group. As it can be seen in Fig. 2, 389 and 236 genes are common in the three groups of all and ER-positive patients, respectively, that might be involved in breast carcinogenesis and also be influenced by GATA3 mutations.
Functional annotation analysis of differentially expressed genes
To gain an insight into the functionality of the DEGs between normal and tumor (mutant and non-mutant) samples, gene set enrichment analysis was performed using the Metascape and DAVID functional enrichment tool. According to DAVID outputs, 36 pathways found to be significantly different between GATA3-mutant and normal samples, 7 pathways had been previously reported as the most important pathways related to breast cancer (p ≤ 0.05)20,21,22. Evaluation of non-mutant tumors against normal tissue samples indicated 37 significantly different. Also, 3 different pathways (protein digestion and absorption; Wnt signalling; and cell adhesion molecules) were significantly different between mutant and non-mutant tumor tissues. Analysis of ER-positive patients indicated 37 and 36 significantly different pathways in normal samples in comparison with mutant and non-mutant tumors, respectively. Furthermore, pancreatic secretion pathway was different between mutant and non-mutant tumors. These results are shown in the supplementary information file, Table S3.
PPI network of module analysis
To gain a better understanding of the biological relationships between breast cancer-related genes, the genes that share the same GO term related to breast cancer were examined in the STRING database. Results indicated that 116 and 95 genes (proteins) for all patients and 142 and 191 for ER-positive subgroup matched the database and were used to construct the PPI network between GATA3 mutant tumor and normal tissues (Fig. 3) and between GATA3 non-mutant tumor and normal tissues, respectively (Fig. 4).
PPI network of breast cancer differentially expressed genes (DEGs) between normal and GATA3 mutant samples in (A) the entire patient and (B) ER-positive subgroup. The node size is proportional to the degree value as the bigger size means the larger degree value. The color of the node is related to the expression of genes: up regulated genes are shown in Red and down regulated genes are shown in Blue.
PPI network of breast cancer differentially expressed genes DEGs between normal and GATA3-non-mutant in (A) the entire patient and (B) ER-positive subgroup. The node size is proportional to the degree value as the bigger size means the larger degree value. The color of the node is related to the expression of genes: up regulated genes are shown in Red and down regulated genes are shown in Blue.
The top nodes with high topology score that were calculated by three centrality methods, were considered as hub nodes. Interleukin 6 (IL6) had the highest scores in three centrality methods in both comparisons between normal and mutant and normal and non-mutant groups in all patients as well as ER-positive subgroup. FN1, IGF1, FGF2 and LEP in all patients and, LEP and FN1 genes in the ER-positive subgroup could be considered as hub nodes in normal and mutant. Moreover, IGF1, FGF2, FN1 and SPP1 genes in all patients and FOS, FGFR, LEP and CDK1 genes in ER-positive subgroup could be considered as hub nodes in normal and non-mutant groups. PPI analysis did not find any prominent network when the two mutant and non-mutant groups were compared, which may be due to the limited number of identified gene sets.
Discussion
Cancer, as a multifactorial disease with complex pathological features, is influenced by genetic factors. However, somatic mutations are amongst the most important well-known genetic factors involved in cancer. The role of somatic mutations in tumor development and progression of cancer has been confirmed through advances in technology and increasing knowledge about mutation characteristics. In this study, we focused on the analysis of a gene with known roles in breast cancer, GATA38,16, using the large-scale data obtained by the TCGA project18. In this cohort, the frequency of somatic mutations in GATA3 was 14.15%. As previously reported, this gene is one of the three genes representing more than 10% somatic mutations in all breast cancer patients23. The analysis of clinical factors in relationship with the GATA3 somatic mutations reported in TCGA-BRCA project revealed that GATA3 mutations were associated with several clinical features and pathological subtypes of breast cancer. Also, differential gene expression analysis has identified different patterns of expression in normal samples, GATA3 mutant and non-mutant tumor tissues in the entire cohort as well as in the ER-positive cases. Furthermore, our results also showed three pathways were significantly different between GATA3 mutant and non-mutant tumors.
Our results suggested that patients with GATA3 mutant tumors were significantly younger than those patients without GATA3 mutations. A previous report has indicated that younger luminal B cases had GATA3 mutations more frequently than older patients24. This finding has also been validated in metastatic breast cancer patients27. Since ER-positive younger patients indicated poorer prognosis28, a higher rate of GATA3 mutations may have clinical importance.
Our results suggested the importance of GATA3 in tumor size in the TCGA dataset. It has been previously reported that mutational load is correlated with the size of tumor in breast cancer patients29. Therefore, it is expected to observe a higher rate of GATA3 mutation in larger tumors. Furthermore, a higher rate of rare types of tumor (Mixed Histology, Mucinous Carcinoma and Medullary Carcinoma) was observed in association with GATA3 mutations (Table 1). Conversely, after adjustment and also in the ER-positive group, we found a significant difference in mutation status between ILC and IDC, but not in rare types of breast cancer (Table 2). These results may be affected by the small number of rare types in comparison with ductal carcinoma of the breast. However, this may highlight the impact of mutations on different features of breast tumors (Table 2). In addition, the results of our analysis showed ER-positive tumors harbored almost all GATA3 somatic mutations detected in the patient cohort. This finding confirms previous reports showing an association of GATA3 with ER-positive status and luminal differentiation, which may reflect its role in response to chemotherapy30. Also, a study has shown that GATA3 up-regulates and stabilizes ER mRNA transcription31. In contrast, GATA3 expression is down-regulated by progestin-induced PR activation32. It may explain the association of GATA3 mutations with the luminal type of breast cancer as a hormone receptor-positive type.
As the two aspects of GATA3 have been studied, i.e. a difference in expression between mutant and non-mutant or normal tissues and the impact of its mutations on tumor properties, it can be postulated that in agreement with previous studies, our data support the higher level of expression in tumor tissues than normal samples16,33 and the lack of importance of GATA3 somatic mutations as an independent factor in patient survival11,34. However, non-mutant samples showed better survival than others in ER-positive patients. METABRIC data indicated the prognostic value of GATA3 X308_Splice mutation, as the mutant samples had better survival than wild-type ones both in all patients and ER-positive patients35. On the other hand, in samples representing a high expression of GATA3, mutant patients had longer survival than wild-types, and mutations in the second GATA3 zinc-finger (ZnFn2) was associated with lower survival time than other mutations36. Another study has also reported that a significant association of GATA3 mutations with hormone receptor-positive situation may reflect the better prognosis of the disease17. All of these different findings suggest the importance of mutation type and co-consideration of other related factors in the association of GATA3 somatic mutation with overall survival. However, different factors including the number of mutant samples and the study settings may cause this variation. We acknowledge such variation in these findings can make it more difficult to come to a straightforward conclusion. Regarding to the higher level of expression in tumor samples (GATA3 mutant and non-mutant) than normal ones, a meta-analysis study confirmed the relation between GATA3 overexpression and favorable phenotypes including ER-positive status14. On the other hand, a cell line study indicated the active GATA3 transcription factors cause proliferative phenotypes and promote the growth of ER-positive breast cancer cell lines37. In addition to the impact of the mutation on expression level, somatic mutations may affect the binding site and influence the rate of downstream genes expression and result in a changed transcriptional network36,38. Furthermore, it has been observed that higher rate of GATA3 mutations in ER-positive patients may lead to resistance to endocrine therapy27. Therefore, all of these findings indicate diverse activities of GATA3 protein which affect the luminal breast epithelial cells via different pathways can neutralize the impact of this gene on the prognosis of the disease.
MYH2, as a down-regulated gene in GATA3-mutant and non-mutant samples, encodes an Actin-based motor protein with the skeletal muscle contraction activity. According to the Human Protein Atlas18, MYH2 protein was not detected in breast tissues, however, low amount of RNA has been observed39. Since GATA3 mutants compared with non-mutant tumor samples did not indicate any difference in expression of MYH2, its lower expression in tumor samples may be resulted due to the tumor environment. Similar to MYH2, CKM (Muscle type of CK) is down-regulated in tumor (GATA3 mutant and non-mutant) tissues. Expression of this gene in mRNA level has previously been shown in breast samples39. Furthermore, a decreased level of serum CK has been specified in breast cancer patients40. Moreover, SMR3B gene (submaxillary gland androgen-regulated protein 3B) was identified to have differential expression between GATA3 mutant and non-mutant tumor tissues as mutant samples indicate a lower level of expression. Previously, it has been predicted SMR3B has GATA3 transcription factor binding site motif41 and is expressed more in triple-negative breast cancer patients with poor prognosis compared to the low-risk patients42. As GATA3 protein has a role in expression regulation, lower level of SMR3B expression in tumor carrying GATA3 mutations can be explained by this fact. CSN1S1 (Casein Alpha S1), is another top down-regulated gene in GATA3 mutant samples compared to non-mutants in the ER-positive subgroup. Its RNA expression has been identified in breast tissue, however, the protein has only been detected in lactating breast. Because of significantly different protein expression in benign prostate hyperplasia compared with normal and tumor prostate tissues, CSN1S1 has been reported as a potential biomarker for early identification of benign prostate hyperplasia patients43. Moreover, CSN1S1 has identified as a tumor suppressor that controls breast tumor growth and metastasis44. According to our finding, GATA3-mutants had larger tumor size that it may be due to the down-regulation of CSN1S1.
We found MUC2 over-expression in GATA3-mutant tumor than normal samples. MUC2 is up-regulated in mucinous carcinomas45, and have higher expression in invasive breast tumors than adjacent normal tissues. A significantly higher level of serum MUC2 has also been found in breast cancer patients compared with healthy people46. Furthermore, as a prognostic effector, MUC2 protein is associated with shorter disease-free survival47. Evaluation of a cell line with the limited expression of MUC2 indicated a decreased rate of proliferation and better response to chemotherapy by efficiently induced apoptosis48. These findings confirmed the potential prominent role of MUC2 expression as the prognostic marker in breast cancer. However, the relationship between GATA3 and MUC2 remains to be evaluated. Another up-regulated gene, S100A15, is a calcium-binding protein with higher expression in non-mutant tumors than normal ones. While there is evidence which indicates elevated S100A15 transcripts in ER/PR negative breast cancers49, the association of this gene with breast cancer prognosis has not been confirmed50. In the ER-positive subgroup, CST5 (Cystatin D), was the first top differentially up-regulated gene between non-mutants and normal. This gene has been down-regulated in colon cancer51, and its induction by calcitriol can also prevent the breast cancer cells growth52. The mutant and non-mutant comparison showed Aldolase B (ALDOB), a glycolytic enzyme, to be up-regulated in GATA3 mutant samples. However, tumor samples did not show differential expression in comparison with normal ones. Previous studies indicated a decreased level of ALDOB in several cancers53,54. Therefore, the higher expression of ALDOB in GATA3 mutant breast cancer tumors may be caused by involved common regulatory pathways that need to be confirmed by functional and gene–gene interaction analyses. Furthermore, according to the Venn diagram, 75 genes in the entire patient group and 46 in ER-positive subgroup, were differentially expressed between GATA3-mutant and non-mutant tumors that may indicate the impact of GATA3 in the expression profile of the tumor cells.
Considering the differently expressed pathways, previously indicated to be associated with breast cancer, protein digestion and absorption pathway was different between all categories55. Other pathways were specifically different between mutant and non-mutant tumors. Wnt/β-catenin signaling pathway is a modulating factor of mammary gland morphogenesis and cell properties20 and mediates the increase of GATA3 expression21. Consistent with our finding, a previous study indicated WNT/β-catenin signaling as an enriched gene set in GATA3 X308_Splice mutant breast tumor35. Cell adhesion molecules (CAMs) was another different gene set between tumor samples. The role of this pathway has been recognized in the carcinogenesis and metastasis of breast cancer. Therefore, evaluation of the involved genes can be diagnostic, prognostic and therapeutic targets22,56. Besides, the regulatory role of GATA3 in adhesion molecules expression has been identified in cell culture analysis57. Hence, expression variation of these genes can happen in association with GATA3 situation induced by mutations. These findings may reflect the interactions between GATA3 and genes involved in WNT and cell adhesion molecules pathways in the pathogenesis of breast cancer. Furthermore, we found that systemic lupus (SLE) erythematosus pathway is differentially expressed between ER-positive breast tumor and normal tissues. It has been shown that SLE is influenced by estrogen-estrogen receptor-mediated signaling through the modulation of cytokine production58. There are also reports indicating a lower rate of hormone-dependent cancers in SLE patients although they may tend for a higher incidence of triple-negative breast cancer compared to general population59. As the main finding of the protein–protein interaction analysis, IL6 was identified to be an important hub node in the comparison between tumor and normal samples. In line with our results indicating the contribution of this gene in different pathways, IL6 overexpression has been previously described in breast cancer60. Many cellular functions including oncogenesis are influenced by IL661. These findings suggest the crucial role of IL6 in the pathogenesis of breast cancer and the importance of targeting this gene in the treatment of the disease.
Conclusion
In conclusion, our results suggest that GATA3 mutation status is associated with a number of clinicopathological features, as well as with overall survival time only in ER-positive breast cancer. Our results also indicate a possible common biological process involving GATA3 mutations and ER/PR status, which needs to be confirmed by functional analyses. The GATA3 mutations may influence the expression profile of the tumor cells via impact on expression and activity rate of the GATA3 gene. These findings should also be confirmed using gene–gene interaction analyses and homogenous samples.
Methods
Patients and data files
The study population has consisted of female breast cancer patients in the TCGA-BRCA cohort. Information on the GATA3 mutations in the tumors was retrieved from https://portal.gdc.cancer.gov. This information was available for 975 of the patients. The tumor mRNA expression data (level 3 data; including raw count data) was extracted from Illuminahiseq_rnaseqV2-exon_quantification (MD5) data file at https://gdac.broadinstitute.org/. This data was available for 771 tumor (671 non-mutant and 100 mutant) and 99 normal tissues of the patients. Demographic and clinical data were obtained from the file rendered by the Legacy Archive of the GDC portal at https://portal.gdc.cancer.gov/legacy-archive/files/735bc5ff-86d1-421a-8693-6e6f92055563. Categorization of the study population was performed according to standard protocols62,63,64,65,66,67,68. All analyses were also replicated in ER-positive samples including 482 non-mutant and 92 mutant tumor tissues.
Computational analysis of expression profile
The edgeR program (http://bioconductor.org/packages/release/bioc/html/edgeR.html) is a Bioconductor software package for examining the differential expression of replicated count data using an over-dispersed Poisson model and Empirical Bayes methods to account for both biological and technical variability and moderate the degree of over-dispersion across transcripts69. This program was used to determine the DEGs in the normal tissues when compared to the tumors (GATA3 mutant and non-mutant). The probabilistic methods were used by edgeR to evaluate the differential expression. The affected genes determined based on a false discovery rate (FDR) < 0.05 and a log Fold change (FC) > 1.
Functional annotation of differentially expressed genes (DEGs)
The proteins encoded by DEGs were analyzed, and annotated using Metascape, “A Gene Annotation and Analysis Resource”, which can be used to analyze multi-platform OMICs data (http://metascape.org/gp/index.html), DAVID “Database for Annotation, Visualization and Integrated Discovery” (https://david.ncifcrf.gov/)70,71,72 to test for gene set enrichment analysis, Gene Ontology (GO) terms and pathways. According to the database, DAVID pathways output is based on KEGG (Kyoto Encyclopedia of Genes and Genomes). Only terms with modified Fisher Exact p value ≤ 0.05 were considered significant. Metascape is a web-based portal, and is useful for functional annotations of genes73.
Protein–protein interaction (PPI) network
DEGs (corrected p values ≤ 0.05) were imported to the search tool of STRING (v10.0, http://string-db.org/) for the retrieval of interacting genes/proteins by selecting Homo sapiens as the organism. STRING can identify a network of close interactions among this set of genes based on information on experimental as well as predicted protein interactions. The three methods including degree centrality, betweenness centrality, and closeness centrality were used to calculate the topology scores of nodes in the PPI network using the CytoNCA74.
Statistical analysis
Demographic and clinical/molecular data that were examined during statistical analyses are shown in the supplementary information file, Table S1. Comparison between variables between the two groups (mutant vs. non-mutant) was examined using Pearson's Chi-squared test for categorical variables and independent sample t test for continuous variables. Univariate logistic regression analysis was used to examine the associations of GATA3 somatic mutation status with different variables, and the odds ratios (OR) and 95% confidence intervals (CIs) were presented. Multivariate logistic regression analysis was used to assess the variables that were independently predictive of the GATA3 mutation status. For this purpose, covariates with p values ≤ 0.05 in the univariate analysis were entered into a multivariable model, excluding the rare variables (ER status and hormone receptor status). In addition, menopause status and age at diagnosis were highly associated, thus, menopausal status (which had more missing data than the age at diagnosis) was excluded from the multivariable model.
Overall survival (OS) time is defined as the time from diagnosis till the time of death or last contact. Associations between variables and OS were examined using the Kaplan–Meier plots/Log-rank test and Cox proportional hazards regression methods. Results of the univariate Cox regression analysis was used to select the variables to be entered into the multivariable Cox regression models. For this purpose, covariates with p values less than 0.05 in the univariate analysis were entered into a covariate selection method (Backward-LR), excluding the rare variables, such as metastasis status (pM) and history of neoadjuvant therapy, and highly correlated variables. Highly correlated variables included menopausal status (excluded) and age at diagnosis, tumor size (pT) (excluded) and stage, and lymph node ratio (excluded) and lymph node status (pN). As a result, age, stage, and radiation therapy status were selected for the analysis of the entire cohort. Association of the GATA3 mutation status with OS was then examined in a multivariable Cox model after adjusting for these clinical factors. Similar to this process, OS analysis was done for the ER-positive subgroup. After excluding the rare variables, such as metastasis status and history of neoadjuvant therapy, and highly correlated variables including menopausal status, tumor size and lymph node ratio, the variables including age, stage, and lymph node status were selected for the assessment of the GATA3 mutations’ association with OS in a multivariable Cox model. The hazard rate ratio (HR) and 95% CIs were calculated by the Cox models.
A p value < 0.05 was considered significant. All statistical analyses were performed using SPSS 16.0 (IBM, USA).
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Data availability
The Data belongs to TCGA Research Network and is available in https://www.cancer.gov/tcga.
References
Ferlay, J. et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 136, E359-386. https://doi.org/10.1002/ijc.29210 (2015).
Rudolph, A., Chang-Claude, J. & Schmidt, M. K. Gene–environment interaction and risk of breast cancer. Br. J. Cancer 114, 125–133. https://doi.org/10.1038/bjc.2015.439 (2016).
Encinas, G. et al. Somatic mutations in breast and serous ovarian cancer young patients: a systematic review and meta-analysis. Rev. Assoc. Med. Bras. 61, 474–483. https://doi.org/10.1590/1806-9282.61.05.474 (2015).
Roy, R., Chun, J. & Powell, S. N. BRCA1 and BRCA2: different roles in a common pathway of genome protection. Nat. Rev. Cancer 12, 68–78. https://doi.org/10.1038/nrc3181 (2012).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54. https://doi.org/10.1038/nature17676 (2016).
Pereira, B. et al. Erratum: The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nat. Commun. 7, 11908. https://doi.org/10.1038/ncomms11908 (2016).
Usary, J. et al. Mutation of GATA3 in human breast tumors. Oncogene 23, 7669–7678. https://doi.org/10.1038/sj.onc.1207966 (2004).
Takaku, M., Grimm, S. A. & Wade, P. A. GATA3 in breast cancer: tumor suppressor or oncogene?. Gene Expr. 16, 163–168. https://doi.org/10.3727/105221615x14399878166113 (2015).
Chou, J., Provot, S. & Werb, Z. GATA3 in development and cancer differentiation: cells GATA have it!. J. Cell. Physiol. 222, 42–49. https://doi.org/10.1002/jcp.21943 (2010).
Miettinen, M. et al. GATA3: a multispecific but potentially useful marker in surgical pathology: a systematic analysis of 2500 epithelial and nonepithelial tumors. Am. J. Surg. Pathol. 38, 13–22. https://doi.org/10.1097/PAS.0b013e3182a0218f (2014).
Voduc, D., Cheang, M. & Nielsen, T. GATA-3 expression in breast cancer has a strong association with estrogen receptor but lacks independent prognostic value. Cancer Epidemiol. Biomark. Prevent. 17, 365–373. https://doi.org/10.1158/1055-9965.epi-06-1090 (2008).
Cakir, A. et al. GATA3 expression and its relationship with clinicopathological parameters in invasive breast carcinomas. Pathol. Res. Pract. 213, 227–234. https://doi.org/10.1016/j.prp.2016.12.010 (2017).
Gonzalez, R. S. et al. GATA-3 expression in male and female breast cancers: comparison of clinicopathologic parameters and prognostic relevance. Hum. Pathol. 44, 1065–1070. https://doi.org/10.1016/j.humpath.2012.09.010 (2013).
Guo, Y. et al. Prognostic and clinicopathological value of GATA binding protein 3 in breast cancer: a systematic review and meta-analysis. PLoS ONE 12, e0174843. https://doi.org/10.1371/journal.pone.0174843 (2017).
Albergaria, A. et al. Expression of FOXA1 and GATA-3 in breast cancer: the prognostic significance in hormone receptor-negative tumours. Breast Cancer Res. 11, R40. https://doi.org/10.1186/bcr2327 (2009).
Yoon, N. K. et al. Higher levels of GATA3 predict better survival in women with breast cancer. Hum. Pathol. 41, 1794–1801. https://doi.org/10.1016/j.humpath.2010.06.010 (2010).
Gustin, J. P. et al. GATA3 frameshift mutation promotes tumor growth in human luminal breast cancer cells and induces transcriptional changes seen in primary GATA3 mutant breast cancers. Oncotarget 8, 103415–103427. https://doi.org/10.18632/oncotarget.21910 (2017).
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120. https://doi.org/10.1038/ng.2764 (2013).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122. https://doi.org/10.1186/s13059-016-0974-4 (2016).
Rangel, M. C. et al. Developmental signaling pathways regulating mammary stem cells and contributing to the etiology of triple-negative breast cancer. Breast Cancer Res. Treat. 156, 211–226. https://doi.org/10.1007/s10549-016-3746-7 (2016).
Wang, L. & Di, L.-J. Wnt/β-catenin mediates AICAR effect to increase GATA3 expression and inhibit adipogenesis. J. Biol. Chem. 290, 19458–19468. https://doi.org/10.1074/jbc.M115.641332 (2015).
Rossetti, C. et al. Adhesion molecules in breast carcinoma: a challenge to the pathologist. Rev. Assoc. Méd. Bras. 61, 81–85 (2015).
Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70. https://doi.org/10.1038/nature11412 (2012).
Griffith, O. L. et al. The prognostic effects of somatic mutations in ER-positive breast cancer. Nat. Commun. 9, 3476. https://doi.org/10.1038/s41467-018-05914-x (2018).
Wang, M.-X., Ren, J.-T., Tang, L.-Y. & Ren, Z.-F. Molecular features in young vs elderly breast cancer patients and the impacts on survival disparities by age at diagnosis. Cancer Med. 7, 3269–3277. https://doi.org/10.1002/cam4.1544 (2018).
Jiang, Y.-Z., Yu, K.-D., Zuo, W.-J., Peng, W.-T. & Shao, Z.-M. GATA3 mutations define a unique subtype of luminal-like breast cancer with improved survival. Cancer 120, 1329–1337. https://doi.org/10.1002/cncr.28566 (2014).
Azim, H. A. Jr., Nguyen, B., Brohée, S., Zoppoli, G. & Sotiriou, C. Genomic aberrations in young and elderly breast cancer patients. BMC Med. 13, 266. https://doi.org/10.1186/s12916-015-0504-3 (2015).
Cancello, G. et al. Prognosis and adjuvant treatment effects in selected breast cancer subtypes of very young women (<35 years) with operable breast cancer. Ann. Oncol. 21, 1974–1981. https://doi.org/10.1093/annonc/mdq072 (2010).
Budczies, J. et al. Classical pathology and mutational load of breast cancer—integration of two worlds. J. Pathol. Clin. Res. 1, 225–238. https://doi.org/10.1002/cjp2.25 (2015).
Tominaga, N. et al. Clinicopathological analysis of GATA3-positive breast cancers with special reference to response to neoadjuvant chemotherapy. Ann. Oncol. 23, 3051–3057. https://doi.org/10.1093/annonc/mds120 (2012).
Hostetter, C., Licata, L. & Keen, J. A role for GATA-3 in control of estrogen receptor alpha expression. Can. Res. 69, 3050. https://doi.org/10.1158/0008-5472.sabcs-3050 (2009).
Izzo, F. et al. Progesterone receptor activation downregulates GATA3 by transcriptional repression and increased protein turnover promoting breast tumor growth. Breast Cancer Res. 16, 491. https://doi.org/10.1186/s13058-014-0491-x (2014).
Liu, H., Wilkerson, M. L., Lin, F. & Shi, J. Immunohistochemical evaluation of GATA3 expression in tumors and normal tissues: a useful immunomarker for breast and urothelial carcinomas. Am. J. Clin. Pathol. 138, 57–64. https://doi.org/10.1309/ajcp5uafmsa9zqbz (2012).
McCleskey, B. C. et al. GATA3 expression in advanced breast cancer: prognostic value and organ-specific relapse. Am. J. Clin. Pathol. 144, 756–763. https://doi.org/10.1309/ajcp5mmr1fjvvtpk (2015).
Hruschka, N. et al. The GATA3 X308_Splice breast cancer mutation is a hormone context-dependent oncogenic driver. bioRxiv https://doi.org/10.1101/664367 (2019).
Takaku, M. et al. GATA3 zinc finger 2 mutations reprogram the breast cancer transcriptional network. Nat. Commun. 9, 1059. https://doi.org/10.1038/s41467-018-03478-4 (2018).
Emmanuel, N. et al. Mutant GATA3 actively promotes the growth of normal and malignant mammary cells. Anticancer Res. 38, 4435–4441. https://doi.org/10.21873/anticanres.12745 (2018).
Mair, B. & Konopka, T. Gain- and loss-of-function mutations in the breast cancer gene GATA3 result in differential drug sensitivity. PLoS Genet. 12, e1006279. https://doi.org/10.1371/journal.pgen.1006279 (2016).
Uhlen, M. et al. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 4, 1920–1932. https://doi.org/10.1074/mcp.M500279-MCP200 (2005).
Pan, H. et al. Low serum creatine kinase levels in breast cancer patients: a case-control study. PLoS ONE 8, e62112–e62112. https://doi.org/10.1371/journal.pone.0062112 (2013).
Rouillard, A. D. et al. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 7, 8. https://doi.org/10.1093/database/baw100 (2016).
Lv, X. et al. Identification of potential key genes and pathways predicting pathogenesis and prognosis for triple-negative breast cancer. Cancer Cell Int. 19, 172. https://doi.org/10.1186/s12935-019-0884-0 (2019).
Xu, K., Ling, M. T., Wang, X. & Wong, Y. C. Evidence of a novel biomarker, αs1-Casein, a milk protein, in benign prostate hyperplasia. Prostate Cancer Prostatic Dis. 9, 293–297. https://doi.org/10.1038/sj.pcan.4500872 (2006).
Bonuccelli, G. et al. The milk protein α-casein functions as a tumor suppressor via activation of STAT1 signaling, effectively preventing breast cancer tumor growth and metastasis. Cell Cycle 11, 3972–3982. https://doi.org/10.4161/cc.22227 (2012).
Rakha, E. A. et al. Expression of mucins (MUC1, MUC2, MUC3, MUC4, MUC5AC and MUC6) and their prognostic significance in human breast cancer. Mod. Pathol. 18, 1295–1304. https://doi.org/10.1038/modpathol.3800445 (2005).
Bademler, S. et al. Clinical significance of serum membrane-bound mucin-2 levels in breast cancer. Biomolecules 9, 40. https://doi.org/10.3390/biom9020040 (2019).
Walsh, M. D., McGuckin, M. A., Devine, P. L., Hohn, B. G. & Wright, R. G. Expression of MUC2 epithelial mucin in breast carcinoma. J. Clin. Pathol. 46, 922–925. https://doi.org/10.1136/jcp.46.10.922 (1993).
Astashchanka, A., Shroka, T. M. & Jacobsen, B. M. Mucin 2 (MUC2) modulates the aggressiveness of breast cancer. Breast Cancer Res. Treat. 173, 289–299. https://doi.org/10.1007/s10549-018-4989-2 (2019).
Wolf, R. et al. Highly homologous hS100A15 and hS100A7 proteins are distinctly expressed in normal breast tissue and breast cancer. Cancer Lett. 277, 101–107. https://doi.org/10.1016/j.canlet.2008.11.032 (2009).
Cancemi, P. et al. A multiomics analysis of S100 protein family in breast cancer. Oncotarget 9, 29064–29081. https://doi.org/10.18632/oncotarget.25561 (2018).
Alvarez-Díaz, S. et al. Cystatin D is a candidate tumor suppressor gene induced by vitamin D in human colon cancer cells. J. Clin. Investig. 119, 2343–2358. https://doi.org/10.1172/jci37205 (2009).
University of Medicine and Dentistry of New Jersey (UMDNJ). Vitamin D Found To Stimulate A Protein That Inhibits The Growth Of Breast Cancer Cells. https://www.sciencedaily.com/releases/2009/02/090204172437.htm. Accessed 21 Feb 2018.
Asaka, M. et al. Alteration of aldolase isozymes in serum and tissues of patients with cancer and other diseases. J. Clin. Lab. Anal. 8, 144–148. https://doi.org/10.1002/jcla.1860080306 (1994).
He, J. et al. Downregulation of ALDOB is associated with poor prognosis of patients with gastric cancer. Onco Targets Ther. 9, 6099–6109. https://doi.org/10.2147/OTT.S110203 (2016).
Akkiprik, M. et al. Identification of differentially expressed IGFBP5-related genes in breast cancer tumor tissues using cDNA microarray experiments. Genes (Basel) 6, 1201–1214. https://doi.org/10.3390/genes6041201 (2015).
Saadatmand, S. et al. Expression of cell adhesion molecules and prognosis in breast cancer. Br. J. Surg. 100, 252–260. https://doi.org/10.1002/bjs.8980 (2013).
Kim, K. S., Kim, J., Oh, N., Kim, M. Y. & Park, K. S. ELK3-GATA3 axis modulates MDA-MB-231 metastasis by regulating cell-cell adhesion-related genes. Biochem. Biophys. Res. Commun. 498, 509–515. https://doi.org/10.1016/j.bbrc.2018.03.011 (2018).
Kassi, E. & Moutsatsou, P. Estrogen receptor signaling and its relationship to cytokines in systemic lupus erythematosus. J. Biomed. Biotechnol. 2010, 317452. https://doi.org/10.1155/2010/317452 (2010).
Chan, K. et al. Breast cancer in systemic lupus erythematosus (SLE): receptor status and treatment. Lupus 27, 120–123. https://doi.org/10.1177/0961203317713146 (2018).
Kozlowski, L., Zakrzewska, I., Tokajuk, P. & Wojtukiewicz, M. Z. Concentration of interleukin-6 (IL-6), interleukin-8 (IL-8) and interleukin-10 (IL-10) in blood serum of breast cancer patients. Rocz. Akad. Med. Bialymst. 1995(48), 82–84 (2003).
Dethlefsen, C., Hojfeldt, G. & Hojman, P. The role of intratumoral and systemic IL-6 in breast cancer. Breast Cancer Res. Treat. 138, 657–664. https://doi.org/10.1007/s10549-013-2488-z (2013).
Health, N. I. O. Racial and Ethnic Categories and Definitions for NIH Diversity Programs and for Other reporting Purposes. https://grants.nih.gov/grants/guide/notice-files/not-od-15-089.html. Accessed 21 Feb 2018 (2015).
Breast Cancer Rates by Race and Ethnicity. Center for Disease Control and Prevention. https://www.cdc.gov/cancer/breast/statistics/race.htm. Accessed 21 Feb 2018.
Houssami, N., Macaskill, P., Marinovich, M. L. & Morrow, M. The association of surgical margins and local recurrence in women with early-stage invasive breast cancer treated with breast-conserving therapy: a meta-analysis. Ann. Surg. Oncol. 21, 717–730. https://doi.org/10.1245/s10434-014-3480-5 (2014).
Lakhani, S. et al. WHO Classification of Tumours of the Breast 4th edn. (IARC Press, Lyon, 2012).
Reeves, G. K., Pirie, K., Green, J., Bull, D. & Beral, V. Reproductive factors and specific histological types of breast cancer: prospective study and meta-analysis. Br. J. Cancer 100, 538–544. https://doi.org/10.1038/sj.bjc.6604853 (2009).
Tseng, L. A. et al. The association of menopausal status with physical function: the Study of Women’s Health Across the Nation (SWAN): menopausal status and physical function. Menopause (New York, N.Y.) 19, 1186–1192. https://doi.org/10.1097/gme.0b013e3182565740 (2012).
Vinh-Hung, V. et al. Lymph node ratio as an alternative to pN staging in node-positive breast cancer. J. Clin. Oncol. 27, 1062–1068. https://doi.org/10.1200/jco.2008.18.6965 (2009).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England) 26, 139–140. https://doi.org/10.1093/bioinformatics/btp616 (2010).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57. https://doi.org/10.1038/nprot.2008.211 (2009).
da Huang, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13. https://doi.org/10.1093/nar/gkn923 (2009).
Dennis, G. et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, R60. https://doi.org/10.1186/gb-2003-4-9-r60 (2003).
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523. https://doi.org/10.1038/s41467-019-09234-6 (2019).
Tang, Y., Li, M., Wang, J., Pan, Y. & Wu, F. X. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72. https://doi.org/10.1016/j.biosystems.2014.11.005 (2015).
Acknowledgments
This work was mainly conducted in a sabbatical period at the Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, Canada (FA), with the financial support from the Mashhad University of Medical Sciences, Mashhad, Iran. Authors gratefully acknowledge the use of data generated by TCGA Research Network (https://www.cancer.gov/tcga).
Funding
This work was financially supported by Mashhad University of Medical Sciences under grant 931185.
Author information
Authors and Affiliations
Contributions
F. A. and S. S. contributed to the design of the work. F. A., S. S., A. S. and A. P. contributed to data analysis and interpretation. F. A. and S. S. contributed to drafting and editing the article. A. P. contributed to critical revision of the article and approving the final version of this paper. All authors also participated in the finalization of the manuscript and approved the final draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Afzaljavan, F., Sadr, A.S., Savas, S. et al. GATA3 somatic mutations are associated with clinicopathological features and expression profile in TCGA breast cancer patients. Sci Rep 11, 1679 (2021). https://doi.org/10.1038/s41598-020-80680-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-020-80680-9
This article is cited by
-
Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer
Nature Communications (2023)
-
The breast pre-cancer atlas illustrates the molecular and micro-environmental diversity of ductal carcinoma in situ
npj Breast Cancer (2022)
-
Circular RNA hsa_circ_0044234 as distinct molecular signature of triple negative breast cancer: a potential regulator of GATA3
Cancer Cell International (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.