Introduction

Breast cancer is the second most incident cancer worldwide, accounting for over 2 million new cases in 2018 [1]. African women have a lower lifetime incidence of breast cancer than women from North America or Europe; however, age-standardized mortality is poorer, particularly for women from Western Africa [1]. This likely reflects a combination of environmental and systematic factors, as well as underlying differences in tumor biology. Women from Sub-Saharan Africa tend to have earlier breast cancer presentation [2, 3]. Histologically, invasive breast carcinomas from this population are frequently basal-like and are negative for estrogen receptor (ER), progesterone receptor (PR), and HER2/neu overexpression and amplification (i.e., triple-negative) [4]. Similarly, African-American (AA) women, who are thought to share African ancestry, have significantly poorer outcomes than white patients after accounting for age, stage, and socioeconomic status, and are more likely to have breast cancer earlier and ER negative status [5,6,7,8]. The underlying molecular alterations for the aggressive breast cancer phenotypes of women of African ancestry are not fully understood.

Basal-like breast cancers are a heterogeneous group of tumors comprising 15–20% of all breast cancers. They are more common in premenopausal, and in African and AA women [9]. We previously reported on the incidence of triple-negative breast cancers (TNBC) in Ghanaian women (53.2%), which more closely resembled AA women (30%) than white American women (15.5%), suggesting that African ancestry may correlate with a higher likelihood of TNBC [10]. Similar results have been found in other cohorts of women from Senegal and Nigeria [6]. The higher proportion of TNBC in invasive breast tumors from African and AA patients is associated with ALDH1 and AR expression, suggesting novel subtypes of TNBC in these populations [11], but the genetic landscapes need investigation.

Enhancer of Zeste Homolog 2 (EZH2), an epigenetic regulator responsible for transcriptional repression, is overexpressed in multiple tumor types [12,13,14,15,16,17]. In breast cancer, overexpression of EZH2 is significantly associated with negative ER and PR status and distant metastasis [12, 17, 18]. We previously showed that EZH2 is significantly associated with high-grade and basal-like tumors in a cohort of 169 breast tissues from Ghanaian women [16]. We detected EZH2 expression in the cytoplasm of 16% of Ghanaian invasive carcinomas where it was significantly associated with TNBC status [16]. While cytoplasmic EZH2 has been shown to increase breast cancer invasion and metastasis, the relevance of cytoplasmic EZH2 to the aggressive behavior of breast cancer in African women has not been studied.

Despite advances in our understanding of the genomic landscape of breast and other cancers, the genetic alterations in invasive carcinomas from African women are far from understood. In this study, we define a set of somatic copy number alterations (CNAs) in a cohort of histologically well-characterized Ghanaian invasive breast carcinomas, and investigate associations with patterns of EZH2 subcellular localization.

Materials and methods

Human tissue samples and TMA development

Tumor samples were collected from 100 women with breast cancer who underwent surgery at Komfo Anoyke Teaching Hospital (KATH) in Kumasi, Ghana, between 2006 and 2011. Clinicopathological features of these tumors, ER, PR, HER2, and EZH2 immunostaining, were previously reported [16]. Formalin-fixed, paraffin-embedded tissues were stained with H&E and reviewed independently by the study pathologists (MR and CK), and tumors were arrayed in a high-density tissue microarray (TMA) with triplicate samples (n = 255 TMA samples) following established protocols [12]. We selected 20 cases for next-generation sequencing (NGS) analyses based on the following criteria: (i) sufficient tissue on the block, (ii) representative samples with EZH2 expression in the nucleus, cytoplasm, or no expression, and (iii) representative samples with different histology, tumor grade (2–3), hormonal receptor and HER2/neu status.

Targeted NGS

Targeted NGS of breast tumor tissue was performed with IRB approval. For each specimen, 4–10 × 10-µm formalin-fixed, paraffin-embedded sections of 20 breast tissue samples were cut from a single block and macrodissected with a scalpel under guidance of an H&E-stained slide to enrich for tumor content. We isolated DNA using the Qiagen Allprep FFPE DNA/RNA kit (Qiagen, Valencia, CA). DNA was quantified using the Qubit 2.0 fluorometer (Life Technologies, Foster City, CA).

Targeted, multiplexed PCR-based NGS was performed on each tumor. For samples with DNA concentration of >40 ng, we used the Ion Ampliseq Comprehensive Cancer Panel (CCP), which targets 1,688,650 bases from 15,992 amplicons representing 409 cancer genes (Thermo Fisher Scientific). For samples with <40 ng of DNA, we employed the Oncomine Comprehensive Panel (OCP) (Thermo Fisher Scientific), which is compatible with 20 ng of formalin-fixed paraffin-embedded isolated DNA and benchtop Ion Torrent sequencers [19]. Barcoded libraries were generated from 40 ng of DNA per sample using the CCP, or 20 ng of DNA per sample using the OCP, and the Ion Ampliseq Library Kit 2.0 (Life Technologies, Foster City, CA) according to the manufacturer’s instructions with barcode incorporation. Sequencing of template libraries was then performed on an Ion Proton sequencer with Ion PI chips (Thermo Fisher Scientific) using the Ion PI Hi-Q Sequencing 200 Kit v3 (Life Technologies, Foster City, CA) according to the manufacturer’s instructions. Data analysis was performed as described previously in Torrent Suite (version 5.0.4) and in-house previously validated bioinformatics pipelines [19, 20].

Copy number analysis

Amplicon-level read counts were determined using the coverage analysis plugin. Briefly, normalized GC content corrected read counts per amplicon for each sample were divided by those from a composite normal male DNA sample (composed of multiple formalin-fixed, paraffin-embedded and frozen tissues, individual and pooled samples) to generate amplicon-level copy number ratios, and weighted gene-level copy number estimates were determined as described previously [20]. Genes with a log2 copy number ratio estimate of <−1 or >0.80 were considered to have high-level loss (deletion) or gain, respectively.

Network and bioinformatics analyses

For enrichment analyses, we performed gene ontology (GO) overrepresentation tests (GO annotations: biological process, molecular function, cellular compartment, and protein domain) in PANTHER (v14.1) and STRING (v11.0) database for confirming enrichment results with topological features from protein/gene interaction networks. A p value of < 0.05 was significant. Correlation between EZH2 and the 17 genes with CNAs was analyzed using Pearson correlation coefficient (r).

Immunohistochemistry for RECQL4 and SDHC

Immunohistochemistry was performed on the TMA containing 100 invasive carcinomas from Ghanaian women. Four-micron-thick sections of the TMA were prepared for staining with H&E and immunohistochemistry using a standard biotin–avidin complex technique, anti-RECQL4 (Abcam, ab188125, rabbit polyclonal, 1:50) and anti-SDHC (Abcam, ab155999 rabbit monoclonal, 1:2500) antibodies. Nuclear expression of RECQL4 and cytoplasmic expression of SDHC was interpreted independently by two pathologists (MR and CK). For each case the staining was graded as negative (score = 1, no staining); weak (score 2, <25% of cells staining, any intensity); moderate (score = 3, 25–75% of cells staining, any intensity); and strong (score = 4, >75% of cells staining, any intensity) following previous studies [12, 21]. High RECQL4 or SDHC expression was defined as scores 3 and 4, and low expression was defined as scores 1 and 2.

Results

NGS identifies frequent CNAs in invasive breast carcinomas from Ghanaian women

Targeted NGS was performed on 20 formalin-fixed, paraffin-embedded breast tissue samples from Ghana identified with sufficient tissue, 9 were excluded due to poor DNA quality. The final cohort consisted of 11 samples comprising 10 invasive carcinomas and 1 fibroadenoma control. Representative micrographs are shown in Fig. 1a–f.

Fig. 1: Representative images of Ghanaian invasive carcinomas subjected to next-generation sequencing (NGS).
figure 1

Low- (a, c, e) and high-power (b, d, f) hematoxylin and eosin-stained (H&E) sections from three invasive carcinomas, Sample BR13 (a, b) is an invasive ductal carcinoma of intermediate grade with mucinous differentiation, Sample BR19 (c, d) is an invasive ductal carcinoma of grade 3, and Sample BR22 (e, f) is metaplastic carcinoma with squamous differentiation and histological grade 3.

All patients were female, ranging from 38 to 64 years old (mean, 48 years). Of the invasive carcinomas, seven were invasive ductal of intermediate and high histological grade, and three were metaplastic carcinomas with squamous differentiation. EZH2 expression was high in the nucleus or cytoplasm (three and six cases), and was negative in one tumor. Eight invasive carcinomas were TNBC including the metaplastic carcinomas, one was HER2/neu positive, and one was luminal B (positive for ER, PR, and HER2/neu). The fibroadenoma sample was negative for expression of EZH2. Details on the clinical and histologic characteristics of this cohort are listed in Table 1.

Table 1 Clinical and pathological characteristics of the samples subjected to NGS.

Using CCP and OPC NGS platforms according to the DNA concentration of each tumor, we successfully analyzed CNAs in ten invasive carcinomas and one fibroadenoma tissue sample. The quality of the Ghanaian samples was not adequate for accurate sequencing studies. Copy number analysis of NGS data yielded a total of 17 high-level CNAs. Prioritized high-level CNAs for each case are shown in an integrative heatmap (Fig. 2).

Fig. 2: Heatmap of copy number variations in 11 Ghanaian tumors.
figure 2

Shown are 17 high-level copy number alterations identified and color coded as indicated in the figure, using CCP and OCP NGS platforms. In all, 90% of the invasive carcinomas show CNAs, largely gains, with the most frequent being RECQL4 and SDHC. Note that CNAs in PDGFRA and DEK are identified in tumors with nuclear EZH2, while CNAs in BCL2L1, AKT3, SMARCA4, CCND1, FGF3, and ABL1 are detected in tumors with cytoplasmic EZH2.

Copy number analysis of NGS data demonstrated recurrent CNAs, including most frequent gain of chromosomes 1q (SDHC, AKT3) 8q (RECQL4), and X (TFE3, G6PD) and loss of chromosome 9q (ABL1). Nine (90%) invasive carcinomas demonstrated CNAs of at least one of these genes: SDHC, RECQL4, TFE3, BCL11A, BCL2L1, PDGFRA, DEK, SMUG1, AKT3, SMARCA4, VHL, KLF6, CCNE1, G6PD, FGF3, and CCND1. One case showed loss of ABL1. It is interesting to note that we identified gains in RECQL4, PDGFRA, AKT3, and SMARCA4 in the fibroadenoma, suggesting that these alterations occur in early neoplastic lesions. Together, these data identify CNAs in invasive carcinomas from Ghanaian women with biological and translational implications.

Bioinformatics reveals a predicted interaction network among genes with CNAs and EZH2 expression

Out of 17 genes, 9 (53%) with CNAs were detected in invasive carcinomas with nuclear or cytoplasmic EZH2 expression. Gains in PDGFRA oncogene and loss of DEK tumor suppressor were identified in invasive carcinomas with only nuclear EZH2, and gains in BCL2L1, AKT3, SMARCA4, and CCND1 as well as gains and losses of FGF3, and loss of ABL1 were detected in invasive carcinomas with only cytoplasmic EZH2 expression (Fig. 2).

We next tested the hypotheses that the genes harboring frequent recurring CNAs may be functionally related, and that they may associate with EZH2 expression. Enrichment analysis in GO annotation and STRING databases revealed a significantly enriched predicted interaction network (p value = 5.76E−07) among 12 of the 17 genes with CNAs (70.6%), as well as a correlation between these genes and EZH2 (Pearson correlation coefficient r = 0.4–0.75) (Fig. 3a). Analyses using GO annotation for the top 10 altered pathways (REACTOME) showed significant representation of mitotic G1–G1/s phases (ABL1, AKT3, CCND1, and CCNE1), transcription pathway (ABL1, AKT3, CCND1, CCNE1, DEK, G6PD, and SMARCA4), PI3K/AKT signaling in cancer (AKT3, FGF3, and PDGFRA), and RMTs methylate histone arginines (CCND1 and SMARCA4) (Fig. 3b and Table 2). Collectively, these data suggest a relationship between the 17 genes with CNAs and EZH2 expression and localization, which warrants further investigation. Our results highlight major oncogenic pathways deregulated in Ghanaian invasive carcinomas that can be explored for therapeutic application.

Fig. 3: Predicted interaction network for the 17 genes with recurrent copy number alterations (CNAs) and with EZH2 in Ghanaian invasive carcinomas.
figure 3

a Network graph shows enrichment (p value = 5.7E−07) of topological features of the 17 genes with CNAs. Enrichment analysis in STRING database (v11.0) predicts interactions between EZH2 and 12 of the 17 genes with CNAs (r = 0.4–0.75). b. Enrichment analyses with biological functions of top ten altered biological pathways (REACTOME). Legend (bars: −log (p value).

Table 2 Top 10 altered pathways associated with the 17 genes with CNAs in Ghanaian invasive carcinomas obtained using enrichment analysis in GO annotation (REACTOME).

RECQL4 and SDHC show frequent copy number gains and protein overexpression in Ghanaian invasive carcinomas

Genes demonstrating the highest frequency of copy number gains in our cohort were SDHC (1q23.3) and RECQL4 (8q24.3), neither of which has been previously considered in the context of breast carcinoma in African patients. SDHC, which encodes Succinate Dehydrogenase Complex Subunit C, was amplified in 6 of 10 (60%) invasive carcinomas, and RECQ4L, encoding a DNA helicase and functions in homologous recombination-mediated double-stranded break repair, was amplified in 5 of 10 (50%) carcinoma cases sequenced (Figs. 2 and 4).

Fig. 4: Copy number variation Ghanaian breast tumors.
figure 4

Copy number profiles for three Ghanaian invasive carcinomas with high-level CNAs highlighting gains at SDHC (1q23.3) and RECQL4 (8q24.3). Log2 copy number ratios per amplicon are plotted, with each individual amplicon represented by a single dot.

To explore whether CNAs in these genes are associated with protein overexpression, we next tested RECQL4 and SDHC protein expression in 100 tissue samples of Ghanaian invasive carcinomas arranged in TMAs. Of the 86 tumors with sufficient tissue to evaluate in the TMA, positive nuclear RECQL4 staining was detected in 53/86 (61.6%) and positive SDHC cytoplasmic expression in 48/86 (56%) tumors (Fig. 5a-c). Further supporting our data, analysis of the TCGA breast tissue data sets using UALCAN showed that RECQL4 and SDHC mRNA expression is significantly upregulated in invasive carcinomas compared with normal breast tissue samples (Supplementary Fig. 1A). When analyzed according to patient race, RECQL4 mRNA expression is significantly higher in invasive carcinomas in AA women compared to Caucasians and Asians, and to normal tissues, while SDHC transcript was significantly higher in Caucasians and Asians compared to AAs (Supplementary Fig. 1B). Analysis of TCGA breast cancer database suggests that high mRNA levels of RECQL4 and SDHC may be associated with worse overall survival compared to low mRNA levels (Supplementary Fig. 1C).

Fig. 5: RECQL4 and SDHC proteins are overexpressed in a substantial number of Ghanaian invasive carcinomas.
figure 5

Representative images of invasive carcinomas with negative and positive nuclear RECQL4 (a) and cytoplasmic SDHC (b) expression, from the TMA containing 86 invasive carcinomas from Ghana. c Quantification of the percentage of invasive carcinomas negative and positive for each protein.

Discussion

While African women have a lower lifetime incidence in breast cancer, age-standardized mortality is poorer [1]. AA women, like women from Ghana, have a higher incidence of TNBC and basal-like tumors than white American women, suggesting that African ancestry may be associated with differences in tumor biology [10]. Our laboratory reported that invasive carcinomas from Ghana have a significantly higher frequency of high histological grade, triple-negative status, and EZH2 overexpression [16]. However, the genetic alterations that underlie the aggressive biological behavior of invasive carcinomas in African women have not been fully elucidated, in part due to the limited availability of tissue samples from these patients. Here, we used NGS to discover high frequency CNAs in invasive carcinomas from Ghana, which may influence their biological behavior and advance our understanding of these tumors.

Our studies revealed the presence of recurrent CNAs in 17 genes in 90% of invasive carcinomas from Ghana studied, including gains in SDHC, RECQL4, TFE3, BCL11A, BCL2L1, PDGFRA, AKT3, SMARCA4, VHL, KLF6, CCNE1, G6PD, FGF3, and CCND1, gains and losses in DEK, SMUG1, and FGF3, and loss of ABL1. Functionally, these genes belong to critical signaling pathways in breast tumorigenesis, including PI3K-Akt, transcriptional, and cell cycle regulatory pathways, which have not been previously considered in African breast cancer. These data pave the way to investigating the contribution of these signaling pathways to the aggressive clinical behavior of breast cancer in this population, and to design studies testing the potential utility of inhibiting these pathways to halt breast cancer progression.

Emerging data suggest that in addition to transcriptional repression, EZH2 exerts oncogenic functions in a subset of TNBCs by interacting with cytoplasmic proteins that regulate cell migration and invasion [18]. Our lab has reported that EZH2 is overexpressed in the nucleus in 42% and in the cytoplasm in 16% of Ghanaian invasive carcinomas, where it is associated significantly with TNBC status [16]. We have recently shown that activation of p38 mitogen activated kinase leads to EZH2 phosphorylation at threonine 367 with resulting accumulation of EZH2 protein in the cytoplasm and binding with vinculin and other cytoskeletal proteins [18]. Of the 17 genes with CNAs identified in this study, 6 (35%) are associated with cytoplasmic (gains of BCL2L1, AKT3, SMARCA4, and CCND1 as well as gains and losses of FGF3, and loss of ABL1) and 2 (12%) with nuclear EZH2 expression (gains in PDGFRA oncogene and loss of DEK tumor suppressor). Network analyses demonstrated a robustpredicted interaction between EZH2 and these genes, especially strong with DEK, SMARCA4, CCNE1, ABL1, and CCND1. These data are intriguing as there are several reports of EZH2-mediated regulation of these genes. For example, EZH2 has been reported to directly repress DEK in fibroblasts [22], and EZH2 and PDGFRA were found inversely associated in Merkel cell carcinomas [23], EZH2 inhibition was found to selectively kill SMARCA4 deficient cells in small cell carcinoma of the ovary [24], and EZH2 was shown to regulate cell cycle progression and the levels of cyclins in various cancer types, including our studies showing that EZH2 regulates CCND1 [25,26,27,28].

Our study shows that the most frequently altered genes in Ghanaian breast cancer patients are RECQL4 and SDHC, harboring copy number gains in 50% and 60% of invasive carcinomas in our cases. These findings were validated in a histopathologically well-characterized cohort of invasive carcinomas treated at the KATH in Kumasi, Ghana, where RECQL4 and SDHC proteins were overexpressed in 61.6% and 56% of 86 tumors with sufficient tissue for evaluation.

RECQL4 is mapped to 8q24, a gene desert flanked by MYC and PVT1. Amplification of this region is frequently associated with susceptibility to multiple tumor types, including breast and prostate [29, 30]. The mechanism of this susceptibility has been thought to occur through the amplification of MYC [31] or ncRNA PVT1 [32]. In our study, we observed MYC amplification in 2/11 samples compared to RECQL4 amplification in 6/11 samples, suggesting that in this cohort, gains of RECQL4 may occur independently of MYC. Several studies have demonstrated through loss-of-function in vitro studies that RECQL4 contributes to breast cancer proliferation and chemoprotection [33, 34]. Succinate dehydrogenase protein C (SDHC) is part of a family of metabolic enzymes that function in the citric acid cycle and electron transport chains. Germline inactivating mutations in SDHC result in accumulation of succinate and are associated with paragangliomas, renal cell carcinomas, and gastrointestinal stromal tumors [35]. While RECQL4 and SDHC have been studied in the context of breast cancer, they have not been previously considered in African populations.

The current study evidences the difficulties encountered in studying invasive breast tumors from western Sub-Saharan populations. While we began our study with 20 FFPE blocks of Ghanaian breast tumors, 9 were excluded due to low quality DNA and 3 additional had low DNA content requiring the use of a smaller sequencing panel. Despite these limitations, our team has generated a cohort of breast tissue samples from Ghana with adequate material for molecular studies. We have identified novel high-level CNAs in 17 genes with functions in tumorigenic pathways and associations to the oncoprotein EZH2. We validated the frequent overexpression of RECQL4 and SDHC tumors in this patient population. Collectively, these data provide the basis for further sequencing and clinical studies to better understand the pathobiology and offer therapeutic opportunities for breast cancer in African and AA women.