Introduction

With five highly conserved miRNAs (i.e. miR-141, miR-200a, miR-200b, miR-200c and miR-429), the miR-200 family is one of the most frequent groups of miRNAs whose expression is altered in cancer. Two gene clusters located at different chromosomes code for miR-200a/miR-200b/miR-429 (at chr1p36) and for miR-200c/miR-141 (at chr12p13). miR-200b, miR-200c and miR-429 share an almost identical seed sequence “AAUACUG”, while the seed of miR-200a and miR-141 differentiates from the other members for only one nucleotide “AACACUG”1. The expression regulation of the miR-200 family was associated with the i) suppression of EMT and tumor metastases through the miR-200/ZEB1-2 axis2, ii) inhibition of cancer stem cell self-renewal and differentiation3, and iii) reversal of chemoresistance4. Despite comprehensive literature that largely described the molecular function of miR-200 family, the precise role of these miRNAs in cancer has not yet completely understood, with some reports suggesting more prevalent oncosuppressive roles while other studies some possible oncogenic functions. Similar inconsistencies are also evident in few studies evaluating miR-200 family expression in tissues and their potential role as prognostic biomarkers5. For example, low miR-200b/c expression in breast cancer was correlated with death3,4,5, whereas high miR-200a expression was correlated with the development of distant metastases6. By comparing miRNA expression in normal breast tissue, in situ carcinoma, non-metastatic and metastatic breast cancers, Sánchez-Cid et al. also found that miR-200a/b were increased in metastatic tumors as compared to non-metastatic cancer7.

These contrasting results led us to perform an extensive expression analysis of the entire miR-200 family in two large cohorts of breast cancer patients, the first collected in our hospital (H-CSS cohort, N = 283) and the second from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA cohort, N = 451), in order to clarify the extent of miR-200 family deregulation in breast cancer, and its specific association with clinicopathological parameters.

Results

Patients and treatment

Table 1 summarizes the clinicopathological information obtained from the review of medical records and descriptive statistics for the 287 enrolled cases (H-CSS cohort). Metastases at diagnosis were present in 16 cases; among non-metastatic patients (N = 271), 55 experienced disease progression (Incidence Rate, IR of 3.5 events per 100 person-years), and 30 of them died for the disease (IR of 1.7 events per 100 person-years). The median time to disease progression was 69.8 months (IQR: 33.8–107.5), whereas the overall follow-up time was 75.1 months (IQR: 40.6–109.7). Hormone receptor positive breast tumors were defined as cases expressing estrogen (ER) or progesterone (PgR) receptors in ≥ 1% of neoplastic cells as indicated by international guidelines8, and HER2 status assessment was carried out according to standard recommendations9. Cases were staged according to the World Health Organization staging system version 7th10. The surrogate molecular classification was performed as described by Pasculli et al.11. Overall, 104 cases (38%) were classified as Luminal A, 96 cases (35%) as Luminal B; 34 cases (12%) were HER2-amplified, and 38 cases (14%) were Triple Negative (Table 1). Fifteen cases were not classified because the HER2 and or ki67 status was not reported in the medical records. All patients received breast-conserving surgery or total mastectomy, plus sentinel node biopsy or complete axillary dissection. Post-surgery treatments were performed according to the following guidelines: San Gallen, NCCN and ASCO. Recurrence was defined as evidence of loco-regional and/or distant disease over 4 months from diagnosis and after curative-intent surgical treatment.

Table 1 Clinicopathological characteristics of the H-CSS patient cohort (N = 287).

Selection of TCGA-BReast invasive CAncer (TCGA-BRCA) cohorts

We selected a cohort of 1053 women with breast cancer not treated with neoadjuvant therapy from the TCGA data portal (https://portal.gdc.cancer.gov/). All tumors had available expression profile for all the five miRNAs of the miR-200 family. The log2 read counts were used for miRNA expression analysis. The TCGA-BRCA cohort was harmonized with the H-CSS cohort by using a two-step approach (Supplemental Fig. 2 and Supplemental Table 1): (i) the TCGA-BRCA cohort was limited to those histotypes (NST and ILC) and stages (I-IIa/b, IIIa/c and IV) represented in H-CSS (N = 822); (ii) we performed a random disproportionate sampling to align the distribution of histotypes and stages between the two cohorts; weights were overall based on H-CSS distribution, with the exception that we reduced the weights for late stages not to deplete the final cohort’s size extensively. We ultimately selected 451 patients for all subsequent analyses. Characteristics of these cohorts are reported in Supplemental Table 1.

Analysis of variance, run through a general linear model, was performed to evaluate the association of miRNAs expression and clinicopathological features, considering both main effects and interaction terms. Cox regression model was implemented to estimate hazard ratios for overall survival, defined as the time from the date of tumor resection until death from any cause. We also downloaded miRNA expression data for 104 available normal breast tissues from TCGA-BRCA data portal. Forty-eight of these normal samples were matched to 48 tumor counterparts among the cohort of 451 women considered.

miR-200 family expression in tumor samples as compared with normal tissues

Following evaluation of RNA quality, 283 out of the 287 samples from the H-CSS cohort showing an RNA Integrity Number (RIN) > 7.0 were suitable for the analysis (Supplemental Fig. 1). Thus, the expression profile of the entire miR-200 family could be performed in 283 breast cancers, and in 13 normal breast tissues (NBTs) obtained from reductive mammoplasty. As shown in Supplemental Table 2A and Fig. 1, all miRNAs were significantly overexpressed in tumors (p < 0.001) when compared to NBTs. Furthermore, almost all miR-200 family members (except for miR-200b-3p) were overexpressed in tumors as compared to normal tissue adjacent to tumor (Margin) (Supplemental Table 2B). Accordingly, in the TCGA-BRCA cohort, we confirmed the overexpression of miR-200 family in breast tumors vs. normal breast tissues (p < 0.0001) (Supplemental Table 3A and Fig. 2) and in matched tumor-normal pairs (N = 48; Supplemental Table 3B).

Figure 1
figure 1

miR-200 family members expression in H-CSS cohort in tumor tissues (T), normal breast tissues distant from tumor (M) and normal breast tissue from reductive mammoplasty (NBT). Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

Figure 2
figure 2

miR-200 family members expression in TCGA-BRCA cohort in tumor tissues (T), and normal breast tissues distant from tumor (N). Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

Association among miR-200 family expression and tumor clinicopathological features

Next, we analyzed the correlation of miR-200 family expression with the clinicopathological characteristics in both the H-CSS (Table 2) and TCGA-BRCA cohorts (Supplemental Table 4A). In the H-CSS cohort (Table 2), the analysis across breast cancer (BC) molecular subtypes (Luminal A, Luminal B, HER2-amplified, and basal/Triple-negative (TNEG); see “Methods”) showed a lower expression of miR-141-3p (p = 0.0306) and miR-200a-3p (p = 0.0381) in those tumors associated to more aggressive subtypes (e.g. LUMB, HER2-amplified and TNEG) Table 2). In particular, miR-141-3p was less expressed in HER2-amplified and TNEG tumors, while miR-200a-3p was less expressed in HER2-amplified, Luminal B and TNEG subtypes (Fig. 3). Consistently, in the TCGA-BRCA cohort, we found that miR-141-3p was downregulated in the Normal-like subgroup (p = 0.0210) while miR-200a-3p was downregulated in HER2-amplified and Luminal B tumors (p = 0.0200). In addition, miR-200b-3p was found down regulated in the HER2-amplified subgroup (p = 0.0060) (Supplemental Table 4A; Fig. 4).

Table 2 Association between miRNAs and clinicopathological variables.
Figure 3
figure 3

miR-200 family members expression within the surrogate molecular classification subgroups identified in the H-CSS cohort: HER2 amplified tumors (HER2), Luminal A (LUMA), Luminal B (LUMB), Triple Negative (TNEG). Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

Figure 4
figure 4

miR-200 family members differential expression within the intrinsic molecular classification subgroups in the TCGA-BRCA cohort: HER2 enriched tumors (HER2), Luminal A (LUMA), Luminal B (LUMB), Basal like, and NORMAL LIKE tumors. Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

In the H-CSS cohort, miR-141-3p, miR-200a-3p and miR-429 expression was increased in advanced stage disease (stage IV) (p = 0.037, p = 0.0011 and p = 0.0078 respectively) (Table 2). In the TCGA cohort, miR-200a-3p (p = 0.0128), miR-200b-3p (p = 0.0009) and miR-200c-3p (p = 0.0013) were increased in invasive lobular carcinoma (Supplemental Table 4A). To evaluate whether these differences may affect the association with molecular subtypes, we performed a multivariable analysis adjusting for stage, histotype and molecular subtype. Overall, our results indicate that miR-200a-3p and miR-141-3p remain significantly associated with the molecular subtypes in breast cancer after the adjustments in the H-CSS cohort (Table 3), whereas an association with miR-141-3p and miR-200c-3p was found in the TCGA cohort (Supplemental Table 4B).

Table 3 Univariate and multivariable analysis of the association between miRNAs and clinicopathological characteristics in the TCGA cohort.
Table 4 Prognostic accuracy of each outcome-specific weighted miRNA score at median and maximum time horizons.

Evaluation of miR-200 family prognostic value in breast cancer cases

The association with time-to-event outcomes (i.e. CSS, DFS, and MFS) was evaluated in the H-CSS cohort without metastases at diagnosis and with complete information about survival outcomes (Supplemental Fig. 1). As shown in Supplemental Tables 5 and 6, tumor dimension, stage, hormone receptor status, HER2-amplification, and surrogate molecular classification were associated with DFS, MFS, and CSS, while high Ki67 was associated with DFS and MFS only. No statistically significant associations were found with any miRNA of miR-200 family in the overall population. Next, we investigated the prognostic role of the miR-200 family in the TCGA-BRCA cohort by considering the subgroup of patients without metastases at diagnosis (cohort C1, N = 435; Supplemental Fig. 2). We scored 56 deaths and a median follow-up for surviving women of 2.4 years (Q1 = 1.2; Q3 = 4.6 years). Although this cohort showed a shorter follow-up and limited number of events, we were able to confirm the prognostic role of stage, estrogen and progesterone receptor status, HER2 status, and molecular subtypes (Supplemental Table 7A). In line with the results obtained in the H-CSS, we did not observe a statistically significant association of any of the miR-200 family members with overall survival (OS) in multivariate analysis (Supplemental Table 7B). These figures were confirmed in the larger cohort of N = 806 subjects without metastases (cohort B1, Supplemental Fig. 2), scoring 101 events, and with a follow-up length (median = 2.4 years; Q1 = 1.2; Q3 = 4.7) comparable to the smaller C1 cohort (Supplemental Table 7A, B).

Table 5 Estimated regression coefficients used to compute multiple weighted miRNA scores.

Last, we evaluated whether the combined expression of miR-200 family members was able to predict survival outcomes. As shown in Table 4, when all miRNAs were jointly considered for the building of the weighted scores, only a slight predictive accuracy on H-CSS outcome at 12 years was found (survival c-statistic: 0.646; 95%CI 0.538–0.754). Regression coefficients (weights) used to calculate the scores were reported in Table 5.

Discussion

In the attempt to elucidate the extent of miR-200 family deregulation in breast cancer and, hence, its potentiality as clinically significant biomarker, we profiled a large series of breast cancer cases with a long follow-up (H-CSS cohort) and the TCGA-BRCA cohorts. First, we found in both H-CSS and TCGA-BRCA cohorts that the global miR-200 family expression is increased in tumors as compared with normal breast tissue or margin. Since miR-200 family is mainly expressed in epithelial cells, these results are most likely due to the enrichment in fibrous connective adipose tissue typical of the normal breast. The enrichment in normal breast epithelial component might also explain some inconsistencies among literature data. Consistently with our data, Amorim et al.12 found an increased expression of miR-200b-3p and miR-141-3p in tumors as compared with margin, whereas other studies reported reduced expression of miR-200b-3p3 and miR-200c-3p5 in tumors as compared with margins.

In both H-CSS and TCGA-BRCA cohorts, we found a differential expression of miR-200 family members within the molecular subgroups identified by the surrogate molecular classification (H-CSS cohort) and intrinsic molecular subtypes (TCGA-BRCA cohort). In particular, lower expression of miR-141-3p/miR-200a-3p was associated with HER2-amplified, Luminal B, and Triple Negative (H-CSS cohort) or Normal Like (TCGA-BRCA cohort) breast cancer subtypes. This is consistent with reports describing that miR-200 family loss of expression unleashes ZEB1 expression13, which in turn induces epithelial-to-mesenchymal transition (EMT), which is an important step forward in the initial phase of the metastatic spreading from the primary tumor.

Functionally speaking, the association between miR-200 family and metastatic processes have been widely investigated in different tumor types, including breast cancer and, once again, conflicting results have been reported. Indeed, the ectopic expression of miR-200a and miR-200b was shown to inhibit EMT features in undifferentiated, non-tumorigenic breast cancer cells, and impair proliferation, migration, and invasion in triple negative breast cancer14. Accordingly, miR-200c/141 cluster deletion affects breast cancer stem cell heterogeneity by promoting the generation of EMT-like stem cells, which resulted in increased tumor metastasis3. miR-200 family members were also found to support Epidermal Growth Factor (EGF)-driven invasion, with the miR-200bc/429 cluster showing stronger effects than the miR-200a/141 cluster1,15. Moreover, miR-200a suppressed cell proliferation in breast cancer by targeting mitochondrial transcription factor A16, and impaired EMT-like transformation, thus migration, by regulating SIRT1 in breast epithelial cells17. Nevertheless, while these studies likely suggest a tumor suppressor role for miR-200 family members, others indicate that higher expression of miR-200 family members might induce rather than prevent metastases formation. For instance, the forced expression of miR-200a/miR-200b in MCF10 mammary cells induced an enhanced epithelial program, aldehyde dehydrogenase (ALDH) activity, mammosphere growth and ability to form branched tubuloalveolar structures while promoting orthotopic tumor growth and lung colonization in vivo, suggesting that miR-200 family members may promote traits of highly proliferative breast luminal progenitor cells7. Likewise, miR-200c/141 cluster overexpression induced by SerpinB2 was shown to foster breast cancer cell metastasis18. Furthermore, miR-200a overexpression was found to enhance malignant transformation of immortalized human mammary epithelial cells19, to protect tumor cells from apoptosis, and promote metastases4 and chemoresistance20. Altogether, these discrepancies lead to hypothesize that the biological functions of miR-200 family members may depend on the cellular context, tumor molecular subtype, and stage of tumor progression21.

In our study, the association between miR-200 family expression and patients’ outcome was evaluated in terms of DFS, MFS, and CSS in the H-CSS cohort including 283 non-metastatic breast cancer cases with a median follow-up of 75 months. In the TCGA-BRCA cohort, only overall survival data were available instead. Our analyses did confirm the prognostic role of lymph node status, estrogen and progesterone receptors status, HER2 status, and molecular subtypes in both H-CSS and TCGA-BRCA cohorts. However, we did not observe any statistically significant association of the miR-200 family members with patients’ outcome in multivariable analyses. Indeed, in the H-CSS cohort, the combined expression of miR-200 family members only showed a slight predictive accuracy on CSS outcome at 12 years (Table 4).

To date, only a minority of studies22 have performed the expression analysis of miR-200 family members in breast cancer tissues, and evaluated its association with patients’ outcomes (Fig. 5, and Supplemental Table S8). Among those, only one study reported an hazard ratio of 0.231 (95%CI 0.094–0.564) in univariable analysis for miR-200c in a patient cohort including only luminal tumors12. Other three studies23,24,25 evaluated the association between miR-200 family members expression in plasma samples and patient’s outcome (Supplemental Table S8). In particular, Medhavan et al.24 found an association between increased expression of miR-200a, miR-200b and miR-200c and higher risk of overall mortality in univariable analyses (Fig. 5).

Figure 5
figure 5

Forest plot of hazard ratios (HR) for studies on plasma samples from breast cancer patients. Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

Conclusion

To the best of our knowledge, this is the first study evaluating the expression of all miR-200 family members in breast cancer tissues in order to identify potential combination biomarkers of clinical relevance. Our results suggest a differential expression of miR-200 family in breast cancer as compared to normal breast, and within the breast cancer molecular subgroups identified by either surrogate classification (H-CSS cohort) or intrinsic molecular classification (TCGA-BRCA cohort). Nevertheless, the correlation analyses with breast cancer patients’ prognosis exclusively found a weak predictive accuracy of the combined expression of miR-200 family on CSS outcome at 12 years in the H-CSS cohort. Although these results seem not to encourage the use of miR-200 family members as combination biomarkers in breast cancer, we cannot rule out that such a role might be held within a single breast cancer subgroup. Indeed, in the H-CSS cohort the number of cases and event outcome is not sufficient for subgroup analyses, whereas only partial information about overall survival and no data on progression are available within the TCGA-BRCA cohort. Thus, this possibility needs to be further investigated in studies specifically designed to evaluate miR-200 family expression in each of the breast cancer subtypes.

Materials and methods

Study design, setting and eligibility criteria

This study is part of the project TRANSCAN Joint Transnational Call (JTC) 2013-BREMIR initiated in 2015 at the Fondazione IRCCS Casa Sollievo della Sofferenza (H-CSS), aimed to identify novel biomarkers predicting disease progression and metastases development in breast cancer patients. In this study, we evaluated the miR-200 family expression in a retrospective consecutively collected cohort of 287 breast cancer cases (H-CSS cohort) with a median age of 60 years (Supplemental Table 1).

We conducted the study according to the REporting of tumor MARKer Studies (REMARK) guidelines8,26, and a prospectively written research (TRANSCAN-BREMIR) plan. Breast cancer tissues were collected between January 2006 and December 2014 at the Breast-Unit, Fondazione IRCCS Casa Sollievo della Sofferenza.

Following pathological evaluation, tissue samples were snap-frozen in liquid nitrogen and stored at − 80 °C. For legal reasons, only women older than 18 years of age with tumors greater than 1.0 cm in diameter were included in the study. For each sample, a 5 μm hematoxylin/eosin stained section was visually inspected by light microscopy to select tumor areas with at least 70% viable cancer cells rather than normal specimens, obtained from reductive mammoplasty, to check for the absence of tumor cells among normal epithelial. The study methodologies using these samples were carried out following the international of Helsinki Declaration 7th revision (2013, EU Directive 2004/23/EC) and the Italian (D. Lgs. 30/06/2003, n. 196) regulations for research on human subjects. All experimental procedures of this study were approved by the Ethical Committee of the Fondazione IRCCS Casa Sollievo della Sofferenza (Prot N 140/CE). A written informed consent was obtained from all patients following the experimental protocol approved by the Ethical Committee.

RNA isolation and RT-qPCR analysis

RNA was isolated from H-CSS tissue samples by Trizol reagent (Invitrogen) according to the manufacturer’s instructions. Total RNA concentration was determined by the absorbance measurement at 260 nm and 280 nm using the NanoDropTM 1000 spectrophotometer (Thermo Fisher Scientific). The RNA quality and integrity were analyzed through 2100 Expert Analyzer (Agilent Technology), and only RNAs with RIN (RNA Integrity Number) ≥ 7.0 were considered acceptable. Then, 10 ng of total RNA was reverse transcribed to single stranded cDNA by using TaqMan MicroRNA Reverse Transcription Kit (Thermo Fisher Scientific) and 5 × specific stem-loop RT primers for both individual miR-200 family members and the endogenous control, according to the manufacturer’s instructions. RT positive and negative controls were included in each batch of reactions. To assess miR-200 family expression levels in the H-CSS cohort, we applied a relative quantification method with a standard curve27. The expression levels of each miR-200 family member were assessed by using TaqMan MicroRNA Assays that were as follows: hsa-miR-200a-3p, assay ID: 000502; hsa-miR-200b-3p, assay ID: 002251; hsa-miR-200c-3p, assay ID: 002300; hsa-miR-141-3p, assay ID: 000463; hsa-miR-429, assay ID: 001024, and normalized to RNU48 endogenous control, assay ID: 001006 (Thermo Fisher Scientific).

Each qPCR run was performed by using 0.5 μl of TaqMan microRNA (20X), 5 μl of TaqMan Universal PCR Master Mix II, No AmpErase UNG, and 1 μl of cDNA. The PCR conditions were as follows: at 95 °C for 10 min, followed by 40 cycles (95 °C for 15 s, 60 °C for 1 min). All samples were run in triplicates. Each plate included positive and negative controls of reverse transcription and multiple water blanks. qPCR reactions were performed on ABI PRISM 7900HT Sequence Detection System and the SDS 2.4 software (Thermo Fisher Scientific) was used for post-run analyses. For each miR-200 family member and RNU48 control, standard curves were constructed by plotting the threshold cycle (Ct) values against log10 of the copy number, and fitting by linear least square regression. For each sample, miR-200 family member expression was determined as the ratio of any single miR-200 family member’s copy number to the RNU48 copy number. Then, it was multiplied by 1000 for more straightforward tabulation (i.e. miRNA target/RNU48) × 1000)27.

Statistical analysis

Patients’ clinicopathological characteristics were reported as median along with interquartile range (IQR, i.e. first-third quartiles) or frequencies and percentages for continuous and categorical variables, respectively. Normal distribution assumption of miRNA expression was evaluated by Q-Q plots and Shapiro-Wilks test, and a log-normal distribution for all miR-200 family members was detected. The two-sample t test (or ANOVA model as appropriate) was used to assess comparisons of log-transformed miRNA expression among patient groups. Pearson correlation coefficient was estimated to assess the correlation between natural log of miRNA expression and continuous variables. Time-to-event analyses were performed by univariable and multivariable proportional hazards Cox regression models and risks were reported as Hazard Ratios (HR) along with their 95% Confidence Interval (95%CI).

The individual overall follow-up time was defined as the time between the enrollment date (i.e. at the time of snap-frozen fresh tissue collection) and the occurrence of the death due to cancer (Cancer Specific Survival, CSS), whereas the individual time to tumor progression or distant metastasis was defined as the time between the enrollment date and the occurrence of the first disease progression (Disease Free Survival, DFS), or the first distant metastasis (Metastasis Free Survival, MFS). For patients who did not experience any event as above, their individual follow-up time was defined as the time between the enrollment date and the end of the observational period (i.e. last available examination).

Furthermore, annual mortality and disease progression rates were defined as the number of events divided by the number of person-years × 100. When each miRNA expression was considered as the main covariate into a univariable Cox model, HRs were reported with respect to patients groups defined by miRNA’s median value (i.e. above vs. below the median). Moreover, multivariable Cox models were also performed with the inclusion of lymph node and surrogate molecular classification as further covariates. A weighted miRNA score was computed for each survival outcome at issue by the assessment of a multivariable Cox model, which included all miRNAs (natural log of expression) of the miR-200 family as main covariates. Weighted scores were calculated as the linear combination of the regression coefficients by the value of each miRNA (natural log of expression). The prognostic accuracy of each miRNA score was assessed at 7 (i.e. the median time horizon) and at 12 years (i.e. the maximum time horizon) by survival C-statistic, along with its 95% CIs derived following 1000 perturbation-resampling28. A two-sided p value < 0.05 was considered for statistical significance. All statistical analyses were performed using SAS Release 9.4 (SAS Institute, Cary, NC, USA). Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).

Ethics approval and consent to participate

The study methodologies using human samples were carried out following the international of Helsinki Declaration 7th revision (2013, EU Directive 2004/23/EC) and the Italian (D. Lgs. 30/06/2003, n. 196) regulations for research on human subjects. All experimental procedures of this study were approved by the Ethical Committee of the IRCCS Casa Sollievo della Sofferenza (Prot N 140/CE). The informed consent was obtained from all patients following the experimental protocol approved by the Ethical Committee.