Combined analysis of miR-200 family and its significance for breast cancer

While the molecular functions of miR-200 family have been deeply investigated, a role for these miRNAs as breast cancer biomarkers remains largely unexplored. In the attempt to clarify this, we profiled the miR-200 family members expression in a large cohort of breast cancer cases with a long follow-up (H-CSS cohort) and in TCGA-BRCA cohort. Overall, miR-200 family was found upregulated in breast tumors with respect to normal breast tissues while downregulated in more aggressive breast cancer molecular subtypes (i.e. Luminal B, HER2 and triple negative), consistently with their function as repressors of the epithelial-to-mesenchymal transition (EMT). In particular miR-141-3p was found differentially expressed in breast cancer molecular subtypes in both H-CSS and TCGA-BRCA cohorts, and the combined analysis of all miR-200 family members demonstrated a slight predictive accuracy on H-CSS cancer specific survival at 12 years (survival c-statistic: 0.646; 95%CI 0.538–0.754).

These contrasting results led us to perform an extensive expression analysis of the entire miR-200 family in two large cohorts of breast cancer patients, the first collected in our hospital (H-CSS cohort, N = 283) and the second from The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA cohort, N = 451), in order to clarify the extent of miR-200 family deregulation in breast cancer, and its specific association with clinicopathological parameters.

Results
Patients and treatment. Table 1 summarizes the clinicopathological information obtained from the review of medical records and descriptive statistics for the 287 enrolled cases (H-CSS cohort). Metastases at diagnosis were present in 16 cases; among non-metastatic patients (N = 271), 55 experienced disease progression (Incidence Rate, IR of 3.5 events per 100 person-years), and 30 of them died for the disease (IR of 1.7 events per 100 person-years). The median time to disease progression was 69.8 months (IQR: 33.8-107.5), whereas the overall follow-up time was 75.1 months (IQR: 40.6-109.7). Hormone receptor positive breast www.nature.com/scientificreports/ tumors were defined as cases expressing estrogen (ER) or progesterone (PgR) receptors in ≥ 1% of neoplastic cells as indicated by international guidelines 8 , and HER2 status assessment was carried out according to standard recommendations 9 . Cases were staged according to the World Health Organization staging system version 7th 10 . The surrogate molecular classification was performed as described by Pasculli et al. 11 . Overall, 104 cases (38%) were classified as Luminal A, 96 cases (35%) as Luminal B; 34 cases (12%) were HER2-amplified, and 38 cases (14%) were Triple Negative (Table 1). Fifteen cases were not classified because the HER2 and or ki67 status was not reported in the medical records. All patients received breast-conserving surgery or total mastectomy, plus sentinel node biopsy or complete axillary dissection. Post-surgery treatments were performed according to the following guidelines: San Gallen, NCCN and ASCO. Recurrence was defined as evidence of loco-regional and/ or distant disease over 4 months from diagnosis and after curative-intent surgical treatment.
Selection of TCGA-BReast invasive CAncer (TCGA-BRCA) cohorts. We selected a cohort of 1053 women with breast cancer not treated with neoadjuvant therapy from the TCGA data portal (https ://porta l.gdc. cance r.gov/). All tumors had available expression profile for all the five miRNAs of the miR-200 family. The log2 read counts were used for miRNA expression analysis. The TCGA-BRCA cohort was harmonized with the H-CSS cohort by using a two-step approach (Supplemental Fig. 2 and Supplemental Table 1): (i) the TCGA-BRCA cohort was limited to those histotypes (NST and ILC) and stages (I-IIa/b, IIIa/c and IV) represented in H-CSS (N = 822); (ii) we performed a random disproportionate sampling to align the distribution of histotypes and stages between the two cohorts; weights were overall based on H-CSS distribution, with the exception that we reduced the weights for late stages not to deplete the final cohort's size extensively. We ultimately selected 451 patients for all subsequent analyses. Characteristics of these cohorts are reported in Supplemental Table 1.
Analysis of variance, run through a general linear model, was performed to evaluate the association of miRNAs expression and clinicopathological features, considering both main effects and interaction terms. Cox regression model was implemented to estimate hazard ratios for overall survival, defined as the time from the date of tumor resection until death from any cause. We also downloaded miRNA expression data for 104 available normal breast tissues from TCGA-BRCA data portal. Forty-eight of these normal samples were matched to 48 tumor counterparts among the cohort of 451 women considered.

miR-200 family expression in tumor samples as compared with normal tissues. Following eval-
uation of RNA quality, 283 out of the 287 samples from the H-CSS cohort showing an RNA Integrity Number (RIN) > 7.0 were suitable for the analysis (Supplemental Fig. 1). Thus, the expression profile of the entire miR-200 family could be performed in 283 breast cancers, and in 13 normal breast tissues (NBTs) obtained from reductive mammoplasty. As shown in Supplemental Table 2A and Fig. 1, all miRNAs were significantly overexpressed in tumors (p < 0.001) when compared to NBTs. Furthermore, almost all miR-200 family members (except for miR-200b-3p) were overexpressed in tumors as compared to normal tissue adjacent to tumor (Margin) (Supplemental Table 2B). Accordingly, in the TCGA-BRCA cohort, we confirmed the overexpression of miR-200 family in breast tumors vs. normal breast tissues (p < 0.0001) (Supplemental Table 3A and Fig. 2) and in matched tumor-normal pairs (N = 48; Supplemental Table 3B).
In the H-CSS cohort, miR-141-3p, miR-200a-3p and miR-429 expression was increased in advanced stage disease (stage IV) (p = 0.037, p = 0.0011 and p = 0.0078 respectively) ( Table 2). In the TCGA cohort, miR-200a-3p (p = 0.0128), miR-200b-3p (p = 0.0009) and miR-200c-3p (p = 0.0013) were increased in invasive lobular carcinoma (Supplemental Table 4A). To evaluate whether these differences may affect the association with molecular subtypes, we performed a multivariable analysis adjusting for stage, histotype and molecular subtype. Overall, our results indicate that miR-200a-3p and miR-141-3p remain significantly associated with the molecular subtypes in breast cancer after the adjustments in the H-CSS cohort (Table 3), whereas an association with miR-141-3p and miR-200c-3p was found in the TCGA cohort (Supplemental Table 4B). www.nature.com/scientificreports/ showed a shorter follow-up and limited number of events, we were able to confirm the prognostic role of stage, estrogen and progesterone receptor status, HER2 status, and molecular subtypes (Supplemental Table 7A). In line with the results obtained in the H-CSS, we did not observe a statistically significant association of any of the miR-200 family members with overall survival (OS) in multivariate analysis (Supplemental Table 7B). These figures were confirmed in the larger cohort of N = 806 subjects without metastases (cohort B1, Supplemental Fig. 2), scoring 101 events, and with a follow-up length (median = 2.4 years; Q1 = 1.2; Q3 = 4.7) comparable to the smaller C1 cohort (Supplemental Table 7A, B). Last, we evaluated whether the combined expression of miR-200 family members was able to predict survival outcomes. As shown in Table 4, when all miRNAs were jointly considered for the building of the weighted scores, only a slight predictive accuracy on H-CSS outcome at 12 years was found (survival c-statistic: 0.646; 95%CI 0.538-0.754). Regression coefficients (weights) used to calculate the scores were reported in Table 5.

Discussion
In the attempt to elucidate the extent of miR-200 family deregulation in breast cancer and, hence, its potentiality as clinically significant biomarker, we profiled a large series of breast cancer cases with a long follow-up (H-CSS cohort) and the TCGA-BRCA cohorts. First, we found in both H-CSS and TCGA-BRCA cohorts that the global miR-200 family expression is increased in tumors as compared with normal breast tissue or margin. Since miR-200 family is mainly expressed in epithelial cells, these results are most likely due to the enrichment in fibrous connective adipose tissue typical of the normal breast. The enrichment in normal breast epithelial component might also explain some inconsistencies among literature data. Consistently with our data, Amorim et al. 12 found In both H-CSS and TCGA-BRCA cohorts, we found a differential expression of miR-200 family members within the molecular subgroups identified by the surrogate molecular classification (H-CSS cohort) and intrinsic molecular subtypes (TCGA-BRCA cohort). In particular, lower expression of miR-141-3p/miR-200a-3p was associated with HER2-amplified, Luminal B, and Triple Negative (H-CSS cohort) or Normal Like (TCGA-BRCA cohort) breast cancer subtypes. This is consistent with reports describing that miR-200 family loss of expression unleashes ZEB1 expression 13 , which in turn induces epithelial-to-mesenchymal transition (EMT), which is an important step forward in the initial phase of the metastatic spreading from the primary tumor.
Functionally speaking, the association between miR-200 family and metastatic processes have been widely investigated in different tumor types, including breast cancer and, once again, conflicting results have been reported. Indeed, the ectopic expression of miR-200a and miR-200b was shown to inhibit EMT features in undifferentiated, non-tumorigenic breast cancer cells, and impair proliferation, migration, and invasion in triple negative breast cancer 14 . Accordingly, miR-200c/141 cluster deletion affects breast cancer stem cell heterogeneity by promoting the generation of EMT-like stem cells, which resulted in increased tumor metastasis 3 . miR-200 family members were also found to support Epidermal Growth Factor (EGF)-driven invasion, with the miR-200bc/429 cluster showing stronger effects than the miR-200a/141 cluster 1,15 . Moreover, miR-200a suppressed cell proliferation in breast cancer by targeting mitochondrial transcription factor A 16 , and impaired EMT-like transformation, thus migration, by regulating SIRT1 in breast epithelial cells 17 . Nevertheless, while these studies likely suggest a tumor suppressor role for miR-200 family members, others indicate that higher expression  www.nature.com/scientificreports/ of miR-200 family members might induce rather than prevent metastases formation. For instance, the forced expression of miR-200a/miR-200b in MCF10 mammary cells induced an enhanced epithelial program, aldehyde dehydrogenase (ALDH) activity, mammosphere growth and ability to form branched tubuloalveolar structures while promoting orthotopic tumor growth and lung colonization in vivo, suggesting that miR-200 family members may promote traits of highly proliferative breast luminal progenitor cells 7 . Likewise, miR-200c/141 cluster overexpression induced by SerpinB2 was shown to foster breast cancer cell metastasis 18 . Furthermore, miR-200a overexpression was found to enhance malignant transformation of immortalized human mammary epithelial cells 19 , to protect tumor cells from apoptosis, and promote metastases 4 and chemoresistance 20 . Altogether, these discrepancies lead to hypothesize that the biological functions of miR-200 family members may depend on the cellular context, tumor molecular subtype, and stage of tumor progression 21 . In our study, the association between miR-200 family expression and patients' outcome was evaluated in terms of DFS, MFS, and CSS in the H-CSS cohort including 283 non-metastatic breast cancer cases with a median follow-up of 75 months. In the TCGA-BRCA cohort, only overall survival data were available instead. Our analyses did confirm the prognostic role of lymph node status, estrogen and progesterone receptors status, HER2 status, and molecular subtypes in both H-CSS and TCGA-BRCA cohorts. However, we did not observe any statistically significant association of the miR-200 family members with patients' outcome in multivariable analyses. Indeed, in the H-CSS cohort, the combined expression of miR-200 family members only showed a slight predictive accuracy on CSS outcome at 12 years (Table 4).
To date, only a minority of studies 22 have performed the expression analysis of miR-200 family members in breast cancer tissues, and evaluated its association with patients' outcomes (Fig. 5, and Supplemental Table S8). Among those, only one study reported an hazard ratio of 0.231 (95%CI 0.094-0.564) in univariable analysis for miR-200c in a patient cohort including only luminal tumors 12 . Other three studies [23][24][25] evaluated the association between miR-200 family members expression in plasma samples and patient's outcome (Supplemental Table S8). www.nature.com/scientificreports/ In particular, Medhavan et al. 24 found an association between increased expression of miR-200a, miR-200b and miR-200c and higher risk of overall mortality in univariable analyses (Fig. 5).

Conclusion
To the best of our knowledge, this is the first study evaluating the expression of all miR-200 family members in breast cancer tissues in order to identify potential combination biomarkers of clinical relevance. Our results suggest a differential expression of miR-200 family in breast cancer as compared to normal breast, and within the breast cancer molecular subgroups identified by either surrogate classification (H-CSS cohort) or intrinsic molecular classification (TCGA-BRCA cohort). Nevertheless, the correlation analyses with breast cancer patients' prognosis exclusively found a weak predictive accuracy of the combined expression of miR-200 family on CSS outcome at 12 years in the H-CSS cohort. Although these results seem not to encourage the use of miR-200 family members as combination biomarkers in breast cancer, we cannot rule out that such a role might be held within a single breast cancer subgroup. Indeed, in the H-CSS cohort the number of cases and event outcome is not sufficient for subgroup analyses, whereas only partial information about overall survival and no data on   www.nature.com/scientificreports/ progression are available within the TCGA-BRCA cohort. Thus, this possibility needs to be further investigated in studies specifically designed to evaluate miR-200 family expression in each of the breast cancer subtypes.

Materials and methods
Study design, setting and eligibility criteria. This study is part of the project TRANSCAN Joint Transnational Call (JTC) 2013-BREMIR initiated in 2015 at the Fondazione IRCCS Casa Sollievo della Sofferenza (H-CSS), aimed to identify novel biomarkers predicting disease progression and metastases development in breast cancer patients. In this study, we evaluated the miR-200 family expression in a retrospective consecutively collected cohort of 287 breast cancer cases (H-CSS cohort) with a median age of 60 years (Supplemental Table 1). We conducted the study according to the REporting of tumor MARKer Studies (REMARK) guidelines 8,26 , and a prospectively written research (TRANSCAN-BREMIR) plan. Breast cancer tissues were collected between January 2006 and December 2014 at the Breast-Unit, Fondazione IRCCS Casa Sollievo della Sofferenza.
Following pathological evaluation, tissue samples were snap-frozen in liquid nitrogen and stored at − 80 °C. For legal reasons, only women older than 18 years of age with tumors greater than 1.0 cm in diameter were included in the study. For each sample, a 5 μm hematoxylin/eosin stained section was visually inspected by light microscopy to select tumor areas with at least 70% viable cancer cells rather than normal specimens, obtained from reductive mammoplasty, to check for the absence of tumor cells among normal epithelial. The study methodologies using these samples were carried out following the international of Helsinki Declaration 7th revision RNA isolation and RT-qPCR analysis. RNA was isolated from H-CSS tissue samples by Trizol reagent (Invitrogen) according to the manufacturer's instructions. Total RNA concentration was determined by the absorbance measurement at 260 nm and 280 nm using the NanoDropTM 1000 spectrophotometer (Thermo Fisher Scientific). The RNA quality and integrity were analyzed through 2100 Expert Analyzer (Agilent Technology), and only RNAs with RIN (RNA Integrity Number) ≥ 7.0 were considered acceptable. Then, 10 ng of total RNA was reverse transcribed to single stranded cDNA by using TaqMan MicroRNA Reverse Transcription Kit Table 4. Prognostic accuracy of each outcome-specific weighted miRNA score at median and maximum time horizons. *95% confidence interval after 1000 perturbation-resamplings of the data.

Statistical analysis.
Patients' clinicopathological characteristics were reported as median along with interquartile range (IQR, i.e. first-third quartiles) or frequencies and percentages for continuous and categorical variables, respectively. Normal distribution assumption of miRNA expression was evaluated by Q-Q plots and Shapiro-Wilks test, and a log-normal distribution for all miR-200 family members was detected. The two-sample t test (or ANOVA model as appropriate) was used to assess comparisons of log-transformed miRNA expression among patient groups. Pearson correlation coefficient was estimated to assess the correlation between natural log of miRNA expression and continuous variables. Time-to-event analyses were performed by univariable and multivariable proportional hazards Cox regression models and risks were reported as Hazard Ratios (HR) along with their 95% Confidence Interval (95%CI). The individual overall follow-up time was defined as the time between the enrollment date (i.e. at the time of snap-frozen fresh tissue collection) and the occurrence of the death due to cancer (Cancer Specific Survival, CSS), whereas the individual time to tumor progression or distant metastasis was defined as the time between the enrollment date and the occurrence of the first disease progression (Disease Free Survival, DFS), or the first distant metastasis (Metastasis Free Survival, MFS). For patients who did not experience any event as above, their individual follow-up time was defined as the time between the enrollment date and the end of the observational period (i.e. last available examination).
Furthermore, annual mortality and disease progression rates were defined as the number of events divided by the number of person-years × 100. When each miRNA expression was considered as the main covariate into a univariable Cox model, HRs were reported with respect to patients groups defined by miRNA's median value (i.e. above vs. below the median). Moreover, multivariable Cox models were also performed with the inclusion of lymph node and surrogate molecular classification as further covariates. A weighted miRNA score was computed for each survival outcome at issue by the assessment of a multivariable Cox model, which included all miRNAs (natural log of expression) of the miR-200 family as main covariates. Weighted scores were calculated as the linear combination of the regression coefficients by the value of each miRNA (natural log of expression). The prognostic accuracy of each miRNA score was assessed at 7 (i.e. the median time horizon) and at 12 years (i.e. the maximum time horizon) by survival C-statistic, along with its 95% CIs derived following 1000 perturbationresampling 28 . A two-sided p value < 0.05 was considered for statistical significance. All statistical analyses were performed using SAS Release 9.4 (SAS Institute, Cary, NC, USA). Plots were performed using the R Foundation for Statistical Computing (version 3.6, packages: ggplot2, gridExtra).
Ethics approval and consent to participate. The study methodologies using human samples were car-

Data availability
The datasets analysed during the current study are available from the corresponding author on reasonable request.