Introduction

Breast cancer (BC) is the most common cancer type in females worldwide1. The molecular mechanisms involved in BC pathogenesis have been thoroughly studied, leading to BC classification into three major subtypes: Luminal which is positive for estrogen (ER+) and progesterone receptors (PR+), human epidermal growth factor receptor 2 (HER2+), and basal-like tumors, which lacks hormone receptor and HER2 expression, called triple-negative breast cancer (TNBC)2,3,4. While this classification has led to better stratification of BC, this disease is still associated with high mortality rate underscoring a need to develop novel molecular signature for better stratification and for prediction of disease outcome.

Although a number of gene-expression signatures have been reported as diagnostic and prognostic biomarkers in BC, the role of non-coding RNAs in this domain is just beginning to unfold. Genome wide transcriptome studies have revealed the existence of large number of long non-coding RNAs (≥200 nucleotides)5,6. Current GENCODE database (version 30) revealed the existence of approximately 16,193 lncRNAs and 14,706 pseudogenes in the human genome.

LncRNAs are involved in regulating various biological processes, including tumor-suppressor and oncogenic pathways and may serve as prognostic markers in BC. A number of oncogenic (H19, SRA, LSINCT5, Zfas1, lncRNA-Smad7, LOC554202, HOTAIR, SOX2OT and FAL1) and tumor suppressor (GAS5 and XIST) lncRNAs have been identified in BC; however their regulation and the mechanisms of action for the majority of lncRNAs remains to be unraveled6,7.

In this study, we characterized the lncRNA expression portrait from 837 patients with invasive BC and 105 normal breast tissues from the cancer genome atlas (TCGA) breast cancer dataset.

Our comparative analysis identified eighteen upregulated and forty-six downregulated lncRNAs in breast cancer compared to normal tissue. Interestingly, we identified eleven lncRNAs: LINC01614, LINC01235, lnc-LRR1–1, lnc-ODF3B-2, AC015712.5, lnc-LAMB3–1, lnc-SPP2–3, lnc-MAP9–2, MIR205HG, lnc-MAP2K6–5 and FGF14-AS2 to be associated with BC patient outcome. In particular, our data provided novel insight implicating LINC01614 as unfavorable prognostic marker in BC, its association with the HR+/HER2+ BC molecular subtype and its regulation by TGFβ and FAK signaling.

Materials and methods

Data analyses and bioinformatics

Long noncoding RNA (lncRNA) expression from 837 invasive breast carcinoma and 105 normal subjects were retrieved from The Atlas of Noncoding RNAs in Cancer (TANRIC; http://bioinformatics.mdanderson.org/main/TANRIC:Overview) database. Expression data were subsequently imported into Altanalyze v.2.1.0 software as described before8. Hierarchical clustering was performed using cosine for columns and cosine for rows while principal component analysis was performed to assess the relatedness of samples. Gene expression for the same cohort was retrieved from the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) database as we described before9.

RNA-Seq data analysis

Raw RNA sequencing data were retrieved from sequence read archive (SRA) database under accession no. SRP062132. Data were retrieved using the SRA toolkit version 2.9.2 as previously described10. Pair end reads were aligned to the hg19 human reference genome in CLC Genomics Workbench-12 (QIAGEN, Germany). The abundance of the expression of transcripts was measured as the score of TPM (Transcript Per Kilobase Million) mapped reads in CLC Genomics Workbench 12. Expression of LINC01614 in each molecular subtype was plotted using Graphpad Prism 6.0 software (Graphpad ® Software, San Diego, CA, USA)

Gene set enrichment and modeling of gene interactions networks

Upregulated genes in the LINC01614high BC group were imported into the Ingenuity Pathways Analysis (IPA) software (Ingenuity Systems; www.ingenuity.com/) and were subjected to functional annotations and regulatory network analysis using upstream regulator analysis (URA), downstream effects analysis (DEA), mechanistic networks (MN) and causal network analysis (CNA) prediction algorithms. IPA uses precise algorithm to predict functional regulatory networks from gene expression data and provides a significance score for each network according to the fit of the network to the set of focus genes in the database. The p value is the negative log of P and represents the possibility that focus genes in the network being found together by chance11.

Statistical and survival analysis

Kaplan-Meir survival analysis and plotting were conducted using IBM SPSS statistics version 24 software. For survival analysis, patients were grouped into high or low based on LINC01614 log2 gene expression. The log-rank test was used to compare the outcome between expression groups. Statistical analyses to compare specific gene expression and graphing were performed using Graphpad Prism 6.0 software. Unpaired two-tailed t-test and p value of <0.05 was considered significant as we described before12.

Cell culture, recombinant TGFβ treatment, and small molecule inhibition

Human breast cancer cell lines (BT474, T47D, MDAMB453, ZR751, MCF7, HCC70, HS578T, MDAMB468, BT549 and MDAMB231) were cultured in Dulbecco’s modified Eagle’s medium/RPMI 1640 supplemented with D-glucose 4500 mg/l, 4 mM L-glutamine and 110 mg/l sodium pyruvate, 10% fetal bovine serum and 1x penicillin–streptomycin (Pen-Strep) (all purchased from Gibco-Invitrogen, Waltham, MA, USA). The triple positive BC cell line (BT474) was treated with rhTGFβ (10 ng/ml, Peprotec, London, UK), TGFβ inhibitor (SB-431542; 10 μM, Selleckchem Inc., Houston, TX, USA), FAK inhibitor (PF-573228; 5 μM, Selleckchem Inc., Houston, TX, USA) and combination of rhTGFβ and TGFβ inhibitor. Pharmacological inhibition of TGF-β and FAK pathways were conducted as we previously described9,13. Briefly, 0.2 × 106 cells/well were cultured in 6 well plates (duplicate) and incubated for 48 hours and subsequently the expression of LINC01614 was measured using qRT-PCR. Assays were carried out with appropriate DMSO control.

LncRNA validation using qRT-PCR

Tumor tissue (TT) specimens from eight BC tissue and adjacent normal tissue (NT) were obtained from treatment-naive BC patients prior to surgery with a proper written informed consent. The study was approved by Qatar Biomedical Research Institute, Doha, Qatar (Protocol no. 2017–006). Total RNA was extracted from eight primary BC tissue, adjacent normal tissue, and from a panel of breast cancer cell lines using Norgon RNA/DNA/Protein Purification Plus Kit (Norgen Biotek Corp, Ontario, Canada) as per the manufacturer’s instructions. Expression level of LINC01614 was validated using SYBR Green-based quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR). The total RNA (500 ng) was reverse transcribed into complementary DNA (cDNA) using a High Capacity cDNA Reverse Transcript Kit (catalogue No. 4368814; ABI) according to the manufacturer’s protocol. Relative levels of lncRNA was determined using the cDNA as template in real-time PCR analysis using the Applied Biosystems QuantStudio 6/7 Flex Real-time PCR system. Primer sequences used in the current study were: LINC01614 F: 5′-AACCAAGAGCGAAGCCAAGA-3′; LINC01614 R: 5′-GCTTGGACACAGACCCTAGC-3′; GAPDH F: 5′-CTGGTAAAGTGGATATTGTTGCCAT-3′; and GAPDH R: 5′-TGGAATCATATTGGAACATGTAAACC-3′. The relative expression level was calculated using –ΔΔCT, GAPDH was used as an endogenous control.

Results

Expression profiling of lncRNAs from the TCGA BC dataset compared to normal breast tissue

Expression data for 12727 lncRNAs from 837 patients with invasive BC and 105 normal breast tissue were retrieved from TANRIC database and were subjected to differential expression analysis, which identified 18 upregulated and 46 downregulated lncRNAs (≤2≥, FDR p ≤ 0.05; Table 1). Hierarchical clustering revealed three major clusters, where breast cancer samples clustered at both sides while normals clustered in the middle (Fig. 1a). Principle component analysis (PCA) also revealed clear separation of normal from breast cancer based on lncRNA expression (Fig. 1b). Interestingly, we observed majority of patients in the further right cluster (76%; Fig. 1a) to be of the TNBC molecular subtype (Fig. 1c).

Table 1 Differentially expressed lncRNAs in breast cancer vs normal
Fig. 1: lncRNA transcriptional portrait in breast cancer compared to normal tissue.
figure 1

a Hierarchical clustering of breast cancer (n = 837) and normal (n = 105) breast tissue based on differentially expressed lncRNA transcripts. Each column represents one sample and each row represents a transcript. Expression level of each transcript (log2) in a single sample is depicted according to the color scale. b Principal component analysis (PCA) for the lncRNA transcriptome of breast cancer and normal breast tissue. c Venn diagram depicting the overlap between tumors samples from tumor cluster 2 (T-cluster 2) on the further right of the heatmap in panel (a) and the indicated breast cancer molecular subtype

Identification of lncRNAs associated with overall survival (OS) and disease-free survival (DFS) in the TCGA BC dataset

The differentially expressed lncRNAs from the profiling study were subsequently subjected to Kaplan-Meir survival analysis. Our data revealed LINC01614 and LINC01235 to predict worse DFS of BC patients (Fig. 2b and d; log-rank test = 0.07 and 0.08, respectively). On the other hand, lnc-LRR1–1, lnc-ODF3B-2, AC015712.5, lnc-LAMB3–1, lnc-SPP2–3, and lnc-MAP9–2 were associated with better DFS (Fig. 2a, c, e–h). Interestingly, the expression of LINC01235 correlated with worse DFS and worse OS (Figs. 2d and 3b), while the expression of MIR205HG, lnc-MAP2K6–5, FGF14-AS2, lnc-SPP2–3 correlated with better OS (Fig. 3a, c–e). Taken together, our data revealed LINC01614 as the only upregulated lncRNAs in BC and associated with worse DFS in the TCGA dataset.

Fig. 2: Disease-free survival (DFS) of breast cancer patients based on lncRNA expression.
figure 2

Kaplan-Meir DFS analysis for lnc-LRR1–1 (a), LINC01614 (b), lnc-ODF3B-2 (c), LINC01235 (d), AC015712.5 (e), lnc-LAMB3–1 (f), lnc-SPP2–3 (g), and lnc-MAP9–2 (h) in the TCGA BC cohort. Significance was calculated using the log-rank test. p values are indicated on each plot

Fig. 3: Overall survival (OS) of breast cancer patients based on lncRNA expression.
figure 3

Kaplan-Meir OS analysis for MIR205HG (a), LINC01235 (b), lnc-MAP2K6–5 (c), FGF14-AS2 (d), and lnc-SPP2–3 (e) in the TCGA BC cohort. Significance was calculated using the log-rank test. p values are indicated on each plot

LINC01614 expression correlates with HER2+HR+ invasive breast cancer molecular subtype

LINC01614 was the most highly expressed lncRNA (5.0 FC, p (adj) = 3.7 × 10–79) in breast cancer compared to normal tissue. We subsequently validated the expression of LINC01614 in a cohort of breast cancer patients, which revealed elevated expression of LINC01614 in BC compared to adjacent normal tissue (5.9 FC, p = 0.0007, Fig. 4a). Similarly, LINC01614 expression was detected in a panel of BC cell lines, where highest expression was observed in the BT474 triple positive BC cell line (Fig. 4b). We subsequently sought to determine if lncRNA expression can discriminate breast cancer with various molecular subtypes. To that end, the 837 BC samples were divided into HER2+HR+, HER2+HR, HER2HR+, and TNBC and were subjected to the marker finder algorithm in Altanalyze v.2.1.0 compared to 115 normal breast tissue. Clustering analysis revealed distinct molecular subtype for the TNBC and normal breast tissue, while less clear separation was observed among the three other molecular subtypes (HER2+HR+, HER2+HR, HER2HR+, Figs. 4c, d). The lncRNA profile distinctive for each molecular subtype is shown in Table 2. Interestingly, the expression of LINC01614 was highest in the HR+HER2+, while lowest expression was observed in TNBC molecular subtype (Fig. 4e–i). The expression of LINC01614 was subsequently validated in another cohort (SRP062132), which concordantly revealed highest expression in the luminal B/HER2+ molecular subtype (Fig. 4j). There was no difference in the expression of LINC01614 in relation to tumor stage (Fig. 4k).

Fig. 4: Correlation between the expression of LINC01614 and BC molecule subtype.
figure 4

a Expression of LINC01614 in eight BC patients (red box) and adjacent normal tissue (blue box) using qRT-PCR. Data are presented as mean ± S.E., n = 8 for each group. b Expression of LINC01614 in a panel of BC cell lines. Data are presented as normalized delta CT expression value. (c) Heat map clustering of BC (n = 837) and normal (n = 105) breast tissue based on molecular subtypes of BC (HER2 + HR+, HER2+HR−, HER2-HR+, TNBC, and normal). Each row represents expression level of the indicated lncRNA (log2). Expression level of transcriptomes in a single sample is depicted according to the color scale (blue to yellow). (d) Principal component analysis (PCA) for the indicated BC molecular subtypes and normal breast tissue. Expression of LINC01614 in relation to estrogen receptor (ER, e), progesterone receptor (PR, f), and HER2 (g) status. h Expression of LINC01614 in TNBC vs non-TNBC BC patients. i Expression of LINC01614 in the indicated molecular subtype subgroups. j Expression of LINC01614 in the SRP062132 dataset in relation to the indicated molecular subtype. k Expression of LINC01614 in relation to the tumor stages

Table 2 lncRNAs enriched according to breast cancer molecular subtype

LINC01614 elevated expression is associates with enhanced BC tumorigenenic molecular profile

To gain more insight into plausible role for LINC01614 in BC pathology, the 837 BC patients were divided into LINC01614high and LINC01614low according to the median LINC01614 expression. The upregulated genes in the LINC01614high group were subsequently subjected to Ingenuity Pathway Analysis (IPA) and downstream effect analysis (DEA). Affected functional categories are illustrated as heat tree map, which clusters functionally associated categories together, therefore depicting a high-level outlook of enriched functional categories. Data presented in Figs. 5a, b revealed a number of enriched functional categories including those involved in cell movement and invasion, while functional categories associated with cell death were under presented. Illustration of the cellular movement functional category is shown in Fig. 5c.

Fig. 5: Enrichment in tumor cell migration and invasion functional categories in LINC01614high BC patients.
figure 5

a Disease and function heat map depicting enrichment in the indicated functional and disease categories in the upregulated transcripts in LINC01614high vs LINC01614low BC patients based on IPA analysis. b Bar-graph depicting the most significantly affected factional categories in LINC01614high vs LINC01614low BC patients. c Heat map-illustrating enrichment in cellular movement functional category

Mechanistic network analysis predicts activation of TGFβ1 and ECM pathways in LINC01614high BC tissue

Upstream regulator analysis (URA) on the upregulated genes in LINC01614high revealed significant enrichment for a number of networks including TGFβ1, lipopolysaccharide, TGFβ, SP1, bleomycin, SMAD3, WNT3A, EDN1, dihydrotestosterone, and AGT (Fig. 6a). Highest enrichment was for the TGFβ1 network (Z score = 5.6; Fig. 6a). Mechanistic network analysis predicted TGFβ1 to directly activate the SMAD2, NFKB1A and SP1 through TGFβ (direct activation) and TNF (inconsistent sate), and to inhibit MYC through FGF2 (direct activation) and inhibit SMAD7 through TGFβ with higher confidence level (Fig. 6b). Concordantly, LINC01614 expression demonstrated significant positive correlation with various members of the TGFβ signaling pathways (COL10A1 (R2 = 0.7), SPOCK1 (R2 = 0.5156), ZEB1 (R2 = 0.3372), TGFBI (R2 = 0.2978), TGFB1 (R2 = 0.1985), ACTA2 (R2 = 0.1833), and TAGLN (R2 = 0.1909)) in the TCGA BC cohort (Fig. 6c). Moreover, we observed several collagen family members to be upregulated in LINC01614high BC, suggesting enhanced extracellular matrix (ECM) formation. Illustration of the ECM network in LINC01614high BC mapped by IPA is shown in Fig. 6d. The color shade intensity of the node correlates with the expression level of the indicated genes. Therefore, our molecular and network analyses revealed strong correlation between LINC01614 expression, TGFβ and ECM signaling. Mechanistically, recombinant TGFβ1 induced LINC01614 expression, while pharmacological inhibition of TGFβ signaling (using SB-431542) and FAK (using PF-573228) inhibited LINC01614 expression in BC cells (Fig. 6e), thus implication TGFβ and FAK signaling in regulating LINC01614 expression in BC cells.

Fig. 6: Mechanistic network analysis predicts predominant activation of the TGFB pathway in LINC01614high BC tumors.
figure 6

a Pie chart illustrating the top activated mechanistic networks in LINC01614high BC tumors based on IPA analyses. Segment size corresponds to the activation Z score. b Illustration of the TGFβ1 signaling network. c Correlation between the expression of LINC01614 and the expression several members of the TGFβ family in BC tumors. d Extra cellular matrix functional enrichment in LINC01614high BC tumors. Color intensity indicates their activation state. Effect of recombinant TGFβ (10 ng/ml), SB-431542 (ββ inhibitor, 10 µM), and PF-573228 (FAK inhibitor, 5 µM) on LINC01614 expression measured by qRT-PCR. Data are presented as mean ± S.E. from two independent experiments, n = 6

Discussion

In recent years, lncRNAs have emerged as key players in regulating cellular functions, differentiation and disease progression, including cancer, through epigenetics, chromatin remodeling, transcriptional and post-transcriptional regulation5,14.

While the number of annotated lncRNAs in the human genome has increase dramatically, functional characterization of lncRNAs and their utilization as disease biomarkers is begging to unfold. In current study, we analyzed the lncRNA transcriptome from the TCGA breast cancer dataset and performed thorough survival and bioinformatics analyses which revealed eighteen upregulated and forty-six downregulated lncRNAs in BC compared to normal breast tissue. Additionally, our data identified different lncRNA signatures associated with various BC molecular subtypes (HER2+HR+, HER2+HR, HER2HR+, and TNBC) as well as those specific to normal breast tissue. Interestingly, our data revealed a distinct lncRNA cluster for the TNBC tumors, while such segregation was less evident among the other molecular subtypes (HER2+HR+, HER2+HR, HER2HR+). This could be attributed to the tissue of origin for various BC molecular subtypes, where TNBC originates from ductal cells, while the HR+ and HER2+ originate from luminal cells15.

Our analyses identified eleven lncRNAs (LINC01614, LINC01235, lnc-LRR1–1, lnc-ODF3B-2, AC015712.5, lnc-LAMB3–1, lnc-SPP2–3, lnc-MAP9–2, MIR205HG, lnc-MAP2K6–5 and FGF14-AS2) whose expression correlated with patient outcome. Among the identified lncRNAs, LINC01614 and LINC01235 correlated with worse DFS, while LINC01235 correlated with worse OS. Interestingly, LINC01235 was downregulated in in BC compared to normal tissue, while at the same time it predicted worse DFS and OS. It is plausible that due to the large heterogeneity of BC cases included the TCGA BC cohort, the expression pattern for LINC01235 did not correlate with survival data. Additionally, a previous study reported LINC01235 (also called FLJ41200; ENSG00000270547.1) as cancer-related genes that mapped telomeric and centromeric to CD274 (PDL-1) at 9p23 in small-cell lung carcinoma16, suggesting possible link between LINC01235 and immune regulation in cancer.

Interestingly, our data revealed over-expression of LINC01614 in BC compared to normal tissue and its elevated expression correlated with worse DFS. More in-depth analysis revealed LINC01614 to be highly expressed in ER+ (log2 exp = 2.1), in PR+ (log2 exp = 2.2) and HER2+ (log2 exp = 2.623), while TNBC exhibited lowest expression (log2 exp = 1.1). Those data were further validated in a second cohort where highest expression was observed in the luminal B/HER2+ molecular subtype while lowest expression was observed in the HER2-TNBC molecular subtype. The expression of LINC01614 did not correlate with BC disease stage, suggesting alteration in LINC01614 expression as an early feature during BC development and progression. Concordant with our data, LINC01614 expression has been linked to lung adenocarcinoma17 and the LINC01614-contaitng signature predicted OS and DFS in patients with esophageal squamous cell carcinoma18. Recently, LINC01614 has also been reported as one of the lncRNA associated survival of ER + BC patients19.

To gain more insight into plausible molecular mechanisms of LINC01614 expression and function, we dicatomized the TCGA BC cohort into LINC01614high and LINC01614low and subsequently retrieved and identified mRNA transcripts upregulated in the LINC01614high group, which revealed 187 upregulated transcripts. Interestingly, IPA analysis on the upregulated gene list suggested strong correlation between LINC01614 expression and enriched functional categories associated with tumor cell movement and invasion. Nonetheless, LINC01614high expression was most significantly associated with TGFβ signaling, suggesting possible induction of LINC01614 by TGFβ signaling. Additionally, LINC01614high tumors exhibited high expression of several collagens, suggesting possible association between LINC01614 expression and enhanced ECM formation. It is noteworthy that ECM itself could be regulated by TGFB signaling20. Mechanistic investigation validated induction of LINC01614 by TGFβ, while it’s expression was inhibited by small molecule inhibitor of TGFβ and FAK, suggesting its regulation by TGFβ and FAK signaling.

Our data also revealed elevated expression of LINC01614 in HER2+ BC tumors. Interestingly, we observed significant correlation between LINC01614 expression and HER2 mutation status in BC (supplementary figure 1). HER2+ (erbB2) represent 25 to 30 % of breast cancer patients and is elevated expression has been associated with more aggressive BC phenotype and shorter DFS and OS21,22. Additionally, activation of HER2 has been linked to Epithelial-to-mesenchymal transition (EMT), hence endowing cancer cells with a more aggressive and invasive phenotype23,24. Interestingly, HER2 and TGFβ signaling cooperated in the induction of cellular processes associated with tumorigenic development in immortalized mammary epithelial cell line25. Additionally, overexpression of HER2 activated the TGFβ/SMAD signaling pathway and induced SNAIL, SLUG and ZEB-1 expression and subsequent acquisition of mesenchymal phenotype24. These published reports are consistent with our current data linking LINC01614 to TGFβ signaling and HER2 + molecular subtype.

Conclusions

Our data revealed the lncRNA transcriptional landscape in breast cancer and identified the lcnRNA signatures associated with each molecular subtype. Specifically, our data provide novel insight implicating LINC01614 as unfavorable prognostic marker in BC, and its association with the HR+/HER2+ BC molecular subtype and its regulation by TGFβ and FAK signaling.