Long non-coding RNA (lncRNA) transcriptional landscape in breast cancer identifies LINC01614 as non-favorable prognostic biomarker regulated by TGFβ and focal adhesion kinase (FAK) signaling

Long non-coding RNAs (lncRNAs) represent a class of epigenetic regulators implicated in a number of physiological and pathological conditions. Herein, we characterized the lncRNA expression portrait from 837 patients with invasive breast cancer and 105 normals from the cancer genome atlas (TCGA), which revealed eighteen upregulated and forty-six downregulated lncRNAs. Clustering analysis revealed distinct lncRNA profile for the triple negative breast cancer (TNBC) and normal breast tissue, while less separation was observed among the HER2+HR+, HER2+HR−, HER2−HR+ molecular subtypes. LINC01614, and LINC01235 correlated with worse disease-free survival (DFS), while the expression of lnc-LRR1–1, lnc-ODF3B-2, AC015712.5, lnc-LAMB3–1, lnc-SPP2–3, and lnc-MAP9–2 correlated with better DFS. The expression of LINC01235 correlated with worse overall survival (OS), while the expression of MIR205HG, lnc-MAP2K6–5, FGF14-AS2, lnc-SPP2–3 correlated with better OS. Highest expression of LINC01614 was observed in progesterone receptor (PR)+, Estrogen receptor (PR)+, and HER2+ tumors, while lowest expression was in TNBC. Concordantly, LINC01614 was highly expressed in the luminalB/HER2+ subtype from the SRP062132 dataset. Elevated expression of LINC01614 was subsequently validated in primary breast cancer tissue and breast cancer cell lines. Bioinformatics and pathway analyses on LINC01614high vs. LINC01614low BC tissue revealed TGFβ1 and ECM as the most activated networks in LINC01614high tumors. Concordantly, strong correlation between the expression of LINC01614 and COL10A1 (R2 = 0.6929), SPOCK1 (R2 = 0.5156), ZEB1 (R2 = 0.3372), TGFBI (R2 = 0.2978), TGFB1 (R2 = 0.1985), ACTA2 (R2 = 0.1833), and TAGLN (R2 = 0.1909) was observed. Mechanistically, exogenous TGFB1 induced LINC01614 expression in the BT474 triple positive BC model, while small-molecule inhibition of transforming growth factor β (TGFβ, SB-431542) or focal adhesion kinase (FAK, PF-573228) abrogated LINC01614 expression. Our data revealed the lncRNA transcription landscape in breast cancer and its molecular subtypes. Our data provide novel insight implicating LINC01614 as unfavorable prognostic marker in BC, its association with the HR+/HER2+ BC molecular subtype and its regulation by TGFβ and FAK signaling.


Introduction
Breast cancer (BC) is the most common cancer type in females worldwide 1 . The molecular mechanisms involved in BC pathogenesis have been thoroughly studied, leading to BC classification into three major subtypes: Luminal which is positive for estrogen (ER+) and progesterone LncRNAs are involved in regulating various biological processes, including tumor-suppressor and oncogenic pathways and may serve as prognostic markers in BC. A number of oncogenic (H19, SRA, LSINCT5, Zfas1, lncRNA-Smad7, LOC554202, HOTAIR, SOX2OT and FAL1) and tumor suppressor (GAS5 and XIST) lncRNAs have been identified in BC; however their regulation and the mechanisms of action for the majority of lncRNAs remains to be unraveled 6,7 .

Data analyses and bioinformatics
Long noncoding RNA (lncRNA) expression from 837 invasive breast carcinoma and 105 normal subjects were retrieved from The Atlas of Noncoding RNAs in Cancer (TANRIC; http://bioinformatics.mdanderson.org/main/ TANRIC:Overview) database. Expression data were subsequently imported into Altanalyze v.2.1.0 software as described before 8 . Hierarchical clustering was performed using cosine for columns and cosine for rows while principal component analysis was performed to assess the relatedness of samples. Gene expression for the same cohort was retrieved from the cBioPortal for Cancer Genomics (https://www.cbioportal.org/) database as we described before 9 .

RNA-Seq data analysis
Raw RNA sequencing data were retrieved from sequence read archive (SRA) database under accession no. SRP062132. Data were retrieved using the SRA toolkit version 2.9.2 as previously described 10 . Pair end reads were aligned to the hg19 human reference genome in CLC Genomics Workbench-12 (QIAGEN, Germany). The abundance of the expression of transcripts was measured as the score of TPM (Transcript Per Kilobase Million) mapped reads in CLC Genomics Workbench 12. Expression of LINC01614 in each molecular subtype was plotted using Graphpad Prism 6.0 software (Graphpad ® Software, San Diego, CA, USA)

Gene set enrichment and modeling of gene interactions networks
Upregulated genes in the LINC01614 high BC group were imported into the Ingenuity Pathways Analysis (IPA) software (Ingenuity Systems; www.ingenuity.com/) and were subjected to functional annotations and regulatory network analysis using upstream regulator analysis (URA), downstream effects analysis (DEA), mechanistic networks (MN) and causal network analysis (CNA) prediction algorithms. IPA uses precise algorithm to predict functional regulatory networks from gene expression data and provides a significance score for each network according to the fit of the network to the set of focus genes in the database. The p value is the negative log of P and represents the possibility that focus genes in the network being found together by chance 11 .

Statistical and survival analysis
Kaplan-Meir survival analysis and plotting were conducted using IBM SPSS statistics version 24 software. For survival analysis, patients were grouped into high or low based on LINC01614 log2 gene expression. The log-rank test was used to compare the outcome between expression groups. Statistical analyses to compare specific gene expression and graphing were performed using Graphpad Prism 6.0 software. Unpaired two-tailed t-test and p value of <0.05 was considered significant as we described before 12 .

LncRNA validation using qRT-PCR
Tumor tissue (TT) specimens from eight BC tissue and adjacent normal tissue (NT) were obtained from treatment-naive BC patients prior to surgery with a proper written informed consent. The study was approved by Qatar Biomedical Research Institute, Doha, Qatar (Protocol no. 2017-006). Total RNA was extracted from eight primary BC tissue, adjacent normal tissue, and from a panel of breast cancer cell lines using Norgon RNA/ DNA/Protein Purification Plus Kit (Norgen Biotek Corp, Ontario, Canada) as per the manufacturer's instructions. Expression level of LINC01614 was validated using SYBR Green-based quantitative reverse transcriptasepolymerase chain reaction (qRT-PCR). The total RNA (500 ng) was reverse transcribed into complementary DNA (cDNA) using a High Capacity cDNA Reverse Transcript Kit (catalogue No. 4368814; ABI) according to the manufacturer's protocol. Relative levels of lncRNA was determined using the cDNA as template in real-time PCR analysis using the Applied Biosystems QuantStudio 6/7 Flex Real-time PCR system. Primer sequences used in the current study were: LINC01614 F: 5′-AACCAAGA GCGAAGCCAAGA-3′; LINC01614 R: 5′-GCTTGGA-CACAGACCCTAGC-3′; GAPDH F: 5′-CTGGTAA AGTGGATATTGTTGCCAT-3′; and GAPDH R: 5′-TGGAATCATATTGGAACATGTAAACC-3′. The relative expression level was calculated using -ΔΔCT, GAPDH was used as an endogenous control.

Results
Expression profiling of lncRNAs from the TCGA BC dataset compared to normal breast tissue Expression data for 12727 lncRNAs from 837 patients with invasive BC and 105 normal breast tissue were retrieved from TANRIC database and were subjected to differential expression analysis, which identified 18 upregulated and 46 downregulated lncRNAs (≤2≥, FDR p ≤ 0.05; Table 1). Hierarchical clustering revealed three major clusters, where breast cancer samples clustered at both sides while normals clustered in the middle (Fig. 1a). Principle component analysis (PCA) also revealed clear separation of normal from breast cancer based on lncRNA expression (Fig. 1b). Interestingly, we observed majority of patients in the further right cluster (76%; Fig. 1a) to be of the TNBC molecular subtype (Fig. 1c).
LINC01614 expression correlates with HER2 + HR + invasive breast cancer molecular subtype LINC01614 was the most highly expressed lncRNA (5.0 FC, p (adj) = 3.7 × 10 -79 ) in breast cancer compared to normal tissue. We subsequently validated the expression of LINC01614 in a cohort of breast cancer patients, which revealed elevated expression of LINC01614 in BC compared to adjacent normal tissue (5.9 FC, p = 0.0007, Fig. 4a). Similarly, LINC01614 expression was detected in a panel of BC cell lines, where highest expression was observed in the BT474 triple positive BC cell line (Fig. 4b). We subsequently sought to determine if lncRNA expression can discriminate breast cancer with various molecular subtypes. To that end, the 837 BC samples were divided into HER2 + HR + , HER2 + HR − , HER2 − HR + , and TNBC and were subjected to the marker finder algorithm in Altanalyze v.   tissue. Clustering analysis revealed distinct molecular subtype for the TNBC and normal breast tissue, while less clear separation was observed among the three other molecular subtypes (HER2 + HR + , HER2 + HR − , HER2 − HR + , Figs. 4c, d). The lncRNA profile distinctive for each molecular subtype is shown in Table 2. Interestingly, the expression of LINC01614 was highest in the HR + HER2 + , while lowest expression was observed in TNBC molecular subtype (Fig. 4e-i). The expression of LINC01614 was subsequently validated in another cohort (SRP062132), which concordantly revealed highest expression in the luminal B/HER2 + molecular subtype (Fig. 4j). There was no difference in the expression of LINC01614 in relation to tumor stage (Fig. 4k).

LINC01614 elevated expression is associates with enhanced BC tumorigenenic molecular profile
To gain more insight into plausible role for LINC01614 in BC pathology, the 837 BC patients were divided into LINC01614 high and LINC01614 low according to the median LINC01614 expression. The upregulated genes in the LINC01614 high group were subsequently subjected to Ingenuity Pathway Analysis (IPA) and downstream effect analysis (DEA). Affected functional categories are illustrated as heat tree map, which clusters functionally associated categories together, therefore depicting a highlevel outlook of enriched functional categories. Data presented in Figs. 5a, b revealed a number of enriched functional categories including those involved in cell movement and invasion, while functional categories associated with cell death were under presented. Illustration of the cellular movement functional category is shown in Fig. 5c.

Discussion
In recent years, lncRNAs have emerged as key players in regulating cellular functions, differentiation and disease progression, including cancer, through epigenetics, chromatin remodeling, transcriptional and posttranscriptional regulation 5,14 .
While the number of annotated lncRNAs in the human genome has increase dramatically, functional characterization of lncRNAs and their utilization as disease biomarkers is begging to unfold. In current study, we analyzed the lncRNA transcriptome from the TCGA breast cancer dataset and performed thorough survival and bioinformatics analyses which revealed eighteen upregulated and forty-six downregulated lncRNAs in BC compared to normal breast tissue. Additionally, our data identified different lncRNA signatures associated with various BC molecular subtypes (HER2 + HR + , HER2 + HR − , HER2 − HR + , and TNBC) as well as those specific to normal breast tissue. Interestingly, our data revealed a distinct lncRNA cluster for the TNBC tumors, while such segregation was less evident among the other molecular subtypes (HER2 + HR + , Our analyses identified eleven lncRNAs (LINC01614, LINC01235, lnc-LRR1-1, lnc-ODF3B-2, AC015712.5, lnc-LAMB3-1, lnc-SPP2-3, lnc-MAP9-2, MIR205HG, lnc-MAP2K6-5 and FGF14-AS2) whose expression correlated with patient outcome. Among the identified lncRNAs, LINC01614 and LINC01235 correlated with worse DFS, while LINC01235 correlated with worse OS. Interestingly, LINC01235 was downregulated in in BC compared to normal tissue, while at the same time it predicted worse DFS and OS. It is plausible that due to the large heterogeneity of BC cases included the TCGA BC cohort, the expression pattern for LINC01235 did not correlate with survival data. Additionally, a previous study reported LINC01235 (also called FLJ41200; ENSG00000270547.1) as cancer-related genes that mapped telomeric and centromeric to CD274 (PDL-1) at 9p23 in small-cell lung carcinoma 16 , suggesting possible link between LINC01235 and immune regulation in cancer.
Interestingly, our data revealed over-expression of LINC01614 in BC compared to normal tissue and its elevated expression correlated with worse DFS. More indepth analysis revealed LINC01614 to be highly expressed in ER + (log2 exp = 2.1), in PR + (log2 exp = 2.2) and HER2 + (log2 exp = 2.623), while TNBC exhibited lowest expression (log2 exp = 1.1). Those data were further validated in a second cohort where highest expression was observed in the luminal B/HER2 + molecular subtype while lowest expression was observed in the HER2-TNBC molecular subtype. The expression of LINC01614 did not correlate with BC disease stage, suggesting alteration in LINC01614 expression as an early feature during BC development and progression. Concordant with our data, LINC01614 expression has been linked to lung adenocarcinoma 17 and the LINC01614-contaitng signature predicted OS and DFS in patients with esophageal squamous cell carcinoma 18 . Recently, LINC01614 has also been reported as one of the lncRNA associated survival of ER + BC patients 19 .
To gain more insight into plausible molecular mechanisms of LINC01614 expression and function, we dicatomized the TCGA BC cohort into LINC01614 high and LINC01614 low and subsequently retrieved and identified mRNA transcripts upregulated in the LINC01614 high group, which revealed 187 upregulated transcripts. Interestingly, IPA analysis on the upregulated gene list suggested strong correlation between LINC01614 expression and enriched functional categories associated with tumor cell movement and invasion. Nonetheless, LINC01614 high expression was most significantly associated with TGFβ signaling, suggesting possible induction of LINC01614 by TGFβ signaling. Additionally, LINC01614 high tumors exhibited high expression of several collagens, suggesting possible association between LINC01614 expression and enhanced ECM formation. It is noteworthy that ECM itself could be regulated by TGFB signaling 20 . Mechanistic investigation validated induction of LINC01614 by TGFβ, while it's expression was inhibited by small molecule inhibitor of TGFβ and FAK, suggesting its regulation by TGFβ and FAK signaling.
Our data also revealed elevated expression of LINC01614 in HER2 + BC tumors. Interestingly, we observed significant correlation between LINC01614 expression and HER2 mutation status in BC (supplementary figure 1). HER2 + (erbB2) represent 25 to 30 % of breast cancer patients and is elevated expression has been associated with more aggressive BC phenotype and shorter DFS and OS 21,22 . Additionally, activation of HER2 has been linked to Epithelial-to-mesenchymal transition (EMT), hence endowing cancer cells with a more aggressive and invasive phenotype 23,24 . Interestingly, HER2 and TGFβ signaling cooperated in the induction of cellular processes associated with tumorigenic development in immortalized mammary epithelial cell line 25 . Additionally, overexpression of HER2 activated the TGFβ/SMAD signaling pathway and induced SNAIL, SLUG and ZEB-1 expression and subsequent acquisition of mesenchymal phenotype 24 . These published reports are consistent with our current data linking LINC01614 to TGFβ signaling and HER2 + molecular subtype.

Conclusions
Our data revealed the lncRNA transcriptional landscape in breast cancer and identified the lcnRNA signatures (see figure on previous page) Fig. 6 Mechanistic network analysis predicts predominant activation of the TGFB pathway in LINC01614 high BC tumors. a Pie chart illustrating the top activated mechanistic networks in LINC01614 high BC tumors based on IPA analyses. Segment size corresponds to the activation Z score. b Illustration of the TGFβ1 signaling network. c Correlation between the expression of LINC01614 and the expression several members of the TGFβ family in BC tumors. d Extra cellular matrix functional enrichment in LINC01614 high BC tumors. Color intensity indicates their activation state. Effect of recombinant TGFβ (10 ng/ml), SB-431542 (ββ inhibitor, 10 µM), and PF-573228 (FAK inhibitor, 5 µM) on LINC01614 expression measured by qRT-PCR. Data are presented as mean ± S.E. from two independent experiments, n = 6 associated with each molecular subtype. Specifically, our data provide novel insight implicating LINC01614 as unfavorable prognostic marker in BC, and its association with the HR + /HER2 + BC molecular subtype and its regulation by TGFβ and FAK signaling.