lncRNA profile study reveals the mRNAs and lncRNAs associated with docetaxel resistance in breast cancer cells

Resistance to adjuvant systemic treatment, including taxanes (docetaxel and paclitaxel) is a major clinical problem for breast cancer patients. lncRNAs (long non-coding RNAs) are non-coding transcripts, which have recently emerged as important players in a variety of biological processes, including cancer development and chemotherapy resistance. However, the contribution of lncRNAs to docetaxel resistance in breast cancer and the relationship between lncRNAs and taxane-resistance genes are still unclear. Here, we performed comprehensive RNA sequencing and analyses on two docetaxel-resistant breast cancer cell lines (MCF7-RES and MDA-RES) and their docetaxel-sensitive parental cell lines. We identified protein coding genes and pathways that may contribute to docetaxel resistance. More importantly, we identified lncRNAs that were consistently up-regulated or down-regulated in both the MCF7-RES and MDA-RES cells. The co-expression network and location analyses pinpointed four overexpressed lncRNAs located within or near the ABCB1 (ATP-binding cassette subfamily B member 1) locus, which might up-regulate the expression of ABCB1. We also identified the lncRNA EPB41L4A-AS2 (EPB41L4A Antisense RNA 2) as a potential biomarker for docetaxel sensitivity. These findings have improved our understanding of the mechanisms underlying docetaxel resistance in breast cancer and have provided potential biomarkers to predict the response to docetaxel in breast cancer patients.

Breast cancer is the most frequently diagnosed cancer among females and the leading cause of cancer death among women worldwide 1 . Multiple treatment modalities are used in breast cancer, such as surgery, radiation therapy, chemotherapy and target therapy. Taxanes, including docetaxel and paclitaxel, are among the most commonly used chemotherapeutic agents to treat breast cancer 2 . However, patients receiving a taxane treatment may become resistant to taxanes, thus resulting in recurrence and metastatic disease.
To reduce patient suffering from disease recurrence caused by taxane resistance, researchers focus on the molecular mechanisms and predictive biomarkers of taxane resistance at multiple levels. Several genes are associated with taxane resistance. The differential expression of the ABCB1 gene is one of the most studied putative biomarkers in taxane-resistant cancers 3,4 . Pgp (permeability glycoprotein), encoded by the ABCB1 gene, has been reported to act as an ATP-dependent efflux pump and reduce taxane concentration by expelling the drug 5 . Other genes, such as ABCG2 (ATP binding cassette subfamily G member 2) 6 ABCB4 (ATP binding cassette subfamily B member 4) 7 are also involved in the process of taxane resistance. However, the underlying mechanism of taxane resistance in breast cancer is still not fully elucidated, and the regulators of the taxane-resistant genes remain unknown. As a result, there are presently no predictive biomarkers for taxanes in clinical use.
Long non-coding RNAs (lncRNAs) are defined as RNAs longer than 200 nucleotides, with little potential in protein coding. Many lncRNAs are described to influence mRNA generation and expression 8  Transcript Antisense RNA) is reported to contribute to cisplatin resistance in human lung adenocarcinoma cells via down-regulating p21 WAF1/CIP1 expression 9 . Another study found that the lncRNA MRUL (multidrug resistance related and up-regulated lncRNA) promotes ABCB1 expression in multidrug-resistant gastric cancer 10 . More recently, data have shown that two lncRNAs, ROR (regulator of reprogramming) and CCAT1 (colon cancer-associated transcript-1), regulate docetaxel resistance in lung adenocarcinoma 11,12 . Furthermore, from a transcriptome microarray study, the lncRNAs HIF1A-AS2 (HIF1A Antisense RNA 2) and AK124454 were shown to promote cell proliferation and invasion in TNBC (triple-negative breast cancers) cells and contribute to paclitaxel resistance 13 . However, the contribution of lncRNAs to docetaxel resistance in breast cancer is still unclear.
In this study, we carried out whole transcriptome sequencing in two cell lines, MCF-7 and MDA-MB-231, and in their docetaxel-resistant sublines, MCF7-RES and MDA-RES. We identified significantly differentially expressed (SDE) mRNAs and lncRNAs between the parental and resistant sublines, and we also uncovered the potential relationship between the SDE mRNAs and lncRNAs. Compared with previous studies, we discovered several novel genes in addition to ABCB1 which might contribute to the taxane-resistant phenotype of breast cancers. More important, we identified a group of lncRNAs that might potentially regulate taxane sensitivity by controlling the expression of chemotherapy-resistant genes.

Results
Sequencing results and quality control. By performing Illumina-based RNA-Seq sequencing, a total of 1,825,984,984 raw reads were produced from the 12 RNA samples (3 independent samples from each cell line (Table 1)). After quality control, 1,750,124,272 clean reads (157.5 Gb) were obtained.
The proportion of clean reads among the raw reads of the 12 samples ranged from 94.32% to 96.95%. The proportion of clean reads with a Phred quality value greater than 30 among the samples ranged from 88.38% to 92.96%. The GC (guanine and cytosine nucleotides) content of the clean reads of the 12 samples ranged from 47.26% to 49.89%. In total, 82.94% to 91.33% of the clean reads were aligned against the human reference genome ( Table 1).
Overview of mRNA expression and the significantly differentially expressed mRNAs between the parental and docetaxel-resistant cell groups. A total of 13,714 mRNAs were detected. More than 97.81% of the mRNAs was detected simultaneously among 3 passages of the same cell line (Sup Fig. 1A). The FPKM (fragments per kilobase million) values of the 13,714 mRNAs from 12 samples were centralized and normalized for the PCA (Principal component analysis). The results show that PC1 (Principal component 1) accounted for 52.28% of the total variance and could separate the MDA cells from the MCF7 cells; PC2 (Principal component 2) accounted for 13.91% of the total variance and separated the parental and docetaxel-resistant MDA cells. However, parental and docetaxel-resistant MCF7 cells clustered together (Sup Fig. 1B).
Differentially expressed mRNAs between the parental and docetaxel-resistant cell groups were calculated by DESeq in the two cell lines. The criteria of |Log 2 FC| > 1 (Log 2 (fold change)), FDR < 0.1 (false discovery rate) was used to select the SDE mRNAs. We identified 1630 and 957 SDE mRNAs in the MDA-RES and MCF7-RES groups by comparing them to their parental groups, respectively, as shown in the volcano figure (Fig. 1A).
In the MDA-RES cells, 782 mRNAs were significantly up-regulated, and 848 mRNAs were significantly down-regulated. In the MCF7-RES cells, 627 mRNAs were significantly up-regulated, and 330 mRNAs were significantly down-regulated.  Number of  raw reads  155,368,324 150,086,362 154,603,830 151,890,702 146,565,380 150,531,888 154,601,446 150,960,982 155,560,554 150,067,038 154,602,  The most studied MDR (multidrug resistance) related gene, ABCB1 was dramatically up-regulated in both the MDA-RES and MCF7-RES cells (Fig. 1A), which was consistent with a previous report 4 . The top ten significantly up-or down-regulated genes in the MDA-RES or MCF7-RES cells are shown in Sup Table 1.
By analyzing the list of significantly up-regulated and down-regulated mRNAs in the MDA-RES and MCF7-RES cells, we found 80 mRNAs that were consistently up-regulated and 44 mRNAs that were consistently down-regulated in both of the cells (Fig. 1B, Sup Table 2).
Using these 124 consistent SDE mRNAs, the 12 samples were clustered into two groups (MDA-MB-231 and MCF-7) (Fig. 1C), which indicated that the mRNA diversity between the MCF-7 cells and MDA-MB-231 cells was greater than that between the docetaxel-resistant and parental cells.
GO (Gene ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analyses of the significantly differentially expressed genes. A KEGG pathway analysis was performed in each cell line. In the MCF7-RES cells, the three most enriched KEGG pathways associated with the SDE mRNAs were the metabolic pathways, axon guidance, and neuroactive ligand-receptor interaction. However, other pathways, such as the PI3K-Akt signaling pathway, Wnt signaling pathway, ECM-receptor interaction and ABC transporters, were also significantly enriched (Sup Fig. 2).
In the MDA-RES cells, the PI3K-Akt signaling pathway, cytokine-cytokine receptor interaction, and pathways in cancer were the top three pathways, followed by the metabolic pathway, AGE-RAGE signaling pathway, and the cell adhesion molecules (CAMs) pathway. Furthermore, ABC transporters, the MAPK signaling pathway and ECM-receptor interaction were also significantly enriched (Sup Fig. 3).
We also performed the KEGG pathway analysis on the 124 consistent SDE mRNAs shared by the MCF7 and MDA cells. As shown in Fig. 1D, 14 signaling pathways were significantly enriched (Q value < 0.05). The most significantly enriched pathway was the cGMP-PKG signaling pathway. However, the ABC transporters, PI3K-Akt signaling pathway, AGE-RAGE signaling pathway in diabetic complications, and HIF-1 signaling pathway were also significantly enriched.
A GO analysis for the consistent SDE mRNAs was also performed to identify the function of the coding transcripts. With regard to biological processes, the most enriched GO term associated with the consistent SDE mRNAs was cellular process. With regard to the cellular component, the most enriched GO terms were cell and cell part. Moreover, the most enriched GO terms associated with the consistent SDE mRNAs for molecular function were binding and protein binding (Fig. 1D).
Overview of lncRNA expression and the significantly differentially expressed lncRNAs between the parental and docetaxel-resistant cell groups. NONCODE 2016 was used to annotate the lncR-NAs. The lncRNAs were classified into 4 subgroups as antisense, exonic, linc (long intergenic non-coding), and sense no exonic lncRNAs, according to their location and their relationship with nearby genes. The classification of the lncRNAs in the NONCODE database was provided online (http://www.noncode.org/datadownload/ NONCODE2016_human_cc.tgz). We found that lincRNAs and antisense lncRNAs were the two most abundant subtypes in all the cell lines ( Fig. 2A).
A total of 39,363 lncRNAs were identified in the 12 RNA samples. More than 81.55% of the lncRNAs were detected simultaneously among three independent passages of the same cell line (Sup Fig. 3A). The differentially expressed lncRNAs between the parental and docetaxel-resistant cell groups were calculated by DESeq. We iden-  Table 3).
By performing complete-linkage clustering with these consistent SDE lncRNAs, the 12 samples were clustered into two groups (parental and resistant) perfectly (Fig. 2D).

lncRNA-mRNA co-expression network construction and docetaxel-resistant module detection.
It is well documented that lncRNAs function as regulators of target mRNAs 8,14 . Additionally, lncRNAs and their target mRNAs usually present similar expression patterns [15][16][17] . To predict the target mRNAs of the lncRNAs, we used the WGCNA (weighted gene co-expression network analysis) 18 R software package to detect similar expression patterns between the SDE lncRNAs and mRNAs.
All the SDE genes (72 lncRNAs and 124 mRNAs) were subjected to the WGCNA analysis. We identified 4 groups of co-expressed genes, termed 'modules' (Fig. 3A). Each module was labeled with a unique color underneath the cluster tree 19 .
Inside a given module, the expression profiles of the transcripts can be summarized by the 'module eigengene' , which is the weighted average of the module gene expression profiles 19 . To identify the module associate with docetaxel-resistant status, we regressed each of the 4 module eigengenes on docetaxel-resistant status. We found that the Turquoise and Gray modules were most significantly associated with docetaxel-resistant status (Fig. 3B). Of these 2 associated modules, the Turquoise was positively correlated with docetaxel resistance, meaning that the genes in this module are predominantly up-regulated in docetaxel-resistant cells. In contrast, the Gray module was negatively correlated with a docetaxel-resistant status, meaning that most genes in this module were down-regulated in the docetaxel-resistant cells. To identify the genes that are central and highly connected in the docetaxel-resistance-associated modules, we selected Hub genes with |GS| > 0.4 (gene significance) and |MM| > 0.8 (module membership) in the two modules and visualized these genes by Cytoscape (Fig. 3C).
In the Turquoise module, we identified 59 hub genes, including 40 protein coding genes and 19 lncRNAs (Sup Table 4). Several well-known chemotherapy-resistance-associated genes including ABCB1 were found in the protein coding genes list. We found that 39 genes, including 22 protein coding and 17 non-coding, were linked to ABCB1 (Table 2). Among the 22 coding genes linked to ABCB1, we found that many of the genes are implicated in drug or chemotherapy resistance, such as ABCB4, ADAM22 (disintegrin and metalloproteinase domain-containing protein 22), GPNMB (glycoprotein nmb), COL21A1 (collagen type XXI alpha 1 chain), and ITGB4 (integrin subunit beta 4) ( Table 2, marked with asterisks). Strikingly, we found a list of lncRNAs closely linked to ABCB1 in the Turquoise module ( Table 2), indicating that these lncRNAs may function as regulators of ABCB1.
In the Gray module, we identified 29 hub genes, including 22 protein coding genes and 7 lncRNAs (Sup Table 5). We found that some of the protein coding genes are reported to be associated with chemotherapy sensitivity. The over-expression of TSPYL5 (testis-specific Y-encoded-like protein 5) and TFAP2A (transcription factor AP-2 alpha) are reported to increase the cell sensitivity to chemotherapy in different cancers [20][21][22][23] . Moreover, in the Gray module, we also identified several lncRNAs that were consistently down-regulated in the docetaxel-resistant MCF7-RES and MDA-RES cells. Notably, an antisense lncRNA, EPB41L4A-AS2 (NONHSAG041250.2), which is reported to inhibit tumor proliferation and is associated with favorable prognoses in breast cancer 24 , was identified as a hub gene in this module.
Location and functional analysis of the important lncRNAs and mRNAs in the resistance-associated modules. To uncover the relationship of the lncRNAs and the most studied MDR-related gene, ABCB1, we analyzed the genomic locations of the lncRNAs that were linked with ABCB1 in the Turquoise module. Our results showed that several lncRNAs were localized at the regions overlapping or near the ABCB1 gene locus. The locations of lncRNAs NONHSAG048134.2, NONHSAG048135.2 and NONHSAG096479.1 overlapped with ABCB1. NONHSAG048135.2 was transcribed in the same direction and was localized to the exons of the  ABCB1 gene, indicating that it was an exonic lncRNA, whereas NONHSAG048134.2 and NONHSAG096479.1 were located in the antisense strand to the ABCB1 gene locus, indicating that these two lncRNAs were antisense lncRNAs for the gene ABCB1 (Fig. 4A). Another lncRNA, NONHSAG048143.2, was located 480 kb upstream of ABCB1 (Fig. 4A). The correlation analysis of the expression data of these four lncRNAs and ABCB1 mRNA revealed a strong positive correlation between the expression of these lncRNAs and the ABCB1 gene (Fig. 4B), suggesting that there was regulatory relationship between these lncRNAs and the ABCB1 gene.
We downloaded the ChIP-seq (chromatin immunoprecipitation sequencing) datasets for MCF-7 and MDA-MB-231 breast cancer cells from the GEO (Gene expression omnibus) and ENCODE (encyclopedia of DNA elements) databases. By calculating the H3K4me1/H3K4me3 ratios at each TSS (transcriptional start site) of the lncRNAs near ABCB1, we found that the lncRNA NONHSAG048143.2, upstream of ABCB1, was associated with an enhancer-like chromatin state (H3K4me1/H3K4me3 ratio > 1.2) in both the MCF-7 and MDA-MB-231 cells (Fig. 4C). This indicated that NONHSAG048143.2 might function as an eRNA (enhancer lncRNA) 25 in the regulation of the ABCB1 gene.
The other lncRNA, EPB41L4A-AS2, which had been reported to be a potential tumor suppressor and prognostic biomarker was located head to head with the EPB41L4A gene in the antisense orientation (Fig. 5A). The correlation analysis on the expression data of this lncRNA and EPB41L4A and ABCB1 mRNAs revealed strong positive correlation between the expression of EPB41L4A-AS2 and EPB41L4A (Pearson correlation test, To validate the relationship between EPB41L4A-AS2 expression and chemotherapy resistance, we downloaded the primary microarray data GSE21997, which profile the RNA expression in primary breast cancers before neoadjuvant chemotherapy (doxorubicin and docetaxel) 26 , from the GEO repository database. And the residual breast cancer burden (RCB) index was adopted to classified the breast cancer patients as previous report 27 . Patients of RCB-III class were considered to be therapeutic resistant. By analyzing the microarray data, we found that the expression of EPB41L4A-AS2 was significantly lower (Student's t-test, P = 0.042, Fig. 5D) in the resistant group than the non-resistant group.

Discussion
Taxanes are a class of anti-microtubule agents that have been demonstrated to be more efficient than anthracycline-based regimens and are listed as the first-line regimens for breast cancer 28,29 . However, patients receiving taxane treatment usually develop resistance to taxanes, thus resulting in recurrence and metastasis 30 .
To uncover the mechanisms underlying taxane resistance and find novel potential predictive biomarkers for clinical use, many studies have been carried out in both clinical cancer samples and resistant cell model systems 4,31-38 . Through this research, a number of genes were identified to be associated with taxane resistance, and the differential expression of the ABCB1 gene was extensively investigated and identified as one of the most credible biomarkers in chemotherapy-resistant cancers 3,4,33,34,36,37 . However, these previous studies were mainly based on microarray hybridization technologies, and therefore, limited concordant differentially expressed genes are shared between the different studies 3,4,33,34,36,37 , and the common denominator associated with taxane resistance remains largely unknown. Moreover, the underlying mechanism of taxane resistance in breast cancer and the regulators of taxane-resistant genes still needs to be uncovered. lncRNAs regulate gene expression at the transcriptional and epigenetic levels 39 . Recent studies have shown that lncRNAs are also implicated in tumor chemotherapy resistance [9][10][11][12][13] . However, the contribution of lncRNAs to docetaxel resistance in breast cancer is still unclear.
Here, we used RNA-Seq technology (Illumina platform), which does not rely on pre-designed complement probes and shows a higher sensitivity in the detection of genes with very low expression and a higher accuracy in the detection of extremely abundant genes compared with microarray technology 40 , to gain a more comprehensive and global profile of mRNAs and lncRNAs in docetaxel-resistant breast cancer cells (MCF7-RES and MDA-RES).
We detected an average of 26,329 lncRNAs and 13,010 mRNAs per sample. Importantly, the lncRNAs and mRNAs showed a high consistency between different passages from the same cell subline (Sup Figs 1A, 4A), which was in concordance with the reports on the high reproductivity of Illumina RNA-Seq 40 and demonstrated the reliability of these data. We identified 1630 and 957 SDE coding genes in the MDA-RES and MCF7-RES groups compared to their parental cells, respectively. In comparison to the previous microarray analysis research in the same cell lines 4 , we identified more consistently up-regulated or down-regulated genes shared by the MCF7-RES and MDA-RES cells (124 vs 16), which are the potential common denominators associated with taxane resistance. The KEGG pathway analyses were performed on the SDE genes in each cell line, which identified the common pathways shared by the two cell lines and found that the most enriched pathways were different between the two cell lines (Sup Figs 2,  3). The 124 consistent SDE genes were also significantly enriched in 14 signaling pathways (Q-value < 0.05). The most significantly enriched pathway was the cGMP-PKG signaling pathway. However, the ABC transporters, the PI3K-Akt signaling pathway, and the AGE-RAGE signaling pathway in diabetic complications signaling pathway were also significantly enriched. Cyclic GMP (cGMP) is the intracellular second messenger that mediates the action of nitric oxide (NO) and natriuretic peptides (NPs), regulating a broad array of physiological processes. The role of this pathway in chemotherapy is unclear. However, it is implicated in the regulation of apoptosis in cancer cells 41 and has been reported to mediate the inhibition of Pgp efflux function by YC-1 42 . The PI3K-Akt signaling pathway was the most significantly enriched pathway in the MDA-MB-231 cells (Sup Fig. 3) and was also significantly enriched in the MCF-7 cells. Although it has not been reported in the previous taxane-resistance studies in breast cancer 3,4,33,34,36,37 , we found that it was linked to docetaxel resistance in prostate cancer 43,44 . These results indicated that in addition to the classical pathways, such as the ABC transporters and ECM-receptor interaction pathways, other signaling pathways may also play important roles in taxane resistance in breast cancer.
In addition to the mRNA analysis, the more novel and important work here was the profiling of the lncR-NAs in the parental and docetaxel-resistant breast cancer cells. We identified 72 consistently SDE lncRNAs (50 up-regulated and 22 down-regulated) in both of the docetaxel-resistant MCF-7 and MDA-MB-231 cell lines. By performing complete-linkage clustering with these consistently SDE lncRNAs, the 12 samples (3 samples for each of the 4 cell lines) perfectly clustered into two groups: the parental and resistant groups (Fig. 2D). A comparison of the clustering results with a similar number of SDE mRNAs (Fig. 1C) suggested that lncRNAs might be more specific biomarkers for distinguishing docetaxel-resistant cells from the non-resistant cells than mRNAs. Furthermore, this highlighted the potential application of lncRNAs as predictive biomarkers and for the prediction of response to chemotherapy.
lncRNAs have been described to exert their biological functions by regulating the generation and expression of target mRNAs 8 . Co-expression network analyses have been widely used to predict the target mRNAs of lncR-NAs by detecting their similar expression patterns 15,16 , and thus, the WGCNA R software package was used in our study to decipher the relationship between the SDE mRNAs and the SDE lncRNAs. By using a hierarchical clustering method, we identified 4 groups of co-expressed lncRNAs and mRNAs, termed 'modules' . Among these, the modules "Turquoise" and "Gray" were most significantly associated with a docetaxel-resistant status.
In the Turquoise module, which was positively correlated with docetaxel resistance, ABCB1 was identified as one of the central and highly connected genes. We found that 39 genes, including 22 coding and 17 non-coding, were linked to ABCB1 in this module. In the 22 coding genes linked to ABCB1, we found that many of the genes were implicated in drug or chemotherapy resistance. For instance, ABCB4, another ATP-binding cassette gene, which was reported to be involved in taxane resistance 7 , also clustered in this module. ADAM22 was also closely linked to ABCB1. This gene has been reported to function as an estrogen receptor-independent mediator of endocrine-resistant breast cancer 45 . However, it has not previously been linked to docetaxel resistance. Similarly, the gene GPNMB was also linked to ABCB1 in the Turquoise module and was up-regulated in all the docetaxel-resistant cells. GPNMB has been shown to induce aggressive cellular phenotypes in breast cancer and has been identified as a potential therapeutic target for patients with basal-like breast cancer (BLBC) and TNBC 46,47 . The concordant expression of ABCB1 and other potential drug-resistant genes in docetaxel-resistant breast cancer cells indicates that besides ABCB1, there are also other genes that may play an important role in the generation of docetaxel resistance, and these genes may contribute to docetaxel resistance in a cooperative way with ABCB1. Pathway enrichment with the genes in this module highlighted the important roles of the PI3K-Akt and ABC transporter signaling pathways.
Moreover, a striking finding here is that many lncRNAs were also co-expressed with ABCB1 and other potential drug-resistant genes in this module. We identified a list of lncRNAs closely linked to ABCB1 in the Turquoise module ( Table 2), indicating that some of these lncRNAs may function as regulators of ABCB1 By analyzing the genomic location of the lncRNAs linked with ABCB1, we found that several lncRNAs were located at the regions overlapping or near the ABCB1 gene locus. The lncRNA NONHSAG048135.2 was identified as an exonic lncRNA of ABCB1, whereas NONHSAG048134.2 (termed ABCB1-AS1) and NONHSAG096479.1 (termed ABCB1-AS2) were identified as antisense lncRNAs for ABCB1 Another lncRNA, NONHSAG048143.2, was identified as an intergenic lincRNA located 480 kb upstream of ABCB1.
lncRNAs are implicated in a wide-spectrum of biological processes at different molecular levels, including epigenetic modifications, transcription, and post-transcriptional processing 48 , and the functions of lncRNAs appears to always to be associated with the location of the lncRNA and the nearby mRNA 49,50 . Antisense lncRNAs have been shown to bind to the complementary pre-mRNAs, regulate RNA editing and splicing and, finally, affect the expression levels of mRNAs and proteins 51,52 . For instance, lncRNA ZEB2 NAT (zinc finger e-box binding homeobox 2 natural antisense transcript) was reported to bind to the ZEB2 pre-mRNA and obstruct the splicing of the Zeb2 5'-UTR, thus up-regulating the expression of the Zeb2 protein 52 . Antisense lncRNAs have also been shown to increase the stability of mRNAs by forming a lncRNA-mRNA duplex and, thereby, up-regulate gene expression 53,54 . Although, the exact functions of the two novel antisense lncRNAs ABCB1-AS1 and ABCB1-AS2 are still unclear, and further research needs to be performed, our correlation analysis revealed a strong positive correlation between the expressions of these antisense lncRNAs and ABCB1 mRNA (Fig. 4B) and raised the possibility that these lncRNAs may up-regulate the expression of the ABCB1 gene via RNA splicing or stabilization mechanisms. In addition to the lncRNAs located within the locus of the mRNAs, accumulating evidence suggests that lncRNAs located in the regulatory regions are also critical to the regulation of gene expression 39,55 . Many lncRNA transcripts from enhancer regions have been shown to facilitate the transcription of protein-coding genes 56 . For instance, CCAT1-L, which is transcribed from a super-enhancer region 500 kb upstream of the MYC (Proto-Oncogene C-Myc) gene, promotes chromatin interactions between MYC and its upstream regulatory elements and, thereby, up-regulates MYC transcription 57 . Recently, a study of lncRNAs in doxorubicin-and vincristine-resistant gastric cancer cells also identified an lncRNA, MRUL, which is located 400 kb downstream of ABCB1 and promotes the expression of ABCB1 10 . To our knowledge, this is the only lncRNA reported to promote ABCB1 expression in chemotherapy-resistant cancer cells. Since this lncRNA has not been recorded in the NONCODE 2016 database, we analyzed the differential expression of the transcripts of MRUL between the parental MCF-7 and MDA-MB-231 cells and their docetaxel-resistant sublines. We found that MRUL was dramatically up-regulated in the docetaxel-resistant MCF-7 subline compared to the parental MCF-7 cells. However, this up-regulation was not apparent in the docetaxel-resistant MDA-MB-231 subline (Fig. 4D). This finding suggested that MRUL may also contribute to docetaxel resistance in ER (estrogen receptor) positive breast cancer but maybe not in the TNBC type.
Most intriguingly, in our study, we also identified the lncRNA NONHSAG048143.2 located 480 kb upstream of ABCB1. The correlation analysis revealed a strong positive correlation between the expression of this lncRNA and ABCB1 mRNA (Fig. 4B), suggesting that this lncRNA may be another enhancer lncRNA for ABCB1. The histone H3K4me1/H3K4me3 ratio has been used to characterize enhancer elements across the genome 25,58,59 . Therefore, we downloaded the ChIP-seq datasets of the MCF-7 and MDA-MB-231 breast cancer cells to calculate the H3K4me1/H3K4me3 ratios at the TSS of the lncRNA NONHSAG048143.2. Our result revealed that lncRNA NONHSAG048143.2 was located in a region presenting an enhancer-like chromatin state (H3K4me1/H3K4me3 ratio > 1.2) in both the MCF-7 and MDA-MB-231 cells (Fig. 4C). This result strongly suggested that this lncRNA might function as an enhancer lncRNA in the up-regulation of the ABCB1 gene in docetaxel-resistant breast cancer cells.
In the Gray module, which was negatively correlated with the docetaxel-resistant status, we found that the gene TSPYL5 was dramatically and consistently down-regulated in the docetaxel-resistant MCF-7 and MDA-MB-231 cells. This gene has been involved in the modulation of cell growth and the cellular response to gamma radiation probably via the regulation of the Akt signaling pathway 60 . The overexpression of TSPYL5 has also been linked to chemotherapy sensitivity in both lung and prostate cancer cells 20,61 . The dramatic down-regulation of TSPYL5 in the docetaxel-resistant breast cancer cells may be involved in the loss of docetaxel sensitivity. Interestingly, we found that the lncRNA EPB41L4A-AS2 was expressed in all the parental breast cancer cell lines but was absent in all the docetaxel-resistant descendants (Fig. 5C). According to a recent study, this lncRNA may inhibit tumor cell proliferation and is associated with favorable prognoses in breast cancer 24 . However, the role of EPB41L4A-AS2 in chemotherapy-resistant cancer has not been deciphered yet. Here, we report, for the first time, the depletion of EPB41L4A-AS2 in docetaxel-resistant breast cancer cells. Our correlation analysis also showed that the decreased level of EPB41L4A-AS2 in the docetaxel-resistant breast cancer cells was significantly associated with an increased level of ABCB1 mRNA (Fig. 5B). These results strongly indicate that the depletion of EPB41L4A-AS2 contributes to the docetaxel-resistant phenotype in breast cancer cells. In support of this notion, our analysis in clinical breast cancers tissues (GSE21997) before neoadjuvant chemotherapy (doxorubicin or docetaxel) also showed that the expression of EPB41L4A-AS2 was significantly lower in the patients resistant to neo-adjuvant chemotherapy than in non-resistant patients (Fig. 5D).
In conclusion, we performed a comprehensive Illumina-based RNA sequencing and analysis on two docetaxel-resistant breast cancer cell lines and their parental breast cancer cell lines. In addition to the most studied docetaxel-resistant gene, ABCB1, we also identified other SDE protein-coding genes and pathways, which might contribute to the generation of docetaxel resistance. These findings improved our understanding of the mechanism underlying docetaxel resistance in breast cancer and suggest that a number of lncRNAs are promising candidates as predictive biomarkers for the docetaxel response. To our knowledge, this is the first study to report on a global profile of mRNAs and lncRNAs in docetaxel-resistant breast cancer. We identified groups of lncRNAs, which were consistently up-regulated or down-regulated in both the docetaxel-resistant MCF-7 and MDA-MB-231 cell lines and constructed a co-expression network to decipher the potential regulatory relationship between the SDE lncRNAs and mRNAs. Our results highlighted four overexpressed lncRNAs that overlapped or were near the ABCB1 locus, which might up-regulate the expression of ABCB1 via different mechanisms. We also identified lncRNA EPB41L4A-AS2 as a potential biomarker for docetaxel sensitivity in clinical breast cancer samples. Although further basic biological and clinical research still needs to be performed to explore the exact biological function of these newly identified lncRNAs, our study provided potential biomarkers to predict the response to docetaxel in breast cancer patients and might help improve the strategy for the treatment of advanced breast cancer. Total RNA purification and rRNA depletion transcriptome Sequencing. For the Illumina-based RNA-Seq, the total RNA from 3 passages of MCF-7, MCF7-RES, MDA-MB-231 and MDA-RES, was purified with RNAiso Plus Kit (TAKARA, Japan). After the RNA purification and DNase I digestion, rRNAs were removed from the total RNA with RiboMinus Eukaryote Kit (Qiagen, USA). The remained RNAs were fragmented using Ambion Fragmentation Solution (Thermo Fisher Scientific, USA). Then the mRNA fragments were used to synthesize cDNAs, followed by end repairing and adenine connection. Sequencing adaptors were ligated the fragments. Suitable fragments were selected according to the agarose gel electrophoresis results for PCR amplification. The StepOnePlus System (Applied Biosystems, Thermo Fisher Scientific, USA) and the Agilent 2100 Bioanaylzer (Agilent Technologies, USA) were used to quantify and qualify the sample libraries in the QC steps. Finally, all the libraries were sequenced by the HiSeq. 2000 sequencer (Illumina, USA).

Methods
lncRNA and mRNA expression profiling. Firstly, adaptor sequences were removed from the raw reads.
And low quality reads, which contained more than 50% of low quality bases (base quality < 10) and 5% of undefined nucleotides [N] on the read, were also removed. To avoid the reads mapped to the remaining rRNAs disturbing the next analysis, we aligned the reads to the human ribosomal RNA sequences (rRNA) and filtered out the perfect mapping reads. After the above two-step filtering, the remaining reads were considered clean reads and were used in next expression profile. In the mRNA analysis, the clean reads were mapped to the human reference genome (hg19) by TopHat2 62 . Then, the genes annotated in RefSeq were quantified by Cufflinks tools 63 . For lncRNA, the clean reads were mapped to the noncoding RNA transcriptome (NONCODE 2016 64 ) by STAR 65 and quantified by RSEM 66 . The parameters used in the analysis tools referred to the ENCODE RNA-seq data processing pipelines (https://www.encodeproject.org/pipelines/). Finally, we obtained all the mRNA and lncRNA expression tables represented in both the FPKM and read-count format.
Identification of significantly differentially expressed lncRNAs and mRNAs. The differentially expressed lncRNAs and mRNAs between the parental and docetaxel-resistant cells groups were calculated by DESeq. 67 in the two cell lines, respectively. To ensure that the denominator was not zero when performing the division operation, we added 1 to every element in the read-count matrix. The P-value was calculated based on a negative binomial distribution model, and adjusted by Benjamini-Hochberg mothed. The criteria of |Log2FC| > 1, FDR < 0.1 was used to select the significantly differentially expressed lncRNAs and mRNAs.
GO and KEGG pathway enrichment. Functional annotation enrichment analyses for GO and the KEGG pathway were performed using the KOBAS server 68 . P-value is calculated by hypergeometric test. The Q-value is the P-value corrected by Benjamini-Hochberg mothed. The GO terms and pathways with a Q value less than 0.05 were considered significantly enriched.
Co-expression network analysis. The WGCNA R package 18 was used to construct the co-expression network between the significantly differentially expressed lncRNAs and mRNAs. By hierarchical clustering of the expression data, several modules, which are clusters of highly interconnected genes, were identified in the co-expression network. Then, we related the drug-resistant feature to the module eigengene to identify the drug-resistant modules (Pearson correlation test, P-value < 0.05). In the drug-resistant modules, hub genes were selected by |MM| > 0.8 and |GS| > 0.4, and they were highly connected with other genes. The connections among the hub genes were visualized as an unsigned network by Cytoscape 69 .