Gene set enrichment analysis provides insight into novel signalling pathways in breast cancer stem cells

Background: Tumour-initiating cells (TICs) or cancer stem cells can exist as a small population in malignant tissues. The signalling pathways activated in TICs that contribute to tumourigenesis are not fully understood. Methods: Several breast cancer cell lines were sorted with CD24 and CD44, known markers for enrichment of breast cancer TICs. Tumourigenesis was analysed using sorted cells and total RNA was subjected to gene expression profiling and gene set enrichment analysis (GSEA). Results: We showed that several breast cancer cell lines have a small population of CD24−/low/CD44+ cells in which TICs may be enriched, and confirmed the properties of TICs in a xenograft model. GSEA revealed that CD24−/low/CD44+ cell populations are enriched for genes involved in transforming growth factor-β, tumour necrosis factor, and interferon response pathways. Moreover, we found the presence of nuclear factor-κB (NF-κB) activity in CD24−/low/CD44+ cells, which was previously unrecognised. In addition, NF-κB inhibitor dehydroxymethylepoxyquinomicin (DHMEQ) prevented tumourigenesis of CD24−/low/CD44+ cells in vivo. Conclusion: Our findings suggest that signalling pathways identified using GSEA help to identify molecular targets and biomarkers for TIC-like cells.

Accumulating evidence suggests that tumour-initiating cells (TICs) or cancer stem cells-which make up only a small proportion of heterogeneous tumour cells-possess a greater ability to maintain tumour formation than other tumour cell types. It has been proposed that TICs have characteristics in common with normal stem cells from tumour-prone tissue (Ailles and Weissman, 2007). For instance, TICs can self-renew and simultaneously produce differentiated daughter cells that proliferate strongly until they reach their final differentiated state. Apparent differences also exist between TICs and normal stem cells. The latter are maintained under tight homoeostatic regulation and are passively protected in the surrounding microenvironment or stem cell niche in adult tissues. However, the former may actively contribute to tumour formation. Although the concept of TICs greatly influences cancer biology and evokes a reconsideration of cancer treatment, the molecular mechanisms involved in the contribution of TICs to tumourigenesis remain obscure.
In human breast cancers, a population characterised by the expression of cell-surface markers, CD24 À/low /CD44 high , was reported to be highly enriched in TICs, compared with populations of CD24 high /CD44 high cells (Al-Hajj et al, 2003;Mani et al, 2008). Two gene-expression profiling studies, comparing CD24 À/low / CD44 þ cell populations with other populations in primary breast cancer cells or in normal tissue, presented the CD24 À/low /CD44 þ cell population-derived different signatures that seemed to predict poorer prognosis (Liu et al, 2007;Shipitsin et al, 2007). One study showed that transforming growth factor (TGF)-b pathways seem to be activated in these cells (Shipitsin et al, 2007). It was subsequently reported that TGF-b induced the epithelial -mesenchymal transition (EMT) in mammary glands and stem-like cells in both normal mammary epithelial cells and breast cancer cells (Mani et al, 2008). Because TGF-b signalling can have positive or negative effects on tumourigenesis, additional signalling may still be needed to stimulate tumourigenesis.
Nuclear factor-kB (NF-kB) is a transcription factor complex and is typically a heterodimer of p50, p52, p65 (RelA), RelB, and c-Rel. It is usually inactive and bound to IkB, an inhibitory protein, in the cytoplasm. Upon stimulation with signals such as tumour necrosis factor (TNF) or interferon (INF), IkB is first phosphorylated, then ubiquitinated, and finally degraded. Released NF-kB translocates to the nucleus and binds to the kB sequence, wherein it promotes the transcription of various genes, including inflammatory cytokines. Nuclear factor-kB is involved in inflammation, angiogenesis, inhibition of apoptosis, and tumourigenesis (Karin et al, 2002;Huber et al, 2004;Tabruyn and Griffioen, 2008).
Gene set enrichment analysis (GSEA) is a recently developed analytical method of gene-expression profiling. The results are easier to interpret biologically, and the method is more accurate and robust than individual gene analysis methods, such as fold change analysis of expression levels (Subramanian et al, 2005).
In this study, we effectively used GSEA to comprehensively analyse signalling pathways in TICs using breast cancer cell lines. As a result, we identified multiple signalling pathways potentially activated in TIC-like cells, including both known and unknown pathways. We found activity of NF-kB, which was previously unrecognised, in TIC-like cells. Therefore, it is possible that the signalling pathways identified using GSEA help to identify novel candidates of molecular targets that have important roles in tumourigenesis in human breast cancer TICs.

FACS
Cells were sorted or analysed after staining with CD24-FITC or CD44-PE antibody (BD Pharmingen, San Jose, CA, USA), and dead cells were eliminated using propidium iodide (Sigma, Saint Louis, MO, USA) and FACS VantageSE (BD Biosciences, Bedford, MA, USA). Data were analysed by FlowJo 7.2.2 Tree Star Inc. Ashland, OR, USA.

Construction of lentiviral vectors
A third-generation self-inactivating lentiviral transfer vector plasmid with a gene encoding firefly luciferase or d2Venus (Nagai et al, 2002) (provided by Dr A Miyawaki, RIKEN, Wako, Japan) under the control of the elongation factor 1-a (EF1a) promoter was produced using constructs provided by Dr H Miyoshi (RIKEN, Tsukuba, Japan) by standard molecular biological techniques (Miyoshi et al, 1999). These vectors also contained the central polypurine track and the woodchuck hepatitis virus postregulatory element. Viral supernatant was produced by transient transfection of 293T cells with packaging plasmids (pMDLg/ p.RRE) and HIV rev expression plasmids (pRSV-rev) and was then pseudotyped with the vesicular stomatitis virus G protein (pMD.G), as previously described (Bai et al, 2003). High-titre viral stocks were prepared by ultracentrifugation. The functional titres of these vectors (HIV-EF1a-Luciferase, HIV-EF1a-d2Venus) were determined by infection of HeLa cells using a real-time PCR method (DNA titre; Sastry et al, 2002). All the multiplicity of infection (MOI) values used in our experiments were calculated from DNA titres.

Transduction of cells with lentiviral vectors
HCC1954 and MCF7 cells were pelleted and incubated with viral supernatant at an MOI of 10 in a 1.5-ml Eppendorf tube. After incubation for 2 h at 371C in 5% CO 2 , cells were cultured until they were used in in vivo experiments. Because HIV-EF1a-d2Venus was used for confirmation of transduction efficiency, HIV-EF1a-Luciferase and HIV-EF1a-d2Venus were infected simultaneously in separate tubes. After more than three passages, the cells were used for FACS analysis or in the xenograft model.

Xenografts
Six-week-old female NOD/SICD mice were anaesthetised with isoflurane (Abbott Japan, Tokyo, Japan), and then 0.72 mg, 60-dayrelease b-estradiol (E2) pellets (Innovative Research of America, Sarasota, FL, USA) were implanted s.c. on the back of the neck (only in the case of MCF7 implantation). A total of 1 Â 10 2 to 3 Â 10 4 sorted cells were suspended in 1 : 1 volumes of phosphatebuffered saline (PBS)(À)/Matrigel (BD Biosciences) to produce 100 ml of mixture and were then injected into the mammary fat pads. Dehydroxymethylepoxyquinomicin (DHMEQ) was suspended in 0.5% chloromethyl cellulose and administrated by i.p. injections of 100 ml containing 12 mg kg À1 thrice a week. Control groups were injected with the same volume of vehicle. All treatments started on day 2 after tumour cell implantation.
Mice were handled according to the guidelines of the Institute of Medical Science, University of Tokyo. The experiments were approved by the committee for animal research at the institution.

In vivo imaging
Mice under anaesthesia were injected i.p. with 150 mg kg À1 of luciferin (Promega, Madison, WI, USA) in PBS(À), and images were recorded by the IVIS Imaging System (Xenogen, Hopkington, MA, USA) 5 min after the injection. The bioluminescence images were quantified by Living Image software (Xenogen). Observations by IVIS were continued once a week, immediately after the injection, up to 4 weeks. In DHMEQ treatment, tumour growth was monitored by luciferase activity twice a week, for up to 32 days.

Histology analysis
Tumours from xenograft cells were fixed in 10% neutralised buffered formalin, embedded in paraffin, and then stained with haematoxylin -eosin (HE).

Microarray analysis
For microarray analysis, 1% of the entire population of the HCC1954, MCF7, or HCC70 cell line, belonging to CD24 À CD44 þ , was purified on the basis of the lowest expression levels of CD24. In addition ten percent of the entire cell population of each cell line, belonging to CD24 þ /CD44 þ , was purified as the control population (CD24 þ ). There was no significant difference in tumourigenicity, whether we considered 1 or 10% of the entire CD24 À/low /CD44 þ population as the TIC population. Microarray analyses were performed as previously described (Morikawa et al, 2007). Briefly, total RNA was isolated from samples using TRIzol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. Six samples were analysed on an Affymetrix highdensity oligonucleotide array, Human Genome U133 Plus 2.0. Output images were processed by the MAS5 algorithm and globally scaled to a target intensity of 100. To identify gene-signature-based differences between CD24 À/low /CD44 þ and CD24 þ /CD44 þ populations, we performed GSEA (Subramanian et al, 2005). All probe sets were pre-ranked using the ratio of the geometric mean of each group's expression values; thereafter, the ordered probe set list was used as the GSEA input. The detailed GSEA parameters are as follows: the number of permutations is 2500, and the permutation type is configured to the gene set to avoid the potential problem of a small sample size.

Quantification of NF-jB activity
Nuclear extracts were prepared with a Nuclear Extract Kit according to the manufacturer's instructions (Active Motif, Carlsbad, CA, USA). Briefly, CD24 À/low /CD44 þ and CD24 þ /CD44 þ populations were sorted from the bulk of HCC1954 or MCF7 cells and isolated from the nuclear extracts. NF-kB p65 activity was measured with a TransAM NF-kB p65 Transcription Factor Assay kit (Active Motif) according to the manufacturer's instructions. Four sets of nuclear extracts from each population were prepared, and 20 mg of each extract was used for the p65 NF-kB activity assay.

TIC populations in breast cancer cell lines
To examine whether CD24 À/low /CD44 þ cell populations exist in various types of breast cancer cell lines, we analysed the expression of these surface markers in eight breast cancer cell lines by FACS analysis (Figure 1). We found that each cell line had various expression levels of CD24 and CD44. HCC1954, MCF7, HCC70, and MDA-MB-231 cells had relatively high percentages of CD44 þ cell populations, whereas BT-474, AU-565, SK-BR-3, and T47-D cells showed low CD44 expression levels. HCC1954, MCF7, and HCC70 cells had a small population of CD24 À/low /CD44 þ cells. This situation might be similar to that in early-stage breast cancer tissues in which the TIC population is assumed to be small. To determine the hierarchical organisation of breast cancer cell lines, we analysed the tumourigenic potential of the CD24 À/low /CD44 þ and CD24 þ /CD44 þ cell populations of HCC1954 and MCF7 cell lines.
Tumourigenicity of CD24 À/low /CD44 þ cell populations in breast cancer cell lines The in vivo tumourigenicity assay is the gold standard for identifying TICs (Clarke et al, 2006). Tumourigenicity of TICs has been examined by using NOD/SCID mouse and measuring palpable tumours. To improve the quality of the quantitative results, we used in vivo bioluminescence imaging (IVIS) to measure tumour growth. We first transduced cells with a lentiviral vector encoding luciferase or d2Venus (an improved version of yellow fluorescent protein) cDNA. We measured transduction efficiency by expression levels of d2Venus using FACS. As shown in Supplementary Figure 1, high transduction efficiency was obtained in each cell line: 92.60 and 99.29% for HCC1954 and MCF7 cells, respectively. Next, we transduced a lentiviral vector expressing luciferase into these cells. Because we used similar MOI levels for transduction of the lentiviral vectors expressing luciferase and d2Venus, we expected similar levels of luciferase expression in each cell line. These were designated HCC1954-Luc or MCF7-Luc. Cells in CD24 À/low /CD44 þ populations were considered to be enriched for TICs, and CD24 þ CD44 þ populations were used as controls. We compared the expression levels of luciferase in both cell populations and confirmed that there were no significant differences (Supplementary Figure 2).
Cells were implanted into mammary fat pads of NOD/SCID mice and tumour growth was measured by quantifying luciferase activity using the IVIS Imaging System. A total of 10 000 HCC1954-Luc and MCF7-Luc cells of both populations were implanted (Figure 2A and C). After 4 weeks, the analysis of luciferase activity indicated that cells in the CD24 À/low /CD44 þ populations of HCC1954-Luc and MCF7-Luc generated significantly larger tumours than the control populations (Po0.05).  Moreover, when we transplanted both populations of 1 Â 10 2 HCC1954-Luc, we found that tumours were generated only by the CD24 À/low /CD44 þ population (n ¼ 6) ( Figure 2B).

BT-474
These results indicate that CD24 À/low /CD44 þ populations in breast cancer cell lines have higher tumourigenicity than control populations. It is therefore likely that CD24 À/low /CD44 þ cells in breast cancer cell lines may behave in a manner similar to TICs.
We also examined the histology of tumours derived from HCC1954-Luc cells from both populations and from an unsorted population when 1 Â 10 4 cells of each population were implanted. HE staining revealed that tumours derived from CD24 À/low /CD44 þ and unsorted cells showed a similar histology, namely, exclusively invasive patterns with a variety of morphologies and the stromal component (Supplementary Figure 3). In contrast, tumours derived from control cells showed both invasive and differentiated patterns associated with a smaller stromal component than CD24 À/low /CD44 þ or unsorted cells.
Gene-expression profiling and GSEA for the identification of pathways and key effectors in CD24 À/low /CD44 þ cells To identify expressed genes that were highly enriched in CD24 À/low /CD44 þ and control cells, we performed DNA microarray analysis using HCC1954, MCF7, and HCC70 cell lines that have small populations of CD24 À /CD44 þ cells. To select cell populations strictly, we used only CD24 À /CD44 þ cell populations that accounted for approximately 1% of the entire population. As control, we used CD24 þ /CD44 þ cell populations that comprised approximately 10% of the entire population. Microarray data were ranked by the expression ratio between the geometric mean of the CD24 À/low /CD44 þ :CD24 þ /CD44 þ populations from the three cell lines (Supplementary Table). We then applied GSEA (Subramanian et al, 2005). Our results showed that gene sets involving TGF-b pathways and oncogeneic Ras pathways were upregulated in CD24 À/lowÀ /CD44 þ populations ( Figure 3). Moreover, we found that both TNF and IFN response  Figure 2 Luciferase activities of CD24 À/low /CD44 þ cells in NOD/SCID mice. HCC1954 and MCF7 cells expressing luciferase were sorted by FACS. Ten per cent of the entire population, belonging to CD24 À/low CD44 þ , was selected as the TIC population (CD24 À ). Ten per cent of the whole population, belonging to CD24 þ /CD44 þ , was selected as the control (CD24 þ ). We transplanted both populations of 1 Â 10 4 HCC1954 cells (A), 1 Â 10 2 HCC1954 cells (B) of 1 Â 10 4 CF7 cells (C) in mammary fat pads of NOD/SCID mice. Luciferase activities were captured by IVIS after 4 weeks (upper panels). Luciferase activities in implanted sites were quantified (n ¼ 6) (lower graphs). Results are represented as mean±s.d. *Po0.05 (Student's t-test).
With regard to individual genes, gene-ontology-based classification revealed that genes involved in 'stemness', cell proliferation/ maintenance, cell adhesion, cell motility, invasion, angiogenesis, growth factor/cytokine, immune response/suppression, and metabolism were highly represented in CD24 À/lowÀ /CD44 þ cells compared with control cell populations. All these genes may  Figure 3 Gene set enrichment analysis. DNA microarray analyses were performed to compare TIC and control populations of HCC70, HCC1954, and MCF7. One per cent of the whole population of each cell line, belonging to CD24 À /CD44 þ , purified on the basis of the lowest expression levels of CD24, was selected as the TIC population. Ten per cent of the whole population, belonging to CD24 þ /CD44 þ , was purified for the control. Microarray data were ranked using the geometric mean of the expression ratios between the TIC and control populations from the three cell lines, and GSEA was then applied. GSEA-extracted representative pathways containing genes enriched in the TIC or control populations are shown. In the original GSEA data sets, the oncogenic Ras pathway is depicted as RAS_ONCOGENIC_SIGNATURE, the TGF-b pathway is depicted as TGFBETA_ EARLY_UP, the IFN response is depicted as IFN_ANY_UP, and the TNF response pathway is depicted as SANA_TNFA_ENDOTHELIAL_UP.
contribute to oncogenesis. For example, from the GSEA results, we found Notch2, a 'stemness'-related gene, in the TGF-b pathway; LAMA3, a cell invasion-or adhesion-related gene, in the oncogenic Ras pathway; and KLF5, EPAS1, and VEGF, angiogenesis-related genes, in the oncogenic Ras pathway (Figure 3, in red). Conversely, GSEA revealed that genes highly expressed in control populations correlated with several cell-cycle-associated gene sets, which have large numbers of cell proliferation/maintenance-related genes.
One of the important effector molecules common to both TNF and IFN response pathways is NF-kB. We quantified NF-kB activities in nuclear extracts of CD24 À/low /CD44 þ and control populations that were sorted by FACS analysis. We found that the activity of NF-kB was significantly higher in CD24 À/low /CD44 þ than in CD24 þ /CD44 þ populations (Figure 4).
The TNF or IFN response pathway is involved in the expression of many inflammatory cytokines/chemokines. Vascular endothelial growth factor A, interleukin 8, and chemokine (C-C motif) ligand 5 are among the inflammatory cytokines/chemokines associated with stroma-like activities (Shono et al, 1996;Moriuchi et al, 1997;Yoshida et al, 1997;Cho et al, 2007). Among the highly ranked genes, we also noticed Toll-like receptor 1, another upstream activator for NF-kB, and stromal cell-derived factor 2-like 1, which is reported to be upregulated through EMT, an important biological output of the TGF-b pathway (Massagué, 2008;Sarrio et al, 2008;Rakoff-Nahoum and Medzhitov, 2009). We measured the expression levels of these genes by quantitative RT-PCR and confirmed that they were expressed at significantly higher levels in CD24 À/lowÀ /CD44 þ populations compared with control cells (Supplementary Figure 4).
Decreased tumourigenesis in CD24 À/low /CD44 þ populations after treatment with DHMEQ, a specific inhibitor for NF-jB We next examined the role of NF-kB activity in tumourigenesis using a mouse model. TICs are believed to be important at the beginning of tumourigenesis or in its recurrence; therefore, we analysed tumourigenesis at early points from relatively small numbers of TIC-like cells. We transplanted 10 4 cells of CD24 À/low / CD44 þ populations into NOD/SCID mice, as shown in Figure 2, and treated them with DHMEQ, a specific inhibitor for NF-kB (Umezawa, 2006;Supplementary Results;Supplementary Figures 5 and 6). To analyse the effects occurring during the course of tumourigenesis, we began inhibitor treatment 2 days after transplantation. We monitored tumour formation by in vivo imaging and found that the luciferase activities of the tumours derived from CD24 À/low /CD44 þ cell populations treated with DHMEQ were significantly decreased compared with that of untreated cell-derived tumours ( Figure 5). This finding suggests that NF-kB functions as a key effector of tumourigenesis derived from TIC-like cells.

DISCUSSION
In this study, we established a mouse model that may recapitulate part of the tumourigenic process in TICs, using breast cancer cell lines. We showed that cells derived from CD24 À/low /CD44 þ populations resulted in tumours larger than those of CD24 þ / CD44 þ control populations. Importantly, when as few as 100 cells were implanted, only CD24 À/low /CD44 þ populations gave rise to tumours ( Figure 2B). This is an important criterion for TICs (Clarke et al, 2006). Therefore, the CD24 À/low /CD44 þ populations in cell lines may be enriched with TIC-like cells. Our results revealed heterogeneity in cell populations divided into TIC-like cells and other cells. Consistent with our data, it has been recently reported that other cell lines also have TIC-like cell populations (Fillmore and Kuperwasser, 2008). Therefore, it is reasonable to assume that several breast cancer cell lines are heterogeneous and that they have distinct cell populations: TIC-like cells and other cells, with both cell types preserving the characteristics of TICs and other cells in primary cancer tissues to some extent.
Moreover, we labelled cells with a luciferase reporter gene and established a monitoring system of tumourigenesis in the fat pads of NOD/SCID mice by sensitive and quantitative in vivo imaging that can detect as few as 100 cells. Because TICs are thought to be important in tumourigenesis during transition from the premalignant stage to the malignant stage or during recurrence with a few tumour cells, this model should be useful for monitoring tumourigenesis at these stages. Indeed, we were able to validate candidate targets in TICs using inhibitors for NF-kB in our model ( Figure 5).
Gene-expression profiling combined with GSEA showed that several signalling pathways and genes are involved in CD24 À/low / CD44 þ TIC-like cells compared with CD24 þ /CD44 þ control populations. Gene Ontology Classification revealed that a variety of genes may represent malignant characteristics of CD24 À/low / CD44 þ TIC-like cells. There is an overlap with those genes found in a previous report using cells from CD24 À/low /CD44 þ and CD24 þ /CD44 þ populations derived from normal breast and primary breast cancer tissues. Some genes involved in the TGF-b pathway were enriched in CD24 À/low /CD44 þ populations, consistent with the previous report (Shipitsin et al, 2007). Importantly, we found genes associated with oncogenic Ras pathways, as well as with TNF and IFN response pathways, as novel gene sets in CD24 À/low /CD44 þ populations. It is likely that genes involved in the oncogenic Ras pathway, the TNF response pathway, and the   Figure 5 Effects on tumourigenesis after treatment with NF-kB inhibitor. Tumour growth of CD24 À/low /CD44 þ populations of HCC1954 cells treated with NF-kB inhibitor DHMEQ was measured by luciferase activity (n ¼ 8). Averages of luciferase activity are indicated by lines. *Po0.05.
IFN response pathway are specifically represented in TICs but not in normal stem cells. The TGF-b pathway inhibits tumourigenesis when it is the only pathway activated in cells (Massagué, 2008). However, in the malignant state, the TGF-b pathway cooperates with other pathways to facilitate tumourigenesis. It has been reported that the oncogenic Ras pathway, inflammatory responses, and activation of NF-kB are such cooperative pathways (Huber et al, 2004).
These findings suggest that TICs are more malignant than other cells in cancer. We also cannot rule out the possibility that cell lines may have additional characteristics that differ from those of primary cells. For example, cell-cycle-related gene sets found to be enriched in control cells may somehow reflect in vitro culture adaptations (Figure 3). The enhanced cell-cycling activity might allow control cells to grow in vivo after implantation; whereas control cells derived from primary tissues rarely generate tumours in vivo (Figure 2).
We showed that activity of NF-kB is higher in TIC-like cells than in control cells. Moreover, DHMEQ, a highly specific inhibitor for NF-kB, suppressed tumourigenesis in the TIC-like cells in our mouse model. This was assessed following treatment that occurred soon after transplantation. Thus, NF-kB could be a promising target for treatment of early stages of breast cancer and for the prevention of recurrence.
Taken together, our findings raise an intriguing possibility: TICs behave in a manner similar to CAFs and can actively generate and maintain the cancer stem cell niche, in which NF-kB functions as the main effector that can induce many secretory proteins, including cytokines and chemokines. Future studies should focus on the extensive evaluation of our model by using clinical samples of breast cancer.