Transcriptomic profiling of human breast and melanoma cells selected by migration through narrow constraints

The metastatic spread of cancer cells is a step-wise process that starts with dissociation from primary tumours and local invasion of adjacent tissues. The ability to invade local tissues is the product of several processes, including degradation of extracellular matrices (ECM) and movement of tumour cells through physically-restricting gaps. To identify properties contributing to tumour cells squeezing through narrow gaps, invasive MDA-MB-231 human breast cancer and MDA-MB-435 human melanoma cells were subjected to three successive rounds of selection using cell culture inserts with highly constraining 3 μm pores. For comparison purposes, flow cytometry was also employed to enrich for small diameter MDA-MB-231 cells. RNA-Sequencing (RNA-seq) using the Illumina NextSeq 500 platform was undertaken to characterize how gene expression differed between parental, invasive pore selected or small diameter cells. Gene expression results obtained by RNA-seq were validated by comparing with RT-qPCR. Transcriptomic data generated could be used to determine how alterations that enable cell passage through narrow spaces contribute to local invasion and metastasis.


Background & Summary
The metastatic spread of cancer cells from primary tumours to distant sites is the most serious and deadliest aspect of the disease, with estimates of up to 90% of cancer related deaths being directly associated with metastasis 1,2 . In addition, it has become apparent that the same processes that contribute to cancer metastasis also facilitate primary tumour growth 3,4 . As a result, characterizing the properties of metastatic cells that enable their dissemination may identify actionable targets for chemotherapy that could reduce cancer spread and possibly tumour growth and progression.
The metastatic ability of cancer cells results from several changes in their normal behaviours 5 . The strength of cell-cell adhesions is often lessened, allowing individual or groups of cells to separate and move away from the primary tumour 6 . The movement away from the tumour into adjacent tissue may be promoted by changes in migratory behaviour, increased extracellular matrix (ECM) degrading activity, and the ability to squeeze between cells and ECM protein fibres 7 . In the next stage, locally invasive cells may spread further by moving through surfaces surrounding body cavities, via lymphatic or blood vessels, or through canalicular spaces. Ultimately, tumour cells may move to a secondary site, the location of which may be influenced by variables including the route taken, intrinsic properties of the target tissue, and accessibility to factors, such as tumour cell-generated exosomes, that condition a pre-metastatic niche. Common to several stages of the metastatic process is the ability of tumour cells to squeeze through narrow spaces. As a result, it can be predicted that changes in the deformability of tumour cells that enabled their movement through physically constraining conditions in their three dimensional environment would likely promote cancer spread 8 .
To select for cells that were better able to move through narrow gaps, MDA-MB-231 D3H2LN human breast cancer cells expressing firefly luciferase (Luc) 9 (abbreviated MDA-MB-231) and MDA-MB-435 human melanoma cells (which were mistakenly used in the past as a breast cancer model until its cancer type was corrected) 10 were subjected to three rounds of enrichment using tissue culture inserts with 3 μm pores (Fig. 1a). By plating cells on microporous membranes in serum-free medium in the inserts, and then transferring the inserts to tissue cultures dishes containing medium supplemented with 10% fetal bovine serum (FBS), a chemotactic gradient was created to attract cells to move through the restricting narrow pores. Several independent 'Selected' populations of cells were established from both MDA-MB-231 and MDA-MB-435 cells in this manner. Given the possibility that selection through narrow pores would enrich for small cells, several independent 'FlowSorted' populations of small diameter MDA-MB-231 cells were also selected by three consecutive rounds of flow cytometry (Fig. 1b).
Parental, Selected and FlowSorted populations of MDA-MB-231 and MDA-MB-435 cells were fixed and stained with phalloidin to reveal filamentous actin (F-actin) structures and cell morphology. Selected and FlowSorted MDA-MB-231 cells, as well as Selected MDA-MB-435 cells, were notably smaller than their originating parental cells (Fig. 1c). In addition, Selected cells from both tumour cell lines were marked by fewer cytoplasmic and cortical distinct F-actin fibres, more actin-rich protrusive regions (indicated by white arrows), and more irregular circumferential membranes compared to their respective parents (Fig. 1c). When invasion into three dimensional fibroblast-conditioned dense collagen matrices was examined, Selected MDA-MB-231cells were markedly more invasive than Parental or FlowSorted cells (Fig. 1d). In addition, pore-selection changed relatively non-invasive Parental MDA-MB-435 cells into highly invasive Selected cells (Fig. 1d).
To determine how gene expression differs between Parental, Selected and FlowSorted populations of MDA-MB-231 and MDA-MB-435, total RNA was extracted and enriched for polyA+ mRNAs, and then subjected to RNA-Sequencing ( Fig. 2 and Table 1). The study has been described at the NCBI Bioproject (PRJNA327913), with descriptions of the MDA-MB-231 cells (SAMN07311741) and MDA-MB-435 cells (SAMN07311743). Primary data are available at the NCBI Sequence Read Archive (Table 2 (available online only) and Data Citation 1). The transcriptomic data generated by this study may reveal important modifiers of the physical properties that enable tumour cells to move through narrow spaces, as well as regulators of cell size, that contribute to the metastatic spread of cancer.

Cell line selection
Independent MDA-MB-231 or MDA-MB-435 pore-selected (Selected) cell populations were established by seeding 1 × 10 6 cells in 10 ml serum-free medium on 3 μm pore membranes in 7.5 cm cell culture inserts (Corning, Fisher Scientific, Loughborough, UK). Inserts were placed in 10 cm dishes containing 10 ml serum-containing medium, and left for four days in standard tissue culture conditions to allow cells  to migrate through the pores. The inserts were then removed, media was changed and plates were placed back in the incubator to expand the selected cell population. The selection process was repeated twice more as described above.
Independent small diameter MDA-MB-231 flow cytometry sorted (FlowSorted) populations were obtained by gating on cells with low forward scatter and side scatter parameters as indicated in the red region in Fig. 1b using a FACSAria Fusion sorter (BD Biosciences, Oxford UK). FlowSorted cells were then grown using standard tissue culture conditions to expand the isolated sorted cell populations, followed by two additional rounds of sorting as described above.
Cells were stained and imaged as described in ref. 11. For filamentous actin imaging, 0.8 × 10 5 cells were seeded on 13 mm autoclaved cover slips and grown overnight. Cells were washed with phosphate buffered saline (PBS), then fixed with 4% paraformaldehyde/PBS solution, permeabilized with 0.5% (v/v) Triton X-100/PBS, blocked with 1% (w/v) bovine serine albumin (Sigma-Aldrich, Gillingham, UK)/PBS, and then incubated with Alexa Fluor488 phalloidin (Sigma-Aldrich, Gillingham, UK) (1:1,000 dilution) for 1 h at room temperature. Cover slips were washed twice with PBS and inverted on 7 μl of ProLong Diamond Antifade Mountant with DAPI (Thermo Fisher Scientific, Renfrew UK) on a glass slide. Cells were imaged on a Zeiss 880 confocal microscope (Cambridge UK) using a 63X oil objective.
Collagen matrix invasion assays were performed as previously described 11 . Primary human fibroblasts were mixed with rat tail collagen1 and placed in a cell culture incubator for a week to allow conditioning of the collagen. To remove fibroblasts from the collagen matrices, the disks were incubated with 5 μg ml −1 Puromycin for at least 24 h and then washed twice with medium. 2 × 10 5 cells were seeded on top of the disks and allowed to settle and grow over 2 days. Afterward, the collagen matrices were mounted onto grids to generate an air/liquid interface. After 8 days, the collagen disks were fixed in 4% paraformaldehyde overnight and processed using standard histological methods. H&E-stained sections were scanned and analyzed using Digital Slide Server (SlidePath, Leica, Milton Keynes UK) software.

RNA isolation
1 × 10 6 cells were seeded into 6-well plates and allowed to settle and grow overnight. Cells were harvested with Trypsin and total RNA was extracted using the RNAeasy kit (Qiagen, Manchester UK) according to manufacturer's instructions. RNA was quantified using the Nanodrop spectrophotometer (Nanodrop, Thermo Fisher Scientific, Renfrew UK). The Agilent RNA ScreenTape assay and the Agilent 2,200 TapeStation system (both Agilent, Stockport UK) were used to determine the RNA integrity number equivalent (RINe; Table 3).

RNA-sequencing
Total RNA was used to generate an oligo dT-enriched library with the Illumina TruSeq RNA Library Preparation kit v2.0 (Illumina, Cambridge UK). Quality and quantity of the DNA library was assessed using the Agilent 2,100 Bioanalyzer (Agilent, Stockport UK) and the Qubit (Thermo Fisher Scientific, Renfrew UK), respectively. The library was run on the Illumina NextSeq 500 platform using the High Output 75 cycles kit (2 × 36 cycles, paired-end reads, single index) (both Illumina, Cambridge UK).

RNA-sequence analysis
RNA-Sequence analysis and alignment was carried out as described in reference 12 . Quality control checks of raw RNA-Seq data files were done with fastqc v0.10.1 (http://www.bioinformatics.babraham.ac.uk/ projects/fastqc/) and fastq_screen v0.4.2 (http://www.bioinformatics.babraham.ac.uk/projects/fas-tq_screen/). RNA-Seq reads were aligned to the human genome build GRCh38 with TopHat2.0.13 13 and genome annotation using GRCh38.82.gtf. BAM files were further processed with HTseq0.6.1p1 (http://www.huber.embl.de/users/anders/HTSeq/doc/count.html). Differential analysis of count data was performed by the DESeq2 package (DESeq2) 14 . Regularized log transformation was used to transform the DESeq2 data for principal component analysis.    Table 4. Reaction mixtures were distributed into MicroAmp Fast Optical 96-well plates and 1.5 μl of cDNA sample or standard added to each well. The plate was covered with optically transparent sealing film and run on an Applied 7,500 Fast Real-Time PCR system. A melting curve was performed to validate the presence of single PCR product. Data was analysed on Applied Biosystem 7,500 Software 2.0.5 and the expression level of genes of interests were calculated using ΔCt method and normalized to GAPDH.

Data Records
A project overview has been submitted as the BioProject reference PRJNA327913, with descriptions of the BioSample MDA-MB-231 D3H2LN Luc cells (reference SAMN07311741) and of the BioSample MDA-MB-435 cells (reference SAMN07311743). Unprocessed RNA-Sequencing reads have been deposited as fastq files at the National Center for Biotechnology Information (NCBI) Sequence Reads Archive (SRA) with the reference SRP111915 (Data Citation 1). The fastq files correspond to four independent biological replicates (BiolRep1-4) for the MDA-MB-435 melanoma Parental, Selected1 or Selected2 cell populations, or to four independent biological replicates (BiolRep1-4) for the MDA-MB-231 Parental and three independent biological replicates (BiolRep1-3) for the Selected2, Selected3, FlowSorted1 or FlowSorted2 cell populations, as indicated in Table 2 (available online only). For each sequencing reaction run on the Illumina NextSeq 500 instrument, four technical replicates were produced (TechRep1-4). Forward (R1) and reverse (R2) reads have been combined, with SRA accession numbers for the combined sequencing results also indicated in Table 2 (available online only). Please also see the associated Metadata Record.

Technical Validation
The quality of extracted RNA (RNA integrity number equivalent; RINe) of all samples was determined using the Agilent RNA ScreenTape assay and the Agilent 2,200 TapeStation system (Table 3). Following RNA-seq, correlation coefficients were calculated for all pairwise comparisons of biological replicates (Table 5)   Selected2 and Selected3 populations ( Fig. 3a; orange and brown symbols) formed a separate cluster. Similarly, three of the Parental MDA-MB-435 biological replicates ( Fig. 3b; grey symbols) clustered together and were separate from the four biological replicates of Selected1 and Selected2 populations ( Fig. 3b; orange and brown symbols). These results are consistent with the pore-selection procedure having enriched for independent cell populations with transcriptomic profiles distinct from the originating parental populations. Further technical validation of the RNA-Sequencing results was provided by quantitative reverse transcription PCR (RT-qPCR) analyses of differences in gene expression between MDA-MB-231 Parental versus Selected (Fig. 3c) or MDA-MB-231 Selected versus FlowSorted cells (Fig. 3d) identified by RNA-Seq. Genes were selected on the basis of fold-change, statistical significance and number of sequence reads. The relatively higher expression of the genes encoding leukocyte immunoglobulin-like receptor subfamily B member 1 (LILRB1), tumour necrosis factor ligand superfamily member 15 (TNFSF15), ectonucleotide pyrophosphatase/phosphodiesterase 1 (ENPP1), protein tyrosine phosphatase receptor type U (PTPRU), heparin-binding epidermal growth factor (HBEGF), as well as the relatively lower expression of matrix remodelling associated 8 (MXRA8), in Selected versus Parental (Fig. 3c) as well as in Selected versus FlowSorted cells (Fig. 3d) were comparable in both RNA-Seq and RT-qPCR assays. In both cases, the fold-changes determined by either method fell on a single fitted straight line with R 2 >0.90 and P o0.05 (Fig. 3c,d). Although the agreement between RNA-Seq and RT-qPCR was good for this  limited set of genes, it is formally possible that the analytical methods employed might have underestimated gene expression levels 15 .