High-throughput transcriptomic and proteomic profiling of mesenchymal-amoeboid transition in 3D collagen

The plasticity of cancer cell invasion represents substantial hindrance for effective anti-metastatic therapy. To better understand the cancer cells’ plasticity, we performed complex transcriptomic and proteomic profiling of HT1080 fibrosarcoma cells undergoing mesenchymal-amoeboid transition (MAT). As amoeboid migratory phenotype can fully manifest only in 3D conditions, all experiments were performed with 3D collagen-based cultures. Two previously described approaches to induce MAT were used: doxycycline-inducible constitutively active RhoA expression and dasatinib treatment. RNA sequencing was performed with ribo-depleted total RNA. Protein samples were analysed with tandem mass tag (TMT)-based mass spectrometry. The data provide unprecedented insight into transcriptome and proteome changes accompanying MAT in true 3D conditions. Measurement(s) gene-expression profile endpoint • protein expression profiling • Proteome • transcriptome Technology Type(s) RNA sequencing • MSn spectrum • mass spectrometry Factor Type(s) doxycycline-inducible expression of EGFP-RhoA G14V gene • dasatinib treatment Sample Characteristic - Organism Homo sapiens Measurement(s) gene-expression profile endpoint • protein expression profiling • Proteome • transcriptome Technology Type(s) RNA sequencing • MSn spectrum • mass spectrometry Factor Type(s) doxycycline-inducible expression of EGFP-RhoA G14V gene • dasatinib treatment Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12084927

still unavailable 15 . Despite the large effort to reveal signalling underlying invasive behaviour of cells, understanding of cancer cell invasion plasticity is still insufficient, mainly due to the scarcity of results obtained from more in vivo-like 3D cell culture conditions. To date, there are only three published works reporting gene expression profiling of amoeboid cells [16][17][18] . While these data provided the first insight into the transcriptome of amoeboid cells, they were not obtained from three-dimensional (3D) cultures, an essential requirement to get the most relevant results.
To gain more insight into molecular level adaptation of cancer cells to the amoeboid state, we performed large scale transcriptomic and proteomic profiling of HT1080 fibrosarcoma cells after MAT in 3D cell culture ( Fig. 1). In order to discern treatment-specific effects, we used two experimental treatments that are sufficiently effective in inducing MAT and compatible with cell viability in 3D collagen gels. The first treatment was doxycycline-inducible constitutively active RhoA (icaRhoA) gene expression; RhoA-ROCK pathway is known to play a key role in amoeboid migration 13,19 and constitutively active RhoA expression has been shown to induce amoeboid morphology in glioblastoma cells and effective MAT in HT1080 3,20 . The second, very different treatment was that with dasatinib, a Src kinase inhibitor, that has been previously also shown to induce MAT 21,22 . The cells were kept for 48 hours in 3D collagen without or with the MAT-inducing treatment and then the whole samples including the collagen and any extracellular material were homogenized and further processed for RNA sequencing or mass spectrometry analysis. Total RNA was depleted of rRNA, converted into a stranded cDNA library and sequenced with Illumina HiSeq sequencer. Protein lysates were trypsin-digested, TMT-labelled, fractionated and analysed on Thermo Orbitrap Fusion mass spectrometer.
Overall, our work provides data for an unprecedented comparison of parallel mesenchymal and amoeboid transcriptomes and proteomes obtained from 3D conditions to reveal new details of the transition regulation, the impact of the transition on biological properties of the cells in terms of gene expression and protein abundance, and possibly discover potential new therapeutic opportunities.

cells, DNA constructs and transfection.
HT1080 cells were maintained in DMEM (4.5 g/l glucose, pyruvate) supplemented with 10% fetal bovine serum and 50 μg/ml gentamicin (all from Sigma) at 37 °C in humidified atmosphere with 5% CO 2 . The cultures were regularly tested for mycoplasma contamination. 3D cell culture experiments were performed with rat tail collagen (SERVA) at concentration 1 mg/ml and DMEM supplemented with 1% fetal bovine serum, 15 mM HEPES and 50 μg/ml gentamicin. Stably transfected cells were prepared by lentiviral transduction using the second-generation packaging system (pLVX constructs, Tet-On Advanced Gene expression system, Clontech). The DNA transfections were performed with polyethylenimine (Polysciences, Inc.). All populations of stably transfected cells were further enriched for the respective encoded fluorescence with a cell sorter. Cells bearing inducible constructs were transiently induced with doxycycline before the sorting. In all experiments, cells stably transfected with the doxycycline-inducible EGFP-RhoA G14V fusion construct were treated with 250 ng/ml doxycycline (Sigma). In dasatinib experiments, HT1080 cells stably expressing LifeAct-mCherry exogene 23 (further referenced just as HT1080) were treated with 1 μM dasatinib (LC Laboratories) in DMSO or equivalent volume of DMSO only (0.1% final concentration). All plasmids used in the study were constructed in the lab with standard molecular cloning procedures; details as well as the plasmids themselves are available upon request.
Time-lapse microscopy in 3D collagen matrix. To record cell invasion in 3D collagen, control and doxycycline-and dasatinib-treated cells with or without protease inhibitor GM6001 (10 µM) were imaged every 5 minutes using JuLI FL, an in-incubator microscope (NanoEnTek Inc.). For digital holographic microscopy, cells were embedded in collagen as stated above and left to adjust to 3D conditions for at least 12 hours. Images were acquired automatically in 80-90 second intervals. Cells were imaged using QPHASE (TESCAN Brno, s.r.o.) a multimodal holographic microscope based on CCHM technology as described previously 3 . cell morphology in 3D collagen matrix. 100,000 cells were seeded in 250 μl of collagen matrix in a 48-well plate. Hoffman modulation contrast microscopy images were taken 48 hours later from approx. 20 planes along z-axis. Cell morphology was assessed by measuring the ratio of the maximum length and maximum width manually using Fiji software 24 . At least 300 cells were analysed for each condition.
RNA extraction and sequencing. One million HT1080 cells were cultured in a 500 μl 3D collagen gel for 48 hours in a 24-well plate. Gels from two wells were transferred into one 2 ml tube and homogenized with Tissue Tearor (BioSpec Products) in 600 μl of RNA extraction solution (60% v/v water-saturated phenol, 3.25 M guanidine thiocyanate, 400 mM sodium acetate buffer pH 4.0, 0.4% w/v N-lauroylsarcosine, 160 mM 2-mercaptoethanol) plus 100 μl of 6.1 M sodium chloride. 200 μl of chlorophorm was added to approx. 1 ml of the lysate and the mixture was vortexed vigorously for 10 seconds. After 30-minute centrifugation at 18,000 g at 4 °C, the polar upper phase was transferred to a new tube, volume adjusted to 800 μl with RNase-free water, RNA was precipitated with 600 μl of isopropanol and recovered by centrifugation at 18,000 g for 30 min at 4 °C. The RNA pellet was washed three times with 600 μl of 75% ethanol and air-dried. Next, the RNA was treated with DNAse I to remove any possible genomic DNA contamination. To this end, the pellet was directly dissolved in 100 μl of solution containing 4 units of DNAse I (Thermo Fisher Scientific) in manufacturer-provided reaction buffer and incubated at 37 °C for 30 min. After that, the RNA was re-purified with RNeasy Protect Mini Kit  Table 1.
High-throughput proteomic profiling. One million HT1080 cells were cultured in a 500 μl 3D collagen gel for 48 hours in a 24-well plate. Gels from two wells were transferred into one 2 ml tube, mixed with equal volume of 2x TEAB buffer (200 mM triethylammonium bicarbonate pH 8.5, 4% sodium deoxycholate, both from Sigma) and homogenized with Tissue Tearor (BioSpec Products). After 15-minute centrifugation at 18,000 g at 4 °C, supernatant was transferred into a new tube and kept frozen at −80 °C before further processing. Protein samples were trypsin-digested, TMT-labeled, fractionated and analysed on Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific) as described previously [28][29][30] . Specifically, sample volume containing 100 μg of proteins was precipitated by four volumes of cold acetone (overnight at −20 °C). After centrifugation the pellets were washed with 80% acetone and let dry. Samples were resuspended in 100 mM TEAB (Triethylammonium bicarbonate, Thermo #90114) reduced with 5 mM TCEP (Tris(2-carboxyethyl)phosphine hydrochloride, Sigma #4706) for 30 min at 60 °C and alkylated with 10 mM MMTS (S-Methyl methanethiosulfonate, Sigma #64306) for 10 min at room temperature. Proteins were digested with trypsin (trypsin:protein ratio 1:50) overnight at 37 °C. TMT label was added according to manufacturer protocol. After 60 min, the labelling reaction was stopped by addition of hydroxylamine. All samples were pooled and vacuum dried. Samples were desalted on Michrom C18 Opti Trap Macro (Optimize Technologies 10-04818-TN). Peptides were fractionated as follows. 100 μl volume of a sample containing approx. 250 μg of peptides was injected onto a C18 column (kinetex 1.7 μm, EVOC18, 150 × 2.1 mm) and separated with linear gradient from 0% A (20 mM ammonium formate, 2% acetonitrile pH 10) to 50% B (20 mM ammonium formate, 80% acetonitrile pH 10) in 32 min with flow rate 300 μl/min 32 fractions were collected and pooled into 8 fractions. Resulting fractions were dried and resuspended in 20 μl of 1% TFA. Nano reversed phase column (EASY-Spray column, 50 cm × 75 μm ID, PepMap C18, 2 μm particles, 100 Å pore size) was used for LC/MS analysis. Mobile phase buffer A was composed of water, 2% acetonitrile and 0.1% formic acid. Mobile phase buffer B was composed of 80% acetonitrile in water and 0.1% formic acid. Samples were loaded onto the trap column (AcclaimPepMap300, C18, 5 μm, 300 Å wide pore, 300 μm × 5 mm) at a flow rate of 15 μl/min. Loading buffer was composed of water, 2% acetonitrile and 0.1% trifluoroacetic acid. Peptides were eluted with gradient of B from 2% to 60% over 240 min at a flow rate of 300 nl/min. Eluting peptide cations were www.nature.com/scientificdata www.nature.com/scientificdata/ converted to gas-phase ions by electrospray ionization and analyzed on a Thermo Orbitrap Fusion (Q-OT-qIT, Thermo) mass spectrometer. Spectra were acquired with 4 seconds duty cycle. Full MS spectra were acquired in orbitrap within mass range 350-1,400 m/z with resolution 120,000 at 200 m/z and maximum injection time 50 ms. Most intense precursors were isolated by quadrupole with 1.6 m/z isolation window and fragmented using CID with collision energy set to 30%. Fragment ions were detected in ion trap with scan range mode set to normal and scan rate set to rapid with maximum injection time 50 ms. Fragmented precursors were excluded from fragmentation for 60 seconds. For quantification information of a TMT label 10 most intense fragments were isolated (simultaneous precursor selection) and fragmented in HCD on 65% energy, maximum accumulation time 140 ms, and fragments were measured in orbitrap on 60 K resolution. Raw data were processed in Proteome Discoverer 2.1. TMT reporter ions ratios were used for estimation of relative amount of each protein. Searches were done with the Human Uniprot reference database and a common contaminant database. Modification were set: peptide N terminus, lysine (unimod #737) and cystein (unimod #39) as static, and methionine oxidation (unimod #1384) and protein N-terminus acetylation (unimod #1) as variable. Raw data are available from the PRIDE database 31 under identifier PXD010425 32 . The samples and related files are summarized in Table 2. immunoblotting. One million cells were cultured in a 500 μl 3D collagen gel for 48 hours in a 24-well plate.
Gels from two wells were transferred to tubes containing 500 μl of 2x SDS lysis buffer (2% SDS, 20% glycerol, 120 mM Tris, pH = 6.8) and homogenized using Tissue Tearor (BioSpec Products). After 10-minute centrifugation (18,000 rcf, 10 °C) 900 ul of the solution was transferred to a fresh tube and protein concentration in the lysate was determined using the DCTM Protein Assay (Bio-Rad Laboratories). The lysates in each series were adjusted to the same protein concentration with 1x SDS lysis buffer. DTT (final concentration 50 mM) and bromophenol blue (final concentration 30 μM) were added, and the samples were incubated at 95 °C for 10 min. Samples were separated on 10% or 12% SDS-polyacrylamide gels and transferred onto nitrocellulose membrane. Non-specific binding was blocked by incubation of the membranes for 60 min at room temperature in Tris-buffered saline (TBS) containing 4% BSA or 5% non-fat dry milk. The membranes were incubated with a primary antibody in 4 °C overnight, washed three times in Tris-buffered saline with Tween-20 (TBST) and incubated for 75 min with HRP-conjugated secondary antibody at room temperature. Membranes were washed with TBST two times, with TBS one time and developed using Amersham TM Imager 600 (GE Healthcare) and  www.nature.com/scientificdata www.nature.com/scientificdata/ SuperSignal TM Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific) or Western Bright TMECL (Advansta) HRP substrates. For probing the total protein level after a phosphoprotein detection and for loading control (GAPDH) detection, membranes were stripped in stripping buffer (200 mM NaOH) at 42 °C for 10 min. The primary antibodies used were as follows: GAPDH (Thermo Fisher Scientific MA5-15738), Caspase-3 (Cell Signaling Technology #9662), Fra-1 (Developmental Studies Hybridoma Bank PCRP-FOSL1-1E3), C/EBPβ (Abcam Ab32358), RhoA (Cell Signaling Technology #2117), GDF15 (Thermo Fisher Scientific PA5-17065) and MX1 (Cell Signaling Technology #37849). Quantification of band signals was performed using Multi Gauge software (Fujifilm, Tokyo, Japan). Band intensities of specific proteins were normalized to the GAPDH protein signal. Log-transformed signal values from at least three experiments were analyzed with paired Student's t-test. The results were presented as geometric means of fold change values with respect to control samples and the uncertainty was expressed as geometric standard errors. activated Rhoa detection. One million cells were cultured in a 500 μl 3D collagen gel for 48 hours in a 24-well plate. Gels from two wells were transferred to tubes containing 500 μl of 2x Triton-X100 lysis buffer (2% Triton X-100, 100 mM Tris, 300 mM NaCl, pH = 7.1, protease inhibitors) and homogenized using Tissue Tearor (BioSpec Products) on ice. After 10-minute centrifugation (18,000 g, 10 °C) 800 μl of the solution was transferred to a fresh tube, protein concentration in the lysate was determined using the DCTM Protein Assay (Bio-Rad Laboratories) and adjusted to the same value in each series with 1x Triton X-100 lysis buffer. 50 μl of the lysate was transferred to a fresh tube (total lysate control) and the rest was incubated with rhotekin-bound GST-beads at 4 °C for 45 min. Beads were separated by brief centrifugation and washed two times with 1x Triton X-100 lysis buffer. Finally, beads were resuspended in 1x Laemmli sample buffer (0.35 M Tris-HCl, pH = 6.8, 10% SDS, 40% glycerol, 0.012% bromophenol blue) with DTT (50 mM) and incubated at 95 °C for 10 minutes. Samples were further processed according to the immunoblotting protocol described above. www.nature.com/scientificdata www.nature.com/scientificdata/ statistical analysis. To estimate differential gene expression from RNA sequencing data a workflow based on the STAR aligner and DESeq2 R package was used as described previously 18 . Mass spectrometry data pre-processed with Proteome Discoverer 2.1 were imported into R environment, imputed and normalized with the MSnbase package 33 . Differences in protein levels were estimated with moderated t-test statistics using the limma package 34 .  Fig. 3 Quantitative analysis of cell migration. HT1080-icaRhoA cells and HT1080 cells were embedded in rattail collagen (1 mg/ml) with or without GM6001. 12 hours later, migration of cells of the mesenchymal phenotype (HT1080-icaRhoA CTRL, HT1080 DMSO) and amoeboid phenotype (HT1080-icaRhoA DOX, HT1080 DAS) was monitored by wide-field microscopy for 15 hours. Track plots were generated using Chemotaxis Tool in ImageJ. Differences in the mean length of the tracks were analysed with two-tailed Student's t-test, α = 0.05, n = 9-13. Error bars -standard deviation. Representative results of three independent experiments. www.nature.com/scientificdata www.nature.com/scientificdata/

Data Records
Time-lapse movies documenting migratory phenotypes of the cells, uncropped immunoblot images, other supporting data referenced in the text, and complete results of differential transcript and protein level analyses were deposited in the Figshare 35 repository. Adapter-trimmed RNA sequencing data have been deposited in the ArrayExpress database at EMBL-EBI 27 . Raw proteomic data have been deposited in the PRIDE database via the ProteomeXchange Consortium 32 . Besides the described data, the PRIDE record contains also a series of datasets obtained from HT1080-icaRhoA cells cultured on 2D collagen (three pairs of doxycycline-untreated/treated cells).

Technical Validation
Induction of icaRhoA in otherwise mesenchymally migrating HT1080 fibrosarcoma induced effective MAT, i.e. cell rounding, membrane blebbing and amoeboid migration resistant to extracellular protease inhibitor (GM6001) in 3D collagen in the vast majority of the cell population (Figs. 2a-c, 3 and additional files available from Figshare 35 : time-lapse Movies 1 and 2). Treatment of HT1080 cells with 1 µM dasatinib produced very similar effects (Figs. 2a-c, 3 and time-lapse Movies 1 and 3 plus the files "Src activity in MAT-induced cells" all available from Figshare 35 ). We also tested the effect of other Src inhibitors and found that they were much weaker MAT   www.nature.com/scientificdata www.nature.com/scientificdata/ inducers than dasatinib (see the file "Src inhibitors comparison" available from Figshare 35 ). The different potential of the Src inhibitors to induce MAT could be attributed to their different effect on Src structure and localization 36 . The induction of the EGFP-RhoA G14V fusion protein in HT1080-icaRhoA cells was verified with immunoblotting (Fig. 2d). Activation of RhoA -an expected feature of the amoeboid phenotype -in dasatinib-treated cells was confirmed with GST-Rhotekin 1 Rho binding domain (RBD)-based pulldown assay (Fig. 2d, see also the file "RhoA activity in MAT-induced cells" available from Figshare 35 ). To verify that the observed membrane blebbing is not due to initiation of apoptosis, we detected Caspase-3 in protein lysates from the cells in 3D collagen. While we easily detected the non-cleaved pro-form in all samples, only a faint signal of the active, cleaved form of Caspase-3 could be detected after a very long blot exposure in all samples with no significant differences (Fig. 2e).
RNA sequencing of control and MAT-induced samples (three pairs of independent biological replicates for each treatment) yielded 31-46 million paired-end reads per sample. Raw reads trimmed of adapter sequences were quality-checked with FastQC software 37 . FastQC's plots of Phred scores (the standard measure of position reading confidence) by position (Fig. 4a,b) showed typical high-quality profiles with decreasing quality towards the ends of reads. Note that the inducible caRhoA series samples were sequenced with 2 × 100 cycles while the dasatinib treatment series samples were sequenced later with improved version of the sequencing chemistry enabling longer read length (2 × 125 cycles). The mapping metrics of STAR aligner showed an average paired read length of 191 bases for inducible caRhoA samples and 235 bases for dasatinib treatment series samples. In average, the aligner uniquely mapped 90.7% of the fragments to human genome, 5.3% fragments were multi-mapped, and 3.7% fragments were excluded as too short. The complete mapping metrics is listed in Table 3.
Mass spectrometry sample quality was checked with the aid of Preview ™ software, version 2.2.9 from Protein Metrics Inc. The effectiveness of tryptic digestions was verified before TMT labelling. Next, we checked for completeness of the labelling procedure. All checks were done as short 30-minute gradient with roughly 250 ng peptide injections. Missing cleavage was under 10% of all peptides and approx. 95% of all N termini and 100% of all lysines were successfully labelled with TMT tags. We identified 5,687 proteins in the icaRhoA series samples and 6,477 proteins in the dasatinib series samples. Among them there were 101 and 105 bovine proteins, respectively, coming from the fetal bovine serum used in the cell culture. For further analysis, we excluded these bovine proteins and all proteins with more than one missing intensity value in the sample series leading to 4,637 and 5,880 proteins, respectively usable for differential protein abundance analysis.
Principle component analysis of normalized gene expression and protein abundance profiles showed that the treatments, but also the biological replication are sources of significant variation (Fig. 4c). The paired character of the data thus has to be reflected in a linear model design formula in order to obtain the most accurate results from the statistical analyses of differential gene expression or differential protein abundance.
Differential gene expression analysis with DESeq2 38 identified 894 out of 12,030 genes (7.4%) significantly changed (with adjusted P-value < 0.1) after induction of caRhoA. Dasatinib treatment changed the expression of 996 out of 12,637 (8.1%) genes. Differential protein abundance with limma R package 34 revealed 410 out of 4,637 (8.8%) proteins significantly changed with adjusted P-value < 0.2 after induction of caRhoA. Dasatinib treatment changed the abundance of 674 out of 5,880 with adjusted P-value < 0.05 (11.5%) proteins. Note that we used a stricter adjusted P-value cut-off on the dasatinib data as the biological replicates were more uniform providing smaller P-values overall (with the opposite for the icaRhoA data). The differential gene and protein expression analysis results are summarized with volcano plots in Fig. 5a. Complete results are available from Figshare 34 as "differential_expression_analyses.xlsx" spreadsheet file. The file also presents lists of genes and proteins concordantly affected by both MAT-inducing treatments in separate sheets. Joint hierarchical clustering of affected genes across both treatment sample series showed a high similarity in the overall impact of MAT on gene expression (Fig. 5b).
To validate the results of the high-throughput analyses, we performed immunoblotting experiments with independent biological samples (different from those used for the high-throughput analyses) to measure differences in protein abundance of four proteins -C/EBPβ, GDF15, MX1 and Fra-1 that were selected from those identified with both transcriptomic and proteomic profiling to be concordantly affected by both experimental  Table 3. RNA sequencing raw data and mapping metrics.
www.nature.com/scientificdata www.nature.com/scientificdata/ treatments (see "Targets selected for verification" on Figshare 35 for the detected fold changes and corresponding adjusted P-values). The blots confirmed reproducible differences in all four proteins in agreement with the data (Fig. 5c).   Horizontal line marks statistical test P-value 0.05. (b) Unsupervised hierarchical clustering of genes that are differentially expressed after MAT in both treatment groups (P < 0.05). The yellow colour scale represents a higher expression level, and the purple colour scale represents a lower expression level. (c) Immunoblotting validation of differential protein abundance analysis results. GAPDH was used as a loading control in all experiments. Representative results of three independent experiments. Numbers next to blots indicate average fold change, standard error (both geometric) and p-value of paired t-test. (2020) 7:160 | https://doi.org/10.1038/s41597-020-0499-2 www.nature.com/scientificdata www.nature.com/scientificdata/ Taken together, we present a valuable transcriptomic and proteomic dataset characterizing fibrosarcoma cell response to induced migration mode switching. The dataset provides a useful resource for better understanding of the mechanisms of cancer metastatic spreading. The experimental model presented here has some limitations. Both treatments used in this work to induce MAT have their specific MAT-unrelated effects. The phenotypes induced by the treatments might be not identical. The mechanisms underlying the transition in HT1080 cells might be, at least in some details, different from those described in other cell types, particularly in melanoma cells. However, we believe that only continuing accumulation of data obtained from different cell contexts and with different experimental approaches will bring a better and more general understanding of the amoeboid migratory phenotype and its importance for cancer metastasis.

code availability
The scripts used in data processing are available from Figshare 35 (see "Code used in data processing").