Uncovering the signaling landscape controlling breast cancer cell migration identifies splicing factor PRPF4B as a metastasis driver

Metastasis is the major cause of death in cancer patients and migration of cancer cells from the primary tumor to distant sites is the prerequisite of metastasis formation. Here we applied an imaging-based RNAi phenotypic cell migration screen using two highly migratory basal breast cancer cell lines (Hs578T and MDA-MB-231) to provide a repository for signaling determinants that functionally drive cancer cell migration. We screened ~4,200 individual target genes covering most cell signaling components and discovered 133 and 113 migratory modulators of Hs578T and MDA-MB-231, respectively, of which 43 genes were common denominators of cell migration. Interaction networks of candidate migratory modulators were in common with networks of different clinical breast cancer prognostic and metastasis classifiers. The splicing factors PRPF4B and BUD31 and the transcription factor BPTF were amplified in human primary breast tumors and the expression was associated with metastasis-free survival. Depletion of PRPF4B, BUD31 and BPTF caused primarily down-regulation of genes involved in focal adhesion and ECM-interaction pathways. PRPF4B was essential for triple negative breast cancer cell migration and critical for breast cancer metastasis formation in vivo, making PRPF4B a candidate for further drug development. Our systematic phenotypic screen provides an important repository of candidate metastasis drug targets.


Introduction
Breast cancer is the most prevalent cancer in women. The triple-negative breast cancer (TNBC) subtype which lacks expression of the estrogen, progesterone and HER2 accounts for ~15% of breast cancer. TNBC is the most aggressive BC subtype with an overall 5 year relapse of 40% due to primary and secondary metastatic spread 1,2 . Transcriptomics has classified four different molecular subtypes of TNBC genes 3 while proteomics studies revealed markers of disease progression 4 . More recently, genome-wide sequencing of large numbers of human breast cancers have defined the somatically acquired genetic variation in breast cancer including TNBC [5][6][7] as well as the genetic evolutionary programs associated with local and distant metastatic spread 8,9 . Although the roles of individual genes in breast cancer metastasis [10][11][12][13][14] have been described, and few pooled shRNA screens have identified novel regulators of cancer metastasis 15,16 a systematic analysis of the functional consequences of genetic aberrations in TNBC for progression towards metastatic disease is still lacking.
The most critical cell biological hallmark of TNBC metastasis is the increased plasticity of TNBC cells [17][18][19] . Many TNBC cell lines have adapted a mesenchymal phenotype and demonstrate a high migratory behavior in association with an increased metastatic spread 12,[20][21][22][23][24][25] . Cell migration is involved in various steps of the metastatic cascade, including local invasion, intravasation, extravasation, and colonization of secondary sites 26 . Understanding the fundamentals of TNBC cell migration is critical for our comprehension of the development of metastatic disease. The control of cell migration is complex and involves components of the cell adhesion machinery as well as modulators of the actin cytoskeleton that coordinate cellular motility behavior 11,27 . The functionality of these machineries is controlled by signaling pathways and associated transcriptional programs that coordinate the expression of these cell adhesion and cytoskeletal modulators as well as their post-translational modification and activity [28][29][30] .
While the signaling pathways that control TNBC proliferation have been uncovered using genome-wide cell survival screens 31,32 the role of individual cell signaling components that define TNBC motility behavior is less clear.
Here we systematically unraveled the global cell signaling landscape that functionally control TNBC motility behavior through a phenotypic imaging-based RNAi-screen to identify genes involved in the regulation of different migratory phenotypes. We discovered novel genes including several transcriptional modulators, e.g. PRPF4B, BPTF and BUD31, that define TNBC migratory programs and metastasis formation, which are associated with poor clinical outcome of breast cancer and share signaling networks underlying prognostic gene signatures for primary breast cancer.

A systematic high throughput signaling RNAi screen for TNBC cell migration
To unravel the critical signaling components that drive TNBC cell migration, we selected two of the most highly motile TNBC cell lines Hs578T and MDA-MB-231 for microscopy-based RNAi screening using the PhagoKinetic Track (PKT) assay (Fig. 1A, Suppl. Fig. 1, Supplementary movies 1 and 2; and described in detail previously 33 ). Briefly, cells were seeded in fibronectin-coated 96-well plates containing a thin layer of discrete latex beads that are phagocytosed after cell attachment and migration, leaving behind migratory tracks. Imageanalysis of the track characteristics including net area, major and minor axis, axial ratio and roughness allowed quantitative assessment of cell migration behavior. To unravel the signaling landscape that regulates mesenchymal tumor cell migration, we focused our screening effort on the complete set of cell signaling components, covering all kinases, phosphatases, (de)ubiquitinases, transcription factors, G-protein coupled receptors, epigenetic regulators and cell adhesion-related molecules (4198 individual target genes in total, SMARTpool siRNAs (pool of 4 single siRNAs per target)). Quantitative output data was normalized (robust Z-score) to mock transfected control cells. High and low Z-scores of individual parameters already showed the effect of siRNA knockdown on cell motility, i.e. low net area or low axial ratio suggests inhibition of cell migration whereas high axial ratio and high major axis indicated enhanced motility (Fig. 1C/D). Even though the quantification provided eight parameters, all the different migratory phenotypes observed within the data were not fully represented by single parameters. Therefore, migratory phenotypes were manually classified, and primary hits were identified by setting thresholds on Z-score for the most dominant parameters of each phenotype (described in the methods section). Candidate knockdown with decreased cell migration accompanied with an increase in cell size did not always result in a decreased net area, but were in this way selected using the round track phenotype. The migratory phenotypes were visualized by principal component analysis (PCA)-based clustering (Fig. 1E/F). For all phenotypes together, we defined in total 2807 hits in total: 1501 primary hits for Hs578T and 1306 for MDA-MB-231. Cytoskeletal genes, which are known to be important in cell migration and might provide regulatory feedback loops, were not enriched in these primary hit lists (Suppl. Fig. 2). Importantly, there was no correlation between proliferation (number of tracks) and any of the phenotypic parameters, suggesting that hit selection was mainly based on effects on migration (Suppl. Fig. 3). However, we cannot exclude that effects on cell proliferation might be contributing to the inhibition of cell migration for some of the candidates, in particular for those candidate genes that show a decrease in track number. We identified 129 overlapping hits showing similar effects on cell migration upon knockdown in both cell lines, suggesting these were bona-fide cell line-independent drivers of tumor cell migration.
Hence we selected these overlapping hits for validation by single siRNA sequences.
Additionally, to obtain a larger coverage of genes regulating cell migration that would uncover a more cell type specific migratory behavior, we also selected the top 153 hits in each cell line for validation. Only genes that have been defined as druggable were validated, resulting in validation of 451 unique targets (129 overlapping hits, 153 Hs578T unique hits and 153 MDA-MB-231 unique hits) (Suppl. Fig. 1 and Suppl. Table 1).
To validate the primary hits, we repeated the PKT screen assays with both SMARTpool and four single siRNA sequences ( Fig. 2A and Suppl. Fig. 1). In total, 217 (77%) hits were validated in the Hs578T and 160 (57%) in the MDA-MB-231 (significant effects for at least 2 singles and SMARTpool, for Hs578T see Fig. 2B; for MDA-MB-231 see Suppl. Fig. 4; all validated genes are in Suppl. Table 2; reproducibility is shown in Suppl. Fig. 5 and Supp. Fig. 6). Mock (no siRNA addition) and siDNM2 were used as negative and positive control, respectively. The majority of validated hits was found in the phenotypic classes of reduced cell migration (Fig.   2C) and there was an overlap of 65 validated candidate genes that showed inhibition of cell migration in both cell lines (Fig. 2D). This relatively low overlap of candidates cannot be attributed to cell line specific mutations, copy number alterations or differences in expression levels of the candidates (Suppl. Fig. 7). Annotation of protein classes for each set of validated hits (Hs578T, MDA-MB-231, and overlap) showed that most of the hits were transcription factors ( Figure 2E i) also after correction for library size ( Figure 2E ii), suggesting that transcriptional regulated gene networks are critical drivers of TNBC cell migration behavior and consequently metastasis formation.

Transcriptional determinants are critical drivers of BC migratory phenotypes.
To further confirm the migratory phenotype detected in the static PKT assay, we evaluated the effect of all validated hits on cell migration using a live microscopy cell migration assay with GFP-expressing Hs578T and MDA-MB-231 cells (Suppl. Table 3 -transcriptional regulators such as RUNX1, MTF1, PAX7, ZNF141, SOX14, MXD1,   ZNF446, TARDBP, TBX5, BPTF, TCF12, TCERG1, ZDHHC13, BRF1, some of which are directly involved with splicing (BUD31 and PRPF4B) or histone modification (HDAC2 and HDAC10). For cell line specific validated hits, we filtered candidate genes for which the expression was associated with clinical breast cancer metastasis-free survival (MFS) in a patient dataset (the Public-344 cohort, GSE5237 and GSE2034, Suppl. Table 5). Many of the hits associated with poor outcome inhibited cell migration in both Hs578T and MDA-MB-231 ( Fig. 3A and 3B, see Suppl. Table 3 and 4 for all candidate genes), combined with the overlap candidates resulting in 43 genes that were common denominators of cell migration. Single cell migratory trajectories were plotted for genes affecting cell migration in both cell lines (Fig. 3C).
Taken together, our tertiary screen established high confidence in 61% and 71% of our candidate genes (Hs578T and MDA-MB-231 respectively). Furthermore, we confirmed the general role of our candidates in random cancer cell migration by validation of several main candidates by inducible CRISPR-Cas9 knockout in a live cell migration assay (Supp. Fig. 8), siRNA knockdown followed by live cell migration assays in 2 additional TNBC cell lines (Suppl. Fig. 9) and siRNA knockdown followed by a traditional scratch assay (Supp. Fig. 10). All of our tested candidates were also affecting FBS-directed cell migration in MDA-MB-231, but not per se in Hs578T (Suppl. Fig. 10). This suggests that effects on random cell migration cannot always be directly extrapolated to directed cell migration, especially in the Hs578T cell line.

Functional drivers of tumor cell migration partner in networks predictive for BC progression.
To better understand the regulatory networks driving BC cell migration, we used the larger lists of our PKT validated candidate genes (217 for Hs578T and 160 for MDA-MB-231) to inform on protein-protein interaction (PPI) networks that are involved in Hs578T and MDA-MB-231 cell migration. KEGG pathway analysis was performed on the first-order networks of our candidate genes and revealed that similar pathways were affecting cell migration in both cell lines, despite that the networks were constructed from different candidate genes ( Fig. 4A and 4B, Suppl. Table 6). We identified cancer-related pathways such as 'pathways in cancer' and 'focal adhesion' but also immune related pathways such as 'osteoclast differentiation' and 'chemokine signaling'. Interestingly, KEGG pathway analysis-based correction of the annotation of the PKT validated candidates not only confirmed the potent role of 'Transcriptional misregulation in cancer', but also immune-related and splicing pathways in cancer cell migration (Suppl. Fig. 11A). To further investigate the connection of our candidate genes to cell migration and invasion, we correlated our signaling networks of three established gene signatures associated with metastatic behavior and cell migration: the Human Invasion Signature (HIS) 34 Table 6). Given the high degree of overlap between these three gene signature-based networks and lists of candidate genes, we constructed a single zero-order network based on the combination of candidate genes affecting cell migration in Hs578T and MDA-MB-231 (65 genes, Fig. 4C). This revealed a subnetwork linking 8 transcriptional regulators of which most already have been related to cancer progression, including HDAC2, BPTF, BRF1, TAF11, TCF12 and FOS [37][38][39] , but also a prominent role for SMADs that are normally driven by TGF-β 40,41 . However, TGF-β treatment showed limited effects on TNBC cell migration (Suppl. Fig. 12), suggesting that migration is not dependent on TGF-β and SMADs were crucial for TNBC cell migration. Next we systematically investigated the effect of knockdown of the 217 PKT validated hits for morphological changes in the highly polarized Hs578T cell line by actin cytoskeleton staining, confocal imaging and quantitative single cell analysis (Suppl. Table 7). Hierarchical clustering grouped our PKT validated hits in nine different clusters (Fig. 4D, see Suppl. Fig. 13 for all gene names). Both clusters 2 and 9 contained control knockdown samples but also many genes that affected Hs578T cell migration, suggesting that a decrease in migration does not necessarily coincide with an overall change in cell morphology. Not surprisingly, inhibition of cell migration was associated with a wide variety of cellular morphologies: genes in cluster 1 and 3 decreased overall cell area, while genes in cluster 4, 5 and 6 increased overall cell area.
Regarding the parameter cell spikes (reflecting the number of cell protrusions), we observed the same variation: some genes reduced cell protrusion formation (cluster 4, 5 and 8), while others enhanced protrusion formation (cluster 7). In vivo, loss of cell adhesion and increased motility are both prerequisites for metastasis formation. However, in vitro, our candidates could inhibit cell migration via different mechanisms: 1) increased cell-cell adhesions, 2) decreased cell-matrix interactions and 3) decreased actin turnover that can also result in multiple cell phenotypes. For example, down-regulation of cell-matrix interactions can decrease the cell area, while decreased actin turnover might more affect the focal adhesions and spikes. In this way, our combined data suggests that cell shape and cellular motility are affected independently and indicates different genetic programs that define BC cell migration behavior.

Modulators of cell migration are associated with BC metastasis-free survival
To further relate our candidate genes to breast cancer progression and metastasis formation in patients, we compared our genes to three prognostic signatures for breast cancer metastasis [42][43][44] . Despite the minimal overlap of genes, these prognostic gene signatures have many related pathways in common 42 Table 8). These three gene expression signatures are strongly predictive of a short time to metastasis, implying but not yet statistically proven that our candidate genes are part of biologically functional regulatory networks and pathways critical in early onset of breast cancer metastasis.
Moreover, we investigated the percentage of mutations, amplifications and deletions (together % altered) of the 43 candidate genes that affected migration in both cell lines (see Fig. 3A and 3B) in 29 cancer types using publicly available data from The Cancer Genome Atlas (TCGA) (Fig. 5B). We identified clusters of candidate genes highly altered in multiple cancer types, amongst which breast cancer. This alteration rate was not related to tumor type aggressiveness (Supp. Fig. 14). For the main factors including BPTF, BUD31, CACNG1, RUNX1, GRK1, PTPRS, PRPF4B, and PBX1, most of these genetic alterations in breast cancer are dominated by amplifications (Fig. 5C), suggesting that enhanced expression levels of these candidates might be involved in breast cancer initiation or progression. Moreover, 11 of these candidates showed higher amplification rates (not significant) in the aggressive primary tumors that already metastasized at diagnosis (Suppl. Fig. 15) and 14 candidates amongst which BUD31 and PRPF4B demonstrated a significant higher amplification rate in the TNBC compared to the ER positive subtype (Suppl. Fig. 16). Consequently, we also evaluated the association of the gene expression of these 43 candidate genes with MFS in the Public-344 cohort (Fig. 5D, Suppl. Table 9). Interestingly, high expression levels of both splicing factors PRPF4B and BUD31 are associated with earlier metastasis formation in triplenegative and ER-positive tumors respectively (Fig. 5D), but not in the other subtypes (Supp. Fig. 17). Non-core splicing factor PRPF4B is a serine/threonine-protein kinase regulating splicing by phosphorylation of other spliceosomal components 45 . BUD31 is a core splicing factor essential for spliceosome assembly, catalytic activity and associates with multiple spliceosome sub-complexes and has shown to be a MYC target in MYC-driven cancer cells 46 .
We also identified the transcription factor BPTF, known for its role in chromatin remodeling and mammary stem cell renewal and differentiation 47,48 , that is highly amplified in many cancer types and significantly positively correlated to MFS in breast cancer patients irrespective of the subtype (Fig. 5D, Suppl. Fig. 17). We further focused on splicing factors PRPF4B, BUD31 and transcription factor BPTF, since these were newly identified modulators of cell migration associated with BC MFS and/or highly amplified in BC.
Notably, expression levels of other validated screen candidates available in our RNA sequencing dataset were not specifically affected by knockdown of PRPF4B, BUD31 or BPTF, indicating that these genes uniquely modulate transcriptional programs that affect cell migration (Suppl. Fig. 20). Knockdown of BUD31 had the broadest effect on gene expression and caused down-regulation of 1119 genes in Hs578T and 929 in MDA-MB-231, with ~50% affected genes overlapping between the two cell lines (Suppl. Fig. 18B-D). PRPFB4 and BPTF knockdown most significantly affected the transcriptome of the Hs578T cells, primarily resulting in down-regulation of gene expression. There was limited overlap in the DEGs between PRPFB4, BUD31 and BPTF (Suppl. Fig. 18E). Since PRPF4B and BUD31 are both splicing factors, we also investigated the effects of knockdown of these candidates on alternative splicing patterns (Suppl. Fig. 21). As expected, depletion of BUD31, a core spliceosomal protein, increased the intron retention compared to controls (Suppl. Fig. 21A and 7C) 46 . As might be expected from a non-core splicing factor, siPRPF4B only increased a small number of introns retained (Suppl. Fig. 21B and 21C). All tested intron retention events were validated with RT-PCR (Suppl. Fig. 22), indicating that the computational pipeline we used is reliable.
We were unable to detect any changes in exon inclusion or 3' or 5' alterative splice site usage after knockdown of either PRPF4B or BUD31. Since siBUD31-induced intron retention was related to reduced expression of the specific gene of interest, we focused on the differentially expressed genes for further analysis. Although the general relation between intron retention and decreased gene expression was previously confirmed [49][50][51] , future studies have to validate these direct causal relationships in response to PRPF4B and BUD31 knockdown. To identify pathways affected by siPRPFB4, siBUD31 and siBPTF we performed KEGG pathway overrepresentation analysis using the significantly down-regulated genes for all hits in both cell lines separately using ConsensusPathDB 52 . Although the overlap in DEGs between different cell lines and knockdown conditions was rather limited, the ECM-receptor interaction was overrepresented in all conditions (Fig. 6A, Suppl. Fig. 23A). Moreover, knockdown of PRPF4B, BUD31 and BPTF resulted in down-regulation of the focal adhesion pathway in both cell lines, except for BPTF in Hs578T. Inhibition of these pathways likely directly contributes to the observed decrease in cellular migration. We also observed candidate specific responses such as immune signaling for PRPF4B (Fig. 6A), cell adhesion for BPTF and metabolic and PI3K related pathways for BUD31 (Suppl. Fig. 23A). Next to down-regulated ECM-receptor interactions, deregulated TNF signaling was validated for all three knockdowns (Suppl. Fig.   24). Gene set enrichment analysis (GSEA) 53 Fig. 26) demonstrated the involvement of many different pathway components of which some were overlapping between PRPFB4, BUD31 and BPTF ( Fig. 6C and 6D). Many of these components were integrins, for which a prominent role has already been described in various steps in the metastatic cascade such as migration and invasion, anchorage-independent growth and colonization 54 . Moreover, increased integrin adhesions can activate actin contraction 55 , regulate E-cadherin internalization and cell adhesion 56 and are related to EMT and cancer cell dissemination 57 . A similar down-regulation was observed at the protein level for several key components in both cell lines (Fig. 6E, Suppl. Fig. 27C). The effects on differential expression of cell matrix adhesion components such as integrins and focal adhesion kinase was also reflected in the different organization of focal adhesions and the Factin network for both PRPFB4, BUD31 and BPTF ( Fig. 6F and Suppl. Figure 27A-B). In summary, both splicing factors PRPF4B and BUD31 as well as the transcription factor BPTF modulate the expression of various focal adhesion-associated proteins and ECM-interaction signaling components in association with distinct cytoskeletal reorganization and decreased BC cell migration.

PRPF4B is essential for breast cancer metastasis formation in vivo
Finally, we investigated whether we could reproduce our in vitro findings in an in vivo mouse model for BC progression. Using our previously established orthotopic xenograft model, we predicted a decrease in BC metastasis formation upon splicing factor PRPF4B depletion. We selected PRPF4B because its depletion strongly inhibited both random and directed cell migration in both Hs578T and MDA-MB-231 cells (see Fig. 3C and Suppl. Fig. 10) and, moreover, a role of PRPF4B in promoting TNBC metastasis formation has so far not been demonstrated. We established stable PRPF4B knockdown in the metastatic MDA-MB-417.5 cell line that expresses both GFP and luciferase 12,13,35 and contains similar basal PRPF4B levels as its parental MDA-MB-231 cell line (Supp. Fig. 28A Interestingly, PRPF4B was higher expressed in the borders of the primary tumor (Supp. Fig.   30), supporting its potential role in invasion and metastasis formation. Bioluminescence imaging demonstrated that lung metastatic spread was less abundant in the PRPF4B knockdown group compared to control group (Suppl. Fig. 29C). Both bioluminescent imaging of the lungs ex vivo and counting of macro-metastases in the ink injected right lung revealed a significant decrease in metastasis formation in mice engrafted with shPRPF4B cells (Fig. 7F and 7G), which was also confirmed by a decreased lung weight (Suppl. Fig. 29D). Ex vivo bioluminescence imaging of the liver, spleen, heart, kidney, uterus and axillar lymph node also showed a decreased metastatic burden by shPRPF4B cells (Fig. 7E) confirmed by a decreased liver and spleen weight (Suppl. Fig. 29E and 29F). Altogether, this demonstrates that PRPF4B knockdown impairs general metastasis formation without showing organspecificity.

Discussion
TNBC is the most aggressive form of breast cancer with 40% of patients dying from metastatic disease. New insights into TNBC migration is highly needed for the identification of potential new drug targets to modulate TNBC dissemination and possibly revert metastatic disease. In the present study, we applied a multi-parametric, high-content, imaging-based RNAi-screen covering ~4200 cell signaling components to unravel regulatory networks in TNBC cell migration and discovered 133 and 113 migratory modulators in the Hs578T and MDA-MB-231 cell lines, respectively. Splicing factors PRPF4B and BUD31 and transcription factor BPTF were critical for TNBC cell migration, associated with metastasis-free survival and affect genes involved in focal adhesion and ECM-interaction pathways. Moreover, PRPF4B knockdown reduced TNBC metastasis formation in vivo, making it an important target for future drug development.
Since our screening effort focused on a broad set of cell signaling components in two TNBC cell lines (Hs578T and MDA-MB-231), we were able to cover a high number of genes and networks in different migration modes. Indeed, the PKT assay allowed us to quantitatively assess different migration phenotypes, as the track morphology reveals the effect on migration, persistence and membrane activity. PCA analysis of the PKT data showed that three migratory phenotypes (small, small round, and round tracks) grouped closely together, resembling different types of inhibition of cell migration. Enhanced cell migration proved to be difficult to validate, probably due to the increased migratory phenotype in Hs578T and MDA-MB-231 reaching a physiological ceiling. Additional screening with TNBC cell lines with lower migratory potential would provide a platform to discover the spectrum of genes that may act as suppressors of TNBC cell migration and metastasis formation. The third live microscopy validation resulted in a conclusive selection of validated candidate tumor cell migration modulators: 133 and 113 genes in Hs578T and MDA-MB-231, respectively. The majority of these candidate genes displayed inhibition of cell migration and are most interesting for translation to cancer metastasis. Importantly, candidate migratory regulators, including SRPK1 and TRPM7, have previously been shown to impair cell migration and metastasis formation 12,14 , supporting the robustness of our candidate drug target discovery strategy.
Our work provides a comprehensive resource detailing the role of individual signaling genes in cell migration. Previously, a cell migration screen in H1299 (non-small cell lung carcinoma) identified 30 candidate migration modulating genes 12 . Surprisingly, there was little overlap with our validated genes, with the exception of SRPK1. Similarly, little overlap in hits was found with a wound-healing screen in MCF10A cells 58 . These differences are likely due to the coverage and size of the screening libraries with the current screen covering ~4200 genes compared to ~1400 genes in the previously published data. Moreover, TNBC cell migration might be driven by different genetic program than non-small cell lung carcinoma and MCF10A cells. Also, the MCF10A screen focused on collective cell migration in epithelial cells, which is distinct from single cell mesenchymal migration in our two TNBC cell lines. Since ECM is a major component of the local tumor microenvironment, all our migration assays were performed on fibronectin coated plates. This coating significantly increased the migratory behavior of our cells (Supp. Fig. 31), which could also result in different candidates compared to the previous reported MCF10A screen. Moreover, none of the previously identified hostregulators of metastases in mouse 59 were validated in our screen. This might be due to the small overlap (only 7 of 23 regulators were in the library), but also suggests that candidates from this study are more involved in the first steps in the metastatic cascade such as escaping the primary tumor and intravasation rather than later steps such as extravasation, colonization and metastatic outgrowth.
While our two individual TNBC cell lines showed some overlap in genes that define cell migration, clearly also many distinct molecular determinants were defined for each cell line.
This suggest that also the genetic differences between cancer cell lines lead to different molecular networks that can modulate the migratory behavior of tumor cells. For ultimate therapeutic strategies to modulate cancer dissemination the identification of common denominators of migratory and metastatic behavior is essential. Our work contributes to the definition of some of these common players.
The importance of our candidate TNBC cell migration modulators was supported by comparative bioinformatics-based network analysis. Our comparative network analyses demonstrated that the cell migration PPI networks in Hs578T and MDA-MB-231 cells are profoundly similar to PPI networks derived from cell migration (Lung Metastasis Signature) 35 , cell invasion (Human Invasion Signature) 34 , as well as several BC prognostic signatures [42][43][44] .
Moreover, we identified candidates that are highly amplified and/or mutated in many cancer types as well as candidates specifically related to breast cancer metastasis formation. The transcription factor BPTF was one of the major hubs in the interaction network of candidate genes, which is also highly amplified in primary tumors of BC patients and its expression levels are negatively related to MFS. Next generation sequencing revealed that knockdown of PRPF4B, BUD31 and BPTF all resulted in decreased expression of genes involved in focal adhesions and ECM interactions, thus directly driving the observed more rounded and less polarized phenotype in conjunction with decreased cell migration.
Our list of candidate genes that regulate TNBC cell migration expectedly also contributes to cancer metastasis. We selected PRPF4B to assess whether our in vitro screening efforts can In conclusion, in the present study we used imaging-based phenotypic screening to identify candidate metastatic genes for TNBC that have translational relevance. Understanding the gene networks that are controlled by the various candidate genes provides further insights in the biological programs that define BC cell migration behaviour and lead to important novel drug targets to combat cancer metastasis.

Transient siRNA-mediated gene knockdown
Human siRNA libraries were purchased in siGENOME format from Dharmacon (Dharmacon,

Phagokinetic track (PKT) assay
PKT assays were performed as described before 33,72 . For each transfection, duplicate bead plates were generated (technical replicates); transfection of each siRNA library was also performed in duplicate (independent biological replicate). Procedures for transfection, medium refreshment and PKT assay were optimized for laboratory automation by a liquid-handling robot (BioMek FX, Beckman Coulter). were excluded during image analysis. Quantitative output of PhagoTracker was further analyzed using KNIME. Wells with <10 accepted tracks were excluded. Next, data was normalized to mock to obtain a robust Z-score for each treatment and each parameter. After normalization, an average Z-score of the 4 replicates was calculated. Knockdowns with <3 images were removed, as well as knockdowns with <60 accepted tracks for Hs578T and <150 accepted tracks for MDA-MB-231. Phenotypic classes were determined based on Z scores of the track parameters: small (net area Z score < -4), small round (net area <8000, axial ratio < 1.7, net area Z score < -1, axial ratio Z-score < -3), big round (net area > 8000, axial ratio < 1.7, net area Z-score > 1, axial ratio Z-score < -4), long rough (axial ratio > 2.4, major axis > 200, roughness > 5, axial ratio Z-score > 1, major axis Z-score > 1) and long smooth (axial ratio > 2.1, major axis > 180, roughness < 5).

PKT imaging and analysis
All PKT screening data will be available in the Image Data Resource database upon publication and is currently available upon request.

Live single cell migration assay and analysis
Hs578T-GFP and MDA-MB-231-GFP cells were transfected with siRNAs as described above and after 65 h, knockdown cell suspensions were seeded in fibronectin-coated black 96-well glass plates (SensoPlate, Greiner Bio-One, Frickenhausen, Germany). As controls, siRNA targeting the GTPase dynamin 2 (DNM2) was used for reduced cell migration and mock was used as transfection control. For the TGF-β and EGF stimulation experiments, cells were seeded in fibronectin-coated black 96-well glass plates (SensoPlate, Greiner Bio-One, Frickenhausen, Germany). Cells were starved in serum-deprived RPMI for 2 hours after which cells were stimulated with 10 ng/ml TGF-β (Immunotools) or 100 ng/ml EGF (Peprotech).
Imaging was directly started after treatment. Live microscopy was performed on a Nikon Eclipse Ti microscope using a 20x objective (0.75 NA, 1.00 WD), 2x2 binning and 2x2 montageTwo positions per well were selected and GFP images were acquired every 12 min for a total imaging period of 12 h using NIS software (Nikon, Amsterdam, The Netherlands).
Image analysis was performed using CellProfiler (Broad Institute) 73 . For live cell migration, images were segmented using an in-house developed watershed masked clustering algorithm 74 , after which cells were tracked based on overlap between frames. Tracking data was organized and analyzed using in-house developed R-scripts to obtain single cell migration data. Only data originating from cells that were tracked for a minimum of 2 h was used. Two negative control wells with low and high cell densities, comparable to the knockdown populations, were selected for statistical comparison, and knockdowns were required to be statistically significant compared to both controls.

Imaging-based phenotypic screen
Hs578T cells were fixed and permeabilized in 1% formaldehyde and 0.1% Triton X-100 in PBS and blocked in 0.5% bovine serum albumin (BSA, A6003, Sigma Aldrich) in PBS. Cells were stained with Rhodamine Phalloidin (R415, Molecular Probes) and imaged using an Nikon Eclipse TE2000-E inverted confocal microscope (Nikon Instruments, Amsterdam, The Netherlands) using a 20x Plan Apo objective, 408 and 561 nm lasers.2x digital zoom, 2x2 stitching images were captured at 4 positions per well. Nuclei and actin cell body were detected by CellProfiler (Broad Institute) and parameters were measured using the measure object size and shape module in CellProfiler. Images with more than 150 cells were filtered out. Using KNIME, nuclei without a clear cell body were rejected and single cell data was normalized to the median of 2 mock control wells per plate. For the heatmap, all features were mock normalized and clustering was performed on complete linkage and Euclidean distance. DEGs were selected by effect size (log2 fold change bigger than 1 or smaller than -1) and adjusted p-value (smaller than 0.05) and used for over-representation analysis for KEGG pathways using ConsensusPathDB 52 .
For the intron retention analysis, RNA-seq reads were mapped to the current human genome (GRCh38) using Hisat 2 76 . Differential intron retention analysis was carried out in R using DexSeq package 77,78 . In DexSeq the difference of intron inclusion were determined based the counts from the intron and the counts from the two adjacent exons. The sizes of the exons were limited to 100nt immediately adjacent to the intron to reduce artifacts deriving from alternative promoters, alternative splice sites and alternative poly-adenylation sites.
Deferentially retained introns were selected by effect size (relative change in inclusion more than 0.1) and statistical significance of the change (adjusted p-value less than 0.001).
Alternative exon inclusion was analyzed using the rMATS package version 3.0.8 79 .
Differentially spliced exons were selected by effect size (relative change in inclusion more than 0.1) and statistical significance of the change (adjusted p-value less than 0.001).
RNA sequencing data is available in Sequence Read Archive with accession number SRP127785.

Network analysis
Protein annotation of the primary hits was retrieved from QIAGEN's Ingenuity Pathway Analysis (IPA, QIAGEN Redwood City, USA). Protein-protein interaction (PPI) networks were generated separately for all signatures using NetworkAnalyst (www.networkanalyst.ca) 80 .
Candidate genes were used as seed proteins to construct first-order, minimum interaction and zero-order networks based on the InnateDB Interactome. KEGG pathway analysis was performed on the first-order PPI networks. The connection between multiple PPI networks was visualized by a Chord diagram using NetworkAnalyst. imaging was used to follow metastasis formation over time. Mice were sacrificed 50 or 51 days after surgery and metastasis formation of all organs was assessed by bioluminescent imaging followed by weighing the lungs, liver and spleen. The right lung was injected with ink in order to count the number of lung macrometastases. Animals that passed away due to unknown reasons before the end of the study were removed from the analysis. Animals were randomly distributed over the different groups. The experiment was performed without blinding.

Breast cancer patient gene expression profiles
Gene expression data of a cohort of 344 lymph node-negative BC patients (221 estrogen receptor-positive (ER-positive) and 123 estrogen receptor-negative (ER-negative)), who had not received any adjuvant systemic treatment, was used and is available from the Gene Expression Omnibus (accession no. GSE5327 and GSE2034). Clinical characteristics, treatment details and analysis were previously described 42,43,[81][82][83] . Stata (StataCorp) was used to perform Cox proportional hazards regression analysis, with gene expression values as continuous variable and MFS as end point.

Statistical Analysis
Sample sizes were based on previously published similar experiments. When not indicated, all experiments were performed in biological triplicates. Normality of migration measurements and in vivo data was tested using Kolmogorov-Smirnov's test, d'Agostino and Pearson's test and Shapiro-Wilk's test using GraphPad Prism 6.0 (GraphPad Software, San Diego, CA). A data set was considered normal if found as normal by all three tests. Data sets following a normal distribution were compared with Student's t-test (two-tailed, equal variances) or one-way ANOVA (for comparison of more than 2 groups) using GraphPad Prism 6.0. Data sets that did not follow a normal distribution were compared using Mann-Whitney's test or a non-parametric ANOVA (Kruskal-Wallis with Dunn's multiple comparisons post-test) using GraphPad Prism 6.0. Results were considered to be significant if p-value < 0.05.

Study Approval
The study involving human BC patients was approved by the Medical Ethical Committee of the

Data availability
All datasets used to generate the results presented in this study are publicly available. RNA sequencing data is available in Sequence Read Archive with accession number SRP127785.
All PKT screening data will be available in the Image Data Resource database upon publication and is currently available upon request. All other remaining data are available within the article or supplemental data or is available upon request.      Clustering was performed based on Euclidean distance and complete linkage.    Hs578T Net Area Z-score