STAT3 phosphorylation at Ser727 and Tyr705 differentially regulates the EMT–MET switch and cancer metastasis

Epithelial–mesenchymal transition (EMT)/mesenchymal–epithelial transition (MET) processes are proposed to be a driving force of cancer metastasis. By studying metastasis in bone marrow-derived mesenchymal stem cell (BM-MSC)-driven lung cancer models, microarray time-series data analysis by systems biology approaches revealed BM-MSC-induced signaling triggers early dissemination of CD133+/CD83+ cancer stem cells (CSCs) from primary sites shortly after STAT3 activation but promotes proliferation towards secondary sites. The switch from migration to proliferation was regulated by BM-MSC-secreted LIF and activated LIFR/p-ERK/pS727-STAT3 signaling to promote early disseminated cancer cells MET and premetastatic niche formation. Then, tumor-tropic BM-MSCs circulated to primary sites and triggered CD151+/CD38+ cells acquiring EMT-associated CSC properties through IL6R/pY705-STAT3 signaling to promote tumor initiation and were also attracted by and migrated towards the premetastatic niche. In summary, STAT3 phosphorylation at tyrosine 705 and serine 727 differentially regulates the EMT–MET switch within the distinct molecular subtypes of CSCs to complete the metastatic process.


Supplementary Figure 4. Flow chart describing the construction of the MSCstreated and Control intracellular PPI network
The figure shows the flowchart of the proposed approaches. One-way ANOVA was applied to select significant protein into protein pools. PPIs of Homo sapiens data were obtained from BioGRID database, and then we obtained the putative intracellular MSCs and control PPIs by protein pools and BioGRID database. Moreover, the candidate PPI network was constructed by using a dynamic PPI model. By Akaike information criterion (AIC) and Student's t-test, MSCs and control candidate PPI networks were refined by system order selection and by determining significance of the protein interactions. Finally, comparing sub-networks extracted from the construction of the MSCs-treated and Control networks via CSCs markers, we investigate the influence of MSCs-secreted factors on each of CSCs markers.

Supplementary Figure 5. The in vivo bone homing assay
STAT3 mutants were overexpressed in LM/STAT3-knockout and HM20/STAT3knockout cells prior to co-culture with MSCs in suspension. For the bone homing assay, 2 × 10 5 (1 × 10 5 LM and 1 × 10 5 HM20 cells) or 3 × 10 5 (the above cells co-cultured with 1 × 10 5 MSCs) cells were cultured under sphere-forming conditions for 7 days before injection (a volume of exactly 100 µL) into the left cardiac ventricle of 6-weekold mice. Mice were sacrificed on day 5 after intracardiac inoculation. The hind limbs were flushed, and cells were isolated and cultured. SCDCs were counted under a light microscope after crystal violet staining.

Isolation and characterization of human BM-MSCs
Fresh human BM was obtained from healthy volunteers during surgery. This study was approved by the institutional review board (IRB) of National Taiwan University Hospital Hsin-Chu Branch. Density gradient centrifugation was performed to isolate mononuclear cells from human BM. Mononuclear cells were isolated from BM aspirates by density gradient centrifugation (Ficoll 1.077 g/mL) and plated in noncoated 75-to 175-cm 2 polystyrene culture flasks (Corning Costar) at a density of 160,000 cells/cm 2 in complete culture medium: Mesencult (Stem Cell Technologies) supplemented with 10% foetal calf serum (FCS; MSC Stimulatory Supplements, Stem Cell Technologies), 2 mmol/L L-glutamine, and 50 μg/mL gentamicin (Life Technologies). Cultures were maintained at 37°C in a humidified atmosphere containing 5% CO2. After 48 hours of adhesion, non-adherent cells were removed, and the culture medium was replaced twice weekly. BM-MSCs were harvested using trypsin (Sigma-Aldrich) after reaching ≥80% confluence and continuously propagated at 4,000 cells/cm 2 until reaching a senescent phase or passage (P). The senescent phase was defined as a decrease in the proliferative capacity ultimately leading to cell cycle arrest. BM-MSCs in the senescent phase were closely monitored for an additional 8 to 12 weeks before culture was interrupted to search for cells that had escaped senescence and recommenced proliferation. Post-senescence clones were isolated by a limiting dilution method: to obtain single-cell-derived clones, BM-MSCs were seeded at 1 cell/well in a 96-well culture plate (Corning Costar) and cultured as described above.

Human samples and IHC analysis
Sectioned human lung cancer specimens were obtained from GenDiscovery Biotechnology, Inc. All staining procedures were performed using a Super Sensitive IHC Detection Systems kit (BioGenex). Counterstaining was performed with haematoxylin. A semi-quantitative method for calculating positive signals was used.
Signals were counted in six fields per sample under a light microscope at 400× magnification. The results were manually evaluated by two independent observers to determine both the percentage of positive cells and the staining intensity, as previously described [1][2][3][4]. The observers were blinded to the stage of each sample. The

Functional fractionation of cancer cells by invasion assays
Through four serial passages (p4), spheres derived from human lung cancer A549 cells were transferred back to adhesive tissue culture plates, after which they migrated back onto the plates and re-formed a monolayer with morphological heterogeneity [and were then collected as LM cells]. To establish an EMT/metastasis cell model, LM cells were seeded on Matrigel-coated Boyden chamber membranes. After a 24-hour incubation period, the cells that had invaded the Matrigel were collected as high-motility1 (HM1) cells, signifying one passage through the basement membrane matrix. Subsequently, these cells were re-cultured and passed 19 more times through the invasion-selection procedure (Fig 1c). The cells harvested from each subsequent round of selection were designated HM2 to HM20 cells [1].

ChIP
ChIP assays were performed as previously described [1][2][3][4][5]. Cells were fixed with 1% formaldehyde for 20 minutes at room temperature and harvested in ChIP lysis buffer (50 mM Tris (pH 8.0), 85 mM KCl, 1 mM DTT, 1 mM PMSF, 0.5% NP-40). Genomic DNA in the lysate was sonicated using a Bioruptor (Diagenode) for 15 cycles on a high power setting (30 s on, 30 s off) at 4°C. Cell debris was removed by centrifugation at 15,000 rpm for 30 minutes at 4°C. Lysate supernatants were brought up to a volume of G agarose (ThermoFisher, Cat#20399) with rotation for 1 hour at 4°C. Precleared lysates were centrifuged at 14,000 rpm for 10 minutes, and lysate supernatants were transferred to new tubes for subsequent immunoprecipitation. A 50 L volume of lysate was reserved as the whole-cell extract input control. Anti-GATA3 and STAT3 were used for immunoprecipitation. Antibodies were recovered using Protein G agarose (ThermoFisher). Immunoprecipitated DNA and input DNA were analysed by PCR using primers spanning the proximal (nucleotide positions -126/+164) promoter region of PROM1, the (-179/+139) promoter region of ABCG1, the (-276/-95) promoter region of CDH1 or the (-32/+89) promoter region of CDH1. Following 30 cycles of amplification, PCR products were separated on a 1.5% agarose gel and analysed by ethidium bromide staining.

Luciferase reporter assay
The luciferase reporter plasmids were purchased from Addgene (Cat#12456 and Cat#12457). The luciferase reporter plasmids and their respective controls were introduced into cells. Cell lysates were harvested 24 hours later and analysed following the standard protocol of the dual luciferase reporter assay (Promega, Cat#RE1960).
Luciferase activities were determined and are plotted as the fold change with overexpression or knockdown relative to the controls.

Sphere-forming culture
Spheres were generated as previously described [1][2][3][4]. Briefly, cells (1,000 cells/mL) were grown in suspension culture in ultra-low attachment plates (Corning) and serumfree RPMI medium (ATCC) supplemented with B27 (Invitrogen), 20 ng/mL EGF and 10 ng/mL bFGF (BD Biosciences). Spheres with a diameter of >30 μm were then counted. For serial passaging, spheres were harvested and dissociated to single cells with trypsin, and 100 dissociated cells were then re-plated in an ultra-low attachment 96-well plate and cultured for 12 days. Spheres were then re-counted. The individual spheres were found to be derived from single cells [6].

Microarray data collection and analysis
We obtained quantified time-course gene expression profiles for lung cancer cells

Animal experiments
Female CB17 severe combined immunodeficiency (SCID) mice (6 weeks old) were used, and all experimental protocols were approved by the Animal Studies Committee of National Tsing Hua University.
The bone homing assay 2 × 10 5 (1 × 10 5 LM and 1 × 10 5 HM20 cells) or 3 × 10 5 (the above cells co-cultured with 1 × 10 5 BM-MSCs) cells were cultured under sphere-forming conditions for 7 days before injection (in a volume of exactly 100 µL) into the left cardiac ventricle of the 6- week-old mice. Mice were sacrificed on day 5 after intracardiac inoculation. The hind limbs were flushed, and cells were isolated and cultured [7].

An experimental model of bone metastasis by human lung cancer cells
For subcutaneously tumor growth, mice (n = 1) were injected subcutaneously with 1 x

Isolation of SCDCs from long bones
Metastatic cells were isolated from the BM of the lower limbs. Long bones were excised and cleaned of all soft tissues. Marrow cells were released by "flushing," in which 5 to 10 mL of -MEM containing 2 × penicillin/streptomycin was introduced via a 27-gauge needle into the distal epiphysis through the BM compartment. Cell clumps were disaggregated by passing medium containing cells through a 27-gauge needle syringe.
Cells were plated in 10-cm dishes and expanded for 5 days in medium containing 0.6 mg/mL G-418. This procedure was conducted separately for each femur and tibia from 5 mice per group. SCDCs were counted under a light microscope after crystal violet staining [7].

Data source
Collection of MSC conditioned medium human bone marrow-mesenchymal stem cells

Target protein pool determination
In our proposed approach, two types of data are needed for the construction of MSCs and Control PPI networks: (i) the protein-protein interactions (PPI) data of Homo sapiens, and (ii) two kinds of time-series microarray profiles. The PPI data of Home sapiens were obtained from Biological General Repository for Interaction Datasets (BioGRID) database (http://thebiogrid.org/) [9]. For both time-series microarray data, one way of analysis of variance (ANOVA) was applied to select proteins having significant variation with the null hypothesis being that all protein profiles are the same within the eight time-points. For our proposed approach, we rejected the null hypothesis when the p-value, adjusted by Bonferroni, is less than 0.05. Proteins thus selected are included in our protein pools for both sets of microarray data. After the protein pools were determined, two putative PPI networks were constructed by connecting the proteins in our protein pools using all available protein interactions from BioGRID. Student's t-test. Following are descriptions of these proposed approaches. For the k-th target protein in the intracellular PPI network, the dynamic PPI model can be described as the following:

Construction of MSCs-treated and Control PPI network
where represents the protein activity level for k-th target protein at time t, represents the degradation for k-th target protein, denotes the protein activity level for i-th protein interacting with k-th target protein, and denotes the interaction ability between the k-th target protein and i-th protein interacting with k-th target protein. We denote that is the number of protein-protein interaction that interacts with the k-th target protein via BioGRID database. The basal level of k-th target protein is denoted by , and the stochastic noise attributed to model uncertainty and other deviant factors is denoted by t .
In order to determine the interaction ability, degradation, and basal level, it is necessary to exploit the time-series microarray data and PPI information. To identify these parameters in this model, we replace the all protein activity levels with the gene expression profiles. The dynamic PPI model in equation [1] can be rewritten as the following: where represents the regression vector and represents the interaction parameter vector which is necessary to be determined for the target k-th protein. In order to avoid overfitting in the procedure for estimated parameters, we exploited the cubic spline method to interpolate extra time-points within the eight time-points (L, total number of time-points, amounts five times the number of elements in interaction parameter vectors). After the interpolation, the equation [2] for different time points can further be rewritten as follows: We defined that x x t ⋯ x t , ψ ψ t ⋯ ψ t and n n t ⋯ n t . Hence, equation [3] can be rewritten in the following form: [4] For the interaction parameter vector, we assumed that the basal level should be equal or greater than zero for each target protein. For [4], the constrained least-squares minimization algorithm could be exploited to identify the interaction parameter vector [10]. Therefore, the least-squares minimization equation and the assumption can be shown in the following form: where A 0 0 ⋯ 1 . After the interaction parameter vectors for all proteins in the putative PPI networks were identified, the parameter vectors would further be estimated and be used to select significant interactions through Akaike's Information Criterion (AIC) so as to removing the false positive PPIs. AIC method can be shown as the following: AIC( ) = l 2 [6] where is the estimated residual error, represents the number of elements in interaction parameter vectors, and L represents the total number of interpolated time-points samples used to estimate the interaction parameter vectors.
The AIC value is comprised of the estimated residual error and model complexity. The AIC value increases as the estimated residual error increases. In other words, the AIC value decreases as the number of parameters decreases. Therefore, the AIC provides a way to achieve a good tradeoff between estimated residual error and model complexity.
We obtain the number of interaction parameters when the AIC value is minimal for the target k-th protein. After the parameter vector is estimated through the AIC method, the student's t-test is further used to determine whether the interaction parameters are significant. The null hypothesis of the student's test is that each of interaction parameters is zero ( 0), and then we rejected the null hypothesis when the p-value, adjusted by Bonferroni, is equal to or less than 0.05. The flow chart of our method is summarized in Supplementary Fig. 4.