Background & Summary

The epithelial-mesenchymal transition (EMT) equips epithelial cells with migratory, survival, and plasticity properties upon loss of epithelial hallmark characteristics. Together with its reverse process, the mesenchymal-epithelial transition, EMT contributes to cancer metastasis, provides resistance to cell death and chemotherapy, confers stemness properties to cancer cells, and interferes with immunotherapy1,2,3. EMT inhibition and elimination of EMT-undergoing cells are therefore investigated as approaches for cancer therapy4. However, detecting cancer cells undergoing EMT is challenging due to the intrinsic heterogeneity of cancer cells and the phenotypic diversity of EMT programs4.

A hallmark characteristic of epithelial cells is adhesion to neighboring cells and to the basement membrane1. To prevent anchorage-independent growth, epithelial cells normally undergo anoikis upon neighbor or matrix detachment5. During EMT, normal adhesion complexes, e.g., involving E-Cadherin, epithelial cell adhesion molecule (EpCAM), and laminin receptor integrin α6β1 (CD49f/CD29), are dissolved and resistance to anoikis is established6,7. Concomitant cytoskeletal rearrangements break down the epithelial apico-basal orientation and induce a motile front-back polarity, which often includes a replacement of cytokeratins with Vimentin8. EMT can further confer stemness properties to epithelial cells9,10. Numerous signaling pathways can trigger EMT, including TGFβ1, Notch, Hedgehog, WNT, and hypoxia, and activate downstream transcriptional drivers such as Snail family zinc finger transcription factors (TF), Twist family BHLH TFs, zinc finger E-box binding homeobox TFs, and homeobox TF PRRX111. Regulation of EMT occurs by integration of epigenetic, transcriptional, post-transcriptional, and protein stability controls11,12. Together, this shows that the phenotypes of EMT-undergoing cells are shaped by complex molecular circuitries.

EMT is increasingly viewed more as a phenotypic continuum with intermediate states and less as a shift between two discrete states, and the concepts of ‘partial EMT’ and ‘hybrid EMT’ phenotypes have been introduced4,13. A systems biology approach used gene expression profiles of four non-small cell lung cancer cell lines to detect three intermediate states termed ‘pre-EMT’, ‘metastable EMT’, and ‘epigenetically-fixed14’. Transcriptomics of cell lines and clinical samples of cancer was used to rank the resulting spectrum of EMT states, showing that only some were linked to poor survival15. However, identification of EMT-undergoing cells in metastatic cancer tissue is still often based on co-expression of a few epithelial and mesenchymal markers16,17. This can be misleading as several of the ‘mesenchymal’ markers, e.g., Vimentin, can also be expressed by non-malignant epithelial cells18. It remains an ongoing debate which markers and combination of markers are sufficient to distinguish EMT from other processes in vitro and in vivo4,19. In particular, there remains the need for a comprehensive analysis of EMT phenotypes at the protein level.

To address this need, we applied multiplex single-cell mass cytometry20 to four non-cancerous human mammary epithelial cell lines that serve as widely-used models of EMT. EMT was induced in the HMLE and MCF10A cell lines by prolonged exposure to TGFβ19,21 and in the HMLE-Twist-ER (HTER) and HMLE-Snail-ER (HSER) cell lines by treatment with 4-hydroxytamoxifen (4OHT)9. In the HTER and HSER cell lines, 4OHT treatment allows the induction of gene expression by murine Twist1 fused to a modified estrogen receptor (ER) or SNAIL1-ER fusion protein, respectively9. To design our mass cytometry antibody panel, we conducted a flow cytometry surface protein screen in parallel with a transcriptome analysis at multiple time points of induced EMT. We observed alterations in the surface proteome of EMT-undergoing cells over time and detected distinct gene expression profiles of hybrid epithelial-mesenchymal states compared with epithelial and mesenchymal states. From these analyses, we extracted candidate markers for multiplex mass cytometry, which revealed complex phenotypic transitions in all four EMT models and little phenotypic overlap of EMT states between the cell lines. The data presented here can aid in characterizing the complexity and dynamics of EMT in these widely used in vitro models.



A table listing the material used in this study can be found on Mendeley Data (Mendeley Table 1)22.

Cell lines

All human breast cancer cell lines were obtained from the American Type Culture Collection (ATCC) and were grown according to ATCC recommendations. The MCF10A human mammary epithelial cell line was obtained from ATCC (CRL-10317) and cultured in DMEM F12 Ham medium (Sigma Aldrich) supplemented with 10 µg/ml human insulin (Sigma Aldrich), 20 ng/ml epidermal growth factor (EGF, Peprotech), 500 ng/ml hydrocortisone (Sigma Aldrich), 5% horse serum (Gibco), 100 ng/ml cholera toxin (Sigma Aldrich), and PenStrep (Gibco)23. We validated the MCF10A cell line by short tandem repeat (STR) profiling using the ATCC kit (#135-XV). The HMLE, HMLE-Twist-ER (HTER), and HMLE-Snail-ER (HSER) cell lines were a gift from the laboratory of Dr. Robert A. Weinberg at the Massachusetts Institute of Technology and were cultured in a 1:1 mixture of DMEM F12 Ham medium (Sigma Aldrich) supplemented with 10 µg/ml human insulin (Sigma Aldrich), 10 ng/ml EGF (Peprotech), 500 ng/ml hydrocortisone (Sigma Aldrich), and PenStrep (Gibco) with the mammary epithelial growth medium (MEGMTM) BulletKitTM (Lonza)10. For the HTER and HSER cell lines, the growth medium was supplemented with 1 µg/ml Blasticidin S (InvivoGen). All of the cell lines were authenticated upon receipt by comparing them to the originally reported morphological and growth characteristics. They were not tested for mycoplasma. For the HMLE, HTER, and HSER cell lines, growth and morphology as well as protein expression profiles of e.g., cytokeratins, E-Cadherin, Vimentin, CD24, CD44, matched previous reports. None of the cell lines used in this project are among misidentified cell lines listed by the International Cell Line Authentication Committee.

EMT time courses and cell harvesting

EMT was induced in the MCF10A cell line by prolonged stimulation with 5 ng/ml TGFβ1 (Cell Signaling Technology) for eight days24. For this, 0.8 million cells were seeded per 10 cm cell culture dish (Nunc) and incubated at 37 °C and 5% CO2 according to ATCC recommendations. TGFβ1 treatment and vehicle treatment using Dulbecco’s phosphate buffer saline (PBS, Sigma Aldrich) started 24 hours after seeding and was applied daily together with a growth medium exchange.

EMT was induced in the HMLE cell line by prolonged stimulation with 4 ng/ml TGFβ1 (Cell Signaling Technology) for 14 days9. For this, 0.5 million cells were seeded per 10 cm cell culture dish (Nunc) and incubated at 37 °C and 5% CO2. TGFβ1 treatment and vehicle treatment using PBS started 24 hours after seeding and was applied daily. The growth medium was exchanged every other day.

EMT was induced in the HTER and HSER cell lines by prolonged stimulation with 4 ng/ml 4-hydroxytamoxifen (4OHT; Sigma Aldrich) for 14 days9. For this, 0.5 million cells were seeded per 10 cm cell culture dish (Nunc) and incubated at 37 °C and 5% CO2. 4OHT treatment and vehicle treatment using methanol (Thommen Furler) started 24 hours after seeding and was applied daily. The growth medium was exchanged every other day.

To avoid over-confluence and senescence during the time course of HMLEs, HTERs, and HSERs, the cells were split and re-seeded on day four and eight. For this, the cells were washed once with pre-warmed PBS, incubated for 5 min at 37 °C with 4 ml pre-warmed TrypLE 1X Express (Gibco), quenched with pre-warmed growth medium, pelleted at 350 × g for 5 min at room temperature, resuspended in pre-warmed growth medium, and re-seeded using 0.5 million cells per 10 cm cell culture dish.

For harvesting, the cells were washed once with pre-warmed PBS, incubated for 5 min at 37 °C with pre-warmed TrypLE 1X Express (Gibco), fixed for 10 min at room temperature with 1.6% paraformaldehyde (PFA, Electron Microscopy Sciences), scraped off the dish using a cell scraper (Sarstedt AG), and quenched using 4 °C growth medium. The cells were pelleted at 600 × g for 4 min at 4 °C, resuspended in 4 °C PBS at a concentration of about 0.5 million cells per ml and frozen at −80 °C. For mass cytometry analysis, 5-Iodo-2′-deoxyuridine (IdU) at 10 μM was added to the medium 20 min before cell harvesting25.

Mass-tag cellular barcoding

To minimize inter-sample staining variation, we applied mass-tag barcoding to fixed cells26. A barcoding scheme composed of unique combinations of four out of nine barcoding metals was used for this study; metals included palladium (105Pd, 106Pd, 108Pd, 110Pd, Fluidigm) conjugated to bromoacetamidobenzyl-EDTA (Dojindo) as well as indium (113In and 115In, Fluidigm), yttrium, rhodium, and bismuth (89Y, 103Rh, 209Bi, Sigma Aldrich) conjugated to maleimido-mono-amide-DOTA (Macrocyclics). The concentrations were adjusted to 20 nM (209Bi), 100 nM (105–110Pd, 115In, 89Y), 200 nM (113In), or 2 µM (103Rh). Cells were randomly distributed across a 96-well plate and about 0.3 million cells per well were barcoded using a transient partial permeabilization protocol. Cells were washed once with 0.03% saponin in PBS (Sigma Aldrich) prior to incubation in 200 µl barcoding reagent for 30 min at room temperature. Cells were then washed four times with cell staining medium (CSM, PBS with 0.3% saponin, 0.5% bovine serum albumin (BSA, Sigma Aldrich) supplemented with 2 mM EDTA (Stemcell Technologies) and pooled for antibody staining.

Fluorescence cellular barcoding and flow cytometry surface protein screen

To apply the flow cytometry surface protein screen to multiple samples simultaneously, we performed fluorescence barcoding of fixed cells. For this, 18 million cells were washed once with CSM prior to incubation in 3 ml barcoding reagent for 20 min at 4 °C in the dark. As barcoding reagents Alexa Fluor-700-NHS-Ester (AF700, Molecular Probes) and Pacific Orange-NHS-Ester (PO, Molecular Probes) dissolved in dimethyl sulfoxide (DMSO) at 200 µg/ml were used. Single stains or a combination of AF700 and PO were performed in CSM at a final concentration of 0.1 µg/ml or 1 µg/ml and 0.4 µg/ml or 2 µg/ml, respectively. Cells were washed twice with CSM before pooling and staining with E-Cadherin-AF647 (clone 67A4, Biolegend) and EpCAM-FITC (clone 9C4, Biolegend) or CD44-FITC (clone IM7, Biolegend) for 20 min at 4 °C in the dark. Cells were washed once with CSM and filtered through a 40 µm cell strainer. About 0.3 million cells in 37.5 µl CSM were loaded in each well of a 96-well plate of the Human Cell Surface Marker Screening (phycoerythrin [PE]) Kit (Biolegend). Each well contained 12.5 µl of diluted PE-conjugated antibody in CSM. The cells were incubated for 30 min at 4 °C in the dark, according to manufacturer’s instructions. The cells were then washed twice with CSM, fixed with 1.6% PFA in PBS for 10 min at room temperature in the dark and washed twice with CSM again, prior to flow cytometry analysis using the LSRFortessa Cell Analyzer (BD Biosciences).

FACS sorting and RNA sequencing

For live cell FACS sorting, cells were washed once with pre-warmed PBS, incubated for 5 min at 37 °C with 4 ml pre-warmed TrypLE 1X Express (Gibco), pipetted off the cell culture dish, and collected in 4 °C PBS. Cells were pelleted at 350 × g for 5 min at 4 °C, re-suspended in 4 °C PBS with 1% BSA, and stained with E-Cadherin-AF647 (clone 67A4, 5 µg/ 100 µl, Biolegend) and CD44-PE (clone IM7, 1.25 µg/ 100 µl, Biolegend) for 20 min at 4 °C in the dark. Cells were washed once using PBS with 1% BSA and kept on ice until FACS sorting using the FACSAria III (BD Biosciences). For RNA isolation, cells were pelleted at 350 × g for 5 min at 4 °C and lysed in 350 µl RLT buffer of the RNeasy Mini Kit (Qiagen). RNA was isolated according to the manufacturer’s instructions. Briefly, RNA was collected on the RNeasy spin column, washed with 70% ethanol (Merck), and DNA was removed by incubation with DNAse I (Qiagen). RNA was collected in 30–50 µl diethylpyrocarbonate (DEPC, Sigma Aldrich)-containing water and stored at −80 °C. DEPC water was prepared by dissolving 1 ml DEPC in 1 L ddH2O prior to autoclaving. The RNA quality was assessed using a NanoDrop (Thermo Scientific) and Bioanalyzer (Agilent). RNA sequencing was performed using the HiSeq. 2500 System (Illumina) in SR 50 mode (50 base reads) after poly (A) enrichment and stranded library preparation.

Antibodies and antibody labeling

All antibodies and corresponding clone, provider, and metal or fluorescence tag are listed in Mendeley Table 1 and Mendeley Table 17 on Mendeley Data22. Target specificity of the antibodies was confirmed by the provider and in our laboratory. Antibodies were obtained in carrier/ protein-free buffer or were purified using the Magne Protein A or G Beads (Promega) according to manufacturer’s instructions. Metal-labeled antibodies were prepared using the Maxpar X8 Multimetal Labeling Kit (Fluidigm) according to manufacturer’s instructions. After conjugation, the protein concentration was determined using a NanoDrop (Thermo Scientific), and the metal-labeled antibodies were diluted in Antibody stabilizer PBS (Candor Bioscience) to a concentration of 200 or 300 µg/ml for long-term storage at 4 °C. Optimal concentrations for antibodies were determined by titration, and antibodies were managed using the cloud-based platform AirLab as previously described27.

Antibody staining and cell volume quantification for mass cytometry

Antibody staining was performed on pooled samples after mass-tag cellular barcoding. The pooled samples were washed once with CSM. Cells were stained with the EMT antibody panel (Mendeley Table 17 on Mendeley Data22) and incubated for 45 min at 4 °C followed by three washes with CSM. For mass-based cell detection, cells were stained with 500 µM nucleic acid intercalator iridium (191Ir and 193Ir, Fluidigm) in PBS with 1.6% PFA (Electron Microscopy Sciences) for 1 h at room temperature or overnight at 4 °C. Cells were washed once with CSM and once with 0.03% saponin in PBS. For cell volume quantification, cells were stained with 12.5 µg/ml Bis(2,2′-bipyridine)-4′-methyl-4-carboxybipyridine-ruthenium-N-succidimyl ester-bis(hexafluorophos-phate) (96Ru, 98–102Ru, 104Ru, Sigma Aldrich) in 0.1 M sodium hydrogen carbonate (Sigma Aldrich) for 10 min at room temperature as previously described23. Cells were then washed twice with CSM, twice with 0.03% saponin in PBS, and twice with ddH2O. For mass cytometry acquisition, cells were diluted to 0.5 million cells/ml in ddH2O containing 10% EQTM Four Element Calibration Beads (Fluidigm) and filtered through a 40 µm filter cap FACS tube. Samples were placed on ice and introduced into the Helios upgraded CyTOF2 (Fluidigm) using the Super Sampler (Victorian Airship) introduction system; data were collected as .fcs files.

For the mass cytometry experiment including fifteen breast cancer cell lines, cells were stained with the following modifications: Purified Galectin-3 (clone Gal397) was applied at 1 µg/ml for 15 min at 4 °C, the cells were washed with CSM, stained with anti-mouse IgG (polyclonal)-148Nd for 15 min at 4 °C, washed, and then the EMT antibody panel was applied as above, but using Ki-67 (clone B56) in channel 198Pt. We observed a strong background signal in the channel 175Lu (position of Keratin 7) even in Keratin 7-negative cell lines such as MDA-MB-231 and PBMCs and thus drew a gate to exclude this background signal from downstream analyses.

Mass cytometry data preprocessing

Mass cytometry data were concatenated using the.fcs File Concatenation Tool (Cytobank, Inc.), normalized using the MATLAB version of the Normalizer tool28, and debarcoded using the CATALYST R/Bioconductor package29. The.fcs files were uploaded to the Cytobank server (Cytobank, Inc.) for manual gating on populations of interest. The resulting population was exported as.fcs files and loaded into R v4.1.0 (R Development Core Team, 2015) for downstream analysis.

Flow cytometry surface marker screen data processing

Flow cytometry data were compensated on the LSRFortessa Cell Analyzer (BD Biosciences) using single-stained samples. The.fcs files were uploaded to the Cytobank server (Cytobank, Inc.) for manual debarcoding and gating on populations of interest. The mean signal intensity per well and population of interest was exported as an excel sheet. The mean signal intensity of the ‘Blank’ wells of the screen and the signal intensity of the respective ‘Isotype control’ well were subtracted. From the resulting intensity values, log2-transformed fold changes were calculated.

Dimensionality reduction analyses

For dimensionality reduction visualizations using the UMAP algorithm30, signal intensities (dual counts) per channel were arcsinh-transformed with a cofactor of 5 (counts_transf = asinh(x/5)) and z scores were calculated. We used the R UMAP implementation package uwot ( and 1,000 cells per condition and replicate. All markers except IdU, Cyclin B1, Ki-67, and cleaved CASP3/PARP1 were used.

Clustering analyses and heatmap

For PhenoGraph31 clustering of mass cytometry data, the R RPhenograph package ( was used. PhenoGraph clustering was performed per cell line, using 1,000 cells per condition and replicate and k = 30. All markers except IdU, Cyclin B1, Ki-67, and cleaved CASP3/PARP1 were used. For the heatmap we performed hierarchical clustering on the z scores of the shown markers, using Euclidean distance and ward.D linkage. The z scores were calculated on the arcsinh-transformed data per marker.

RNA sequencing data analysis

The RNA sequencing data was processed using an analysis setup derived from the ARMOR workflow32. Quality control of the raw FASTQ files was performed using FastQC v0.11.8 (Andrews S, Babraham Bioinformatics, Transcript abundances were estimated using Salmon v1.2.033, using a transcriptome index based on Gencode release 3434, including the full genome as decoy sequences35 and setting the k-mer length to 23. For comparison, the reads were also aligned to the genome (GRCh38.p13) using STAR v2.7.3a36. Transcript abundances from Salmon were imported into R v4.0.2 and aggregated on the gene level using the tximeta Bioconductor package, v1.6.237. The quasi-likelihood framework of edgeR, v3.30.038,39 was used to perform differential gene expression analysis, accounting for differences in the average length of expressed transcripts between samples40. In each comparison, edgeR was used to test the null hypothesis that the true absolute log2-fold change between the compared groups was less than 1. edgeR was also used to perform exploratory analysis and generate a low-dimensional representation of the samples using multidimensional scaling (MDS). The analysis scripts were run via Snakemake41, and all the code is available on GitHub (

Data Records

A detailed list of all materials used in this study can be found as Mendeley Table 1 on Mendeley Data22 ( RNA sequencing data have been deposited in the ArrayExpress database at EMBL-EBI with accession number E-MTAB-936542. Tables showing the results of the differential gene expression analyses and a table reporting the RNA quality and RNA sequencing mapping metrics have been deposited as Mendeley Tables 2–13 on Mendeley Data22. The code used for RNA sequencing data analysis can be found on GitHub ( Flow cytometry surface protein screen data as.fcs files and the corresponding data analyses referenced in the text as Mendeley Tables 14–16 have been deposited on Mendeley Data22. Furthermore, the Biolegend data sheet corresponding to the flow cytometry screen has been deposited22. Mass cytometry.fcs files of cells after debarcoding (‘DebarcodedCellsGate’) and of live cells (‘LiveCellsGate’) have been deposited on Mendeley Data22 together with a table containing.fcs file annotations (‘FCS_File_Information’) and a table corresponding to the antibody panel used (Mendeley Table 17).

Technical Validation

Optimizing the time courses for in vitro induction of EMT

We induced EMT in four non-cancerous human mammary epithelial cell lines by prolonged ectopic stimulation with TGFβ1 or 4OHT over several days (Fig. 1a; Methods); all four systems are widely used models of EMT9,16,21. We initially carried out a basic characterization of these models and optimized each induction time course to yield the maximum percentage of cells with mesenchymal (M) phenotype, characterized by loss of E-Cadherin and concomitant gain of expression of Vimentin4. We excluded apoptotic cells from the analysis (Fig. 1b).

Fig. 1
figure 1

Induction of EMT in human mammary epithelial cell lines. (a) Experimental workflow. (b) Gating to select live cells. (c) E-Cadherin and Vimentin expression in HMLEs. Gating to select populations with E1-, E2-, EM-, or M-phenotype. (d) Percentages of HMLEs per gate and time point as in (c). (e) Phase contrast images of HMLEs. (f) E-Cadherin and Vimentin expression in MCF10As. (g) Percentage of MCF10As cells per gate and time point as in (f). (h) Phase contrast images of MCF10As. (i) E-Cadherin and Vimentin expression in HTERs. (j) Percentage of HTERs per gate and time point as in (i). (k) Phase contrast images of HTERs. (l) E-Cadherin and Vimentin expression in HSERs. (m) Percentage of HSERs per gate and time point as in (l). (n) Phase contrast images of HSERs. (o) E-Cadherin and Vimentin expression in HMLEs. (p) Percentage of HMLEs per gate and time point as in (o). (q) Phase contrast images of HMLEs. Scale bar = 10 µm. E1 = epithelial 1, E2 = epithelial 2, EM = hybrid epithelial-mesenchymal, M = mesenchymal.

On day 12 of prolonged exposure to TGFβ1, the HMLE cell line yielded 25% of cells with an M-phenotype, 33% of cells with a hybrid epithelial-mesenchymal (EM) phenotype with increased Vimentin expression but no downregulation of E-Cadherin, 28% of an E-CadherinhighVimentinlow phenotype (E1), and 14% of an E-CadherinlowVimentinlow phenotype (E2) (Fig. 1c,d). In comparison, on day twelve, 2% of control HMLEs exhibited an M-phenotype, 5% an EM-phenotype, 84% an E1-phenotype, and 9% an E2-phenotype (Fig. 1c,d). Control HMLEs with EM- or E2-phenotype were most abundant during sparse growth conditions, such as after splitting (Fig. 1d, Methods), indicating a regulation of E-Cadherin and Vimentin levels by growth density16,43. As previously reported, treatment with TGFβ1 induced spindle-like morphological changes44 and resulted in lower cell density compared with control45 (Fig. 1e).

In the MCF10A cell line, induction of EMT by TGFβ1 treatment occurred in a different time frame. The percentage of cells with an M-phenotype increased from 54% on day two to 70% on day eight, the percentage of EM cells (28%) and E1 cells (2%) remained stable across the time course, and the percentage of E2 cells dropped from 10% to 2% (Fig. 1f,g). In control, cells with M-phenotype were at 26% on day 2 and 10% on day 8, cells with EM phenotype more than doubled from 25% to 64%, the percentage of E1 cells stayed stable at 22%, and the E2 cells decreased from 29% to 1% over the time course (Fig. 1f,g). As reported, TGFβ1-treated MCF10A cells acquired spindle-like morphologies while control cells retained their cobblestone shape (Fig. 1h)16. Together, these data show that under sparse growth conditions on day 2, MCF10A cells exhibit mesenchymal-like phenotypes even without TGFβ1 treatment, reflecting the basal-like character of the cell line16. An increase in cell density over time is accompanied by upregulation of E-Cadherin and therefore loss of the M-phenotype in control, while stimulation with TGFβ1 inhibits an E-Cadherin upregulation and induces an upregulation of Vimentin. In TGFβ1-treated cells, a decrease in the percentage of cells with M-phenotype on day eight compared with day six, suggests that cell density may inhibit further EMT46.

In the HTER and HSER cell lines, EMT was induced by prolonged treatment with 4OHT (Methods). We detected the highest percentage (14%) of 4OHT-treated HTER cells with M-phenotype on day ten, at which point 26% of cells exhibited an EM-phenotype (Fig. 1i,j,k). The percentage of 4OHT-treated HSER cells with M-phenotype peaked at 12% on day eight and 28% of cells exhibited an EM-phenotype at this time point (Fig. 1l,m). For both cell lines, treatment with 4OHT induced spindle-like morphologies and was accompanied by reduced cell density compared with control (Fig. 1k,n), as previously reported9. We then assessed possible effects of the 4OHT treatment on HMLEs in the absence of the Twist1-ER or SNAIL1-ER fusion proteins. As expected, treatment with 4OHT did not induce EMT or morphological changes in HMLEs (Fig. 1o,p,q). In treated and control, the percentage of cells with M-phenotype was below 1% and cells with EM-phenotype at 11% at all time points, indicating a basal-like character of the cell line9. The majority of treated and control HMLEs maintained an E1-phenotype throughout the time course (Fig. 1o).

In conclusion, we could induce EMT in four in vitro human cell line models of this process. We observed phenotypic variability, including both full and partial EMT phenotypes, in response to 1–2 weeks of prolonged stimulation with TGFβ1 or 4OHT. Each model followed a unique EMT timeline and showed varying extents of transition to the mesenchymal phenotype.

Transcriptomic profiling of cells undergoing EMT

We next used RNA sequencing to identify markers that distinguish EMT-undergoing cells from control and markers that distinguish cells with EM-phenotype from cells with E- or M-phenotype. From the resulting markers, candidates were selected to inform a mass cytometry antibody panel design. For RNA sequencing, EMT-undergoing HTER cells on day eight and day twelve were sorted by fluorescence-activated cell sorting (FACS) into three populations: E-CadherinhighCD44low (E1-phenotype), E-CadherinintCD44int (EM-phenotype), and E-CadherinlowCD44high (M-phenotype) (Fig. 2a, Methods). CD44 served as a surrogate M-phenotype marker for intracellular Vimentin to avoid cell permeabilization and RNA loss9. As control, day-matched untreated HTER cells with E1-phenotype were used (Fig. 2a). As a second type of control to monitor possible effects of 4OHT independent of EMT, we included 4OHT-treated and untreated HMLE cells. We included two to four pairs of independent biological replicates per condition and collected high quality RNA for all samples (Mendeley Table 2, Methods).

Fig. 2
figure 2

Transcriptomic profiling of EMT-undergoing mammary epithelial cells. (a) Gating to select populations of interest of HTERs for RNA sequencing. (b) Number of RNA sequencing reads assigned to genes per sample. (c) Average base quality (upper panel) and GC content (lower panel) for all samples. (d) Multidimensional scaling plot showing the first two dimensions. (eg) Volcano plots showing the indicated differential gene expression analyses. Highlighted in red are genes with an adjusted p-value below 0.05. logFC = log2 fold change, E = epithelial, EM = hybrid epithelial-mesenchymal, M = mesenchymal.

RNA sequencing yielded above 20 million reads per sample assigned to genes, except one sample with 19 million reads (Fig. 2b, Mendeley Table 2). Mean Phred scores ranged between 35 and 36, indicating high base call accuracy, and GC content distribution across samples did not indicate any noticeable contamination (Fig. 2c, Mendeley Table 2). For all samples, more than 82% of the reads could be uniquely aligned to the human reference genome using STAR36. Mapping to the transcriptome index using Salmon33 showed that more than 86% of fragments were assigned to a transcript, with little variation across samples.

We next assessed the similarity of samples based on global gene expression levels using multidimensional scaling38,39 (Methods). This showed that the respective pairs of biological replicates were similar (Fig. 2d). Control HTER cells were similar to day-matched 4OHT-treated and control HMLE cells, indicating few effects of 4OHT on transcription independent of EMT. This analysis further revealed that 4OHT-treated HTER cells with E-, EM-, and M-phenotype were all separate from their respective day-matched control (Fig. 2d). Differential gene expression analysis showed that more genes were significantly differentially expressed between HTER cells with M-phenotype or EM-phenotype and control than between E-phenotype and control on day eight (Fig. 2e, Mendeley Tables 3–5). Among differentially expressed genes between M-phenotype and control, we found upregulation of canonical markers of EMT, such as the transcription factors ZEB1, ZEB2, FOXC2, and PRRX1, as well as downregulation of typical epithelial markers such as EPCAM1 (Mendeley Table 3). We then asked, which genes were significantly differentially expressed between HTER cells with EM-phenotype and cells with E- or M-phenotype on day eight and found three genes (HHIP, FBN1, HHIP-AS1) and one gene (KIAA1755), respectively (Fig. 2f, Mendeley Tables 6 and 7). When comparing HTER cells on day twelve, more genes were significantly differentially expressed between cells with M-phenotype and control than between E-phenotype and control (Fig. 2g, Mendeley Tables 8 and 9).

In conclusion, 4OHT-treated HTER cells with M-phenotype or EM-phenotype deviated transcriptionally more from control than cells with E-phenotype. Also, 4OHT-treated cells with E-phenotype are transcriptionally distinct from control cells with E-phenotype.

Surface protein expression screen during EMT

We then carried out a flow cytometry surface protein screen to identify further markers that distinguish EMT-undergoing cells from control and to design a mass cytometry antibody panel. Treated and control samples of the HTER, HMLE, and MCF10A cell lines were fixed at multiple time points, fluorescently barcoded, and co-stained with a combination of surface epithelial markers, E-Cadherin and/or EpCAM, and a surface mesenchymal marker, CD44. The resulting flow cytometry data were compensated, debarcoded and gated for cell populations of interest (Fig. 3a–c, Methods). We detected expected surface protein abundance differences between cell populations, such as elevated levels of CD51 in EMT-undergoing cells compared with control47, confirming the quality of the screening results (Fig. 3d). We identified multiple surface proteins that were more than two-fold differentially expressed between treated (TGFβ1-treated or 4OHT-treated) and control samples (Tables 13, Mendeley Tables 14–16). Several of these were regulated in all three cell lines (CD51, CD83, CD266) or in two cell lines (e.g., CD90, CD146, CD166, EGFR, N-Cadherin, Notch 3, and Podoplanin) and most were regulated in the same direction (up or down) relative to control (Fig. 3e). Based on these flow cytometry screen results and the RNA sequencing analysis, we assembled a panel of candidate targets to assess phenotypic heterogeneity during EMT using a multiplex mass cytometry workflow (Fig. 3f, Mendeley Table 17).

Fig. 3
figure 3

Flow cytometry surface protein profiling of EMT-undergoing mammary epithelial cells. (a-c) Control and treated (TGFβ1-treated or 4OHT-treated) cells from the indicated cell lines used for the flow cytometry screen. (d) Histogram overlays comparing CD51 levels between treated and control cells. (e) Proteins that were more than two-fold regulated between treated cells and control in more than one cell line. The arrow direction indicates whether proteins were up- or down-regulated relative to control, and color indicates the corresponding cell line. (f) Antibody panel used for mass cytometry analysis.

Table 1 Flow cytometry screen results for HMLE cells showing log2 fold changes selected for at least two-fold differences.
Table 2 Flow cytometry screen results for MCF10A cells showing log2 fold changes selected for at least two-fold differences.
Table 3 Flow cytometry screen results for HTER cells showing log2 fold changes selected for at least two-fold differences.

Mass cytometric profiling of EMT phenotypes

Mass cytometry is uniquely suited to assess phenotypic heterogeneity during EMT due to its ability to measure about 40 targets on the single-cell level20,48. To ensure high data quality, all antibodies against the candidate targets were titrated using samples that represent epithelial phenotypes (HMLE and MCF10A control cells), mesenchymal phenotypes (fibroblasts, TGFβ1-treated HMLE and MCF10A cells), and non-epithelial, non-mesenchymal phenotypes (peripheral blood mononuclear cells) (Fig. 4a). We then selected EMT-undergoing and control samples at four to six time points for each of the HMLE, HTER, HSER, and MCF10A cell lines, totaling 92 samples (Table 4). The single-cell suspensions were fixed and mass-tag barcoded26 to allow pooling and simultaneous antibody staining of the samples (Methods). We used antibodies against cleaved CASPASE-3 (cl. CASP3) and cleaved poly(ADP-ribose)-polymerase 1 (cl. PARP1) to exclude apoptotic cells, yielding more than 1 million live cells for downstream analysis (Fig. 4b). Comparing three biological replicates of the MCF10A or the HMLE cell lines using the dimensionality reduction algorithm Uniform Manifold Approximation and Projection (UMAP)30 showed a strong similarity of the triplicates for each cell line (Fig. 4c). For MCF10A, the UMAP showed good discrimination of treated and control samples, including differences in E-Cadherin and Vimentin levels (Fig. 4d; Methods). The separation of day 2 control MCF10A cells from other control MCF10A cells is likely caused by the very low expression of epithelial markers and strong expression of the proliferation marker Ki-67 in day 2 cells in contrast to later time points and may reflect growth at lower confluence. Sparse growth conditions have previously been associated with more basal/mesenchymal-like phenotypes in the MCF10A cell line16. In the HMLE cell line, TGFβ1-treated and control samples were less separable (Fig. 4e). In the HTER and HSER cell lines, we observed a separation of 4OHT-treated cells with E-CadherinlowVimentinhigh phenotype from their respective control on the UMAP (Fig. 4f). In contrast and as expected, 4OHT-treated HMLE cells were indistinguishable from control and displayed only low levels of Vimentin, indicating the absence of an EMT (Fig. 4g). Together, our multiplex mass cytometry data shows that EMT is associated with strong phenotypic changes in all four cell lines.

Fig. 4
figure 4

Multiplex mass cytometry profiling of EMT phenotypes. (a) Histogram overlays showing the antibody panel performance. (b) Gating to select live cells. (c) UMAPs showing TGFβ1-treated and control MCF10A and HMLE cells colored by biological replicates. (dg) UMAPs showing the indicated cell lines colored either by day and treatment (left) or by marker expression using z score levels (right). For all UMAPs, 1,000 cells per condition and replicate were used. (h) Heatmap showing marker expression (columns) using z score levels for each cluster or cell line (rows). PhenoGraph clustering was applied per EMT cell line. Z scores were calculated on arcsinh-transformed data and per marker. Hierarchical clustering was performed using Euclidean distance and ward.D2 linkage. The blue rectangle highlights that cluster HSER_2 is phenotypically similar to Hs578T cells and fibroblasts. The bar plot on the right shows the frequency per cluster of cells of the respective day and treatment.

Table 4 Types of samples used for mass cytometry analysis in Fig. 4a–g.

We next wanted to assess the phenotypic diversity of EMT-undergoing cells in more detail and in the context of other cell types and cell lines, specifically fibroblasts (i.e., mesenchymal cells), fifteen breast cancer cell lines spanning luminal epithelial and basal/mesenchymal-like epithelial phenotypes, and peripheral blood mononuclear cells (PBMC; i.e., neither epithelial nor mesenchymal cells). For this, we repeated the mass cytometry analysis for 35 markers and using a subset of time points for the HMLE, MCF10A, HTER, and HSER cell lines. We included four luminal (MCF-7, T47D, ZR-75-1, MDA-MD-134 VI), one HER2-positive (SKBR-3), eight basal Vimentin-positive (MDA-MB-436, MDA-MB-231, HCC38, HCC1395, BT459, CAL-51, HDQ-P1, and Hs578T), and two basal Vimentin-negative breast cancer cell lines (DU4475, HCC1806), fibroblasts, and PBMCs (Table 5). We then applied the algorithm PhenoGraph31 to each EMT model individually, which grouped treated and control cells into nine to eleven phenotypically diverse clusters per cell line based on expression of all 35 markers (Fig. 4h, Methods). The majority of clusters contained mostly treated or untreated cells, indicating a treatment-based separation, while other clusters contained cells of both conditions. We observed this separation for all eleven clusters for the MCF10A cell line, six of eleven clusters for HMLE, eight of eleven clusters for HTER, and two of nine clusters for HSER (Fig. 4h). In MCF10A cells, we observed upregulation of CD44, Podoplanin, CD146, and CD51 upon EMT induction compared with control, and concomitant downregulation of E-Cadherin and K5. In the HMLE, HTER, and HSER cell lines, Vimentin, CD44, CD90, CD51, and CD10 were upregulated in EMT-undergoing cells compared with control (Fig. 4h). Several clusters containing EMT-undergoing cells were phenotypically similar to different basal breast cancer cell lines, as determined by hierarchical clustering (Fig. 4h, Methods). For example, cluster HSER_2 contained 4OHT-treated cells from days 5 and 10 and shared high levels of CD44, CD90, and CD146 and low levels of E-Cadherin, EpCAM, and cytokeratins with the basal Hs578T cell line and with fibroblasts (Fig. 4h, blue rectangle). In another example, the clusters MCF10A_1–6 contained TGFβ1-treated cells from days 4 and 8 or day 2 control cells and shared high levels of Vimentin, CD44, N-Cadherin, and Galectin-3 and low levels of EpCAM and E-Cadherin with the MDA-MB-231, BT549, HCC1395, and MDA-MB-436 basal cell lines. Low levels of epithelial markers in day 2 control cells likely reflects growth at low confluence16. In contrast, all luminal breast cancer cell lines clustered separately from the EMT transition phenotype clusters (Fig. 4h).

Table 5 Types of samples used for mass cytometry analysis in Fig. 4h.

In conclusion, we assembled an antibody panel for multiplex mass cytometry characterization of EMT and discovered a vast phenotypic diversity of EMT states among four widely used human in vitro models of this process. Several of these EMT states displayed phenotypic similarities with basal breast cancer cell lines and fibroblasts, suggesting that EMT in normal mammary epithelial cells can induce phenotypes observed among aggressive breast cancer cell lines.

Usage Notes

We provide here a comprehensive characterization of EMT transition phenotypes in four human mammary epithelial cell lines. We characterize transcriptomes and multidimensional protein-level single-cell phenotypes of these cell lines during EMT. We place these transition phenotypes in the context of the multidimensional phenotypes of fifteen luminal or basal breast cancer cell lines, fibroblasts, and PBMCs. It has previously been shown that EMT in the here used models is associated with increased mammosphere formation9, or induction of invasion and migration49. A detailed functional assessment of the different molecular phenotypes of EMT-undergoing cells presented here is not part of this study and may be of interest.