On the correlation between material-induced cell shape and phenotypical response of human mesenchymal stem cells

Learning rules by which cell shape impacts cell function would enable control of cell physiology and fate in medical applications, particularly, on the interface of cells and material of the implants. We defined the phenotypic response of human bone marrow-derived mesenchymal stem cells (hMSCs) to 2176 randomly generated surface topographies by probing basic functions such as migration, proliferation, protein synthesis, apoptosis, and differentiation using quantitative image analysis. Clustering the surfaces into 28 archetypical cell shapes, we found a very strict correlation between cell shape and physiological response and selected seven cell shapes to describe the molecular mechanism leading to phenotypic diversity. Transcriptomics analysis revealed a tight link between cell shape, molecular signatures, and phenotype. For instance, proliferation is strongly reduced in cells with limited spreading, resulting in down-regulation of genes involved in the G2/M cycle and subsequent quiescence, whereas cells with large filopodia are related to activation of early response genes and inhibition of the osteogenic process. In this paper we were aiming to identify a universal set of genes that regulate the material-induced phenotypical response of human mesenchymal stem cells. This will allow designing implants that can actively regulate cellular, molecular signalling through cell shape. Here we are proposing an approach to tackle this question.

An important source of cell shape signaling is the actin cytoskeleton which changes upon shape change, leading to changes in transcriptional activity of transcription factors such as MRTF, SRF, YAP and EGR [16][17][18] . However, nature has more mechanosensitive proteins in store. Stretch-activated ion channels open upon stretching of the plasma membrane resulting in the influx of ions which trigger several signalling cascades 19 , for example Zhang et al. previously provided evidence that potassium ion channel can be a key player to control MSC proliferation 20 . Many proteins contain mechanosensitive protein domains, e.g. the focal adhesion protein talin 21 , and some proteins contain a so-called BAR domain which is able to sense the curvature of a membrane and transduce that into a signal 22 . The generation of cell signalling events under the control of shape can be clustered under the term mechanotransduction 23 , and an open question in this field is how many different mechanobiological signals can be picked up by how many molecular circuits. The analogy is that of the canonical signalling pathways, where a dozen well-described families of diffusible molecules (hormones, metabolites, etc.) bind to and activate their cognate families of receptors and initiate signalling cascades.
We aim to identify how many mechanobiological pathways exist and how they are activated by cell shape parameters. How specific are the mechanobiological pathways and can we correlate shape features to gene expression and changes in phenotype? To answer these questions, we 24 and others 25 have used surface topography to control cell shape and direct cell function, because shape mediated control can be applied to functionalize the surface of medical devices. In this study, we have generated a library of topography-induced cell shapes and were able to correlate shape induced cell signalling to several different phenotypes, including molecular and morphological. We used bone marrow-derived mesenchymal stem cells as a conventional cell model system. These cells are multipotent, their progeny has very distinct differences in cell shape and there is ample evidence for shape-directed differentiation.

Results
Cell shape diversity can be captured by feature-based clustering of a library of cell shapes. To systematically explore the relation between cell shape and phenotypic response, we exposed human bone marrow-derived mesenchymal stem cells (MSC) to a library of 2176 unique topographies (TopoChip) and observed a plethora of different cell shapes 26 which were distinct from the MSCs cultured on the flat substrate ( Fig. 1a,b). For example, we observed cells with an elongated narrow cell body without any protrusions which resemble the shape of tenocytes (Fig. 1b, bottom-right cell) or had the cuboidal shape typical for bone lining cells; still, others appeared neuronal. Thus, topographies can induce cell shapes, some of which reminisce different mesenchymal cells in vivo.
To capture shape diversity, we generated a morphological fingerprint for each cell on each topography using 11 shape descriptors including Area, Perimeter, Eccentricity and Form Factor 27 (Supplementary Tables 1, 2). We pruned the cell database to remove outliers and imaging artefacts and focused on topographies that consistently induce the same cell shape (Supplementary Materials and Methods). The resulting selection of 851 cell shapes was plotted by Principal Components Analysis (PCA, Fig. 1c), which reduces data dimensionality and allows identifying cells with similar shapes. We observed a continuum of cell shapes with some densely and sparsely occupied areas, demonstrating that some shape features are more abundant in our shape collection than others. This was confirmed by hierarchical clustering, where the tree structure displays four distinct branches, which are further divided until it reaches single cell shapes (Fig. 1d). For further phenotypic experiments, we sampled the medoid surfaces from 28 clusters, which together span the whole range of shape feature range, and reproduced them on 12 mm polystyrene disks. Visual inspection of the MSCs again shows a large diversity of cell shape, which is further highlighted by their cell shape fingerprints (Fig. 1e). The 28 surfaces are thus representing cell shape diversity induced by the TopoChip library ( Fig. 1f) and further permits well plate compatible biological assays.
Shape features of mesenchymal stem cells correlate to their phenotypic response. Cell shape is related to cell function [28][29][30] , so we probed the relation between cell shape features and a series of basic cell phenotypes, i.e. proliferation, apoptosis, protein biosynthesis, migration and differentiation, in MSCs exposed to the 28 groups of topographically-patterned surfaces and to flat control surfaces. We found that topographies induced profound differences in all phenotypic assays, demonstrating that basic cell biological processes are influenced by surface topographical cues. For instance, we observed a big difference in the rate of proliferation, ranging from 12 to 59% of EdU-positive cells, between the low and high scoring surfaces (Fig. 2a). Differentiation was also affected, with, e.g. a six-fold difference in lipid production under adipogenic conditions between the lowest and the highest performing surface (Fig. 2b). None of the surfaces exclusively affected one biological property, yet a comparison of the magnitude of the phenotypic response showed that each surface induces a distinct phenotypic fingerprint (Fig. 2c, Supplementary Fig. 1). This is further illustrated in Fig. 2d, in which we clustered the surfaces based on phenotypic response. Flat surfaces formed a distinct cluster, demonstrating that cells on topographies share part of their phenotypic response which is different from the behaviour of cells on a flat surface (Fig. 2d). A cluster was formed by surfaces 2063 and 1130, located between the flat surface and the other topographies. Interestingly, the topographies on these surfaces are composed of sparsely located pillars, which resemble the flat surface, inducing relatively similar cell shape and similar phenotypic scores.
We assume that the phenotypic fingerprint is the result of the specific combination of multiple cellular signals induced by the topographies, some of which are directly related to cell shape, but others may not. For instance, it is conceivable that the direct effect of topography on nuclear shape influences gene expression and thus phenotype 31 , whereas we did not include nuclear shape features in our analysis. To assess the hypothesis that phenotypic responses are directly related to cell shape, we constructed computational models using the Lasso algorithm in which cell shape features were correlated to each specific phenotype. The most accurate computational model was able to predict alkaline phosphatase (ALP) expression, a marker for osteogenic differentiation, with high accuracy (goodness of fit of 0.82, Fig. 2e). The shape parameters Form-Factor and Minor Axis length positively correlate with ALP expression, whereas Extent showed a negative correlation (Fig. 2f). Thus, a typical ALP positive cell has an elongated, irregular shape with many protrusions, which is consistent with previous work 26 (Fig. 3a).
The second most accurate computational model predicted protein biosynthesis with goodness of fit 0.68, in which Area shows a positive correlation, whereas Euler Number (the number of the "holes" in the object, which can appear when the cell is forming a protrusion around a pillar), negatively correlates to protein synthesis (Supplementary Fig. 2). We were also able to produce predictive models for cell proliferation (goodness of fit 0.63) and adipogenesis (0.37) but were unable to construct a model to predict apoptosis and migration (Fig. 2g). This suggests that parameters other than explored shape descriptors are responsible for differences in those readouts.
To summarise, surfaces selected based on the diverse cell shapes also induced diverse cell response in functional assays. Investigated cell shape descriptors could partially model the phenotypic variance. Different shape features account for different phenotypes, this suggests that shape affects the phenotype through various molecular mechanisms.
Topographies induce distinct but overlapping gene expression profiles in MSCs. To assess the landscape of genes under the control of surface topography, we exposed MSCs to seven topographies that span the phenotypic space and to flat control surfaces and assessed the transcriptome after 24 h. We observed between 46 to 258 differentially expressed genes with a fold-change > 1.5 as compared to flat (Fig. 3b). Each topography induced a unique fingerprint at the gene expression level (Fig. 3c), but with various degrees of overlap. Differential gene expression data was used to identify molecular pathways which were activated. Lists of Differentially Expressed Genes (DEGs) per topography were queried in the Connectivity Map (CMap) 32 , a library of gene www.nature.com/scientificreports/ expression signatures induced by chemical compounds or genetic interference (perturbants), and connectivity scores between the topography-induced DEGs and Connectivity Map DEGs were retrieved. We focused on perturbant classes (PCL), which are groups of signatures in which members belong to the same gene family or are targeted by the same compound (Fig. 4a). Some PCLs were shared by most topographies such as perturbants that induce Cell cycle inhibition gain of function or known inhibitors such as JAK inhibitor and IKK inhibitors, demonstrating that some signaling events are always induced when an MSC encounters a topography. In agreement www.nature.com/scientificreports/ with this, proliferation is impaired on all topographies relative to flat. Other PCLs were present in only one or a few topographical gene lists such as LIM class homeoboxes GOF or glycogan synthase kinase inhibitor, suggesting that some topographies induce distinctive signaling events.
To investigate how the DEGs collaborate to elicit a phenotypic response, we selected all DEGs absolute value with fold change > 1.9 from all seven topographies and created a gene network (Fig. 4b). Genes, represented by nodes, are connected by edges based on binary protein interactions described in public resources (such as KEGG, Reactome, and WikiPathways) using ConsensusPathDB software 33 . 51 out of 91 genes were placed in this topography-induced gene network, suggesting an extensive overlap in functionality. Based on Gene Ontology and manual literature research, we noticed three main clusters of cellular processes, i.e. related to actin, p53 and NF-ƙB signaling, respectively. Further literature research revealed that many DEGs in our network are specifically expressed during the transition from the G2 to M phase of the cell cycle (e.g. Aurora Kinase and CDC20) and are repressed in cells grown on topographies. We noticed a significant overlap in topographical DEGs and those associated with the p53/DREAM complex, which regulates quiescence ( Supplementary Fig. 3). We previously reported on topography-induced quiescence as a typical topography-induced phenotype 34 . Whether actin, NF-kB and proliferation are in the same signalling network or represent separate signalling pathways induced by different cell shapes remains to be investigated.

Shape-dependent phenotypic responses show a strong association with specific genes.
To determine the relation between gene expression, cell shape and phenotype, we calculated pairwise Spearman correlations between cell shape descriptors, gene expression, and phenotype score levels on all topographies. Many genes correlate to shape features such as Solidity and Perimeter (Fig. 5a), whereas far fewer genes correlate to Compactness or EulerNumber. We also noted that relatively few genes correlated with protein synthesis (Fig. 5b), ALP expression was positively correlated with many genes, and Adipogenesis negatively correlated with gene expression. We were particularly interested in situations where the Gene-Shape-Phenotype triad of correlations was robust and consistent (Fig. 6a). For example, Spearman correlation between VCAM-1 gene expression and hMSC proliferation was − 0.75 (Fig. 6b), and at the same time, Spearman correlation between cell Area and proliferation was − 0.71 (Fig. 6d), as the result the Spearman correlation between Cells Area and www.nature.com/scientificreports/ VCAM-1 was 0.71 ( Fig. 6f). This implies that topographies that promote large cell areas are associated with a high level of proliferation and the expression of the VCAM-1 gene. Another clear example is the strong correlation between expression of the transforming growth factor receptor 2 (TGFBR2), ALP protein expression and Extent with Spearman correlation values, correspondingly, − 0.96, 0.67, − 0.75 (Fig. 6c,e,g). Interestingly, a strong association between ALP protein levels and TGFBR2 are reported in the literature 35 . Similar to what we found here, in a previous study we discovered a strong association between the amount of protrusions and ALP protein level 26 .
Browsing the list of 59 unique genes with an absolute correlation value above 0.5 contains 124 combinations and contains many interesting correlations that connect cell shape with phenotype and potentially causal gene (available as Supplementary Information). To further corroborate this, we uploaded the Gene-Shape Correlation (a) Gene expression in hMSCs on seven different topographies was compared to hMSCs on a flat surface and differentially expressed genes were compared in the Connectivity Map, a gene expression database with more than 1 million profiles. All processes that have absolute score value above 99 at least for one condition are depicted, with 0 low and 100 as high similarity. Biological processes have been ranked based on the number of topographies that can affect the process (specificity). (b) Gene expression in hMSCs on seven different topographies was compared to hMSCs on a flat surface, and 437 differentially expressed genes with an adjusted p-value of less than 0.05 were detected. The list was uploaded in ConsensusPathDB, which retrieves correlations between genes and proteins from a number of databases. Circular nodes in the graph represent the genes; rectangles are genes retrieved from a transcription factor database, thus transcriptional control. Edges represent reported relationships. The yellow and grey shading represent two clusters of genes with published involvement in actin related-processes and p53 respectively, based on literature study. Each circular node is a pie chart indicating in which topographies the gene is differentially expressed. . The image was prepared in the Inkscape 0.92.4, the network was drawn in the CytoScape v.3.5.1 66 .

Figure 5.
Correlation of gene expression to shape features or phenotype. Spearman calculated for all different samples a combined in 1 plot in which the distribution of outcomes can be observed. Violin plot representing the distribution of genes' Spearman correlation to either a shape feature (a) or a phenotype (b). Each gene was plotted against the respective feature and correlation presented as a Spearman score. Solidity has many high and low scoring correlation values, meaning that Solidity correlates strongly with gene expression. Most Spearman score for Euler number is close to zero, meaning that few genes' expression strongly correlate to this feature.
Scientific Reports | (2020) 10:18988 | https://doi.org/10.1038/s41598-020-76019-z www.nature.com/scientificreports/ list into Cmap and retrieved the PCLs (Fig. 7a,b). Surprisingly no PCL signatures above absolute value 0.99 were associated with Area. In contrast, Minor Axis length has the largest number of associated signatures. Moreover, we have observed that PCLs HIF activator, CDK inhibitor and Cell Cycle Inhibition GOF were the most common signatures with the highest score. Interestingly, ATPase inhibitor was exclusively linked to shape parameter Euler Number, which was the most important for the prediction of the protein biosynthesis. At the same time, Euler Number has a very strong signature of the PCL Protein Synthesis Inhibitor. This example demonstrates that we can independently connect gene signalling and outcome in the phenotypic assay via cell shape features.

Comparison of different databases reveals a list of universal shape related genes. To investi-
gate the broader relevance of our list of cell shape-correlated genes, we compared its overlap with two other data sets. In one study, whole transcriptome gene expression and related cell shape changes were induced by chemical compounds 36 . In a second study, cell shapes were under the control of adhesive islands, and gene expression was assessed 37 . Overlap of all topographically-induced genes, 437 in total, and the two other gene sets yielded a list of only 12 genes (Fig. 8a) and all the genes showed a strong Pearson correlation cell shape features (Fig. 8b). Of the 12 genes found to be shape-predictable in all three studies, Minor Axis Length and Compactness correlated to eleven of them; Extent correlated to seven of them. As expected, Cell Orientation did not correlate to gene expression (Fig. 8c). The above results demonstrate that filtering genes based on correlation to cell shape descriptors is a powerful method to find associations between gene expression, cell shape, and phenotype and that genes on the list of 275 genes can be considered as candidate genes directly influenced by cell shape. Indeed, of the twelve genes, seven have previously been directly linked to changes in cell shape: • BIRC5 (Yap transcriptional target) 38    have been associated with proliferation. CDC 20 linked to both cell shape (Rho Signaling Protein) 47 , and cells proliferation 48 .

Discussion and conclusion
The molecular mechanisms connecting cell shape to basic cell functions and phenotype maintenance are important and yet remain largely unknown. Using high content imaging, transcriptomics and machine learning we were able to identify strong relationships between cell shape, molecular signalling and cellular phenotype.
In order to correlate the datasets (imaging, phenotypical assays and transcriptomics), all experiments were performed with cells from one donor and at one passage number because both parameters are known to affect the quantitative response of MSCs, as we have described in earlier work 49,50 . It is, therefore, likely that the ranking of the features described in this manuscript will be different when investigated in cells from different donors. Validation of specific parts of the data set is underway and was already achieved with the TGBFR2-EGR1 transcriptional response, which was observed in a different human MSC cell line and even in cells of different origin such as skin and muscle 51 . We also want to emphasize that differences in the transcriptomics response to external stimuli are obvious, as we have described earlier in a cohort of 62 donors, but we also observed a strikingly strong quantitative similarity between donors 52 .
Some signal transduction pathways are particularly strongly correlated to topography-induced changes in cell shape. Changes in cell cycle-related signalling, and in the JAK, IKK and HIF pathways were observed in cells grown on most topographies. Other molecular signalling signatures were more specific, such as SRC, ATPase inhibition and Glycogen synthesis kinase which were affected in MSCs grown on specific topographies. This confirms earlier findings by Nassiri and McCall who identified signatures of DNA damage and proliferation signalling changes in cell shape 36   Cell biological processes related to cell shape features. (a) Spearman correlation was calculated between gene expression and cell shape features. Genes with absolute Spearman correlation above 0.5 per condition (either per phenotype or per cell features) were used as input in the Connectivity Map, a gene expression database with more than 1 million profiles. All processes that have absolute score value above 99 at least for one condition are depicted, with 0 low and 100 as a high similarity. Biological processes have been ranked based on the number of conditions that can affect the process (specificity). (b) Number of genes that were used for the analysis per shape feature.
Scientific Reports | (2020) 10:18988 | https://doi.org/10.1038/s41598-020-76019-z www.nature.com/scientificreports/ A classical mechano-transduction pathway which responds to changes in surface topography 54 but also material stiffness 55 and extracellular matrix remodeling links integrins to proteins in focal adhesions such as talin and vinculin, which subsequently affects Rho-ROCK signaling, actin remodeling and thus leading to activation of the YAP/TAZ transcription factor and downstream genes 56 . In a recently published manuscript 51 , we show that surface topographies from our TopoChip library are able to stimulate TGFbeta signaling, leading to the transcriptional activation of the scleraxis gene. Under these circumstances, TGFbeta2 signaling can be mitigated by blocking Rho/ROCK and actin mediated signaling. On the other hand, small molecules that activate PKC, which is also associated with actin remodeling, can mimic the synergistic effect of topographies on TGFbeta2 induced signaling. In the current manuscript, we note that PCK is one of the small molecules whose gene expression profiles correlate to that of the cell shape features Extend, Solidity and Minor Axis. Additional evidence for the role of the integrin/Rho-ROCK-actin pathway is seen in Fig. 4b, where we see a distinct actin-associated network of genes with genes such as FOS, EGR1 and ACTG1, all of which are SRF target genes. SRF and MRTF-A are transcription factors whose activity can be directly regulated by actin. These observations suggest that at least part of the transcriptional response is related to signaling that relates to the integrin/Rho-ROCK/actin axis. Striking, we do not see a YAP signaling signature either in GO analysis or the CMap data.
Moreover morphological change of MSC was tightly associated with phenotypic functions, particularly in immunomodulation. Morphologically changes of MSCs are related to cell skeletal re-arrangement, that is tightly involved in cytokine stimulus i.e. INF-r stimuli 57 , TNF-a stimuli 58 , Rap1/NFkb signaling 59 , the formation of tunnelling nanotubes (TNT), and mitochondrial transfer 60 .
Further support for the relevance of the genes and pathways discovered here in shape-related cell signalling comes from the overlap of 12 genes between our data set and those of two independent shape-related transcriptomics studies, and the known role of these genes. Thus, our systematic approach allowed us to confirm previous findings and to identify many novel unreported signatures. The data we have produced and publicly shared can be regarded as the first compendium of shape-gene expression and will likely lead to many new discoveries. The correlations can also be used to guide bio-instructive surfaces. For instance, more genes are affected by cell shape features such as Solidity and Extent then, for example, Compactness and some shape features are strongly Figure 8. Genes related to shape are enriched in shape-based transcriptomics data sets. (a) Venn diagram representing the overlap between genes differentially expressed on different adhesive islands, genes related to chemically induced shape changes and the 437 shape-based genes differentially expressed on the seven topographies with a fold change above 1.5. (b) Filtering of the shape-specific genes based on the Spearman correlation score between the gene and at least on of the cell shape parameters. The red line and Y-axis at the left represents a number of selected shape-related genes with specified Spearman correlation threshold value (X-axis). Y-axis on the right represents the total number of genes that have Spearman correlation value above the specified threshold (X-axis). (c) Heatmap that represents the correlation value between shape specific genes and shape parameters. All Spearman correlations with an absolute value below 0.4 are depicted for clarity. In this manuscript, we obtained proof of principle for the correlation between image-based cell features and cell signalling and 11 basic cell shape features. There is likely an opportunity for mechanistic information within cell organelle shape and associated correlations with phenotype. Although a thorough investigation of this was beyond the scope of this work, we did discover a strong negative correlation between nuclear area and the expression of histone H3, suggesting a link between nuclear shape and epigenetic regulation (data not shown). The approach can be taken much further and feed into the creation of large-scale computational models which describe biomaterial surface design, cell shape and cell phenotype. This approach is referred to as Quantitative Structure-Activity Relationships (QSAR) and is widely used to model the biological activity of pharmaceutical compounds 61 . It allows predicting the biological effect of untested compounds and thus narrowing down the search space.
To improve our shape-based models, we need more data which can be obtained by, e.g. upscaling the number of topographies from 28 in this manuscript to thousands, more extensive phenotypic characterisation of the cells for instance by using Cell Painting protocols (Bray et al. Nat Protocols) or other stains that display relevant biological processes and more extensive transcriptomics analysis on different cell types. As a start, data obtained in our project is freely available via our cBIT 62 repository and can already be used to design a biomaterial that inhibits cell proliferation, for example, to reduce growth of the fibrosis tissue on the interface of the tissueimplant. The gene signatures related to cell shape and their underlying topographies can be used to browse the Connectivity Map database for genetic signatures that resemble the effect of small molecule or RNAi-induced gene signatures 32 . This brings new opportunities to research the molecular pathways underlying shape-induced phenotypes. To accelerate the field of biomaterial engineering, this database should link information not only of topography design but also other material properties such as stiffness, ligand density, and any other relevant biomaterial properties. Based on systematic banking and mining of biomaterials, cell phenotype and transcriptomics data, we foresee a universal tool that will help understand and engineer the interface between cells and biomaterials. The work presented in this study outlines a method to search the large design space that links material properties to cell phenotype.

Materials and methods
Cell culture. In this study we used human mesenchymal stem cells (hMSCs) from donor (d016), undergoing a total hip replacement. hMSCs were isolated from bone marrow after obtaining written informed consent from the patient. Ethical approval for using the bone marrow sample was obtained from the ethical advisory board of the Medisch Spectrum Twente, Enschede. All methods were carried out in accordance with local and relevant guidelines and regulations 52 . Cells were cultured in basic medium (αMEM medium (Gibco, 22-571-038) supplemented with 10% fetal bovine serum (FBS) (Sigma-Aldrich batch number 013M3396) at 37 °C in a humid atmosphere with 5% CO 2 , unless stated differently. In all experiments, we used hMSCs at passage number four. Before cells seeding, surfaces were sterilized with 70% ethanol (Boom, 84010059.5000) and wetted in the basic medium during at least 2 h. We have previously systematically characterized in vitro lineage differentiation capacity, gene expression signature and in vivo capacity for ectopic bone formation of this donor (16) cells 52 . Table 1 summarize duration of the functional experiments, which are explained below.
Cell proliferation. hMSCSc (d016) were starved for 24 h in αMEM (Gibco, 22-571-038) medium without FBS to synchronize their cell cycles. Then cells were trypsinized with trypsin-EDTA (0.05%) (Fisher Scientific, 25300054), trypsin was neutralized with the basic medium. Cells were seeded at a density of 10,000 cells/cm 2 in triplicates and cultured in the presence of 1% FBS (Sigma-Aldrich batch number 013M3396). Cells were allowed to adhere overnight and afterwards, cells were cultured for 48 h with 10 μM EdU component (Click-iT Plus EdU Alexa Fluor 594 Imaging Kit, Thermo Fischer Scientific) in the media. Next, incorporated EdU was imaged as described in the kit manual (Click-iT Plus EdU Alexa Fluo 594 Imaging Kit, Thermo Fischer Scientific).

Protein synthesis.
To assess global protein synthesis on the topographies we looked at global protein expression by virtue of l-methionine analogue incorporation. We seeded hMSCs (d016) in triplicates at density 20,000 cells/cm 2 on topographies in 96 well plates in basic medium and let them adhere for 24 h. Later, the medium was replaced with DMEM high glucose, no l-glutamine, no l-methionine, no l-cystine, (Thermo Fischer Scientific, 21013024) supplemented with 580 mg/l l-glutamine (Thermo Fisher Scientific, 25,030,081), and 63 mg/l l-Cystine dihydrochloride (Sigma-Aldrich, C6727-25G). Prior cells fixation we added l-homopropargylglycine, l-methionine analogue (Thermo Fischer Scientific) for 1.5 h. We further stained incorporated l-homopropargylglycine as described in the kit manual (Click-iT HPG Alexa Fluor Protein Synthesis Assay Kits, Thermo Fischer Scientific).

Apoptosis. hMSCs (d016) were stained with dye for live cells, CellTracker Green CMFDA (Thermo Fischer
Scientific, 11570166). During the next step, cells were seeded on topographies for 24 h in a basic medium at density 10,000 cells/cm 2 in 96 well plates. Next, we added a staurosporine (Abcam, ab120056), a protein kinase inhibitor, in the concentration of 0 www.nature.com/scientificreports/ Transcriptional profiling. hMSCs (d016) were seeded on selected topographies for 24 h in a basic medium at a density of 15,000 cells/cm 2 in 24 well plates in triplicate. Total RNA was isolated using the Nucleospin RNA isolation kit (Macherey-Nagel). Next, 100 ng of RNA was used to synthesize cRNA, following the instructions of the Illumina TotalPrep RNA amplification kit, and both RNA and cRNA quality was verified on a Bioanalyzer 2100 (Agilent Technologies). Illumina HT-12 v4 expression Beadchips were used for microarray transcriptomics analysis. Briefly, 750 ng of cRNA was hybridized on the array overnight followed by washing and blocking. Next, by addition of streptavidin Cy-3, a fluorescent signal was developed and arrays were scanned on an Illumina Beadarray reader. Raw intensity values were corrected for background signal in BeadStudio (Illumina). Further data processing and statistical testing were performed using the online portal ArrayAnalysis (https ://array analy sis.org/) 64 . Probe-level raw intensity values were quantile normalized and transformed using variance stabilization (VSN). To reduce the number of false positives, a detection threshold of 0.01 was used. A linear modelling approach with empirical Bayesian methods, as implemented in the R Limma package 65 , was applied to look for differential expression of the resulting probe-level expression values. A false discovery rate (FDR) correction of the p-values was applied using the Benjamini and Hochberg method. Genes were considered differentially expressed at a corrected p-value < 0.05.
Pathway over-representation analysis was performed using the ConsensusPathDB (CPDB) tool (https ://cpdb. molge n.mpg.de/), which provides a comprehensive pathway analysis covering most public resources of gene and protein interactions 32 . As input for the analysis, the set of differentially expressed genes mentioned above was used. To improve the statistical evaluation of the pathways, a background list containing all measured genes was also used in the calculations. Pathways with an FDR-corrected p-value < 0.05 were considered significant.
A protein network analysis was carried out in two steps. CPDB contains an induced network module, which uses the protein-protein interactions described in a large number of public resources, to build a network based on a list of input genes or proteins. As a first step, a network was generated using the same list of differentially expressed genes as used above in the pathway analysis, using a z-score threshold of 20. Only binary protein interactions of low, medium and high confidence were selected. Furthermore, the addition of intermediate genes to the network was allowed in order to improve inter-gene connectivity. In the second step, in order to better understand how this CPDB network may be regulated, the network was imported into CytoScape v.3.5.1 66 and the plugin CyTargetLinker was used to extend the network by adding transcription factors from the transcription factor target database TFe (Transcription Factor encyclopedia). TFe is a small-scale manual literature curation project containing 1531 human transcription factor-target interactions.

Selection of surfaces and clustering.
To be able to find cell shapes that were repeatedly induced by topography and remove any imaging artefacts or biological variation we performed the following filtering steps. Among other outlier detection tools we used, so called, 1.5 interquartile range (IQR) rule. It works as following: calculate 1st and 3rd quantile of the data (value of the first 25% and 75% of ordered data, correspondingly), difference between 3rd and 1st quantiles is IQR, all the data points that lie outside of the following range: 1st quantile − 1.5 × IQR, 3rd quantile + 1.5 × IQR, considered being outlines. First, we removed replicas based on cell density. We performed this on a per-surface basis, to avoid removal of the surfaces with a unique response. Replicas with too low or to high cell number were discarded using the 1.5 IQR rule. Afterwards, we removed cells that were outliers in terms of area and perimeter. Outliers were removed sequentially, first based on area and then based on perimeter values using the 1.5 IQR rule. We further employed Moutlier package in R 67 to remove outliers based on cell shapes. This approach allows finding outliers by using information from multiple features simultaneously. We did this by looking at the distribution of cell shapes in a 5-dimensional shape space and then removing outliers using the Mahalanobis-based outlier detection. These 5 shape features were selected from 11 features that have correlation index less than 0.7 between each other. Finally, for each surface, we retained only those replicates that were of good quality, those that were reproducible. The reproducibility was measured using the correlation of cell shapes between replicas. We excluded replicas that have correlation index less than 0.5, in comparison to all the replicas. Next, we summarized cell shape features per surface by taking median features of all the cells. Afterwards, we removed redundant surfaces, those that were highly correlated (Pearson correlation index above 0.95) to each other based on cell shape features. To reduce the dimensionality of the data further, we performed principal components analysis. We have chosen first seven principal components that captured the most variations of the data. We further used hierarchical clustering analysis with Ward linkage to find clusters of cell shapes. A distance matrix was calculated with Euclidean distances.
Image and data analysis. Open-source software Cell Profiler 2.1 (CP) was used for image analysis 27 . In order to perform automated image analysis in CP, a robust pipeline able to recognize different cell features was built.
Training the model. We employed Lasso regression model from Glmnet package in R 68 to find cell shape features that can predict the response of the phenotypic assays. The model was trained with Leave One Out Cross validation approach. We assessed the performance of the model by reporting the Pearson correlation index (r) between predicted and observed values on the training data. To show the goodness of fit we also reported R 2 . The hyperparameter lambda was optimized separately for every model. Cells on flat control surfaces were excluded because of their distinct response, in comparison to the rest of topographies, which would bias the model. www.nature.com/scientificreports/ focused on perturbant classes (PCL), which are groups of signatures in which members belong to the same gene family or are targeted by the same compound. To find cell shape signatures we have uploaded genes that have absolute correlation value above 0.5. Where genes with positive correlation values were loaded as "upregulated genes" and genes with negative correlation values were loaded as "downregulated genes. The results were further exported to csv file and visualized in R.
TopoChip and enlarged surfaces fabrication. The complete design of the TopoChip is described in detail elsewhere 24 . The TopoChip and enlarged surfaces were fabricated as described previously 69 . Briefly, a chromium mask for photolithography was constructed that contains the design for either the TopoChip or the topographies on expanded surfaces. Subsequently, through standard lithography and deep reactive etching, inverse structures of the topographies were generated on a silicon wafer. This mould was then utilized to make a positive mould in poly(dimethylsiloxane) (PDMS). Subsequently, a second negative mould in OrmoStamp hybrid polymer (micro resist technology Gmbh) was generated from the PDMS mould. This mould serves as the template for hot embossing the topographies in 190 µm thick PS films by applying 140 °C and 10 bar for 5 min. Before cell culture, PS films containing the topographical imprints were O2-plasma treated to enhance cell attachment.
Statistical analyses and data visualization. Statistical analysis was performed in R ver. 3.2.5 70 , graphs were generated in R package ggplot2 71 and cowplot 72 . To determine a fold difference in the phenotypical assays, we have used original units as was measured by image analysis. To quantify the rank of the surfaces were sorted them based on the measured value.