Predicting phenotype using morphological cell responses to nanotopography

Cells respond in complex ways to topographies, making it challenging to identify a direct relationship between surface topography and cell response. A key problem is the lack of informative representations of topographical parameters that translate directly into biological properties. Here, we present a platform to relate the effects of nanotopography on morphology to function. This platform utilizes the ‘morphome’, a multivariate dataset containing single cell measures of focal adhesions, the cytoskeleton, and chromatin. We demonstrate that nanotopography-induced changes in cell phenotype are uniquely encoded by the morphome. The morphome was used to create a Bayesian linear regression model that robustly predicted changes in bone, cartilage, muscle and fibrous tissue gene expression induced by nanotopography. Furthermore, the morphome effectively predicted nanotopography-induced phenotype within a complex co-culture microenvironment. Thus, the morphome enables the cell function-oriented exploration of new topographies, with potential applications in the development of novel surface-patterned biomaterials for tissue implants.

A quantitative relationship exists between a material's physicochemical structure and its biological activity. Rational drug design has long relied on molecule solubility, ionisation and lipophilicity to predict activity 11 . Protein engineering has similarly modelled protein-peptide interactions from protein structure 12 . Cell metabolic activity correlates with synthetic polymer composition, glass transition temperature, and water contact angle 13 . Meanwhile, bacterial attachment can be predicted from descriptors of secondary ionic hydrocarbon chains 14 . In contrast to active biomolecules, the mechanotransductive effects of topography on cell response do not intuitively relate to topography length scale, isotropy, geometry, and polarity. This limits the discovery of functional topography to the screening of libraries for hits using a single, representative cell type [15][16][17][18] . Among its limitations (which include cost, inefficiency and the sampling of a small topography space), this screening approach disregards the cell specificity of response to nanotopogrpahy. Thus, it is vitally important to develop a systematic method to capture cell phenotypes induced by topography.
Here, we demonstrate a platform that encompasses cell phenotype and morphological response induced by nanotopography using image-based cellular features. We obtained measurements of focal adhesions, actin cytoskeleton and chromatin, referred to here as the 'morphome', from single cells on nanotopographies. We observed a unique morphological signature based on cell identity and nanotopography in the morphome. We also identified changes in the transcription of myogenic, osteogenic, chondrogenic and fibroblastic genes across all combinations of cells and nanotopographies. We used a Bayesian linear regression model to directly relate the morphological and phenotypic changes induced by nanotopography. Using the morphome as predictors and without prior knowledge about the nanotopography, the model robustly predicted gene expression induced by nanotopography. A new morphome dataset from a co-culture of osteoblastic and fibroblastic cells verified that the model positively predicts bone tissue formation induced by nanotopography, even within a complex microenvironment. The lack of topography-specific parameters in the model and the presence of the mechanosensing machinery in all cell types highlights the broad applicability of this platform to many biomaterial and cell systems, thereby supporting the functionoriented exploration and rational design of new topographies.

Measuring morphological cell responses to topography: the morphome
We used nanopit topographies consisting of 120 nm diameter, 100 nm depth and with a 300 nm centre-to-centre distance in a square array (SQ) 7 , hexagonal array (HEX) [25][26][27] , and arranged with centre-to-centre distance offset from 300 nm by 50 nm in both x and y directions (NSQ array) 7,8 . An unpatterned ('FLAT') surface was used as a control ( Figure 1A).
We employed cells of the musculoskeletal system due to the diverse responses of muscle, bone, cartilage and fibrous cell types to nanotopographies [27][28][29] . Mouse myoblasts, osteoblasts, chondrocytes, fibroblasts, and pre-osteoblast 30  Quantitative polymerase chain reaction (qPCR) was then used to assess changes in lineage marker expression induced by nanotopography by day 7. Machine learning was then applied on the morphome: (i) hierarchical clustering was used to uncover unique patterns of morphological features that distinguish cell type-specific responses on nanotopographies, and nanotopography-specific morphological changes induced within the cell; (ii) Bayesian linear regression was used to predict myogenic, osteogenic, chondrogenic and fibrogenic gene expression induced by nanotopography using the morphome as predictors ( Figure 1D). , actin (red), focal adhesion kinase (FAK, yellow) and phosphorylated FAK (pFAK, green) were obtained from images of single cells to create the 'morphome'. (C) At day 7, lineage-specific gene expression induced by nanotopography was measured using population-based qPCR. (D) Machine learning for data-driven exploration and model building using the morphome. (i) Hierarchical clustering uncovered unique patterns that delineate cell type and nanotopography within the morphome. (ii) Bayesian linear regression created a predictive model that related the morphome to gene expression and phenotype.

The morphome depicts unique cell signatures induced by nanotopographies
Patterns of nanotopography-induced morphological changes were visible from the morphome (Figure 2A). Immediately apparent were large blocks of actin, FAK and pFAK measurements with similar values within a cell type on a specific nanotopography. These features correspond to increasingly complex measures of texture, granularity and radial intensity distribution for chromatin, actin, FAK and pFAK (Table S1). Frequency of pixel gray levels measures texture and thus of homogeneity, with high texture values indicating coarseness. Granularity measures an object's coarseness, with higher values indicating heterogeneity of pixel intensities and coarser texture. The Zernike coefficient measures the spatial arrangement of intensity as it resembles the increasingly complex Zernike polynomials ( Figure 2A). Interestingly, higher Zernike polynomials resemble the punctate shape and spatial distribution of focal adhesions 32 .

Nanotopography induces cell type-specific gene expression changes
Gene expression was used to determine the effect of nanotopographies on cell phenotype ( Figure S5). We discuss here the changes induced by nanotopography   Figure S3. All qPCR measurements are given as mean ± standard deviation from 2 independent experiments (n=6). Significance levels obtained from one-way ANOVA with Tukey's post-hoc test, denoted by * (p<0.05), ** (p<0.01), *** (p<0.001), and **** (p<0.0001). of pre-myoblasts cultured on SQ reflects the need for FAK phosphorylation and for its preferential localization at stress fiber edges, which is necessary for myotube differentiation 36,37 . In contrast, myoblasts on SQ showed a particularly high average value for chromatin granularity and nuclear morphometry (cluster 4), and near-zero values for radial distribution of actin and of focal adhesions (cluster 1). High chromatin granularity observed for both pre-myoblasts and myoblasts on SQ denotes chromatin heterogeneity and condensation and transcriptional activity, which is reportedly higher prior to myotube formation 38,39 .
Pre-osteoblasts on SQ and NSQ had high average values for pFAK radial distribution, intensity, granularity and texture, and high average values for granularity of chromatin and actin (clusters 1-5). However, pre-osteoblasts on SQ had higher order pFAK and FAK radial distribution than on NSQ, which induced the highest expression of osteogenic markers. The morphome of pre-osteoblasts grown on NSQ featured radially variable actin that resemble bone cells, which have high contractility and actin stress fibers 40 .
For osteoblasts, the differences between the SQ and NSQ morphome were more prominent: NSQ induced lower average values of focal adhesion granularity, chromatin texture and nuclear morphometry (cluster 1), and higher average values for focal adhesion radial distribution (clusters 4-7) compared to SQ ( Figure 3D). The osteoblast morphome on NSQ indicates that focal adhesions localize at regular intervals along the periphery, which is associated with osteogenesis 41 . Furthermore, changes in nuclear morphometry attributed to spreading after growth on stiff surfaces is also associated with osteogenic differentiation 42 . Overall, the morphome reflected cell-type specific responses to nanotopography.
The morphome also exhibited patterns from different cell types (e.g. pre-osteoblasts vs osteoblasts) despite similarity in origin (Table S2). A multi-class logistic regression classifier was able to accurately distinguish 6 different cell types using the morphome ( Figure S7). The radial arrangement of actin and focal adhesions were a critical distinction of the musculoskeletal cell types induced by nanotopography, while the arrangement of actin fibers into stress fibers or into cortical, circular bundles provided information on various cell states.
We also clustered the morphome based on cell type ( Figure (Table S3). The morphome enabled discrimination between 4 nanotopographies with 68% overall accuracy (87% classification rate for HEX) using multi-class logistic regression ( Figure S10).

The morphome predicts nanotopography induced gene expression changes
The Spearman rank correlation revealed that varying degrees of correlation exist between morphome features and gene expression ( Figure S11). We hypothesized that the features of the morphomes would sufficiently encompass cell phenotypes induced by nanotopography thus allowing prediction. We utilized Bayesian linear regression to predict gene expression using the morphome features as predictors

The morphome encompasses cell response to topography within a complex environment
We demonstrate the application of the linear regression model by predicting the outcome of pre-osteoblasts and fibroblasts co-cultured on nanotopographies. A new morphome were obtained from all cells on the entire nanotopography ( Figure S13).
This co-culture morphome was then used in the linear regression model to predict gene expression ( Figure S14).
For visualization, the sum of predicted osteogenic (RUNX2, SP7, BGLAP and SPP1) and fibrotic (TGFB1I1, COL3 and ELN) genes was plotted against the spatial coordinates of the pre-osteoblasts and fibroblasts. Osteogenic gene expression was highest on NSQ, which also induced concentrated areas of highly expressing cells ( Figure 5A). These areas might represent hotspots of osteogenic paracrine signaling induced by the NSQ nanotopography 9 . In contrast, osteogenic gene expression was low and homogenous on the FLAT, SQ and HEX topographies.
Fibrotic gene expression showed more spatial variability across the nanotopographies but was also maximized on the NSQ nanotopography, and largely overlaps with the spatial pattern of osteogenic gene expression ( Figure 5B). This is attributable to the synergistic interaction of osteoblasts and fibroblasts on osteogenic differentiation and mineralization 51

Discussion
In this study, we present a system that robustly encodes nanotopography The co-culture experiment also demonstrated that the morphome dataset encompasses, at high resolution, structural, functional and spatial information.
Structural information is crucial for determining how cells look, a highly valuable feature that can also be used to study drug-and disease-induced cell perturbation.
The functional information -represented by changes in the expression of 14 different genes -can be observed at single-cell level, reflecting spatial resolution that is invaluable in high-throughput formats. When developed as an automated platform that executes topography production, cell seeding and staining, image acquisition, morphome extraction and gene expression prediction, this approach could increase screening efficiency and make it more cell function-oriented.

Polycarbonate surfaces with nanotopography
Surfaces patterned with 120 nm diameter and 100 nm depth nanopits were fabricated on polycarbonate using injection molding 60 . The following nanotopographies were used: surfaces without nanopits (FLAT); nanopits in a square array with 300 nm center-to-center spacing (SQ); nanopits in a square array with approximately 300 nm center-to-center spacing distorted by 50 nm in both x and y directions (NSQ); nanopits in a hexagonal array with 300 nm center-to-center spacing (HEX). Samples were cleaned in 70% ethanol and dried before treating with O2 plasma at 120 W for 1.5 mins. Samples were sterilized using UV light in biological safety cabinet for at least 20 mins before cell seeding.

Cell culture
Mouse fibroblast cell line NIH3T3 (ATCC) was cultured in reduced sodium

Machine learning and multivariate analysis
The morphome initially consisted of a total of 1050 measurements obtained from single cells. Morphome measurements from single-cell populations were combined and were scaled by subtracting the mean and normalizing by the standard deviation of the entire data set. Morphome data from co-culture studies were similarly scaled and normalized using the mean and standard deviation from the initial dataset consisting of homogeneous cell populations. Prior to machine learning, features with zero variance within each batch (e.g. Zernike Phase measurements) were removed from the data set. A Pearson correlation method at significance level 90% was used to remove features with correlation higher than 0.9 without significantly reducing total data variance ( Figure S16) using the KMDA package 64 . After preprocessing, 624 morphome features were used further in the study. To determine the features that were significantly varied either across nanotopography or cell type, a one-way ANOVA was performed (KNIME v3.3.0). The intersection of nanotopography-specific and cell type-specific features were obtained. Hierarchical clustering was performed using Spearman correlation as a distance metric and an average linkage method for cluster linkage using gplots package 65 . Cluster membership and stability was obtained from silhouette analysis using the cluster package 66 . Dendrogram correlation was performed using corrplot package 67 .

Bayesian linear log-Normal regression
Only morphome features with an absolute Spearman correlation coefficient ≥ 0.7 against all examined expression markers were used in the linear regression model And that $ is a linear function of # parametrized by 5: $ ' = 7 + 5 9 + 5 : # : + 5 ( # ( + ⋯ + 5 < # < All model parameters 5 were assumed a priori to come from a normal distribution, parametrized by mean and standard deviation: All models were confirmed to converge to the equilibrium distribution by confirming potential scale reduction statistic split-@ A ≥ 1, effective sample size was smaller than total sample size and low autocorrelation. Predicted gene expression was performed by using the test set or the morphome obtained from the co-culture study as input to the linear model. Predicted values were averaged across 50 draws from the posterior distribution. To determine the predictive power of the morphome, a specific combination of cell type, topography and replicate were iteratively omitted and the remaining dataset was used to refit new models. Thus, 576 additional models were created to test 24 different cell type combinations across 12 genes and 2 replicates. The predictive quality of the models was assessed by predicting the expression of all 14 genes from the omitted cell type, topography and replicate dataset. We report the mean absolute error (MAE) of qPCR prediction for each cell type and topography combination averaged across 2 replicates. MAE was calculated as the average across all absolute differences between predicted and actual gene expression.

Statistics, visualization and software
Statistical analysis and machine learning were performed using statistical software R (v3.4.3) and its graphical interface RStudio (v1.0) 70 . Scatterplots, boxplots and histograms were generated using ggplot2 in R 71 . Pearson correlation, Spearman rank correlation and Bayesian linear regression were performed in R (v3.4).
Interpolation of x and y coordinates and sum of predicted osteogenic or fibrotic gene expression was performed using bivariate interpolation of a regularly gridded dataset using akima package 72 . Afterwards, the contour plots were created using the fitted.contour function in the base package of R, with the nuclear centroid position used as spatial coordinates of the cell. Barcharts and one-way ANOVA analysis of qPCR values were obtained using GraphPad Prism (v7.0a).

Disclosure of competing financial interests
The authors have no competing financial interests to disclose.