Predicting gene expression using morphological cell responses to nanotopography

Cells respond in complex ways to their environment, making it challenging to predict a direct relationship between the two. A key problem is the lack of informative representations of parameters that translate directly into biological function. Here we present a platform to relate the effects of cell morphology to gene expression induced by nanotopography. This platform utilizes the ‘morphome’, a multivariate dataset of cell morphology parameters. We create a Bayesian linear regression model that uses the morphome to robustly predict changes in bone, cartilage, muscle and fibrous gene expression induced by nanotopography. Furthermore, through this model we effectively predict nanotopography-induced gene expression from a complex co-culture microenvironment. The information from the morphome uncovers previously unknown effects of nanotopography on altering cell–cell interaction and osteogenic gene expression at the single cell level. The predictive relationship between morphology and gene expression arising from cell-material interaction shows promise for exploration of new topographies.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection Nucleus and cell segmentation from fluorescent images and measurements of cell characteristics were obtained using CellProfiler v2.4.0.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Raw data (e.g. gene expression data, morphome data), R workspace data that contains all Bayesian linear regression models, and associated code that support the findings of this study are available in Zenodo with the identifier 10.5281/zenodo.3608197 nature research | reporting summary

October 2018
Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
No sample size calculation was performed. Two independent experiments were conducted, with each experiment containing at least distinct 75 cells measured without repetition (morphome dataset) or at least 2 repeated measurements from the same sample as technical replicates (gene expression), which contributed heterogeneity in the dataset Data exclusions No data were excluded from the analyses.

Replication
Technical and biological replication of qPCR data and image-based data was performed. All technical replicates ensured precision in in the qPCR measurements. Biological replication of both qPCR and image analysis data across 2 independent experiments showed some agreement between replicates thus demonstrating the heterogeneity in biological response captured in single cell and population measurements. However, the heterogeneity in both datasets were taken into account in Bayesian linear regression, where single cell measurements were regressed against gene expression values for each independent biological experiment. That is, measures of central tendency (e.g. mean) across independent biological experiments were not used in machine learning.
Randomization Separation of the dataset for training and testing of machine learning algorithm was carried out in a completely randomized manner.

Blinding
Blinding of data was not carried out. Supervised machine learning, as carried out in this study, requires that the machine be trained against known or labelled data set to create a predictive regression model.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Validation pFAK antibody was validated by testing positive for the immunizing peptide and negative for a non-phosphorylated equivalent peptide. It has also been validated for use in immunofluorescence, as per manufacturer website. The pFAK antibody has been cited in 19 studies, including: https://www.ncbi.nlm.nih.gov/pubmed/29901439 and https://www.ncbi.nlm.nih.gov/ pubmed/29763414 FAK antibody was verified by immunoprecipitation mass spectrometry with FAK, and has been validated for use in immunofluorescence, as per manufacturer website. The FAK antibody has been used in 1 study: https://www.ncbi.nlm.nih.gov/ pubmed/22546345 YAP antibody has been validated for use in immunoprecipitation using YAP wildtype and YAP knockout samples. The antibody has also been used in 272 citations: https://www.cellsignal.co.uk/products/primary-antibodies/yap-antibody/4912 TAZ antibody has been verified for immunofluorescence, as per manufacturer website. The TAZ antibody has been used in 3 studies: https://www.bdbiosciences.com/eu/applications/research/stem-cell-research/mesenchymal-stem-cell-markers-bonemarrow/human/positive-markers/purified-mouse-anti-taz-m2-616/p/560235 Representative images of cells stained using pFAK and FAK antibodies (see Supplementary Figure 1) or YAP and TAZ antibodies (see Supplementary Figure 4) are included as supplementary figures.