## Introduction

Mechanical forces between cells and their embedding matrix are essential for a variety of biological processes, ranging from migration of cells – including immune cells and cancer cells – to tissue maintenance and organ development, see1,2,3,4,5,6,7 for only a few of the many review articles on this topic. Many of the relevant processes occur on a micrometer, or sub-micrometer lengthscale, for instance in nascent cell adhesion sites, filopodia, and bacterial adhesion. To understand these processes’ mechanics and their biological control, reliable and accurate methods for measurement of cellular forces are required.

Traction force microscopy (TFM) is a versatile and perturbation-free method yielding a spatial image of the stress exerted by cells on relatively soft elastic gel substrates. This method has its origins in pioneering work by Harris et al.8, who employed flexible silicone substrates to investigate the mechanical forces that cells generate. Today, TFM has become a method that is routinely used in laboratories studying cell biology and soft matter physics around the world9,10,11,12,13,14. TFM requires three distinct procedures, illustrated in Fig. 1(a): (1) Cells are plated on an elastic substrate containing fiducial markers allowing to quantify gel deformation visually, for instance fluorescent beads or quantum dots15,16. The deformations caused by adherent cells are recorded by taking images of the gel before and after removing the cell. (2) A discrete gel displacement field u is calculated by tracking the markers. The most common techniques for tracking are particle tracking velocimetry (PTV) and particle image velocimetry (PIV). (3) Finally, the traction force field f is calculated from the displacement field u by making use of a mechanical model of the elastic substrate. A variety of methods exist for this purpose, including finite element methods17,18,19,20,21, boundary element methods15,22,23 and methods operating in Fourier space11,21,23,24,25. Usually, calculation of traction from displacement requires either filtering or regularization approaches to reduce the effect of noise. The TFM methodology is limited by two common serious issues that introduce systematic errors. First, the resolution of the measured traction is usually not high enough to resolve processes at micrometer-sized cellular structures. Secondly, the most commonly used TFM algorithms require the user to choose a filter or a regularization parameter, which introduces a considerable degree of subjectivity regarding smoothness and magnitude of the resulting traction. In this article, we suggest methods for improving the state-of-the art with respect to these issues.

In the standard TFM approach it is assumed that the substrate is a homogeneous, isotropic, and linear elastic half-space. The mechanical model relating a continuous displacement field Ui(x) to the traction force field Fj(x′) on a two-dimensional (x = (x1, x2)) surface of the gel is expressed as the integral equation26

$${U}_{i}({\bf{x}})={\int }_{{\rm{\Omega }}}\,\sum _{j=1}^{2}\,{G}_{ij}({\bf{x}}-{\bf{x}}{\boldsymbol{^{\prime} }}){F}_{j}({\bf{x}}{\boldsymbol{^{\prime} }})\,{{\rm{d}}}^{2}{\bf{x}}{\boldsymbol{^{\prime} }},$$
(1)

where Ω denotes the whole surface of the substrate. The integrand contains a Green’s function Gij(x) = (1 + ν)/(πE)[(1 − ν)δij/r + νxixj/r3], and E and ν represent Young’s modulus and Poisson’s ratio, respectively. We also write r2 = |x|2 and δij is the Kronecker delta function. Calculation of the traction Fj requires inversion of Eq. (1). A very popular and practical approach is to solve Eq. (1) in Fourier space23,24,27. With this approach, the inversion is often directly feasible if noise in the displacement data has been filtered prior to calculation of the traction. Optimal filtering, however, requires input of a prior-defined filter function that imposes a smoothness constraint on the calculated traction. Moreover, spatial clustering of traction into sparse regions is not conserved when switching from real space to Fourier space. To take advantage of the sparsity of traction patterns for better reconstruction, one can solve Eq. (1) in real space. Here, the integral in Eq. (1) can be converted into a matrix product by discretizing the traction field Fj and interpolating it as a piecewise linear, continuous function using pyramidal shape functions h(x)23,28. We write the two-dimensional, discrete displacement field as a 2m × 1 vector u, where m is the number of discretization nodes. The discrete traction field f is a 2n × 1 vector, where n is the number of nodes at which traction is prescribed. Then, Eq. (1) becomes

$${\bf{u}}={\bf{M}}{\bf{f}}+{\bf{s}},$$
(2)

where we also explicitly included the linear acquisition noise s that is present in the experimental data. The matrix M represents the coefficients of a discretized integration. It can be calculated with different techniques. The required convolution of the Green’s function with the shape functions h(x) can be done by numerical integration23. However, the computational effort can be significantly reduced by using the convolution theorem in Fourier space on regular grids28. In the Supplementary S1, we show how this method for fast calculation of M can be extended to irregularly spaced measurements with the help of the shift theorem. Regardless of the way how M is calculated, the condition number of M is typically very large. This means that even the smallest noise s leads to very large errors if a direct inversion of Eq. (2) is attempted. A further characteristic of TFM is that the experimental data is always undersampled, which introduces a large degree of uncertainty. The challenge in traction force microscopy is to turn Eq. (2) into a well-posed problem, while providing a traction field f with correct magnitude at high spatial resolution. The standard approaches employed in TFM are L1 or L2 regularization that penalize norms of the solution to render traction calculation a well-defined numerical problem.

Linear problems involving large, ill-conditioned matrices occur commonly throughout engineering and physics. A number of existing methods have not yet been employed in the context of TFM. For example, the elastic net regularization. Literally, the elastic net regularization behaves like a stretchable fishing net that retains “all the big fish” while removing the small background signal29. An alternative are proximal gradient methods which usually operate in wavelet space and employ adaptive or non-adaptive thresholding of high spatial frequencies30,31. Proximal gradient methods are for instance used for reconstructing lost parts of an image32,33,34, for analysis of MRI data35, and for analysis of genomic data36,37. All regularization methods have the weakness that they require selection of one or more constant regularization parameters to allow discrimination of noise and signal.

Bayesian statistics provides one solution to this problem. From the Bayesian point of view, regularization parameters can be seen as random variables that are picked for every experimental sample from a prior distribution. Thus, one infers the most probable value of the regularization parameters in a conceptually similar way as one infers the solution38. Such methods do not require the choice of a regularization parameter and are therefore potentially less prone to subjectiveness than the classical regularization. The conceptual framework of employing hierarchical priors for data regularization has been exploited for a large number of different applications, for instance in astrophysics39,40,41, machine learning42, mechanical structure monitoring43, face recognition44, and radar imagery45. Moreover, Bayesian methods have been used to infer stresses between cells in a monolayer46. Therefore, it is to be expected that Bayesian analysis can be of great use for traction force reconstruction.

In this work, we systematically compare a range of different approaches for solving Eq. (2). Altogether, we study the performance of seven methods, illustrated as a schematic diagram in Fig. 1(b). First, we test various regularization methods. Among these are the classical TFM methods, L1- and L2 regularization and previously untested methods from computer vision, namely Elastic Net (EN) regularization, Proximal Gradient Lasso (PGL), and Proximal Gradient Elastic Net (PGEN). We find that the new EN regularization scheme has a substantially improved accuracy as compared to previous approaches but requires considerable extra computational cost. Secondly, we seek to establish Bayesian models that can automatically perform an optimal regularization of the data. Initial tests indicate that different freely available Bayesian hierarchical models are of little use for TFM since the large number of hidden variables, even when used with sparsity priors, does not enforce sufficient data faithfulness. Instead, we find that the simplest-possible Bayesian models with global priors yield robust results that can be interpreted as optimal L2 regularization. We study two variants of this algorithm: Bayesian L2 regularization (BL2), where the magnitude of the noise in the displacement data must be measured separately, and Advanced Bayesian L2 regularization (ABL2) which requires no extra input. We test the Bayesian methods using artificial data and real experimental data. Our results suggest that BL2 is not only very robust, but also superior to classical L2 regularization when measurement noise is large. Most importantly, BL2 automatically determines the degree of regularization, which removes subjectiveness from the result. This advance is particularly relevant for in-detail comparison of cells in different conditions, where the varying signal-to-noise ratio previously made an unambiguous comparison challenging.

## Methods

### Regularization

A common heuristic approach to solve Eq. (2) is regularization of the solution through additional constraints. Here, not only the residual of (u − Mf) is minimized in a least-squares sense, but the magnitude of the solution is also penalized through its p-norm denoted by $$\parallel {\bf{f}}{\parallel }_{p}$$. The trade-off between minimization of the residual and minimization of the solution norm is determined by fixed regularization parameters, λ1 and λ2, leading to a minimization problem of the type

$$\hat{{\bf{f}}}=\mathop{{\rm{argmin}}}\limits_{{\bf{f}}}\,[\parallel {\bf{M}}{\bf{f}}-{\bf{u}}{\parallel }_{2}^{2}+{\lambda }_{1}\parallel {{\bf{R}}}_{1}{\bf{f}}{\parallel }_{1}+{\lambda }_{2}\parallel {{\bf{R}}}_{2}{\bf{f}}{\parallel }_{2}^{2}\mathrm{].}$$
(3)

The two norms are explicitly written as $$\parallel {\bf{x}}{\parallel }_{1}={\sum }_{k}\,|{x}_{k}|$$ and $$\parallel {\bf{x}}{\parallel }_{2}^{2}={\sum }_{k}\,{x}_{k}^{2}$$. R1 and R2 are functions that are to be defined, e.g., as the unit matrix I. Of the large number of existing regularization strategies, we will focus on the following:

• L2 regularization, employing an L2-norm with λ2 > 0 and λ1 = 0 to penalize traction magnitude through R2 = I is currently the most common technique used for TFM23,47,48. L2 regularization is also known as ridge regression or Tikhonov regularization49 and this method efficiently produces a continuous and smooth reconstructed traction field. This approach conveys a high level of robustness for the inversion problem in real space and also in Fourier space. See supplementary information for our implementation in real space50.

• L1 regularization, also called Lasso51, is realized through setting λ2 = 0, λ1 > 0, and R1 = I. With L1 regularization, small values of the reconstructed signal are efficiently set to zero. L1 regularization is therefore frequently used in the the field of compressive sensing (CS)52, where the underlying assumption is that the signal can be represented in a sparse form where all but a few components of the signal vanish. Recently, the technique has found use for TFM28,53,54,55 and it is appropriate for traction fields containing few, sparsely located traction hotspots. In this case, it has been found that L1 regularization improves the ability to distinguish different traction hotspots. We employ the popular convex optimization solver CVX56,57, for our implementation of the L1 regularization, see Supplementary Information. Alternatively, an iterative reweighted least squares algorithm can be employed58,59,60, see Supplementary Information and Fig. S6.

• The elastic net regularization combines L1- and L2 regularization. Here, λ1 > 0 and λ2 > 0 and Eq. (3) is modified by setting f → f/(1 + λ2)29. This regularization scheme is known to have a better accuracy compared to L1 and L2 regularization if the coefficient matrix has many correlated entries. EN regularization is well established for a wide variety of applications, most notably the analysis of genetic data36,61,62,63, but has to date not been used for TFM. We employ the convex optimization solver CVX for our implementation, see supplementary information56,57.

• Proximal gradient methods are an alternative approach to the optimization problems arising from the non-differentiable target functions in L1 regularization (PGL) and the EN regularization (PGEN)30,64. Here, the penalty terms are chosen to be wavelet-transforms written as $${{\bf{R}}}_{1}{\bf{f}}=2\,{\sum }_{l}\,\langle {\bf{f}},{{\rm{\Psi }}}_{l}\rangle$$ and $${{\bf{R}}}_{2}{\bf{f}}={\sum }_{l}\,\langle {\bf{f}},{{\rm{\Psi }}}_{l}\rangle$$, where the Ψl constitute an orthonormal Wavelet basis65,66. The optimization problem is solved through iterative soft thresholding, where the regularization parameters control the threshold below which the wavelet-coefficients are set to zero. Proximal gradient methods are widely applied for image inpainting, which is the process of reconstructing lost or deteriorated parts of images32,34,67,68,69. Therefore, these methods may be useful for TFM where traction images are reconstructed from undersampled displacement data. Details regarding our implementation is given in the supplementary information.

These schemes have in common that they require the choice of one or two regularization parameters. Selecting the optimal regularization parameters is often a non-trivial problem. For L2- and L1-regularization, one can use the so-called L-curve criterion50 to find regularization parameters that provide a tradeoff between minimization of residual from the inverse problem and the regularization penalty21,23,28,47,48,70. Usually, the regularization parameter is assumed to be located at the inflection point of a curve described by the norm of the residual versus the norm of the solution in double-logarithmic axes. However, the L-curve criterion is of limited use for real data, since the inflection point does not always exist. Alternatively, multiple inflection points can appear, and the points are hard to localize precisely on the employed logarithmic scales. Moreover, the L-curve criterion does not behave consistently in the asymptotic limit of large system sizes or when the data is strongly corrupted by noise71,72. Hence, in practice, regularization parameters are often chosen by visual inspection of the resulting traction field. This procedure lacks objectivity and significantly biases any conclusions drawn from later analysis of the traction forces. Note that this problem is not specific to regularization, but the issue of distinguishing between noise and “real” signal appears generally with any type of method if the data is processed in any way to reduce noise.

### Bayesian approaches for traction reconstruction

The discrete inverse problem u = Mf + s is of two-fold statistical nature. First, the cellular traction varies from measurement to measurement. Thus, f is a sample drawn from a distribution of possible traction values which we denote by p(f|α) with an undetermined parameter α. The function p(f|α) describes any prior knowledge about the distribution of traction. For reasons that will become clear below, we will assume that the prior distribution for the 2n × 1 vector f is a Gaussian

$$p({\bf{f}}|\alpha )=\frac{\exp [\,-\,\alpha {E}_{{\rm{f}}}({\bf{f}})]}{{Z}_{{\rm{f}}}},$$
(4)

where Zf = (2π/α)n and Ef = fTf/2. Second, the acquisition noise s also presents a source of randomness. Typically, s is assumed to be drawn from a zero-mean Gaussian with unknown variance 1/β38,39,41,73. In the language of Bayesian statistics, the probability distribution p(u|f, β) is called the likelihood function and determines the probability to to measure a particular vector u given a traction vector f. Since the noise is Gaussian, the likelihood function is

$$p({\bf{u}}|{\bf{f}},\beta )=\frac{\exp [\,-\,\beta {E}_{{\rm{u}}}({\bf{u}}|{\bf{f}})]}{{Z}_{{\rm{u}}}},$$
(5)

where Eu(u|f) = (Mf − u)T(Mf − u)/2 and Zu = (2π/β)m. m is, as above, the number of two-dimensional displacements. The likelihood function p(u|f, β) describes a situation that is exactly the reverse of the experimental situation, where we are looking for the probability of having f given measurements u. This situation is described by the posterior distribution p(f|u) and can be related to the likelihood via Bayes’ rule

$$P({\bf{f}}|{\bf{u}},\alpha ,\beta )=\frac{p({\bf{u}}|{\bf{f}},\beta )p({\bf{f}}|\alpha )}{p({\bf{u}}|\alpha ,\beta )}=\frac{\exp [\,-\,\beta {E}_{{\rm{u}}}({\bf{u}}|{\bf{f}})]}{{Z}_{{\rm{u}}}}\frac{p({\bf{f}}|\alpha )}{p({\bf{u}}|\alpha ,\beta )}.$$
(6)

Here, the marginal likelihood p(u|α, β) is the overall probability of finding the displacements u when the traction distributions are integrated out. Thus, p(u|α, β) is also called evidence for the model with {α, β, u}.

Assuming that α and β are known constants, one can maximize the posterior probability P(f|u, α, β) with respect to f. The resulting solution then satisfies $${{\bf{f}}}_{{\rm{MP}}}={{\rm{argmin}}}_{{\bf{f}}}[\beta \parallel {\bf{M}}{\bf{f}}-{\bf{u}}{\parallel }_{2}^{2}+\alpha \parallel {\bf{f}}{\parallel }_{2}^{2}]$$, which is exactly the formula employed for L2 regularization, Eq. (3), if the parameters α and β are related to the L2 regularization parameter as λ = α/β74. Thus, our choice of a Gaussian prior is justified if we intend to perform an L2 regularization. Other popular choices for replacing Eq. (5) as prior are the Laplace distribution $$p({\bf{f}}|\theta )=(\theta /\mathrm{2)}\,\exp \,(\,-\,\theta \mathrm{/2}\parallel {\bf{f}}{\parallel }_{1})$$75, and a product of a Gaussian and a Laplace distribution76. Using these priors, we would have found the formulas corresponding to L1 regularization and EN regularization, respectively. Thus, regularization is equivalent to maximizing the posterior probability of a measurement assuming fixed, known parameters of the prior distributions.

However, α and β can also be treated as variables whose values can be determined by maximizing their probability p(α, β|u) = p(u|α, β)p(α, β)/p(u) ~ p(u|α, β), where we assume a uniform prior p(α, β) and we can omit the marginal probability p(u) since it plays no role for the optimization. To calculate the evidence p(u|α, β), we need to integrate out f in the posterior given in Eq. (6). Due to the Gaussian probabilities, the integration can be done analytically. For convenience, we expand the integrand around the most probable value fMP. On defining $$K({\bf{f}})\equiv \alpha {E}_{{\rm{f}}}({\bf{f}})+\beta {E}_{{\rm{u}}}({\bf{f}})$$ and its Hessian, $${\bf{A}}\equiv \nabla \nabla {\bf{K}}({\bf{f}})$$, we expand as K(f) ≈ K(fMP) + (f − fMP)TA(f − fMP)/2. Thus, the evidence becomes

$$\begin{array}{rcl}p({\bf{u}}|\alpha ,\beta ) & = & {\int }_{{\bf{f}}}\,p({\bf{u}}|{\bf{f}},\beta )p({\bf{f}}|\alpha ){{\rm{d}}}^{2n}{\bf{f}}\\ & = & \tfrac{1}{{Z}_{{\rm{u}}}{Z}_{{\rm{f}}}}\,{\int }_{{\bf{f}}}\,\exp [\,-\,K({\bf{f}})]\,{{\rm{d}}}^{2n}{\bf{f}}\\ & = & \tfrac{{\mathrm{(2}\pi )}^{n}{({\rm{\det }}{\bf{A}})}^{-\mathrm{1/2}}}{{Z}_{{\rm{u}}}{Z}_{{\rm{f}}}}\exp [\,-\,{\bf{K}}({{\bf{f}}}_{{\rm{MP}}})],\end{array}$$
(7)

which is an exact result. The logarithm of the evidence is

$$\begin{array}{rcl}\mathrm{log}\,p({\bf{u}}|\alpha ,\beta ) & = & -\,\alpha {E}_{{\rm{f}}}({{\bf{f}}}_{{\rm{MP}}})-\beta {E}_{{\rm{u}}}({{\bf{f}}}_{{\rm{MP}}})-\frac{1}{2}\,\mathrm{log}({\rm{\det }}\,{\bf{A}})\\ & & +\,n\,\mathrm{log}\,\alpha +m\,\mathrm{log}\,\beta -m\,\mathrm{log}\,\mathrm{(2}\pi \mathrm{).}\end{array}$$
(8)

The right hand side of this equation is a typical example for target functions employed in Bayesian analysis, for example in the context of data fitting38. For TFM, numerical calculation of log(detA) requires some care. We employ here a Cholesky decomposition of the positive matrix A = LLT yielding $$\mathrm{log}({\rm{\det }}\,{\bf{A}})=\,\mathrm{log}({\rm{\det }}({\bf{L}}{{\bf{L}}}^{{\bf{T}}}))\,=$$ $$2\,\mathrm{log}\,{{\rm{\Pi }}}_{i}{L}_{ii}=2{{\rm{\Sigma }}}_{i}\,\mathrm{log}({L}_{ii})$$.

The logarithmic evidence, Eq. (8), assumes a maximum at those parameters $$\hat{\alpha }$$ and $$\hat{\beta }$$ that are most likely associated with the measurement u. The implicit equations resulting from maximizing Eq. (8) read38 $$2\hat{\alpha }{E}_{{\rm{f}}}^{{\rm{MP}}}=2n-\hat{\alpha }{\rm{Tr}}{{\bf{A}}}^{-1}$$ and $$2\hat{\beta }{E}_{{\rm{u}}}=2m-2n+\hat{\alpha }{\rm{Tr}}{{\bf{A}}}^{-1}$$. Since fMP and A depend on α and β, these equations need to be solved numerically. Once $$\hat{\alpha }$$ and $$\hat{\beta }$$ are determined, the optimal L2 regularization parameter follows as $$\hat{\lambda }=\hat{\alpha }/\hat{\beta }$$.

We employ two approaches for determining the numerical values of α and β. In the first approach, called Bayesian L2 regularization (BL2), we estimate the inverse noise variance β directly from the data calculating the variance of the measured displacements in spatial regions that are very far away from any cell. Thus, in BL2 only α is determined through maximization of Eq. (8). In the second approach, termed advanced Bayesian L2 regularization (ABL2), we solve directly for α and β, which requires an increased computational effort. For both approaches, it is imperative to standardize the data to adjust its spread in different dimensions. For a displacement vector u of length 2m, we first subtract the mean $$\tilde{{\bf{u}}}={\bf{u}}-{{\bf{1}}}_{2m}\bar{u}$$ with $$\bar{u}={\sum }_{i=1}^{2m}\,{u}_{i}/\mathrm{(2}m)$$. Next, we calculate the mean and standard deviation for all columns of the matrix M as $${\bar{M}}_{j}=1/(2m)\,{\sum }_{i=1}^{2m}\,{M}_{ij}$$ and $${\omega }_{j}={({\sum }_{i=1}^{2m}{({M}_{ij}-{\bar{M}}_{j})}^{2}/\mathrm{(2}m-\mathrm{1)})}^{\mathrm{1/2}}$$. Thus, we can define a problem matrix where each column is normalized by its spread $${\tilde{M}}_{ij}=({M}_{ij}-{\bar{M}}_{j})/{\omega }_{j}$$. The standardized problem therefore reads $${\tilde{u}}_{i}={\tilde{M}}_{ij}\,{\tilde{f}}_{j}$$, which yields $${f}_{i}={\tilde{f}}_{i}/{\omega }_{i}$$.

### Generation of artificial test data

To quantitatively compare the performance of different reconstruction methods, we require artificial data with exactly known traction force and displacements. The process of generating this data is shown in Fig. 2(a,i–iv) and involves prescribing traction force magnitude and direction in distributed circular areas, analytical calculation of the resulting displacements23, sampling displacements at discrete positions and addition of noise. The Supplementary Information S5 provides further details on the involved analytical calculations. Throughout the article, artificial test data is generated for gel substrates with a Young’s modulus of E = 10 kPa and a Poisson’s ratio of ν = 0.3. The size of the image plane is arbitrary, but fixed to 25 μm × 25 μm and involves 9 or 15 circular traction spots. For these fixed geometries we vary the traction magnitude, density of displacements, and the noise level.

### Evaluation metrics for assessing the quality of traction reconstruction

To evaluate the quality of the reconstructed traction, we introduce four different error measures comparing reconstructed traction and known original traction. For this purpose, traction at every grid node is written as a two-dimensional vector t = {tx, ty}. Real traction and reconstructed traction are discriminated by superscripts as treal and trecon. The error measures are calculated by discriminating traction inside and outside of Ni circular traction patches in a test sample.

• The Deviation of Traction Magnitude at Adhesions (DTMA)23 is defined as

$${\rm{DTMA}}=\frac{1}{{N}_{i}}\,\sum _{i}\,\frac{{{\rm{mean}}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{recon}}}{\parallel }_{2}-\parallel {{\bf{t}}}_{j,i}^{{\rm{real}}}{\parallel }_{2})}{{{\rm{mean}}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{real}}}{\parallel }_{2})},$$
(9)

where Ni is the number of circular traction patches and the index i runs over all patches. The index j runs over all traction vectors in one patch. A DTMA of 0 represents a perfect average traction recovery and a negative or positive value implies underestimation or overestimation, respectively.

• The Deviation of Traction Magnitude in the Background (DTMB) is the normalized difference between the reconstructed and real traction magnitude outside the circular patches

$${\rm{DTMB}}=\frac{{{\rm{mean}}}_{k}\,(\parallel {{\bf{t}}}_{k}^{{\rm{recon}}}{\parallel }_{2}-\parallel {{\bf{t}}}_{k}^{{\rm{real}}}{\parallel }_{2})}{\frac{1}{{N}_{i}}\,\sum _{i}\,{{\rm{mean}}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{real}}}{\parallel }_{2})},$$
(10)

where the index k runs over all traction vectors outside the patches. A DTMB with a magnitude much smaller than unity implies low background noise in the reconstructed traction.

• The Signal to Noise Ratio (SNR) for TFM

$${\rm{SNR}}=\frac{\frac{1}{{N}_{i}}\,\sum _{i}\,{{\rm{mean}}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{recon}}}{\parallel }_{2})}{{{\rm{std}}}_{k}({{\bf{t}}}_{k}^{{\rm{recon}}})}\mathrm{.}$$
(11)

measures the detectability of a real signal within a noisy background77. As before, the index k runs over all traction vectors outside the patches while j is the index of each traction vector in the patch i. The value of the SNR runs from 0 to infinity where a SNR that is much larger than unity indicates a good separation between traction and noise.

• The Deviation of the traction Maximum at Adhesions (DMA) measures how peak-values of the traction over- or underestimate the true value. The quantity is defined as

$${\rm{DMA}}=\frac{1}{{N}_{A}}\,\sum _{i}\,\frac{[{{\rm{\max }}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{recon}}}{\parallel }_{2})-{{\rm{\max }}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{real}}}{\parallel }_{2})]}{{{\rm{\max }}}_{j}\,(\parallel {{\bf{t}}}_{j,i}^{{\rm{real}}}{\parallel }_{2})},$$
(12)

where the maxima of traction magnitude are calculated for each traction patch separately through index j. This error measure is particularly important since traction maxima are easy to extract from real experimental data. A DMA of 0 means that the local traction maxima in the reconstruction and in the original data are equal. Positive or negative values of the DMA indicate that the maximum of traction is overestimated or underestimated.

### Experimental procedures

Primary murine podocytes were isolated and maintained by following previously published protocols78. In brief, mGFP positive podocytes were isolated from mTom/mGFP*Nphs2Cre reporter mice and subsequent FACS based purification resulted in a primary podocyte culture of highest purity79. Mice were housed in a SPF facility with free access to chow and water, according to the NIH guide for the care and use of Laboratory animals as well as the German law for the welfare of animals (kept at 12 hour day/night cycle). All animal experiments were approved by local authorities (Regierungspraesidium Freiburg, Germany - G11/51). TFM substrates were prepared according to previously established protocols74. For all experiments, gels were prepared from 12% acrylamide and 0.15% Bis-acrylamide and contained fluorescent beads. Elastic properties of this gel are characterized by a Young’s modulus of ~49 kPa and we assumed a Poisson’s modulus of 0.47. The gel surface was covered with the crosslinker Sulfo-SANPAH and Fibronectin solution was added. Crosslinking was enhanced by exposure of the gels to UV light for 5 minutes. After letting the reaction proceed for 12 hours, the gels were thoroughly washed with water and PBS. Subsequently, primary podocytes were seeded and cultivated for 12–16 hours. Then, coverslips were placed in flow chambers and images of beads and cells were recorded on an inverted confocal microscope with a 63x objective. After recording images of a cell and of the beads below it, the cell was completely removed from the substrate by using a micromanipulator (Eppendorf). Subsequently, images of the fluorescent beads in the relaxed gel were recorded, yielding an unstressed reference state. Substrate deformations between the stressed state with cells and the unstressed reference state were quantified using correlation-based tracking of the beads.

Embryonic rat heart muscle cells were obtained from pregnant rats (Wistar, Charles River, Sulzfeld) at 18–19 days of gestation, details are published elsewhere80. Before decapitation, the pregnant rat was anesthetized with CO2. Afterwards the rat embryos were taken and decapitated immediately (Animal testing license number 84-02.04.2015.A173, Landesamt fuer Natur, Umwelt und Verbraucherschutz Nordrhein-Westfalen, Duesseldorf, Germany). Cover slides were coated with approximately 70 μm thick silicone elastomer layer produced from a commercial two-component formulation (Sylgard 184, Dow Corning; mixing ratio 50:1 base to crosslinker by weight; cured overnight at 60 °C). These substrates contained fluorescent beads in their uppermost layer (FluoSpheres Crimson carboxylate-modified beads; Invitrogen) and were coated with fibronectin before cell seeding. Details on sample preparation and cell culture are published elsewhere80. Calibration of stiffness81 yielded a Young’s modulus of 15 kPa and a Poisson’s ratio of 0.5. Live cell microscopy on spontaneously beating cardiac myocytes was performed and positions of fluorescent beads were determined by cross-correlation80,82.

## Results

### Manual selection of optimal regularization parameters is challenging

The optimal regularization parameters λ1/2 in Eq. (3) are usually unknown. Classical methods for their choice are the L-curve criterion71,83 or the generalized cross validation (GCV) for L2 regularization50,84. However, these two methods hardly ever produce the same parameter values and results can differ substantially in the presence of noise, see Supporting Fig. S1. To illustrate the strong effect of regularization on traction reconstruction, we focus on artificial test data where the underlying traction pattern is known. Figure 2(a) illustrates the generation of artificial traction fields consisting of circular patches each exerting 100 Pa. Figure 2(c) demonstrates how variation of the regularization parameters affects the error of traction reconstruction with different methods. Note that the errors exhibit minima for intermediate values of the regularization parameters. For the methods shown in panels i, iv, v of Fig. 2(c) (L2 regularization, PGL, PGEN), minima occur in the positive error of the background traction DTMB. In contrast, L1 regularization shown in panel ii of Fig. 2(c) exhibits a maximum in the DTMA, indicating that the average traction magnitude is estimated reasonably accurately here.

The occurrence of clear minima in the error measures suggests that the corresponding regularization parameter values produce a faithful traction reconstruction. Indeed, employing the values corresponding to the error minima yields traction fields that visually compare well with the original data, see Fig. 2(a,d). Note that for L1 regularization, the reconstruction clearly overestimates the maximum traction locally. As shown in the Supplementary Fig. S2(b), the overestimation of the maximum quantified by the DMA can only be reduced through ~10 fold reduction of λ1, which however leads to strong background traction and suppression of real traction, see also Fig. S6. While the minima of the error measures in Fig. 2(c) allow to determine a “best” regularization for test data, the resulting regularization parameter values deviate from those suggested by the L-curve criterion, see green lines in Fig. 2(c). Moreover, the L-curves for these samples are complex and exhibit multiple turning points, illustrating the difficulty in choosing the right regularization parameter in experiments, see Supporting Fig. S5.

### The elastic net outperforms other regularization methods for traction reconstruction

To facilitate quantitative comparison of different reconstruction methods, we employ artificial data consisting of 15 circular traction spots with traction magnitude between 0 Pa and 250 Pa, see Fig. 3(a). Gaussian noise with a standard deviation given in percent of the maximal absolute value of the of true displacements is added. The spots have a diameter of 2 μm and the mesh constant for traction reconstruction is 0.5 μm.

Results from different regularization approaches are shown in Fig. 3(a). The figure illustrates that L2 regularization can yield realistic estimates for the absolute magnitude of traction on the spots but produces a strong traction background which may render identification of traction sites difficult. The opposite deficiencies occur for results from L1 regularization. Here, the background is nicely suppressed, which can allow excellent resolution of very small traction spots. However, the peak tractions are significantly overestimated, which can not be mitigated by increasing the regularization parameter, see the Supplementary Fig. S6. Note that the quality of L1 regularization can be improved by using an Iterative Reweighted Least Squares algorithm and the solution from the L2 regularization as an initial guess, see Fig. S6 in the Supplemental Material. The best results are obtained with the EN regularization which combines the advantages of L1- and L2-regularization. Here, we obtain a clean background combined with acceptable accuracy in the absolute traction magnitude on the circular patches. The results from the proximal gradient methods PGL and PGEN qualitatively have a smooth appearance with a level of background traction that is between those of L2 and L1. Figure 3(b) quantifies the described differences between the regularization methods through the error in traction magnitude on the traction spots (DTMA), the error in traction magnitude in the background (DTMB), signal to noise ratio (SNR), and error in maximum traction on the spots (DMA). The Supplementary Fig. S9 contains additional plots of these quantities. We find that the reconstruction quality of traction and background improves with increasing number of displacement measurements m. Furthermore, EN regularization outperforms other regularization methods with regard to reconstruction accuracy of undersampled data (m/n < 1). However, the advantage of EN regularization comes at a significantly increased computation time and memory requirement as shown in Table 1.

### Bayesian variants of the L2 regularization allow parameter-free traction reconstruction

We next consider the performance of the two Bayesian methods, BL2 and ABL2, that allow automatic choice of the optimal L2 regularization parameter, as schematically shown in Fig. 4(a). Both methods select the optimal regularization parameter by maximizing the logarithmic evidence, Eq. (8). As illustrated in Fig. 4(a), the regularization parameter is here deduced from the parameters β and α, characterizing the distributions of measurement noise and traction respectively. We first employ the same test data as used for Fig. 3, containing 5% Gaussian noise in the displacements with β = 400 Pix−2.

With BL2, the log evidence exhibits a clear maximum in a one-dimensional space as seen in Fig. 4(c). Figure 4(d) shows the reconstructed traction employing the optimal parameter $${\hat{\lambda }}_{2}=76.75\,{{\rm{Pa}}}^{2}{/\mathrm{Pix}}^{2}$$. Visual comparison of the color-coded traction magnitude in Fig. 4(b,d) clearly shows that the reconstructed traction has the correct range.

For ABL2, the evidence is a function of β and α as seen in Fig. 4(e). Numerical localization of the maximum yields $$\hat{\alpha }=3.06e4\,\,{{\rm{Pa}}}^{-2}$$ and $$\hat{\beta }=394\,\,{{\rm{Pix}}}^{-2}$$, which is very close to the known input value of β = 400 Pix−2. The optimal regularization parameter in this case is thus $${\hat{\lambda }}_{2}=\hat{\alpha }/\hat{\beta }=77.66\,{{\rm{Pa}}}^{2}{/\mathrm{Pix}}^{2}$$, which agrees well with the estimate from BL2 (76.75 Pa2/Pix2). The resulting traction map is is shown in Fig. 4(f) and is very similar to the traction map resulting from BL2 in Fig. 4(d). Thus, BL2 and ABL2 yield consistent parameter estimates that produce traction reconstruction of good accuracy. See Supplementary Fig. S9 for a comparison of the Bayesian methods with non-Bayesian approaches.

As with other regularization approaches, quality of reconstruction strongly depends on the present noise. When the magnitude of the noise is comparable to the magnitude of the displacements caused by the traction ($${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}}\approx 1$$), little information can be recovered. For instance, the circular spots with weak traction labeled 1 and 2 in Fig. 3(a) are almost impossible to detect in the presence of 5% noise but can be reconstructed in the noise-free case, see the Supplementary Fig. S7. To quantitatively assess the fidelity of reconstruction with small traction forces, we employ a constant 5% but scale the tractions to mean values of (12 Pa, 16 Pa, 60 Pa, and 120 Pa). The resulting relative strength of noise and displacements is quantified through the ratio of standard deviations $${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}}$$, which is plotted against our reconstruction quality measures in Fig. 4(g). For comparison, results from manual selection of the regularization parameter using the L-curve criterion are also given. The reconstruction qualities of BL2, ABL2 and L2 are similar when $${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}}\ll 1$$ (high traction). However, BL2 and ABL2 have an improved signal to ratio SNR compared with the L-Curve approach when $${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}}$$ approaches unity, see (iii). This is due to the difficulties with the L-curve criterion at high noise. The logarithmic evidence function exhibits in all cases a clear maximum that enables robust and reliable choice of optimal parameters with BL2 and ABL2. In general, the results from BL2 are however more reliable since the optimization involves here only one parameter. Overall, the tests with artificial data show that these Bayesian methods containing few additional parameters to be determined from the data can resolve the ambiguity associated with manual choice of the regularization parameters over a wide range of signal strengths $${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}} < 1$$.

BL2 and ABL2 are based on the simplest structure of a Bayesian model with only one, global prior distribution. One may hypothesize that more complex hierarchies of priors yield an improved traction estimate. For instance, it is possible to prescribe a position-dependent prior for sparse traction patterns through hierarchical Bayesian networks. Such methods require more advanced techniques for sampling of the probability distributions and optimization, such as variational techniques or Markov-chain Monte Carlo methods. We have tested three such algorithms that were originally developed for purposes other than TFM75,85,86,87. Results are shown in the supplementary information. However, the tested algorithms all produce highly overestimated, localized traction patterns that sensitively depend on noise. Such errors are likely due to the many free parameters of the models that do not favor a faithful data reconstruction in spite of the sparsity constraints. Thus, our tests suggests that these hierarchical network models are not suited for the inverse problem associated with TFM.

### Test of methods with experimental data

To compare the performance of the different methods for real cells, we employ primary mouse podocytes studied with a standard TFM setup. The deformation field resulting from cellular traction is shown in Fig. 5(a). Using this displacement data, we find that the variance of the noise is ~0.01 pix2 = 103.4 nm2 in regions that are far away from the cell. The maximum displacement is 0.52 μm. Figure 5(b–h) show reconstruction results using all methods discussed above. As for artificial data, we find here that the EN regularization results in a very clear background. The traction magnitudes and shapes of the traction spots are similar to those resulting from regularization with the L2 method. For L1 regularization, traction localizes in sparse regions and has a significantly higher value than for other methods. Proximal gradient methods produce smooth traction profiles as expected from the use of the soft wavelet thresholding. The magnitude of traction measured with PGL and PGEN is close to results of EN and L2.

Next, we considered the performance of the Bayesian methods. The logarithmic evidence, Eq. (8), calculated with BL2 and ABL2 reveals pronounced maxima, allowing to robustly choose the optimal parameters for the experimental data. See also Supplementary Fig. S11. The resulting values for $${\hat{\lambda }}_{2}$$ are 30.4 Pa2/Pix2 and 24.4 Pa2/Pix2 for BL2 and ABL2, respectively, and thus agree reasonably well with each other. For BL2, only a single maximum of the evidence was found in all our tests. For ABL2, we found that further extrema may occur at the boundary of the region of scanned parameters. Figure 5(e,h) show the traction fields calculated with BL2 and ABL2. These traction fields are visually very similar to the one obtained with standard L2 regularization. However, the L-curve criterion provides a much more uncertain estimate of a regularization parameter due to the difficulty of localizing it on a logarithmic scale, see Supplementary Fig. S12. Note that the regularization parameters obtained from the L-curve criterion can not be directly compared to the parameters resulting from the Bayesian methods due to standardization employed for the latter. Overall, the suggested Bayesian models can eliminate ambiguity in TFM by automatically providing a consistent parameter choice.

### Bayesian regularization enables consistent analysis of traction time sequences

TFM is frequently employed to study dynamical aspects of cell mechanics. Examples include cell migration, cell division, or cytoskeletal reorganization in response to extracellular stimuli. Such processes are usually accompanied with a change in the traction distribution. As a result, the optimal regularization parameter varies among different images in a time sequence of microscopy data. Additionally, the regularization parameter can also change if the degree of noise varies over time, which can be caused for example by stage drift or photo bleaching. In such cases, it is very challenging to perform a consistent, frame-by frame analysis to determine the degree of regularization with conventional methods. Thus, one fixed parameter is commonly employed for the whole time sequence whereby precision and accuracy of traction reconstruction are sacrificed.

To test whether our Bayesian methods can be useful in this situation, we employ TFM data with a spontaneously beating cardiac myocyte, see Fig. 6. Due to the large size of the cell, we focus on a region of interest shown in Fig. 6(a). The analyzed time sequence corresponds to one cell contraction. Snapshots from frames 1, 4, and 6 are shown for illustration. Figure 6(b) shows the overall norm of reconstructed traction where λ2 is either chosen according to the L-curve criterion, held at an intermediate, constant value, or automatically determined in BL2. The overall traction magnitudes are similar in frames 2–6 where traction is high. Differences occur, however, in the low-traction regime, where BL2 systematically yields lower values of traction. We expect that the results from BL2 are more trustworthy in this regime since the L-curve criterion yields highly ambiguous values for the regularization parameters, see Supplementary Fig. S13. Figure 6(c) shows the overall norm of the gel displacement and the optimal regularization parameter estimated with BL2. The noise variance is small, ~0.00003 pix2, in regions that are far away from the cell (pixel size 0.2 μm). As expected, λBL2 is inversely correlated with the displacement magnitude.

Figure 6e shows snapshots of the resulting traction fields that illustrate again that BL2 produces slightly different results for low traction, most apparent in Fig. 6(e)i,I and (f),i. Note that the traction field resulting from classical L2 regularization in Fig. 6(e),i,I shows a noise background outside of the cell that is almost comparable to the real cellular traction. In contrast, BL2 suppresses this background at the price of an apparently reduced spatial resolution as seen in Fig. 6(f),i. However, this provides an objective distinction between real signal and noise, which is what is to be expected from a faithful data reconstruction.

## Discussion

During the last decade, traction force microscopy has become one of the most popular techniques for studying mechanobiology on the cellular level. The technique has found broad use for studying sub-cellular structures70,88,89,90,91, single cell dynamics92,93,94,95, and collective cellular behavior96,97,98,99,100,101 and there are far too many applications to be reviewed here appropriately.

Many, if not all, TFM methods critically rely on some form of noise reduction. Usually, traction is calculated from substrate displacement through the solution of a linear problem involving elastic Green’s functions. Here, the effects of noise are not a technical issue relating to the data precision, but connected directly to the structure of the linear problem where even the slightest numerical noise can be amplified to an extent that the true solution is entirely lost. The most immediate approach to deal with noise is to filter the displacement field prior to traction reconstruction. Filtering becomes possible if the solution is calculated in Fourier space because the convolution theorem simplifies the matrix inversion24. However, filtering the input data generally reduces the spatial resolution and optimal resolution can only be gained if the filter is adapted for each sample. In certain cases, moreover, data filtering is not sufficient to guarantee stability of the solution, for example, if the three-dimensional position of displacements is included.

A popular alternative strategy for enforcing well-behaved solutions is regularization. With regularization, Fourier-space inversion becomes more robust21,23. However, regularization is also used for real-space approaches and has been used in conjunction with finite element methods or boundary element methods. Solving the linear problem in real space is generally more demanding, but has the advantage that the spatial sparsity of traction patterns is conserved. For TFM, two regularization methods have to date been used, namely L2 regularization23,74,77 and L1 regularization28,54,55. These methods each have one regularization parameter that is chosen manually based on heuristics, which introduces a considerable degree of subjectivity in the resulting traction.

In this work, we systematically compare the classical L1- and L2 regularization to three other methods that have, to our knowledge, not yet been employed for TFM. These three regularization methods are the Elastic Net (EN), Proximal Gradient Lasso (PGL) and Proximal Gradient Elastic Net (PGEN). Our tests with artificial data clearly demonstrate that EN regularization outperforms other regularization methods with regard to the reconstruction quality. Here, accurate traction reconstruction is due to a simultaneous suppression of background noise and penalization of large traction magnitude. In contrast, the proximal gradient methods PGL and PGEN are effective at producing a smoothed traction field, due to the local removal of high-frequency spatial variations. These results obtained with artificial data agree qualitatively with results from tests with experimental data. Here too, L1 and L2 regularization yield overestimated or underestimated traction on adhesion sites. EN again yields a clear background without producing excessively sharp traction peaks at adhesions, see Fig. 5. PGL and PGEN yield smooth traction fields and rounded adhesion site contours. While our work presents a comprehensive overview of regularization variants in TFM, it does not cover all variants and solution procedures. For example, an Iterative Reweighted Least Squares algorithm for L1 regularization outperforms in our experience the L1 regularization method studied here, see Supplementary Material. Also, it has been suggested to use an L1 norm for both the residual and regularization term55, and various other iterative regularization procedures can be tested for TFM in the future.

Next, we ask if Bayesian methods can eliminate the necessity of a manual choice of regularization parameters. Here, the corresponding parameter values are inferred by maximizing their evidence given a fixed class of chosen probability distributions. Using the simplest approach, our prior assumption on the traction forces is that they are drawn from one global Gaussian distribution with an unknown variance 1/α. The posterior distribution determining the probability of a particular traction field given a measured displacement field is then determined by the parameter 1/α and a further parameter 1/β, quantifying the variance of the measurement noise. For fixed values of α and β, maximization of the posterior distribution corresponds exactly to L2 regularization with λ2 given by α/β. However, the values of α and β can also be determined through maximizing their probability conditioned on a given measurement and the chosen probability distributions. Here, this is equivalent to maximizing the evidence for u, given any two parameter values. We refer to the simultaneous determination of both parameters from the evidence as advanced Bayesian L2 regularization (ABL2). In an even simpler approach, we estimate the noise strength directly from the displacement data, leaving only one parameter α to be determined by maximization of the evidence; which we call Bayesian L2 regularization (BL2). These methods represent an automatic optimization of the L2 regularization. Thus the resulting traction field has all the qualitative features of L2 regularization, including the suppression of exceedingly high traction values and a visible background traction. For all our tests, we found that BL2 was a robust method yielding reasonable estimates for traction and regularization parameters. Due to the difficulty in choosing the correct regularization parameter manually, BL2 has substantial advantages over the classical L2 procedure, in particular if the traction is so small that the resulting displacements are comparable to the noise $${\sigma }_{{\bf{n}}}/{\sigma }_{\bar{{\bf{u}}}}\approx 1$$.

We mention that we have also tested more elaborate hierarchical Bayesian network algorithms that were originally designed for other purposes than for use with TFM. These include a variational approach termed “Bayesian compressive sensing using Laplace priors” (BCSL)75, and Markov chain Monte Carlo methods, for instance the “Bayesian Lasso”87. In our experience, however, none of these methods could compete with the much simpler Bayesian L2 regularization when applied to TFM problems, see Figs S8S10 and S14 in the Supplementary Information.

The advantage of employing Bayesian traction reconstruction is most apparent when cells in different conditions are to be compared. To perform a correct comparison of situations with different traction, different substrate rigidities, etc., it is technically necessary to adapt the regularization parameter for each case. However, the difficulty of finding the corresponding parameters usually makes this impossible, which introduces significant quantitative errors. Bayesian methods present a possible solution to this problem. We have shown here that BL2 produces a regularization that adapts smoothly and robustly to changes of cellular traction. Thus, we expect that this method can be of wide use for quantitative studies of cell physiology.