## Introduction

Nonequilibrium dynamics is an essential physical feature of biological and active matter systems1,2,3. By harvesting a fuel—in the form of solar energy, a redox potential, or a metabolic sugar—the molecular dynamics in these systems differs profoundly from the equilibrium case. Some of the fuel’s free energy is utilized to perform work or is stored in an alternative form, but the remainder is dissipated into the environment, often in the form of heat1,4. The energetic loss can alternatively be cast as an increase in entropy of the environment, and the entropy production is associated with broken time-reversal symmetry in the system’s dynamics5,6,7. This connection has been leveraged to experimentally classify particular biophysical processes as thermal or active8,9 based on the existence of probability currents10,11. There is great interest in going beyond this binary classification—thermal versus active—to experimentally quantify how active, or how nonequilibrium, a process is12,13,14. Such a quantification could, for example, provide insight into how efficiently molecular motors are able to work together to drive large-scale motions15,16,17,18,19.

One way to quantify this nonequilibrium activity is to measure the dissipation rate, or how much free energy is lost per unit time. In a biophysical setting, a direct local calorimetric measurement is challenging, but signatures of the dissipation are encoded in stochastic fluctuations of the system20, even far-from-equilibrium21,22,23,24,25,26,27,28,29. We set out to develop and explore strategies for inferring the dissipation rate from these experimentally-accessible nonequilibrium fluctuations. In a system of interacting driven colloids, where all degrees of freedom are tracked, Lander et al. have indirectly measured dissipation from fluctuations27. However, it should also be possible to bound dissipation on the basis of nonequilibrium fluctuations in a subset of the relevant degrees of freedom. As a tangible example of our motivation, consider the experiment of Battle et al., which tracked cilia shape fluctuations to determine that the cilia dynamics were driven out of equilibrium9. With suitable analysis of those shape fluctuations, might one determine, or at least constrain, the free energetic cost to sustain the cilia’s motion?

Though our ultimate motivations pertain to these complex systems, here we present an exhaustive analytical and numerical study of a tractable model30. Using a model consisting of beads coupled by linear springs, we demonstrate how the statistical properties of trajectories provides information about the dissipation rate. The bead-spring model furthermore allows us to address various practical considerations that will be important for future experimental applications of the inference techniques: how much data is required, what is the role of coarse graining, and what can be done about the curse of dimensionality. We show that fluctuations in nonequilibrium currents can provide a route to bound the dissipation rate, even in high-dimensional dynamical systems operating outside a linear-response regime. Crucially, we anticipate many of these insights will support the data analysis of experimentally accessible biological and active matter systems.

## Results

As one of the simplest possible nonequilibrium models, we consider two coupled beads, each allowed to fluctuate in one dimension. The beads are connected to each other and to the boundary walls by linear springs with stiffness k (see Fig. 1). We imagine that the beads are embedded in two different viscous fluids, one hot with temperature Th and the other cold with temperature Tc. These fluids exert a friction γ on each bead, absorbing energy from the beads’ motion. In the absence of coupling between the beads, the average rate at which each thermal bath injects energy exactly balances with the rate it absorbs energy due to frictional drag. By coupling the beads, however, there is a net steady-state rate of heat flow $$\dot Q_{{\mathrm{ss}}}$$ from the hot reservoir into the system and out to the cold reservoir. The hot reservoir’s entropy changes with rate $$\dot S_{\mathrm{h}} = - \dot Q_{{\mathrm{ss}}}/T_{\mathrm{h}}$$ while the cold reservoir’s entropy increases with rate $$\dot S_{\mathrm{c}} = \dot Q_{{\mathrm{ss}}}/T_{\mathrm{c}}$$. In total, the steady-state entropy production rate can therefore be written as

$$\dot S_{{\mathrm{ss}}} = \dot S_{\mathrm{h}} + \dot S_{\mathrm{c}} = \dot Q_{{\mathrm{ss}}}\left( {T_{\mathrm{c}}^{ - 1} - T_{\mathrm{h}}^{ - 1}} \right).$$
(1)

This equation expresses the entropy production rate as the product of a flux $$\dot Q_{{\mathrm{ss}}}$$ and the conjugate thermodynamic driving force $$(T_{\mathrm{c}}^{ - 1} - T_{\mathrm{h}}^{ - 1})$$. The typical situation is that the driving force may be tuned in the lab and the flux is measured as a response.

Suppose, however, that it is not simple to measure the heat flux. Rather, we imagine directly observing the bead positions as a function of time. Those measurements are sufficient to extract the entropy production rate, but to do so we must go beyond the thermodynamics and explicitly consider the system’s dynamics, an approach known as stochastic thermodynamics1,31,32. The starting point is to mathematically describe the bead-spring dynamics with a coupled overdamped Langevin equation $$\mathop {{\bf{x}}}\limits^. = A{\mathbf{x}} + F{\mathbf{\xi }}$$, where x = (x1, x2)T is the vector consisting of each bead’s displacement from its equilibrium position, ξ = (ξ1, ξ2)T is a vector of independent Gaussian white noises, and

$$A = \left( {\begin{array}{*{20}{c}} { - 2k/\gamma } & {k/\gamma } \\ {k/\gamma } & { - 2k/\gamma } \end{array}} \right), \ F = \left( {\begin{array}{*{20}{c}} {\sqrt {2k_{\mathrm{B}}T_{\mathrm{h}}/\gamma } } & 0 \\ 0 & {\sqrt {2k_{\mathrm{B}}T_{\mathrm{c}}/\gamma } } \end{array}} \right).$$
(2)

The matrix A captures deterministic forces acting on the beads due to the springs, while F describes the random forces imparted by the medium. The strength of these random forces depends on the temperature and the Boltzmann constant kB, consistent with the fluctuation-dissipation theorem33.

It is useful to cast the Langevin equation as a corresponding Fokker-Planck equation for the probability of observing the system in configuration x at time t, ρ(x, t):

$$\frac{{\partial \rho ({\mathbf{x}},t)}}{{\partial t}} = - \nabla \cdot (A{\mathbf{x}}\rho ({\mathbf{x}},t) - D\nabla \rho ({\mathbf{x}},t)) \equiv - \nabla \cdot {\mathbf{j}}({\mathbf{x}},t),$$
(3)

with D = FFT/2. Though we are modeling a two-particle system, it can be helpful to think of the entire system as being a single point diffusing through x space with diffusion tensor D and with deterministic force γAx. The second equality in Eq. (3) defines the probability current j(x, t). These probability currents (and their fluctuations) will play a central role in our strategies for inferring the rate of entropy production.

Due to its analytic and experimental tractability, this bead-spring system and related variants have been extensively studied as models for nonequilibrium dynamics34,35,36,37,38,39. In particular, the steady-state properties are well-known. Correlations between the position of bead i at time 0 and that of bead j at time t are given by Cij(t) = 〈xi(0)xj(t)〉. The expectation value is taken over realizations of the Gaussian noise to give

$$C(t) = {\int_{ - \infty }^t} d s\;e^{A(t - s)}FF^Te^{A^T(t - s)}.$$
(4)

The steady-state density and current are expressed simply as

$$\begin{array}{*{20}{l}} {\rho _{{\mathrm{ss}}}({\mathbf{x}})} \hfill & = \hfill & {(2\pi \sqrt {{\mathrm{det}}{\cal{C}}} )^{ - 1}e^{ - \frac{1}{2}{\mathbf{x}}^T{\cal{C}}^{ - 1}{\mathbf{x}}}} \hfill \\ {{\mathrm{j}}_{{\mathrm{ss}}}({\mathbf{x}})} \hfill & = \hfill & {(A{\mathbf{x}} + D{\cal{C}}^{ - 1}{\mathbf{x}})\rho _{{\mathrm{ss}}}({\mathbf{x}})} \hfill \end{array}$$
(5)

in terms of the long-time limit of the correlation matrix

$${\cal{C}} \equiv \mathop {{\lim }}\limits_{t \to \infty } C(t) = \frac{{k_{\mathrm{B}}}}{{12k}}\left( {\begin{array}{*{20}{c}} {7T_{\mathrm{h}} + T_{\mathrm{c}}} & {2(T_{\mathrm{c}} + T_{\mathrm{h}})} \\ {2(T_{\mathrm{c}} + T_{\mathrm{h}})} & {T_{\mathrm{h}} + 7T_{\mathrm{c}}} \end{array}} \right).$$
(6)

The steady-state current jss(x) is a vector field that specifies the probability current conditioned upon the system being in configuration x. Associated with this current is a local conjugate thermodynamic force $${\mathbf{F}}({\mathbf{x}}) = k_{\mathrm{B}}{\mathbf{j}}_{{\mathrm{ss}}}^T({\mathbf{x}})D^{ - 1}/\rho _{{\mathrm{ss}}}({\mathbf{x}})$$40,41. The product of the microscopic current and force is the local entropy production rate at configuration x: $$\dot \sigma _{{\mathrm{ss}}}({\mathbf{x}}) = {\mathbf{F}}({\mathbf{x}}) \cdot {\mathbf{j}}_{{\mathrm{ss}}}({\mathbf{x}})$$. Upon integrating over all configurations, we obtain the total entropy production rate

$$\begin{array}{*{20}{l}} {\dot S_{{\mathrm{ss}}}} \hfill & = \hfill & {{\int} d {\mathbf{x}}\,\dot \sigma _{{\mathrm{ss}}}({\mathbf{x}}) = k_{\mathrm{B}}{\mathrm{Tr}}\left\{ {AD^{ - 1}A{\cal{C}} - {\cal{C}}^{ - 1}D} \right\}} \hfill \\ {} \hfill & = \hfill & {k_{\mathrm{B}}\frac{{k(T_{\mathrm{h}} - T_{\mathrm{c}})^2}}{{4\gamma T_{\mathrm{h}}T_{\mathrm{c}}}}.} \hfill \end{array}$$
(7)

Comparing with Eq. (1), we see that the rate of net heat flow is $$\dot Q_{{\mathrm{ss}}} = k_{\mathrm{B}}k(T_{\mathrm{h}} - T_{\mathrm{c}})/4\gamma$$. Our ability to analytically compute the heat flow derives from the linear coupling between beads, yet we are ultimately interested in experimental scenarios in which linear coupling could not be assumed. In those more complicated systems, there is no simple analytical expression for the local entropy production rate, but we could still estimate $$\dot \sigma _{{\mathrm{ss}}}$$ by sampling trajectories from the steady-state distributions—either in a computer or in the lab. We now consider strategies for this estimation by sampling the bead-spring dynamics and comparing with the analytical expression, Eq. (7).

### Estimating the steady state from sampled trajectories

We first seek estimates of jss(x) and ρss(x) from a long trajectory x(t) of bead positions over an observation time τobs. We estimate the steady-state density by the empirical density, the fraction of time the trajectory spends in state x:

$$\rho ({\mathbf{x}}) = \frac{1}{{\tau _{{\mathrm{obs}}}}}{\int_{0}^{\tau _{{\mathrm{obs}}}}} \delta \left( {{\mathbf{x}}\left( t \right) - {\mathbf{x}}} \right)dt,$$
(8)

where δ is a Dirac delta function. The empirical density is an unbiased estimate of the steady-state density, meaning the fluctuating density ρ(x) tends to ρss(x) in the long-time limit. Similarly, an unbiased estimate for the steady-state currents is the empirical current

$${\mathbf{j}}({\mathbf{x}}) = \frac{1}{{\tau _{{\mathrm{obs}}}}}{\int_0^{\tau _{{\mathrm{obs}}}}} \delta \left( {{\mathbf{x}}\left( t \right) - {\mathbf{x}}} \right)\circ d{\mathbf{x}}(t).$$
(9)

This Stratonovich integral can be colloquially read as the time-average of all displacement vectors that were observed when the system occupied configuration x. In practice, experiments typically record the configuration x at discrete-time intervals Δt such that the trajectory is given by the timeseries {xΔt, xt,...}. Consequently we work with estimates of the density and currents42:

$$\hat \rho ({\mathbf{x}}) = \frac{{\Delta t}}{{\tau _{{\mathrm{obs}}}}}\mathop {\sum}\limits_{i = 1}^{\tau _{{\mathrm{obs}}}/\Delta t} K \left( {{\mathbf{x}}_{i\Delta t},{\mathbf{x}}} \right)$$
(10)
$$\hat {\mathbf{{\jmath}}}({\mathbf{x}}) = \frac{{\hat \rho ({\mathbf{x}})}}{{2\Delta t}}\frac{{\mathop {\sum}\limits_{i = 2}^{\tau _{{\mathrm{obs}}}/\Delta t - 1} L \left( {{\mathbf{x}}_{i\Delta t},{\mathbf{x}}} \right)\left[ {{\mathbf{x}}_{\left( {i + 1} \right)\Delta t} - {\mathbf{x}}_{\left( {i - 1} \right)\Delta t}} \right]}}{{\mathop {\sum}\limits_{i = 2}^{\tau _{{\mathrm{obs}}}/\Delta t - 1} L ({\mathbf{x}}_{i\Delta t},{\mathbf{x}})}},$$
(11)

where K and L are kernel functions43. The kernel functions make it natural to spatially coarse grain the data, a necessity because experiments have a limited resolution and because most microscopic configurations will never be sampled by a finite-length trajectory. The function K(xiΔt, x) controls how observing the ith data point at position xiΔt impacts the estimate of $$\hat \rho$$ at a nearby position x. Similarly, L controls how currents are estimated in the neighborhood of the observed data points. Specific choices for K and L are discussed in the Methods section. Using $$\hat \rho$$ and $$\hat {\mathbf{ \jmath}}$$ we can now construct direct estimates of the entropy production rate.

### Direct strategies for entropy production inference

In computing Eq. (7), we integrated the local entropy production rate F(x) jss(x) over all configurations x. When jss(x) and F(x) are not known, it is natural to replace them by the estimators $$\hat {\mathbf{ \jmath}}({\mathbf{x}})$$ and $$\hat {\mathbf{F}}({\mathbf{x}}) \equiv k_{\mathrm{B}}\hat {\mathbf{ \jmath}}^T({\mathbf{x}})D^{ - 1}/\hat \rho ({\mathbf{x}})$$. Though $$\hat {\mathbf{F}}$$ is constructed from the unbiased estimators $$\hat {\mathbf{ \jmath}}$$ and $$\hat \rho$$, $$\hat {\mathbf{F}}$$ is only asymptotically unbiased, necessitating sufficiently long trajectories for the bias to become negligible. Utilizing $$\hat {\mathbf{F}}$$, we approximate $$\dot S_{{\mathrm{ss}}}$$ by either a spatial or a temporal average:

$$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{spat}}} \equiv {\int} d {\mathbf{x}}\,\hat {\mathbf{F}}({\mathbf{x}}) \cdot \hat {\mathbf{ \jmath}}({\mathbf{x}})$$
(12)
$$\begin{array}{*{20}{l}} {\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}} \hfill & \equiv \hfill & {\frac{1}{{\tau _{{\mathrm{obs}}}}}{\int_0^{\tau _{{\mathrm{obs}}}}} dt\,\hat {\mathbf{F}}({\mathbf{x}}(t)) \cdot \circ d{\mathbf{x}}(t)} \hfill \\ {} \hfill & = \hfill & {\frac{1}{{\tau _{{\mathrm{obs}}}}}\mathop {\sum}\limits_{i = 2}^{\tau _{{\mathrm{obs}}}/\Delta t} \hat {\mathbf{F}}\left( {\frac{{{\mathbf{x}}_{i{\mathrm{\Delta }}t} + {\mathbf{x}}_{(i - 1){\mathrm{\Delta }}t}}}{2}} \right) \cdot \left[ {{\mathbf{x}}_{i{\mathrm{\Delta }}t} - {\mathbf{x}}_{(i - 1){\mathrm{\Delta }}t}} \right].} \hfill \end{array}$$
(13)

The performance of these estimators is assessed using data sampled from numerical simulations of the Langevin equation, described further in Methods. As illustrated in Fig. 2, the estimators are biased for any finite trajectory length, but they converge to the analytical result, Eq. (7), with sufficiently long sampling times.

At first glance $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{spat}}}$$ and $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}$$ may appear equivalent due to ergodicity. Indeed, with an infinite amount of sampling, both schemes must yield the same result, $$\dot S_{{\mathrm{ss}}}$$, but the temporal estimator converges significantly faster with finite sampling. Plots of the estimated local dissipation rate (Fig. 2 inset) hint at the reason $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{spat}}}$$ converges more slowly: $$\dot \sigma _{{\mathrm{ss}}}({\mathbf{x}})$$ must be accurately estimated by $$\hat {\dot \sigma }_{{\mathrm{ss}}}({\mathbf{x}}) = \hat {\mathbf{F}}({\mathbf{x}}) \cdot \hat {\mathbf{ \jmath}}({\mathbf{x}})$$ throughout the entire configuration space. The integral in Eq. (12) equally weights $$\hat {\dot \sigma }_{{\mathrm{ss}}}({\mathbf{x}})$$ at all x, even those points which have been infrequently (or never) visited by the stochastic trajectory. Our x has dimension two, but we will also consider higher-dimensional configuration spaces, for example by coupling more than two beads in a linear chain. If that configuration space has dimension greater than three or four, it becomes impractical to estimate $$\dot \sigma _{{\mathrm{ss}}}$$ across the entire space. Furthermore, estimating Eq. (12) for high-dimensional x confronts the classic problem of performing numerical quadrature on a high-dimensional grid, where it is well-known that Monte Carlo integration becomes a superior method.

The temporal integral can be thought of as a convenient way to implement such a Monte Carlo integration, with sampled x’s coming from the configurations of the stochastic trajectory. Notably, Eq. (13) is computed from estimates of the thermodynamic force near the sampled configurations xiΔt, precisely where the finite trajectory has been most reliably sampled. In contrast, Eq. (12) requires spurious extrapolation of the kernel density estimates ($$\hat \rho$$ and $$\hat{\mathbf{ \jmath}}$$) to points which are far from the any sampled configurations. The advantage of the temporal estimator over the spatial one becomes even more pronounced as dimensionality increases. Nevertheless, even $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}$$ becomes harder to estimate when x grows in dimensionality. Getting accurate estimates of F around the xiΔt requires observing several trajectories which have cut through that part of configuration space while traveling in each direction. But when the dimensionality is large, recurrence to the same configuration-space neighborhood takes a long time. Consequently, we turn to a complementary method which can be informative even when x is too high-dimensional to accurately estimate F.

### Indirect strategy for entropy production inference

Thus far our estimators have been based on detailed microscopic information, but as the dimensionality of x increases, estimating the microscopic steady-state properties requires exponentially more data. To combat this curse of dimensionality, it is standard to project high-dimensional dynamics onto a few preferred degrees of freedom9,44,45,46. For example, the projected coordinates could be two principle components from a principle component analysis. Such projected dynamics have been used to detect broken detailed balance9, however, these reduced dynamics overlook hidden dissipation from the discarded degrees of freedom.

An alternative strategy that retains contributions from all degrees of freedom is provided by recent theoretical results relating entropy production and current fluctuations in general nonequilibrium steady-state dynamics28,29,47,48,49,50,51,52. To this end, we introduce a single projected macroscopic current, constructed as a linear combination of the microscopic currents:

$$j_{\mathbf{d}} = {\int} d {\mathbf{x}}\,{\mathbf{d}}({\mathbf{x}}) \cdot {\mathbf{j}}({\mathbf{x}}),$$
(14)

where d(x) is a vector field that weights how much a microscopic current at x contributes to the macroscopic current jd. Any physically measurable current—electrons flowing through a wire, heat passing from one bead to the other, or the production of a chemical species in a reaction network—can be cast as such a linear superposition of microscopic currents. Figure 3 illustrates one particular example by applying the weighting field d(x) = F(x) to project microscopic currents onto the single macroscopic current jF. Each step of the trajectory is weighted by the value of d associated with the observed transition, and this weighted average, accumulated as a function of time, is the fluctuating macroscopic current (fluctuating because it depends on the particular stochastic trajectory). Each trajectory observed for a time τobs yields a measurement jd of the fluctuating current, and many such trajectories give a distribution P(jd) characterized by mean 〈jd〉 and variance Var(jd). The thermodynamic uncertainty relation (TUR)28,29,48,49,50 then constrains the entropy production rate in terms of the dynamical fluctuations of this macroscopic current:

$$\dot S_{{\mathrm{ss}}} \ge \frac{{2k_{\mathrm{B}}\left\langle {j_{\mathbf{d}}} \right\rangle ^2}}{{\tau _{{\mathrm{obs}}}{\mathrm{Var}}(j_{\mathbf{d}})}} \equiv \dot S_{{\mathrm{TUR}}}^{({\mathbf{d}})}.$$
(15)

Note that we have used Var(jd) to denote the variance of the macroscopic empirical current distribution, but some prior work29,48 used this notation to denote the way the variance scaled with observation time. The difference between these notations is the factor of τobs in the denominator of the right hand side of Eq. (15).

Unlike the field of microscopic currents, j(x), the macroscopic current jd is a single scalar quantity, allowing estimates of its cumulants—particularly the mean $$\widehat {\left\langle {j_{\mathbf{d}}} \right\rangle }$$ and variance $$\widehat {{\mathrm{Var}}(j_{\mathbf{d}})}$$—to be extracted from a modest amount of experimental data. Indeed, measurements of kinesin fluctuations have recently been used to infer constraints on the efficiency of these molecular motors18,53. Importantly, the TUR is valid for any choice of d, granting freedom to consider fluctuations of arbitrary macroscopic currents, some of which will yield tighter bounds than others. In a later section, we use Monte Carlo sampling to seek a choice for d which yields the tightest possible bound, but first we consider an important physically motivated choice, d = F. In this case, the macroscopic current jF is the fluctuating entropy production rate (cf. Eqs. (7) and (14)), so $$\left\langle {j_{\mathbf{F}}} \right\rangle = \dot S_{{\mathrm{ss}}}$$. With access to F, we can thus compute the entropy production rate by simply taking the mean of the generalized current (the average slope in Fig. 3), or we could use the fluctuations from repeated realizations of jF to get a bound on $$\dot S_{{\mathrm{ss}}}$$ via Eq. (15).

It perhaps seems foolish to settle for a bound if one could compute the actual entropy production rate, but in practice one would not typically have access to F. More likely, it would only be possible to estimate F from data as $$\hat {\mathbf{F}}$$. With sufficient data, $$\hat {\mathbf{F}}$$ converges to F such that a temporal estimate of the entropy production rate would eventually become accurate, but this convergence is slow in high dimensions. Alternatively, by choosing $${\mathbf{d}} = \hat {\mathbf{F}}$$, we obtain a TUR lower bound estimate

$$\widehat {\dot S}_{{\mathrm{TUR}}}^{(\hat {\mathbf{F}})} = \frac{{2k_{\mathrm{B}}\widehat {\left\langle {j_{\hat {\mathbf{F}}}} \right\rangle }^2}}{{\tau _{{\mathrm{obs}}}\widehat {{\mathrm{Var}}(j_{\hat {\mathbf{F}}})}}}.$$
(16)

A key advantage of this estimate is that it is less sensitive to whether $$\hat {\mathbf{F}}$$ has converged than either $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{spat}}}$$ and $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}$$. When $$\hat {\mathbf{F}}$$ is noisily estimated due to little data or high dimensionality, the TUR estimate can nevertheless provide an accessible route to constraining the entropy production rate from experimental data.

### Convergence of the entropy production rate estimates

To assess the costs and benefits of the various estimation schemes, we numerically sampled trajectories for the two-bead model of Fig. 1 and for a variant with five beads coupled along a one-dimensional chain with spring constant k, the five beads being embedded in thermal baths whose temperatures linearly ramp from Tc to Th. Equation (7) gives the entropy production rate for the two-bead model as a function of the bath temperatures. An analogous expression is derived in Supplementary Note 1 for the model with five beads, and both expressions are plotted with a solid red line in Fig. 4. The temporal and spatial estimators both converge to these analytical expressions in the long trajectory limit, while the TUR estimate tends to the lower bound $$\dot S_{{\mathrm{TUR}}}^{{\mathbf{(d)}}}$$. We performed a series of calculations to assess: (1) how close is this lower bound to the true dissipation rate and (2) how long of a trajectory is needed to converge all three estimates.

We discuss the convergence results first, plotted as insets in Fig. 4. Using a trajectory of length τobs, $$\hat {\mathbf{F}}$$ was estimated, and this estimated thermodynamic force field was used to plot how quickly $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{spat}}}$$ and $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}$$ converged to their expected value of $$\dot S_{{\mathrm{ss}}}$$. To compare convergence of the TUR bound on an equal footing, we recognize that the τobs → ∞ limit of a long trajectory with perfect sampling will not yield $$\dot S_{{\mathrm{ss}}}$$ but rather the bound $$\dot S_{{\mathrm{TUR}}}^{({\mathbf{F}})}$$. In all three cases we scale the estimate by its appropriate infinite-sampling limit and observe how quickly this ratio decays to one. The superiority of the temporal estimator over the spatial one is clear in the two-bead model, and the inadequacy of the spatial estimator is so stark in the higher-dimensional five-bead model that it was prohibitive to compute. The TUR estimator performance is comparable to the temporal average estimator when F can be estimated well (low dimensionality and large thermodynamic driving). In the more challenging situation that the phase space is high dimensional and the statistical irreversibility is more subtle, the TUR estimator appears to offer some advantage. It converges with roughly an order of magnitude fewer samples than are required for $$\widehat {\dot S}_{{\mathrm{ss}}}^{{\mathrm{temp}}}$$ (see bottom right inset of Fig. 4b).

To understand how well one can estimate the entropy production rate from current fluctuations, we must also address how close the TUR lower bound is to $$\dot S_{{\mathrm{ss}}}$$. The dashed black line of Fig. 4 shows that the TUR lower bound equals the actual entropy production rate in the near-equilibrium limit Tc → Th. Far from equilibrium, the TUR lower bound remains the same order of magnitude as the entropy production rate, with the deviation increasing with the size of the temperature difference. Comparing the dashed black lines in two different dimensions, we can see that as more beads are added to the model, this deviation between $$\dot S_{{\mathrm{TUR}}}^{({\mathbf{F}})}$$ and $$\dot S_{{\mathrm{ss}}}$$ decreases. Hence the TUR bound more closely approximates the actual entropy production rate with increasing dimensionality and decreasing thermodynamic force, precisely the conditions when the TUR estimate converges more rapidly.

### Optimizing the macroscopic current

Thus far we have focused on measuring the statistics of a particular macroscopic empirical current, the fluctuating entropy production, constructed by choosing d = F. This choice was a natural starting point since the fluctuations are known to saturate Eq. (15) in the equilibrium limit Tc → Th29. However, when working with timeseries data we had to replace F by the estimate $$\hat {\mathbf{F}}$$, and this estimated thermodynamic force is error-prone in high dimensions. In the previous section we saw that the TUR estimator is sufficiently robust that a tight bound for $$\dot S_{{\mathrm{ss}}}$$ may be inferred even when $$\hat {\mathbf{F}}$$ has not fully converged to F. This robustness derives from the validity of Eq. (15) for all possible choices of d. The generality of the TUR can be further leveraged by optimizing over d:

$$\dot S_{{\mathrm{ss}}} \ge \frac{{2k_{\mathrm{B}}}}{{\tau _{{\mathrm{obs}}}}}\mathop {{{\mathrm{max}}}}\limits_{{\mathbf{d}}({\mathbf{x}})} \frac{{\left\langle {j_{\mathbf{d}}} \right\rangle ^2}}{{{\mathrm{Var}}(j_{\mathbf{d}})}}.$$
(17)

We are not aware of methods to explicitly compute the optimal choice of d, but a vector field d*(x) which outperforms F(x) can be found readily by Monte Carlo (MC) sampling with a preference for macroscopic currents with a large TUR ratio 〈jd2/Var(jd).

Each step of the MC algorithm requires 〈jd〉 and Var(jd), which could be estimated with trajectory sampling, as illustrated in Fig. 3a, c. In fact, one could collect a single long trajectory—from an experiment or from simulation—then sample d* based on mean and variance estimates $$\widehat {\left\langle {j_{{\mathbf{d}}^ \ast }} \right\rangle }$$ and $$\widehat {\left\langle {{\mathrm{Var}}(j_{{\mathbf{d}}^ \ast })} \right\rangle }$$ for that fixed trajectory. Such a scheme is enticing, but we warn that the procedure is susceptible to over-optimization of the TUR ratio since optimizing to maximize the ratio $$\widehat {\left\langle {j_{{\mathbf{d}}^ \ast }} \right\rangle }^2/\widehat {{\mathrm{Var}}(j_{{\mathbf{d}}^ \ast })}$$ is not the same as optimizing the ratio $$\left\langle {j_{{\mathbf{d}}^ \ast }} \right\rangle ^2/{\mathrm{Var}}(j_{{\mathbf{d}}^ \ast })$$. The former can yield a large value just because the trajectory happens to return anomalous estimates for the mean and variance of the generalized current. The latter ratio does not depend on any one trajectory but has rather averaged over all trajectories. Avoiding over-optimization requires appropriate cross-validation. For example, d* could be selected based on one sampled trajectory then the dissipation bound inferred by an independently sampled trajectory.

Rather than implementing such a cross-validation scheme, we avoided the over-optimization problem for this model system by putting the dynamics on a grid to compute the means and variances exactly. As described in Methods, we construct a continuous-time Markov jump process on a square lattice with grid spacing h = {h1, h2} such that the h → 0 jump process limits to the same Fokker-Planck description, Eq.(3), as the continuous-space Langevin dynamics48. The vector field d(x) is also discretized as a set of weights dx+h,x associated with the transition from x to the neighboring microstate at x + h (see Fig. 3b, d). In place of trajectory sampling, the mean and variance can be extracted from a standard computation of the current’s scaled cumulant generating function as a maximum eigenvalue of a tilted rate matrix54,55,56.

The MC sampling returns an ensemble of nearly-optimal choices for d* such that $$\dot S_{{\mathrm{ss}}} \ge \dot S_{{\mathrm{TUR}}}^{({\mathbf{d}}^ \ast )} \ge \dot S_{{\mathrm{TUR}}}^{({\mathbf{F}})}$$. Each d* from the ensemble yields a similar TUR ratio, but the near-optimal vector fields are qualitatively distinct (see Fig. 5). We lack a physical understanding of the differences between the various near-optimal choices d* and the thermodynamic force field F. Even without a clear physical interpretation, we have a straightforward numerical procedure for extracting as tight of an entropy production bound as can be obtained from macroscopic current fluctuations.

## Discussion

Physical systems in contact with multiple thermodynamic reservoirs support nonequilibrium dynamics that manifest as probability currents in phase space. Detection of these currents has been used in a biophysical context to differentiate between dissipative and equilibrium processes. In this paper, we have explored how the currents can be further utilized to infer the rate of entropy production. Using a solvable toy model, we demonstrated three inference strategies: one based on a spatial average, one based on a temporal average, and one based on fluctuations in the currents.

Regardless of strategy, the entropy production inference becomes more challenging and requires more data as the thermodynamic drive decreases. This challenge results from the fact that weakly driven systems produce trajectories which look very similar when played forward or backward in time. The weaker the drive, the more data it requires to confidently detect the statistical irreversibility.

It is in this weak driving limit that we see the most stark difference between the performance of the three studied estimators. As we move to higher-dimensional but weakly driven systems, it requires too much data to detect the statistical irreversibility at every point in phase space, so performing spatial averages is out of the question. The temporal average can still be taken, but for a fixed amount of data, estimates of F become systematically more error-prone with increased dimensionality. In that limit we find it useful to measure not just the average current, but also the variance. By leveraging the TUR we circumvent the need to accurately estimate F and achieve more rapid convergence.

The TUR-inspired estimator is not without pitfalls. Most prominently, it only returns a bound on the entropy production rate, and it is not simple to understand how tight this bound will be. That tightness, characterized by $$\eta \equiv \dot S_{{\mathrm{TUR}}}^{({\mathbf{F}})}/\dot S_{{\mathrm{ss}}}$$, does not, for example, depend solely on the strength of the thermodynamic drive. In Supplementary Note 2 and Supplementary Figure 2, we make this point by separately tuning the various spring constants to show how η depends on properties of the system in addition to the ratio of reservoir temperatures. Though our modestly sized toy systems (no more than five coupled beads) always produce η of order unity, there is little reason to believe that the TUR bound will remain a good proxy for the entropy production rate in the limit of a high-dimensional system in which only a few degrees of freedom are visible. Future experiments are needed to elucidate whether these inference strategies can be usefully applied to the complex biophysical dynamics that has motivated our study.

## Methods

### Numerically generating the bead-spring dynamics

We simulate the bead-spring dynamics in two complementary ways: as discrete-time trajectories in continuous-space and as continuous-time trajectories in discrete space. The results presented in Figs. 2 and 4 stem from continuous-space calculations. Trajectories are generated by numerically integrating the overdamped Langevin equation using the stochastic Euler integrator with timestep Δt according to x(i+1)Δt = xiΔt + AxiΔtΔt + Fη, where η is a vector of random numbers drawn from the normal distribution with variance Δt for each timestep. Setting k = γ = 1, we numerically integrate the equation of motion with timestep Δt = 0.001. The initial condition x0 is effectively drawn from the steady state by starting the clock after integrating the dynamics for a long time from a random initial configuration. In addition to the discrete-time simulations, continuous-time jump trajectories were simulated in discrete space with a rate

$${\Bbb W}_{{\mathbf{x}} + {\mathbf{h}},{\mathbf{x}}} = \left[ {\left( {A{\mathbf{x}}/2} \right) + {\mathbf{h}}^TD} \right] \cdot {\mathbf{h}}/{\mathbf{h}}^T{\mathbf{h}}$$
(18)

for transitioning from a lattice site at position x to a neighboring site at position x + h48. This discrete-space trajectory was generated by first discretizing the phase space on a 200 by 200 grid with x1 ranging from −50 to 50 and x2 ranging from −20 to 20 as shown in Fig. 3a. The Markov jump process is simulated using the Gillespie algorithm57.

### Estimating density and current

To form histogrammed estimates, we bin the data into a 100 by 100 grid with x1 ranging between ±50 and x2 ranging between ±20. We can write the kernel functions as $$K({\mathbf{x}}_{i{\mathrm{\Delta }}t},{\mathbf{x}}) = L({\mathbf{x}}_{i{\mathrm{\Delta }}t},{\mathbf{x}}) = \mathop {\sum}\nolimits_{m,n} \chi _{mn}({\mathbf{x}})\chi _{mn}({\mathbf{x}}_{i{\mathrm{\Delta }}t})$$, where χmn is the indicator function taking the value 1 only if the argument lies in the bin with row and column indices m and n. Alternatively, a continuous estimate of the density and current can be constructed using smooth non-negative functions for K and L, each of which integrates to one. For our kernel density estimates, we place a Gaussian at each data point by choosing K(xiΔt, x) exp[(x − xiΔt)TΣ−1(x − xiΔt)]. The breadth of the ith Gaussians bi, known as the bandwidth, sets the diagonal matrix Σ−1 via $$\Sigma _{ii} = b_i^2$$. The estimation of currents proceeds similarly using kernel regression with the Epanechnikov kernel58

$$L({\mathbf{x}}_{i{\mathrm{\Delta }}t},{\mathbf{x}}) \propto \left( {\begin{array}{*{20}{l}} {\mathop {\prod}\limits_{j = 1}^d \left( {1 - \frac{{(x_{i{\mathrm{\Delta }}t;j} - x_j)^2}}{{b_j^2}}} \right),} \hfill & {|{\mathbf{x}}_{i{\mathrm{\Delta }}t} - {\mathbf{x}}| < {\mathbf{b}}} \hfill \\ {0,} \hfill & {{\mathrm{otherwise,}}} \hfill \end{array}} \right.$$
(19)

where d is the spatial dimension and xiΔt;j is the jth component of the configuration xiΔt at discrete time i. The bandwidth for both Gaussian and Epanechnikov kernels are chosen using the rule of thumb suggested by Bowman and Azzalini58, specifically

$${\mathbf{b}} = \left( {\frac{4}{{N(d + 2)}}} \right)^{1/(d + 4)}\frac{{\tilde {\boldsymbol{\sigma }}}}{{0.6745}}.$$
(20)

Here N denotes the length of the data, and $$\tilde {\boldsymbol{\sigma }}$$ is the median absolute deviation estimator computed by $$\tilde {\boldsymbol{\sigma }} = \sqrt {{\mathrm{median}}\{ |v - {\mathrm{median}}(v)|\} {\mathrm{median}}\{ |{\mathbf{x}} - {\mathrm{median}}({\mathbf{x}})|\} }$$, where v is the magnitude of the velocities. In general the bandwidth will go to zero with increasing data length, so the kernel estimator should be asymptotically unbiased. In that limit of infinite data, the differences between histogram and kernel density estimates are insignificant. When data is limited, we find the fastest convergence by using kernel density estimates with a multivariate Gaussian for K and the Epanechnikov kernel for L.

To optimally handle limited data, the bandwidth is typically chosen to minimize the mean squared error (MSE) of the estimated function59,60,61:

$${\mathrm{MSE}}_{\dot{S}_{{\mathrm{ss}}}} = \left\langle {\left( {\widehat {\dot S}_{{\mathrm{ss}}} - \dot S_{{\mathrm{ss}}}} \right)^2} \right\rangle \ \,{\mathrm{and}}\, \ {\mathrm{MSE}}_{{\mathrm{TUR}}} = \left\langle {\left( {\widehat {\dot S}_{{\mathrm{TUR}}} - \dot S_{{\mathrm{TUR}}}} \right)^2} \right\rangle ,$$
(21)

where the expectation value is taken over realizations of trajectories. The MSE is naturally a function of the bandwidth since the value of the estimator depends on b. Supplementary Figure 1 shows this bandwidth-dependence of the MSE estimated from the five-bead model temporal estimator and TUR lower bound with τobs = 1200 and Tc/Th = 0.1. Notice that the TUR lower bound tends to be less sensitive to the choice of bandwidth.

### Estimation of the TUR lower bound

To get estimates for the current’s mean and variance, $$\widehat {\left\langle {j_{\mathbf{d}}} \right\rangle }$$ and $$\widehat {{\mathrm{Var}}(j_{\mathbf{d}})}$$, from a single realization of length τobs, we first divide the trajectory into τobsτ subtrajectories of length Δτ. For the continuous-time Markov jump process as shown in Fig. 3b, the vector field d(x) is discretized as a set of weights dx+h,x associated with the edges of the lattice and the trajectory is series of lattice sites occupied over time. The accumulated current, as illustrated in Fig. 3d, is computed as the sum of weights along the subtrajectory k: $$J_{\mathbf{d}}^{(k)} = \mathop {\sum}\nolimits_i {{\mathbf{d}}_{{\mathbf{x}}_i,{\mathbf{x}}_{i + 1}}} .$$ For the continuous-space Langevin dynamics, the accumulated current for subtrajectory is given by $$J_{\mathbf{d}}^{(k)} = \mathop {\sum}\nolimits_i {\mathbf{d}} \left( {\frac{{{\mathbf{x}}_{i{\mathrm{\Delta }}t} + {\mathbf{x}}_{(i - 1){\mathrm{\Delta }}t}}}{2}} \right) \cdot \left( {{\mathbf{x}}_{i{\mathrm{\Delta }}t} - {\mathbf{x}}_{(i - 1){\mathrm{\Delta }}t}} \right)$$. This accumulated current is scaled by the trajectory length to get the fluctuating macroscopic current for subtrajectory k: $$j_{\mathbf{d}}^{(k)} = J_{\mathbf{d}}^{(k)}/\Delta \tau$$. The sample mean and variance of $$\left\{ {j_{\mathbf{d}}^{(1)},j_{\mathbf{d}}^{(2)}},... \right\}$$ give $$\widehat {\left\langle {j_{\mathbf{d}}} \right\rangle }$$ and $$\widehat {{\mathrm{Var}}(j_{\mathbf{d}})}$$, respectively.

### Computing the mean and variance by tilting

It is useful to conceptualize 〈jd〉 and Var(jd) in terms of sampled trajectories, but finite trajectory sampling will result in statistical errors. We may alternatively compute the mean and variance as the first two derivatives of the scaled cumulant generating function $$\phi (\lambda ) = {\mathrm{lim}}_{\tau _{{\mathrm{obs}}} \to \infty }\frac{1}{{\tau _{{\mathrm{obs}}}}}{\mathrm{ln}}\left\langle {e^{\lambda j_{\mathbf{d}}\tau _{{\mathrm{obs}}}}} \right\rangle$$, evaluated at λ = 0. The expectation value averages over all trajectories of length τobs, and in the long-time limit, ϕ(λ) coincides with the maximum eigenvalue of the tilted operator with matrix elements $${\Bbb W}(\lambda )_{{\mathbf{x}} + {\mathbf{h}},{\mathbf{x}}} = {\Bbb W}_{{\mathbf{x}} + {\mathbf{h}},{\mathbf{x}}}e^{\lambda d_{{\mathbf{x}} + {\mathbf{h}},{\mathbf{x}}}}$$54,55,56. By discretizing space, we computed ϕ(λ) around λ = 0 as the maximal eigenvalue of the tilted operator. Using numerical derivatives, we estimate

$$\left\langle {j_{\mathbf{d}}} \right\rangle = \phi ' (0) \approx \frac{{\phi (\delta \lambda ) - \phi ( - \delta \lambda )}}{{2\delta \lambda }}$$
(22)
$${\mathrm{Var}}(j_{\mathbf{d}}) = \phi '' (0) \approx \frac{{\phi (\delta \lambda ) + \phi ( - \delta \lambda )}}{{\delta \lambda ^2}}$$
(23)

with δλ = 0.00001.

### MC optimization

We seek a vector field d(x) such that the TUR bound is as large as possible. To identify such a choice of d, we first decompose it into a basis of M = 100 Gaussians:

$${\mathbf{d}}({\mathbf{x}}) = \mathop {\sum}\limits_{i = 1}^M {w^{(i)}} {\mathrm{exp}}\left[ {({\mathbf{x}} - {\mathbf{x}}^{(i)})B^{ - 1}({\mathbf{x}} - {\mathbf{x}}^{(i)})} \right].$$
(24)

The ith Gaussian, centered at position x(i), carries a weight w(i). The centers for the first 50 Gaussians are uniformally sampled with x1 ranging from −50 to 50 and x2 from −20 to 20. The breadth of the Gaussians along the i direction, Bii, is set to 10% of the length of the interval from which uniform samples are drawn. Only the weights for these 50 Gaussians will be allowed to freely vary. The remaining 50 Gaussians are paired with the first 50 to impose the antisymmetry d(x) = −d(−x). Practically, this antisymmetry constraint is achieved by placing a second Gaussian at −x with the opposite weight as the Gaussian positioned at x.

With this regularization, we replace the optimization of d with a sampling problem. We sample the first 50 weights w in proportion to $${\mathrm{exp}}(\beta \dot S_{{\mathrm{TUR}}}^{({\mathbf{d}})})$$, where β is an effective inverse temperature and $$\dot S_{{\mathrm{TUR}}}^{({\mathbf{d}})}$$ depends on the weights since d depends on w. By choosing β = 5000, the sampling is strongly biased toward weights that give a near-optimal value of the TUR bound. After initializing the weights with uniform random numbers from [−1, 1], Monte Carlo moves w → w′ were proposed by perturbing the wi's by random uniform numbers drawn from [−0.5, 0.5]. The d′ corresponding to these new weights was computed according to Eq. (24), and the TUR bound for that proposed macroscopic current was computed using numerical derivatives of the tilted operator $${\Bbb W}(\lambda )$$ around λ = 0 as described above. The maximum eigenvalue calculations made use of Mathematica’s implementation of the Arnoldi method, performed using sparse matrices. Each proposed move to w′ was accepted with the Metropolis criterion $${\mathrm{min}}[1,{\mathrm{exp}}( - \beta (\dot S_{{\mathrm{TUR}}}^{({\mathbf{d}})} - \dot S_{{\mathrm{TUR}}}^{({\mathbf{d}}' )}))]$$.

In addition to starting from a random choice of d, we performed MC sampling about the thermodynamic force by expressing d as

$${\mathbf{d}}({\mathbf{x}}) = {\mathbf{F}}({\mathbf{x}}) +\mathop {\sum}\limits_{i = 1}^M {w^{(i)}} {\mathrm{exp}}\left[ {({\mathbf{x}} - {\mathbf{x}}^{(i)})B^{ - 1}({\mathbf{x}} - {\mathbf{x}}^{(i)})} \right].$$
(25)

Again, we have 100 Gaussians, half of them uniformally placed throughout the space and the rest positioned to make the perturbation antisymmetric. We stochastically update the weights by adding a uniform random number drawn from [−0.05, 0.05], and conditionally accept the update with the same Metropolis factor as before. The resulting TUR lower bound tends toward higher values until it hits a plateau (Fig. 5 blue line). For each temperature ratio in Fig. 4a, the MC sampling was run for 500 steps, after which the TUR bound achieved a plateau and further optimization is either impossible or at least significantly more challenging.