Thermodynamic fluctuation theorems govern human sensorimotor learning

Hack, P.; Lindig-Leon, C.; Gottwald, S.; Braun, D. A.

doi:10.1038/s41598-023-27736-8

Download PDF

Article
Open access
Published: 17 January 2023

Thermodynamic fluctuation theorems govern human sensorimotor learning

P. Hack¹,
C. Lindig-Leon¹,
S. Gottwald¹ &
…
D. A. Braun¹

Scientific Reports volume 13, Article number: 869 (2023) Cite this article

856 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

The application of thermodynamic reasoning in the study of learning systems has a long tradition. Recently, new tools relating perfect thermodynamic adaptation to the adaptation process have been developed. These results, known as fluctuation theorems, have been tested experimentally in several physical scenarios and, moreover, they have been shown to be valid under broad mathematical conditions. Hence, although not experimentally challenged yet, they are presumed to apply to learning systems as well. Here we address this challenge by testing the applicability of fluctuation theorems in learning systems, more specifically, in human sensorimotor learning. In particular, we relate adaptive movement trajectories in a changing visuomotor rotation task to fully adapted steady-state behavior of individual participants. We find that human adaptive behavior in our task is generally consistent with fluctuation theorem predictions and discuss the merits and limitations of the approach.

Contextual inference underlies the learning of sensorimotor repertoires

Article 24 November 2021

Statistical determinants of visuomotor adaptation along different dimensions during naturalistic 3D reaches

Article Open access 17 June 2022

Perception and control: individual difference in the sense of agency is associated with learnability in sensorimotor adaptation

Article Open access 15 October 2021

Introduction

The study of learning systems with concepts borrowed from statistical mechanics and thermodynamics has a long history reaching back to Maxwell’s demon and the ensuing debate on the relation between physics and information³⁴. Over the last 20 years, the informational view of thermodynamics has experienced great developments, which has allowed to broaden its scope form equilibrium to non-equilibrium phenomena^{10, 22}. Of particular importance are the so-called fluctuation theorems^{7, 20, 42}, which relate equilibrium quantities to non-equilibrium trajectories allowing, thus, to approximate equilibrium quantities via experimental realizations of non-equilibrium processes^{32, 53}. Among the fluctuation theorems, two results stand out, Jarzynski’s equality^{4, 19, 21} and Crooks’ fluctuation theorem^{6, 8}, as they aim to bridge the apparent chasm between reversible microscopic laws and irreversible macroscopic phenomena²⁹.

The advances in non-equilibrium thermodynamics have recently also led to new theoretical insights into simple learning systems^{12, 13, 16, 31, 35, 46}. Abstractly, thermodynamic quantities like energy, entropy or free energy can be thought to define order relations between states^{14, 25}, which makes them applicable to a wide range of problems. In the economic sciences, for example, such order relations are typically used to define a decision-maker’s preferences over states³⁰. Accordingly, a decision-maker or a learning system can be thought to maximize a utility function, analogous to a physical system that aims to minimize an energy function. Moreover, in the presence of uncertainty in stochastic choice, such decision-makers can be thought to operate under entropy constraints reflecting the decision-maker’s precision^{31, 34}, resulting in soft-maximizing the corresponding utility function instead of perfectly maximizing it. This is formally equivalent to following a Boltzmann distribution with energy given by the utility. Therefore, in this picture, the physical concept of work corresponds to utility changes caused by the environment, whereas the physical concept of heat corresponds to utility gains due to internal adaptation⁴⁶. Like a thermodynamic system is driven by work, such learning systems are driven by changes in the utility landscape (e.g. changes in an error signal). By exposing learning systems to varying environmental conditions, it has been hypothesized that adaptive behavior can be studied in terms of fluctuation theorems^{12, 16}, which are not necessarily tied to physical processes but are broadly applicable to stochastic processes satisfying certain constraints¹⁸.

Fluctuation theorems are usually deployed in statistical mechanics; particularly, the study of nonequilibrium steady states in thermodynamics. In this setting, one normally assumes a probabilistic description of an ensemble of many particles, i.e., the kinds of systems usually considered in statistical thermodynamics. However, as described in^{41, 42}, exactly the same principles and fluctuation theorems also apply to the path of a single particle, leading to stochastic thermodynamics. This suggests that fluctuation theorems may not only be applicable to the statistics of ensembles of many learners, but also when describing the trajectory of a single participant during a learning process.

Although fluctuation theorems have been empirically observed in numerous experiments in the physical sciences^{1, 5, 11, 28, 37, 44}, there have been no reported experimental results relating fluctuation theorems to adaptive behavior in humans or other living beings. Here, we test Jarzynski’s equality and Crooks’ fluctuation theorem experimentally in a human sensorimotor adaptation task. In this context, the fluctuation theorem establishes a linear relationship between the externally imposed utility changes driving the learning process (which are directly related to non-predicted information and energy dissipation⁴⁶) and the log-probability ratio between forward and backward adaptation trajectories, when exposing participants to the sequence of environments either in the forward or reverse order. Accordingly, such learners can be quantitatively characterized by a hysteresis effect that can also be observed in simple physical systems.

Results

In a visuomotor adaptation task, human participants controlled a cursor on a screen towards a single stationary target by moving a mechanical manipulandum that was obscured from their vision under an overlaid screen—see Fig. 1A. Crucially, in each trial n, the position of the cursor could be rotated with angle $\theta _n$ relative to the actual hand position so that participants had to adapt when moving the cursor from the start position to the target. To measure participants’ adaptive state, we recorded their movement position at the time of crossing a certain distance from the start position, so that their response could be characterized by an angle $x_n$. The deviation between participants’ response $x_n$ and the required movement incurs a sensorimotor loss $E_n$²⁴ in trial n, that can be quantified as an exponential quadratic error

$$\begin{aligned} E_n(x)= 1 - e^{- (x-(\theta _n+b))^2}, \end{aligned}$$

(1)

that depends on the actual rotation angle $\theta _n$ set in trial n. The parameter b is a participant-specific parameter allowing for bias due to posture, biomechanics, the mechanics of the manipulandum, or other influences—see Fig. 1D. The loss (1) is taken to be the energy (or negative utility) of a participant’s stochastic response $X_n = x_n$. For a bounded rational decision-maker^{26, 27, 31, 39} that optimizes this loss under uncertainty, the optimal pointing behavior after a suitably long adaptation time is described by a Boltzmann equilibrium distribution $p_n^{eq}$ of the form

$$\begin{aligned} p_{n}^{eq}(x_n) = \exp \big (-\beta ( E_n(x_n) - F_n )\big ), \end{aligned}$$

(2)

for all $x_n\in A_n$, where the sensorimotor error $E_n(x_n)$ plays the role of an energy, the free energy term $F_n = \frac{1}{\beta } \log \int _{A_n} \exp \left( -\beta E_n(x_n) \right) dx_n$ is caused by the normalization, and $A_n$ is the support of the equilibrium distribution $p_{n}^{eq}$, which will vary for each participant, as we explain in Sect. A.3.3. See Fig. 1C for a representation of (2). Moreover, the softness-parameter $\beta$, also known as inverse temperature or precision, controls the trade-off between entropy maximization and energy minimization, essentially interpolating between a purely stochastic choice ($\beta = 0$) and a purely rational choice ($\beta \rightarrow \infty$) minimizing the energy perfectly.

The task consisted of a sequence of target reaching trials, where the rotation angle $\theta _n$ changed from one trial n to the next trial $n+1$ according to a given up-down protocol—see Fig. 1B—, so that participants’ responses over trials could be represented by a trajectory $\varvec{x}=(x_0,x_1,\ldots ,x_N)$. When the environment is changing over trials, we can distinguish cumulative error changes $\Delta E_{ext}(\varvec{x}) {:=}\sum _{n=0}^{N-1} (E_{n+1}(x_n)-E_{n}(x_n))$ that are induced externally by changes in the environmental parameter $\theta _n$, from cumulative error changes $\Delta E_{int}(\varvec{x}) {:=}\sum _{n=1}^{N} (E_{n}(x_n)-E_{n}(x_{n-1}))$ due to internal adaptation when subjects change their response from $x_{n-1}$ to $x_{n}$. Crucially, it is exactly the externally induced changes in error, $\Delta E_{ext}(\varvec{x})$, analogous to the physical concept of work, that drive the adaptation process: if $\Delta E_{ext}(\varvec{x})$ is large, the system is more surprised and has to adapt more. In the following, we thus refer to $\Delta E_{ext}(\varvec{x})$ as driving error or driving signal. When applying Crooks’ fluctuation theorem for general adaptive systems¹⁸ to the above setting, we obtain the linear relation

$$\begin{aligned} \Delta E_{ext}(\varvec{x}) - \Delta F = \frac{1}{\beta }\log \left( \frac{\rho ^F(\varvec{x})}{\rho ^B(\varvec{x}^R)} \right) , \end{aligned}$$

(3)

where $\varvec{x}^R = (x_N,\ldots ,x_1)$ is the reverse trajectory, $\Delta F$ denotes the free energy difference $F_N-F_0$ and the distributions $\rho ^F(\cdot )$ and $\rho ^B(\cdot )$ denote the probability of observing a certain trajectory when the learner faces a series of environments in some specific order or the order is reversed, respectively. This form of Crooks’ theorem allows for an intuitive interpretation, in that any difference in probability of a trajectory and its reverse signifying a hysteresis can be directly related to an excess loss that is irretrievably generated because of imperfect adaptation. Unfortunately, Equation (3) is hard to determine from data, as it would require to estimate probability distributions over paths. However, there is an equivalent form of Crooks’ theorem that groups all trajectories according to their associated value of $\Delta E_{ext}(\varvec{x})$ with corresponding distributions $\rho ^F$ and $\rho ^B$ over these values, such that

$$\begin{aligned} \Delta E_{ext}(\varvec{x}) - \Delta F = \frac{1}{\beta }\log \left( \frac{\rho ^F(\Delta E_{ext}(\varvec{x}))}{\rho ^B(-\Delta E_{ext}(\varvec{x}))} \right) . \end{aligned}$$

(4)

The distribution $\rho ^F(\cdot )$ can be interpreted as the probability that the learner experiences a certain overall surprise when being exposed sequentially to a series of environments and $\rho ^B(\cdot )$ is the analogous concept when the order in which the environments are presented is reversed. In equation (4), these densities are evaluated at the actual driving errors $\Delta E_{ext}(\varvec{x})$ and $-\Delta E_{ext}(\varvec{x})$, respectively, for a particular adaptive trajectory $\varvec{x}$.

A direct consequence of (4) is Jarzynski’s equality⁶, which states that

$$\begin{aligned} \big \langle e^{-\beta \Delta E_{ext}(\varvec{X})}\big \rangle = e^{-\beta \Delta F}, \end{aligned}$$

(5)

where $\langle ~ \cdot ~ \rangle {:=}{\mathbb {E}}[~\cdot ~]$ denotes the expectation operator, considering $\varvec{X} = (X_n)_{n=0}^N$ a Markov chain with transition densities $\Pi _n$ that have $p^{eq}_n$ as stationary distributions, that is, for each n, $p^{eq}_n$ is the stationary distribution for $X_n$. In our experiment, $\varvec{X}$ represents participants’ responses that are repeated over multiple repetitions of the forward-backward protocol. In the following, we will test the relationships (4) and (5) experimentally with $\Delta F = 0$ as our human learners start and end in the same environmental state (i.e. $F_N=F_0$). Note that, in our particular setting where there is no overall change in the free energy $(\Delta F=0)$, Equation (5) suggests that the expected value $\big \langle e^{-\beta \Delta E_{ext}(\varvec{X})}\big \rangle$ equals $e^{-\beta 0}=1$ irrespective of the value taken by $\beta$. This provides a quantitative prediction that we will evaluate empirically below.

In our experiment the task is divided into 20 cycles of 66 trials each, following the protocol (9) illustrated in Fig. 1B. We refer to trials 1 to 25 of each cycle as a realization of the forward process and trials 34 to 58 as a realization of the backward process. Notice the backward process consists of the same angles as the forward process, that is, the same utility functions, but in reversed order. Thus, we record for each participant 20 values for $\Delta E_{ext}(\varvec{x})$ in both the forward and backward processes that we use to estimate participants’ probability densities of the forward and backward processes, $\rho ^F$ and $\rho ^B$, respectively, using kernel density estimation. As the amount of data is limited to test the linear relation in (4), we will use simulation results in the following to compare against participants’ behavior.

When simulating an artificial decision-maker based on a stochastic optimization scheme with Markovian dynamics, for example a Metropolis-Hasting algorithm with target distribution $p_n^{eq}\propto \exp (-\beta E_n)$, it is clear that we can recover the linear relationship (4), provided that sufficient samples are collected¹⁸—see, for example, a simulation with 1000 cycles in Fig. 2A, where we can see a good adjustment between the theoretical prediction (in black) and the linear regression of the observed data (in red). As a result, (5) also holds in this scenario. The more critical question is what happens when only few samples are available. To this end, we use the stochastic optimization algorithm to simulate the protocol of our experiment, that is, 20 cycles, and indicate confidence intervals using 1000 bootstraps. It can be seen in Fig. 2B that the theoretical prediction is consistent with the $99\%$ confidence interval in the region where $|\Delta E_{\text {ext}}| \le 4$ (which is the region where our experimental data lies). Using the same bootstrapped data, we obtain several estimates of $\langle e^{- \Delta E_{ext}(\varvec{X})}\rangle$ (the mean of $e^{- \Delta E_{ext}(\varvec{X})}$ for the observed values of $\Delta E_{ext}(\varvec{X})$ at each bootstrap) which we use to calculate a confidence interval for it. This results in the $99\%$ confidence interval for $\langle e^{-\Delta E_{ext}(\varvec{X})}\rangle$ being (0.48, 1.64), which is consistent with the theoretical prediction $\langle e^{-\Delta E_{ext}(\varvec{X})}\rangle = 1$ for $\Delta F = 0$ according to Equation (5). Accordingly, we will expect a similar behavior for our experimental data. Note we take, for simplicity, $b=0$, $\beta =1$ and, for all n, $A_n=[-90,90]$ in these simulations (see Methods).

Participants’ average adaptive responses can be seen in Fig. 3 compared to the experimentally imposed true parameter values (the trial-by-trial responses can be seen in Fig. 6). The green and red lines distinguish the forward and backward trajectories, respectively, so that, from the contrast between the two curves, hysteresis becomes apparent, as common in simple physical systems²² and as reported previously in similar experiments for sensorimotor adaptation⁵⁰. Participants that achieve at least $50\%$ adaptation are shaded by a green background color and are our participants of interest. The three participants that fail to achieve this minimum adaptation level are marked by a red shade. Instead of excluding these participants entirely from the analysis, we keep them in to show the contrast to the well-adapted participants and to highlight that the results reported for the well-adapted participants do not hold trivially for any participant producing inconsistent behavior.

Figure 4 shows participants’ data compared to the theoretical prediction from (4) and the 99 % confidence interval after 1000 bootstraps as in the case of the simulations in Fig. 2B. There, we see that our data follow the trend of the theoretical prediction and lie within or close to the confidence interval bounds of the prediction in broad regions for several participants. This is not a trivial result, as can be easily seen, when randomizing the temporal order of the trajectory points or when replacing the utility function with another one that does not fit the setup. Figure 5A,B show this, for example, for an inverted Mexican hat ((10) with $\sigma =4$) that assigns low utility to the target region, and for resamples of the trajectory points in a random order, respectively. Both results are clearly incompatible with the theoretical prediction.

When conducting an additional robustness analysis in Fig. 8, we found that, under the proposed utility function, participants’ behavior is compatible with Crooks’ fluctuation theorem for a broad neighbourhood of parameter settings, but breaks down when choosing implausible parameters. Regarding Jarzynski’s equality (5), the confidence intervals for the majority of participants are consistent with the theoretical prediction when using the bootstrapped values to calculate $\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle$ (cf. Table 1). In contrast, when following the same procedure for both the inverted Mexican hat and the randomized procedure, we obtain consistency for a considerably smaller number of participants. In particular, for the inverted Mexican hat, we obtain consistency for only two participants. Moreover, these participants are $S_8$ and $S_9$, which belong to the group that did not reach at least $50\%$ adaptation (indicated by the red background area in the figures). For the randomized procedure, the expected number of participants that show consistency is also close to two, although the specific participants which are consistent vary with the realization of the randomized procedure. More specifically, after 1000 runs of the randomized procedure, the mean number of consistent participants we observed was 2.33.

Table 1 Experimental results for Jarzynski’s equality. We include the confidence intervals for the left hand side of (5), which we obtain after bootstrapping the observed values of $\Delta E_{ext}(\varvec{x})$ for the forward process 1000 times and estimating $\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle$ by its mean for each set of bootstrapped data. In our experiment we have $\Delta F=0$ in the right hand side of (5), resulting in a theoretical prediction of $\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle =1.0$. Note, that for most subjects the value of 1.0 lies inside the confidence interval, which does not hold when assuming unsuitable loss functions, as discussed at the end of the Results. Participants that achieve at least $50\%$ adaptation (c.f. Fig. 3) are shaded by a green background color .

Full size table

Discussion

In our experiment we have investigated the hypothesis that human sensorimotor adaptation may be participant to the thermodynamic fluctuation theorems first reported by Crooks⁷ and Jarzynski²⁰. In particular, we tested whether changes in sensorimotor error induced externally by an experimental protocol are linearly related to the log-ratio of the probabilities of behavioral trajectories under a given forward and time-reversed backward protocol of a sequence of visuomotor rotations. We found that participants’ data, in all cases where participants showed an appropriate adaptive response, was consistent with this prediction or close to its confidence interval bounds, as expected from our simulations with finite sample size. Moreover, we found that the exponentiated error averaged over the path probabilities was statistically compatible with unity for these participants, in line with Jarzynski’s theorem.

Together these results not only extend the experimental evidence of Boltzmann-like relationships between the probabilities of behavior and the corresponding order-inducing functions—such as energy, utility, or sensorimotor error—from the equilibrium to the non-equilibrium domain, but also from simple physical systems to more complex learning systems when studying adaptation in changing environments, deepening, thus, the parallelism between thermodynamics in physics and decision-making systems³¹.

When testing for the validity of thermodynamic relations, one of the most critical issues is the choice of the energy function, that is, in our case, the error cost function. In physical systems, the energy function is usually hypothesized following from simple models involving point masses, springs, rigid bodies, etc., and generally requires knowledge of the degrees of freedom of the system under consideration. Here we have used an exponential quadratic error as a utility function, as it has been suggested previously that human pointing behavior can be best captured by loss functions that approximately follow a negative parabola for small errors and then level off for large errors²⁴. In the absence of very large errors, many studies in the literature on sensorimotor learning have only used the quadratic loss term^{48, 52}. Quadratic errors have also been advocated in the context of the central limit theorem and in terms of prediction errors in the context of predictive coding^{36, 45,46,47}. Thus, our assumptions regarding the loss function are compatible with the literature at large. Crucially, the reported results fail when assuming non-sensical cost functions, like the Mexican hat.

Experimental tests of both Jarzynski’s equality (5) and Crooks fluctuation theorem (4) have been previously reported in classical physics^{5, 11, 28, 37, 49} and also, in the case of Jarzynski’s equality, in quantum physics^{1, 44}. Importantly, these results have been successfully tested in several contexts: unfolding and refolding processes involving RNA^{5, 28}, electronic transitions between electrodes manipulating a charge parameter³⁷, rotation of a macroscopic object inside a fluid surrounded by magnets where the current of a wire attached to the macroscopic object is manipulated¹¹, and a trapped ion^{1, 44}. Despite differences in physical realization, protocols, and energy functions (and thus work functions), all the above experiments follow the same basic design behind the approach presented here. This supports the claim that fluctuation theorems do not necessarily rely on involved physical assumptions but are simple mathematical properties of certain stochastic processes¹⁸, although originally they were derived in the context of non-equilibrium thermodynamics^{6, 19}.

Mathematically, Crooks theorem (4) holds for any Markov process (i), whose initial distribution is in equilibrium (ii), and whose transition probabilities satisfy detailed balance with respect to the corresponding equilibrium distributions (iii)¹⁸. Our experimental test of Equation (4) can be seen, thus, as a test for the hypothesis that human sensorimotor adaptation processes satisfy conditions (i), (ii), and (iii). Condition (i) requires adaptation to be Markovian, which is in line with most error-driven models of sensorimotor adaptation⁴³ that assume some internal state update of the form $x_{t+1}=f(x_t, e)$ with adaptive state x and error e. While such models have proven fruitful for simple adaptation tasks like ours, they also have clear limitations, for example when it comes to meta-learning processes that have been reported in more complex learning scenarios^{2, 17}. Condition (ii) is supported by our data in the second and last rows of Fig. 7, where it can be seen that participants’ behavior at the beginning of each cycle is at least approximately consistent with the equilibrium behavior recorded prior to the start of the experiment. Condition (iii) requires that the adaptive process converges to the equilibrium distribution (2) dictated by the environment and that the behaviour remains statistically unchanged when staying in that environment. Moreover, it requires that the equilibrium behavior at each energy level is time-reversible, that means, once adaptation has ceased the trial-by-trial behavior would have the same statistics when played forward or backward in a video recording. Note, however, that does not imply time-reversibility over the entire adaptation trajectory, but is only required locally for each transition step. In our sensorimotor setting, this would mean that after a suitably long adaptation time with perfect adaptation there would ultimately be no hysteresis, and accordingly it would be impossible to tell where the learner has come from. If we regard, for example, Metropolis-Hastings as a plausible model of adaptation, as some kind of stochastic optimization scheme, detailed balance and time reversibility would be fulfilled^{16, 38}. What kind of model describes human adaptive behavior best, and whether such a model is compatible with detailed balance is ultimately an open question. In our experiment at least, the condition seems to be fulfilled well enough to stay within the confidence intervals associated with the predictions made by Crooks’ theorem.

While Jarzynski’s equality (5) directly follows from Crooks theorem, weaker assumptions are sufficient to derive it^{18, 19}. In particular, condition (iii) regarding detailed balance is not necessary, as it is only required that the behavioral distribution does not change anymore once the equilibrium distribution is reached. Thus, Equation (5) can be used as a test for the weaker hypothesis that human sensorimotor adaptation satisfies conditions (i), (ii) and stationarity after convergence. While Jarzynski’s equality only requires samples from the forward process, Crooks theorem also tests the relation between the forward and the backward processes. In particular, Crooks theorem decouples the information processing with respect to any particular environment from the biases introduced by the adaptation history, that is, it assumes the transition probabilities for any given environment are independent of the history. In other words, the conditional probabilities have no memory and, thus, all memory effects are explained in terms of the state of the learning system prior to making some decision. Hence, the observed difference in behaviour after having adapted to the same environment, the hysteresis, is solely explained in terms of the information processing history before encountering the environment. Such hysteresis effects are not only common in simple physical systems like magnets or elastic bands, but have also been reported for sensorimotor tasks^{23, 40, 50}. The hysteresis effects we report in Fig. 3 are in line with a system obeying Crooks theorem and can be replicated using Markov Chain Monte Carlo simulations of adaptation¹⁶.

Our study is part of a number of recent studies that have tried to harness equilibrium and non-equilibrium thermodynamics to gain new theoretical insights into simple learning systems^{12, 13, 31, 35, 46}. For example, the information that can be acquired by learning in simple forward neural networks has been shown to be bounded by thermodynamic costs given by the entropy change in the weights and the heat dissipated into the environment⁴². More generally, when interpreting a system’s response to a stochastic driving signal in terms of computation, the amount of non-predictive information contained in the state about past environmental fluctuations is directly related to the amount of thermodynamic dissipation⁴⁶. This suggests that thermodynamic fundamentals, like the second law, can be carried over to learning systems. Consider, for example, a Bayesian learner where the utility is given by the log-likelihood model and where the data are presented either in one chunk for a single update, or consecutively in little batches with many little updates. Rather than having one big surprise, in the latter case the cumulative surprise is much smaller as prior expectations can be continuously adapted, up to a point where the cumulative surprise reaches a lower bound given by the log-likelihood of the data, which corresponds to the free energy difference before and after learning¹⁶. Fluctuation theorems have recently also been attributed a fundamental role in the context of the Free Energy Principle, with relations to information geometry and decision-theoretic concepts like risk, ambiguity, expected information gain and expected value^{9, 33}. Due to the central role of the concept of variational free energy in inference processes¹⁵, this raises the interesting question in how far our results may generalise to any belief-updating process, including for example perceptual inference and perceptual hysteresis. Finally, it has even been suggested that the dissipation of absorbed work as it is studied in a generalized Crooks theorem may underlie a general thermodynamic mechanism for self-organization and adaptation in living matter¹², raising the question of whether such a general principle of adaptive dissipation could also govern biological learning processes³⁵.

Data Availability

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

References

Shuoming, A. et al. Experimental test of the quantum Jarzynski equality with a trapped-ion system. Nat. Phys. 11(2), 193–199 (2015).
Article Google Scholar
Braun, D. A., Mehring, C. & Wolpert, D. M. Structure learning in action. Behav. Brain Res. 206(2), 157–165 (2010).
Article Google Scholar
Chib, S. & Greenberg, E. Understanding the metropolis-hastings algorithm. Am. Stat. 49(4), 327–335 (1995).
Google Scholar
EGD Cohen and David Mauzerall. A note on the Jarzynski equality. J. Stat. Mech: Theory Exp. 2004(07), P07006 (2004).
MATH Google Scholar
Delphine, C. et al. Verification of the crooks fluctuation theorem and recovery of RNA folding free energies. Nature 437(7056), 231–234 (2005).
Article Google Scholar
Gavin, E. C. Nonequilibrium measurements of free energy differences for microscopically reversible Markovian systems. J. Stat. Phys. 90(5), 1481–1487 (1998).
ADS MATH Google Scholar
Gavin, E. C. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 60(3), 2721 (1999).
Article ADS Google Scholar
Gavin, E. C. Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E 61(3), 2361 (2000).
Article Google Scholar
Da Costa, L., Friston, K., Heins, C. & Grigorios, A. P. Bayesian mechanics for stationary processes. Proc. R. Soc. A 477(2256), 20210518 (2021).
Article ADS Google Scholar
De Groot, S.R. & Mazur, P. Non-equilibrium Thermodynamics (Courier Corporation, 2013).
Douarche, F., Ciliberto, S., Petrosyan, A. & Rabbiosi, I. An experimental test of the Jarzynski equality in a mechanical experiment. EPL (Europhys. Lett.) 70(5), 593 (2005).
Article ADS CAS Google Scholar
Jeremy, L. E. Dissipative adaptation in driven self-assembly. Nat. Nanotechnol. 10(11), 919–923 (2015).
Article Google Scholar
Goldt, S. & Seifert, U. Stochastic thermodynamics of learning. Phys. Rev. Lett. 118(1), 010601 (2017).
Article ADS Google Scholar
Gottwald, S. & Braun, D. A. Bounded rational decision-making from elementary computations that reduce uncertainty. Entropy 21(4), 1 (2019).
Article Google Scholar
Gottwald, S. & Braun, D. A. The two kinds of free energy and the Bayesian revolution. PLoS Comput. Biol. 16(12), 1–32 (2020).
Article Google Scholar
Jordi, G.-M., Matthias, K. & Daniel, A. B. Non-equilibrium relations for bounded rational decision-making in changing environments. Entropy 20(1), 1 (2018).
Google Scholar
Griffiths, T. L. et al. Doing more with less: Meta-reasoning and meta-learning in humans and machines. Curr. Opin. Behav. Sci. 29, 24–30 (2019).
Article Google Scholar
Pedro, H., Sebastian, G., & Braun, D.A. Jarzyski’s equality and crooks’ fluctuation theorem for general markov chains. arXiv preprint arXiv:2202.05576 (2022).
Jarzynski, C. Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach. Phys. Rev. E 56(5), 5018 (1997).
Article ADS CAS Google Scholar
Jarzynski, C. Hamiltonian derivation of a detailed fluctuation theorem. J. Stat. Phys. 98(1), 77–102 (2000).
Article MATH Google Scholar
Jarzynski, C. Nonequilibrium work theorem for a system strongly coupled to a thermal environment. J. Stat. Mech: Theory Exp. 2004(09), P09005 (2004).
Article MATH Google Scholar
Jarzynski, C. Equalities and inequalities: Irreversibility and the second law of thermodynamics at the nanoscale. Annu. Rev. Condens. Matter Phys. 2(1), 329–351 (2011).
Article ADS CAS Google Scholar
Kelso, J. J., Buchanan, J. A. S. & Murata, T. Multifunctionality and switching in the coordination dynamics of reaching and grasping. Hum. Movement Sci. Curr. Opin. Behav. Sci. 13, 63–94 (1994).
Article Google Scholar
Konrad, P. K. & Daniel, M. W. The loss function of sensorimotor learning. Proc. Natl. Acad. Sci. 101(26), 9839–9842 (2004).
Article ADS Google Scholar
Lieb, E. H. & Yngvason, J. The physics and mathematics of the second law of thermodynamics. Phys. Rep. 310(1), 1–96 (1999).
Article ADS MATH Google Scholar
Lindig-León, C., Gottwald, S. & Braun, D. A. Analyzing abstraction and hierarchical decision-making in absolute identification by information-theoretic bounded rationality. Front. Neurosci. 13, 1 (2019).
Article Google Scholar
Cecilia, L.-L., Gerrit, S. & Braun, D. A. Bounded rational response equilibria in human sensorimotor interactions. Proc. R. Soc. B 288(1962), 20212094 (2021).
Article Google Scholar
Jan, L., Sophie, D., Steven, B. S., Ignacio, T. & Carlos, B. Equilibrium information from nonequilibrium measurements in an experimental test of Jarzynski’s equality. Science 296(5574), 1832–1835 (2002).
Article Google Scholar
Joseph, L. Ueber den zustand des wärmegleichgewichtes eines system von körpern. Akademie der Wissenschaften, Wien. Mathematisch-Naturwissenschaftliche Klasse, Sitzungsberichte 73, 128–135 (1876).
Google Scholar
Mas-Colell, A., Whinston, M. & Green, J. Microeconomic theory (Oxford University Press, 1995).
Pedro, A. O. & Daniel, A. B. Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A: Math. Phys. Eng. Sci. 469(2153), 20120683 (2013).
Article MATH Google Scholar
Park, S., Khalili-Araghi, F., Tajkhorshid, E. & Schulten, K. Free energy calculation from steered molecular dynamics simulations using Jarzynski’s equality. J. Chem. Phys. 119(6), 3559–3566 (2003).
Article ADS CAS Google Scholar
Parr, T., Da Costa, L. & Friston, K. Markov blankets, information geometry and stochastic thermodynamics. Phil. Trans. R. Soc. A 378(2164), 20190159 (2020).
Article ADS MATH Google Scholar
Parrondo, J. M. R., Horowitz, J. M. & Sagawa, T. Thermodynamics of information. Nat. Phys. 11(2), 131–139 (2015).
Article CAS Google Scholar
Nikolay, P., Robert, A. M. & Jeremy, L. E. Statistical physics of adaptation. Phys. Rev. X 6(2), 021036 (2016).
Google Scholar
Rajesh, P. N. R. & Dana, H. B. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2(1), 79–87 (1999).
Article Google Scholar
Saira, O.-P. et al. Test of the Jarzynski and crooks fluctuation relations in an electronic system. Phys. Rev. Lett. 109(18), 180601 (2012).
Article ADS Google Scholar
Sanborn, A. N. & Chater, N. Bayesian brains without probabilities. Trends Cogn. Sci. 20(12), 883–893 (2016).
Article Google Scholar
Schach, S., Gottwald, S. & Braun, D. A. Quantifying motor task performance by bounded rational decision theory. Front. Neurosci. 12, 1 (2018).
Article Google Scholar
Schütz, C., Weigelt, M., Odekerken, D., Klein-Soetebier, T. & Schack, T. Motor control strategies in a continuous task space. Mot. Control 15(3), 321–341 (2011).
Article Google Scholar
Seifert, U. Entropy production along a stochastic trajectory and an integral fluctuation theorem. Phys. Rev. Lett. 95(4), 040602 (2005).
Article ADS Google Scholar
Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 75(12), 126001 (2012).
Article ADS Google Scholar
Shadmehr, R. & Mussa-Ivaldi, S. Biological Learning and Control: How the Brain Builds Representations, Predicts Events, and Makes Decisions (MIT Press, 2012).
Andrew, S. et al. Verification of the quantum nonequilibrium work relation in the presence of decoherence. New J. Phys. 20(1), 013008 (2018).
Article Google Scholar
Still, S. Information-theoretic approach to interactive learning. EPL (Europhys. Lett.) 85(2), 28005 (2009).
Still, S., Sivak, D.A., Bell, A.J., Crooks, G.E. Thermodynamics of prediction. Phys. Rev. Lett. 109(12), 120604 (2012).
Todorov Emanuel. General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control, pp. 4286–4292. IEEE, (2008).
Emanuel, T. & Michael, I. J. Optimal feedback control as a theory of motor coordination. Nat. Neurosci. 5(11), 1226–1235 (2002).
Article Google Scholar
Toyabe, S., Sagawa, T., Ueda, M., Muneyuki, E. & Sano, M. Experimental demonstration of information-to-energy conversion and validation of the generalized Jarzynski equality. Nat. Phys. 6(12), 988–992 (2010).
Article CAS Google Scholar
Edward, J. A. T., Daniel, A. B. & Daniel, M. W. Facilitation of learning induced by both random and gradual visuomotor task variation. J. Neurophysiol. 107(4), 1111–1122 (2012).
Article Google Scholar
Weglarczyk Stanislaw. Kernel density estimation and its application. In ITM Web of Conferences, 23. EDP Sciences (2018).
Daniel, M. W., Zoubin, G. & Michael, I. J. An internal model for sensorimotor integration. Science 269(5232), 1880–1882 (1995).
Article Google Scholar
Marty Ytreberg, F. & M Zuckerman, D. Efficient use of nonequilibrium measurement to estimate free energy differences for molecular systems. J. Comput. Chem. 25(14), 1749–1759 (2004).
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, Ulm, 89081, Germany
P. Hack, C. Lindig-Leon, S. Gottwald & D. A. Braun

Authors

P. Hack
View author publications
You can also search for this author in PubMed Google Scholar
C. Lindig-Leon
View author publications
You can also search for this author in PubMed Google Scholar
S. Gottwald
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Braun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: study conception and design: P.H., C.L.-L., D.B.; data collection: P.H., C.L.-L.; analysis and interpretation of results: P.H., D.B.; draft manuscript preparation: P.H., C.L.-L.; S.G.; D.B.. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to P. Hack.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics declaration

The studies involving human participants were reviewed and approved by the Ethics committee of Ulm University. The participants provided their written informed consent to participate in this study.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix: Methods

A.1 Theoretical methods

The derivation of (4) and (5) in the context of general Markov chains can be found in¹⁸. A similar proof of (5) under stronger assumptions was derived in⁶ and a different one using the same assumptions was given in¹⁹. Regarding (4), a similar proof can be found in⁶. Note, however, that the usual definition of work in thermodynamics is slightly different for the forward and backward process, based on the physical definition of time reversal and the associated symmetry for the work values. In our case, we define the driving signal that is analogous to the work concept in the same way, for both forward and backward process. In this case, for Equation (4) to hold, we need to assume that $E_1=E_0$ both in the forward and backward process¹⁸. Fortunately, this is true for our protocol, since we begin both forward and backward protocol with some washout trials without perturbation. It should also be pointed out that, in order for the elements involved in Jarzynski’s and Crooks’ derivations to be well-defined, the equilibrium probability density associated to each step in the Markov chain ought to be non-zero at both the starting and ending point of that step¹⁸. This will play a relevant role in the choice of the support $A_n$ for the equilibrium distributions $p_n^{eq}$ in Sect. A.3.3.

A.2 Simulation methods

In this section, we explain in detail how we simulated (4) and (5).

A.2.1 Metropolis–Hastings algorithm

We use³ as reference for this section. However, for simplicity, we skip over several technical details and may oversimplify some notions.

The Metropolis-Hastings algorithm is a procedure which allows to obtain samples x from a distribution p that is proportional to some function f, that is, $p(x) = \frac{1}{Z} f(x)$. There are three concepts relevant to this algorithm: U, q and $\alpha$. They are defined as follows

U(A) stands for the uniform distribution over some set $A \subseteq {\mathbb {R}}$.
$q(\cdot ,\cdot )$ is called the candidate generating density. The role of q in the algorithm is to generate a new point y given a previous point x, with y being sampled from the distribution $q(x,\cdot )$. In our case we define the density function in y with $\int _{-90}^{90} q(x,y) dy = 1$, as we assume that movements will be towards the target ($0^\circ$ direction) under a maximally induced error of $20^\circ$. Accordingly, we can expect that practically all responses will be covered by choosing a support of $\pm 90^\circ$.
$\alpha (\cdot ,\cdot )$ is defined as follows:
$$\begin{aligned} \alpha (x,y)&= \text {min} \Big \{ \frac{f(y)q(y,x)}{f(x)q(x,y)},1\Big \} \text { if } f(x)q(x,y)>0,\\&= 1 \qquad \text { otherwise} \end{aligned}$$
and is included in the algorithm as a filter on the samples proposed by q, so that some of these samples will be accepted and some will be rejected, to make the samples appear to be sampled from p.

We can now introduce the Metropolis-Hastings algorithm. The algorithm is initialized at an arbitrary value $x_0$ and then repeats the following steps for $i = 1,2,\ldots ,M$:

(i)
Generate y from $q(x_{i-1},\cdot )$ and u from U(0, 1).
(ii)
If $u \le \alpha (x_{i-1},y)$, then $x_i=y$.
(iii)
Otherwise, $x_i=x_{i-1}$.

Finally, the algorithm returns the values $(x_1,\ldots ,x_M)$.

Note that the density of transitions from x to y is therefore given by

$$\begin{aligned} p_M(x,y)=q(x,y) \alpha (x,y) \quad \text { if } x \ne y , \end{aligned}$$

which satisfies detailed balance with respect to $p\propto f$³. Thus, p is the stationary distribution of the resulting Markov process, and so the $x_i$ can be regarded samples from p after the chain has passed a transient stage after which the effect of the initialization is negligible. Notice, in our implementation, described below, we only require the burn-in phase for the initial energy in order to make sure that the process starts in the corresponding stationary state. However, since we are interested in the adaptation process during a changing energy signal, we only use the first sample ($M=1$) for the remaining steps, conditioned on the sample from the previous step.

A.2.2 Implementation

Given a set of equilibrium distributions $(p_0,\ldots ,p_N)$, we use the Metropolis-Hastings algorithm on their proportional functions $(f_0,\ldots ,f_N)$ to generate two paths: the forward path where we apply the algorithm once at step i ($M=1$ in Sect. A.2.1, as explained above), with $p=p_i$, and the backward path where we do the same but with the distributions in the reverse order. In particular, we consider

$$\begin{aligned} f_n(x) = e^{-E_n(x)} \end{aligned}$$

(6)

for $n=0,\ldots ,24$ for the forward process, where, for $n=1,\ldots ,24$, we take

$$\begin{aligned} E_n(x)=-e^{-(x-\theta _n)^2} \end{aligned}$$

(7)

with $\theta _n$ given by (9), and for $n=0$ we consider

$$\begin{aligned} E_0(x)= {\left\{ \begin{array}{ll} -(x+2) &{}\text {if } x<-2, \\ -e^{-x^2} &{}\text {if } -2\le x \le 2, \\ x-2 &{}\text {if } 2<x. \end{array}\right. } \end{aligned}$$

(8)

We will refer to the application of the algorithm following the sequence in (6) with $M=1$ for each $n=1,\ldots ,25$ as a cycle. Note $E_0$ in (8) differs from $E_n$ in (7) for $n=1,\ldots ,24$. While we would like to take $E_0$ as in (7) with $\theta _0=0$, since one of our hypothesis is the simulations sample the first point in each cycle from

$$\begin{aligned} p_0(x) \propto e^{e^{-x^2}}, \end{aligned}$$

the values of $p_0$ for $x \not \in [-2,2]$ are quite indistinguishable once we fix a certain precision. As a result, the algorithm does not converge to $p_0$ in the long run. To avoid this difficulty, we simply modify the function outside $[-2,2]$ such that points there become distinguishable. This results in the algorithm converging to a distribution close to $p_0$. Note this modification only applies to the generation of the initial samples, hence, we use (7) to calculate $\Delta E_{ext}(\varvec{x})$.

The candidate generating density we use for the nth step with $n=1,\ldots ,24$ is a normal distribution with mean equal to the $(n-1)$th sample and standard deviation equal to the mean of the distances between subsequent points in the observed data, which turns out to be around 5. Using the values generated by the algorithm during a cycle, we calculate $\Delta E_{ext}(\varvec{x})$ for the forward process via the utilities in (7), and, after generating several of them, we apply kernel density estimation (see Sect. A.3.4) to estimate $\rho ^F$ in (4). We proceed analogously to estimate $\rho ^B$ and, finally, use the obtained values of $\Delta E_{ext}(\varvec{x})$ for the forward process together with the estimates of $\rho ^F$ and $\rho ^B$ to test (4). This test is done differently for the simulation with the large number of sample and that with a small number of them. For the larger one, we simply use the least squares method as the estimate of (4) (cf. Fig. 2A). For the smaller one, however, we produce 1000 bootstraps from the produced values of $\Delta E_{ext}(\varvec{x})$ and find a confidence interval for (4) from the curves we obtain from the pair ($\rho ^F$, $\rho ^B$) for each bootstrap (cf. Fig. 2B).

A.3 Experimental methods

In this section, we explain the specifics of how we tested experimentally both (4) and (5).

A.3.1 Participants

Ten participants $P_1,\ldots ,P_{10}$, five females and five males, participated in this study. Three of the authors were among the participants ($P_1$, $P_2$ and $P_3$). All other participants provided written informed consent for participation and were remunerated with 10 Euros per hour. The participants were undergraduate and graduate students. The procedures were approved by the Ethics committee of Ulm University. All methods were performed in accordance with the relevant guidelines and regulations.

A.3.2 Setup

The experiment was run on a vBOT. Each participant performed the task using the handle of the right arm of the vBOT, which was manipulated with the dominant hand. The participants had no direct view of the handle but of a screen where its position, altered according to a protocol we describe in the following, was represented by a cursor.

A.3.3 Experimental design

Participants were asked to reach the center of a yellow rounded target on the screen with the center of their cursor. To begin each trial, the participants were asked to place the cursor inside a rounded initial position whose center was 15 cm away from the target’s center along the same vertical. Once the cursor crossed the horizontal containing the center of the target, the target became green if participants successfully situated the center of the cursor inside the target and red otherwise. Once the target changed its color, participants were asked to return the cursor to the initial position to begin the following trial. While both the target and the initial position were at the same place each trial, the cursor did not represent the movement of the handle veridically each trial. In particular, after 100 trials where the cursor position and the handle coincided, there were 1420 trials divided in 20 cycles of 66 trials where the cursor position was determined by rotating the vector going from the center of the initial position to the handle’s position. The rotation angle $\theta _n$ for each n in any cycle $n=0,\ldots ,65$ was

$$\begin{aligned} {\left\{ \begin{array}{ll} \theta _n = \alpha (n) &{}\text {if } n=0,\ldots ,24\\ \theta _n=0 &{}\text {if } n=25,\ldots ,32\\ \theta _n = \alpha (57-n) &{}\text {if } n=33,\ldots ,57\\ \theta _n=0 &{}\text {if } n=58,\ldots ,65\\ \end{array}\right. } \end{aligned}$$

(9)

where all angles are in degrees and

$$\begin{aligned} \begin{aligned} \alpha =&(0,5,10,15,20,20,20,20,20,15,10,5,0,-5,-10,\\&-15,-20,-20,-20,-20,-20,-15,-10,-5,0). \end{aligned} \end{aligned}$$

For each $n=0,\ldots ,65$, we extract $\theta _n'$, the angle between the vertical segment joining the center of the initial position and the center of the target and the segment joining the center of the initial position and the handle in the first recorded point which is more than 12 cm apart from the center of the initial position. One can find the recorded angles $(x_0,\ldots ,x_{65})$ for both the forward and backward processes in Fig. 6. For participant $P_j$, with $1 \le j \le 10$, we take $p_{n,j}=p^{eq}_n$ as the equilibrium distribution for the n-th trial, where $b_j$ represents the bias introduced by the machine for participant $P_j$. We determine the bias as the mean of the initial 100 trials (where the cursor veridically represents the handle). The value $c_j$ represents the maximum deviation for participant $P_j$ among the distances $|x_n-(\theta _n+b_j)|$ and $|x_{n-1}-(\theta _n+b_j)|$, which we use to fix the support of the equilibrium distribution for $P_j$ as $A_n = [\theta _n+b_j-c_j,\theta _n+b_j+c_j]$. The parameter $\beta _j$ represents the spread around the bias, which we pick once the bias and the support of the equilibrium distributions are fixed by requiring these distributions to maximize the likelihood of the observed values for the first 100 trials. We observe the best spread parameters are between $\beta _j=0.25$ and $\beta _j=5$ for all participants. In order to choose the most suitable one for each participant, we consider the values between 0.25 and 5 that result from sequentially adding 0.25 to the lowest value and pick as $\beta _j$ the one that maximizes the likelihood on the observed angles in the 100 initial trials— see Fig. 7 for a comparison between the observed angles and the equilibrium distribution. We discuss, in Sect. A.3.5, how the choice of the parameters $b_j$ and $\beta _j$ affect the results. Note the choice of $c_j$ does not directly affect how we measure the accuracy of the predictions, but is key in the maximum likelihood estimation of $\beta _j$.

Using the angles recorded during a cycle, we calculate $\Delta E_{ext}(\varvec{x})$ via $p_{n,j}$ for both the forward and backward processes, and, using the 20 values per participant, we estimate $\rho ^F$ and $\rho ^B$ in (4) through kernel density estimation (see Sect. A.3.4). Finally, we bootstrap the obtained values of $\Delta E_{ext}(\varvec{x})$ for the forward and backward process to obtain several estimates of $\rho ^F$ and $\rho ^B$. Each of these pairs is used to produce a curve that estimates (4). The mean of these curves for each participant is what we compare to (4) in Fig. 4. The same values of $\rho ^F$ are used to test (5) (cf. Table 1).

Table 2 Mean distance between the theoretical prediction in (4) and the mean curve we obtain from bootstrapping the observed angles (see Sect. A.3.3) for several pairs of parameters $(b,\beta )$. In particular, we consider the combinations having $b=-10,-3,-1,0,1,3,10$ and $\beta =0.01,0.1,1,3,4,10,100$.

Full size table

A.3.4 Kernel density estimation

In order to determine the probability distributions $\rho ^F$ and $\rho ^B$ in (4), we use kernel density estimation⁵¹. Kernel density estimation consists of choosing a function K, the kernel, and a positive number $h>0$, the bandwidth, and approximating p by distributions of the form

$$\begin{aligned} \frac{1}{nh} \sum _{i=1}^n K\left( \frac{x-x_i}{h}\right) . \end{aligned}$$

We consider here K to be a standard normal distribution. Notice we simply estimate p as a sum of standard normal distributions around each observed point $x_i$, for $i=1,\ldots ,n$, and decide how much each $x_i$ influences other points in ${\mathbb {R}}$ via h. We fix $h=0.7$ throughout this work.

Table 3 Mean distance between the theoretical prediction in (4) and the mean curve we obtain from bootstrapping the observed angles (see Sect. A.3.3) for several sensorimotor errors that are obtained as convex combinations of the exponential quadratic error (1) and the Mexican hat (10). In particular, we consider sensorimotor errors of the form $\lambda f + (1-\lambda )g$, where $\lambda =0,0.25,0.5,0.75,1$, f is the exponential quadratic error with $b=0$ and $\beta =4$ (which are close to the values fitted for the participants) and g is the Mexican hat with $\sigma =4$. As expected, the mean distance diminishes as the weight of the exponential quadratic error increases.

Full size table

A.3.5 Robustness analysis

In this section, we measure model robustness using two approaches: (i) using the exponential quadractic error (1) and varying the parameters we fitted, i.e. b and $\beta$, and (ii) fixing a pair of parameters that are close to the optimal ones for each participant and taking convex combinations of the exponential quadractic error and the Mexican hat as sensorimotor errors.

As pointed out in Sect. A.3.3, we fix the parameters in (1) and (4), via the initial 100 trials (where no perturbation is applied). To assess model robustness, we consider the effect of assuming the same model with different parameters. We consider, in particular, all pairs $(b,\beta )$ with $b \in \{-10,-3,-1,0,1,3,10\}$ and $\beta \in \{0.01,0.1,1,3,4,10,100\}$, since they cover a wide scope of the possible behaviour of (4) using the model in (1). For the robustness analysis we fit the data of all participants with the same parameter sets. In Fig. 9, we show the histogram of the driving signals $\Delta E_{ext}(\textbf{x})$ for different pairs of parameters $(b,\beta )$. Then, we follow the bootstrapping procedure from Sect. A.3.3 using the different values of b and calculate the mean distance between the mean of the curves we obtain from the bootstraps and the theoretical prediction (4) with the different values of $\beta$. In particular, we consider the mean horizontal distance between the prediction and the mean curve at the points between $\Delta E_{ext}(\varvec{x})=-4$ and $\Delta E_{ext}(\varvec{x})=4$ (that is, the range of values of $\Delta E_{ext}(\varvec{x})$ we present in Fig. 4) with steps of 0.1. We denote the obtained mean distance as $d_{b,\beta }$.

To assess how well the parameters fit the data, we have to consider the plausibility of the data being generated by our model using the different parameter settings $(b,\beta )$. Accordingly, it is not enough to simply look at $d_{b,\beta }$ as a goodness-of-fit measure. This is the case, as the underlying assumption in our model is that the data comes from a Markov chain where the equilibrium distributions at each step are given by the Boltzmann distribution (2) with parameters $(b,\beta )$. In this situation, we expect participants to lag behind the utility they are adapting to most of the time, and hence, by definition, we expect the driving signal to be biased towards positive values. We can discard any parameter settings where this is not the case. Accordingly, we can disregard all pairs that have $b=10,-10$ —see Fig. 9. The values of $d_{b,\beta }$ for all pairs $(b,\beta )$ we considered can be found in Table 2 (see Fig. 8 for a graphical comparison). As we can see there, the best parameters have $\beta =1$, $-3 \le b \le 3$, and mean distances which are both close to each other and significantly better than the rest. This was expected, since the hypothesis that the data observed at the plateaus follows (2) for these parameters is not completely implausible (cf. Fig. 7). The values $b \in \{-1,0,1\}$ and $\beta \in \{3,4\}$, which are the closest to the fitted parameters, also have a small mean distance (although larger than the best cases). In contrast, $d_{b,\beta }$ becomes significantly larger for the parameters that are clearly unlikely, that is, those that present a huge concentration of the probability around some point, i.e. the ones where the value of $\beta$ is large. In contrast, whenever the values of $\beta$ become small, the equilibrium distributions become all closer to a uniform and, although the mean distance does worsen when compared to the best cases, its values do not increase much.

To assess robustness with an obviously non-fitting utility, we consider an inverted Mexican hat as utility function, that is, we substitute (1) by

$$\begin{aligned} E'_n (x)= \frac{2}{\sqrt{3 \sigma } \pi ^{\frac{1}{4}}} \bigg (1-\Big (\frac{x-\theta _n}{\sigma }\Big )^2\bigg ) \exp \bigg ( \frac{(x-\theta _n)^2}{2 \sigma ^2} \bigg ), \end{aligned}$$

(10)

where we take $\sigma =4$. In this scenario, the bootstrapped data does not reflect the trend of the theoretical prediction (cf. Fig. 5). Moreover, as illustrated in Fig. 10, the model presents an unexpected bias towards negative values of the driving signal. Hence, as discussed above, the likelihood of the data coming from such a Markov chain is small and we can disregard this model. Furthermore, when following the same robustness analysis we performed on the pairs $(b,\beta )$ using the convex combinations $\lambda f + (1-\lambda ) g$ as sensorimotor loss, where $\lambda =0,0.25,0.5,0.75,1$, f is the exponential quadratic error with $b=0$ and $\beta =4$ (which are close to the values fitted for the participants) and g is the Mexican hat with $\sigma =4$, we obtain that the mean distance decreases as $\lambda$ increases, as one can see in Table 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hack, P., Lindig-Leon, C., Gottwald, S. et al. Thermodynamic fluctuation theorems govern human sensorimotor learning. Sci Rep 13, 869 (2023). https://doi.org/10.1038/s41598-023-27736-8

Download citation

Received: 07 September 2022
Accepted: 06 January 2023
Published: 17 January 2023
DOI: https://doi.org/10.1038/s41598-023-27736-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.