Introduction

The study of learning systems with concepts borrowed from statistical mechanics and thermodynamics has a long history reaching back to Maxwell’s demon and the ensuing debate on the relation between physics and information34. Over the last 20 years, the informational view of thermodynamics has experienced great developments, which has allowed to broaden its scope form equilibrium to non-equilibrium phenomena10, 22. Of particular importance are the so-called fluctuation theorems7, 20, 42, which relate equilibrium quantities to non-equilibrium trajectories allowing, thus, to approximate equilibrium quantities via experimental realizations of non-equilibrium processes32, 53. Among the fluctuation theorems, two results stand out, Jarzynski’s equality4, 19, 21 and Crooks’ fluctuation theorem6, 8, as they aim to bridge the apparent chasm between reversible microscopic laws and irreversible macroscopic phenomena29.

The advances in non-equilibrium thermodynamics have recently also led to new theoretical insights into simple learning systems12, 13, 16, 31, 35, 46. Abstractly, thermodynamic quantities like energy, entropy or free energy can be thought to define order relations between states14, 25, which makes them applicable to a wide range of problems. In the economic sciences, for example, such order relations are typically used to define a decision-maker’s preferences over states30. Accordingly, a decision-maker or a learning system can be thought to maximize a utility function, analogous to a physical system that aims to minimize an energy function. Moreover, in the presence of uncertainty in stochastic choice, such decision-makers can be thought to operate under entropy constraints reflecting the decision-maker’s precision31, 34, resulting in soft-maximizing the corresponding utility function instead of perfectly maximizing it. This is formally equivalent to following a Boltzmann distribution with energy given by the utility. Therefore, in this picture, the physical concept of work corresponds to utility changes caused by the environment, whereas the physical concept of heat corresponds to utility gains due to internal adaptation46. Like a thermodynamic system is driven by work, such learning systems are driven by changes in the utility landscape (e.g. changes in an error signal). By exposing learning systems to varying environmental conditions, it has been hypothesized that adaptive behavior can be studied in terms of fluctuation theorems12, 16, which are not necessarily tied to physical processes but are broadly applicable to stochastic processes satisfying certain constraints18.

Fluctuation theorems are usually deployed in statistical mechanics; particularly, the study of nonequilibrium steady states in thermodynamics. In this setting, one normally assumes a probabilistic description of an ensemble of many particles, i.e., the kinds of systems usually considered in statistical thermodynamics. However, as described in41, 42, exactly the same principles and fluctuation theorems also apply to the path of a single particle, leading to stochastic thermodynamics. This suggests that fluctuation theorems may not only be applicable to the statistics of ensembles of many learners, but also when describing the trajectory of a single participant during a learning process.

Although fluctuation theorems have been empirically observed in numerous experiments in the physical sciences1, 5, 11, 28, 37, 44, there have been no reported experimental results relating fluctuation theorems to adaptive behavior in humans or other living beings. Here, we test Jarzynski’s equality and Crooks’ fluctuation theorem experimentally in a human sensorimotor adaptation task. In this context, the fluctuation theorem establishes a linear relationship between the externally imposed utility changes driving the learning process (which are directly related to non-predicted information and energy dissipation46) and the log-probability ratio between forward and backward adaptation trajectories, when exposing participants to the sequence of environments either in the forward or reverse order. Accordingly, such learners can be quantitatively characterized by a hysteresis effect that can also be observed in simple physical systems.

Results

In a visuomotor adaptation task, human participants controlled a cursor on a screen towards a single stationary target by moving a mechanical manipulandum that was obscured from their vision under an overlaid screen—see Fig. 1A. Crucially, in each trial n, the position of the cursor could be rotated with angle \(\theta _n\) relative to the actual hand position so that participants had to adapt when moving the cursor from the start position to the target. To measure participants’ adaptive state, we recorded their movement position at the time of crossing a certain distance from the start position, so that their response could be characterized by an angle \(x_n\). The deviation between participants’ response \(x_n\) and the required movement incurs a sensorimotor loss \(E_n\)24 in trial n, that can be quantified as an exponential quadratic error

$$\begin{aligned} E_n(x)= 1 - e^{- (x-(\theta _n+b))^2}, \end{aligned}$$
(1)

that depends on the actual rotation angle \(\theta _n\) set in trial n. The parameter b is a participant-specific parameter allowing for bias due to posture, biomechanics, the mechanics of the manipulandum, or other influences—see Fig. 1D. The loss (1) is taken to be the energy (or negative utility) of a participant’s stochastic response \(X_n = x_n\). For a bounded rational decision-maker26, 27, 31, 39 that optimizes this loss under uncertainty, the optimal pointing behavior after a suitably long adaptation time is described by a Boltzmann equilibrium distribution \(p_n^{eq}\) of the form

$$\begin{aligned} p_{n}^{eq}(x_n) = \exp \big (-\beta ( E_n(x_n) - F_n )\big ), \end{aligned}$$
(2)

for all \(x_n\in A_n\), where the sensorimotor error \(E_n(x_n)\) plays the role of an energy, the free energy term \(F_n = \frac{1}{\beta } \log \int _{A_n} \exp \left( -\beta E_n(x_n) \right) dx_n\) is caused by the normalization, and \(A_n\) is the support of the equilibrium distribution \(p_{n}^{eq}\), which will vary for each participant, as we explain in Sect. A.3.3. See Fig. 1C for a representation of (2). Moreover, the softness-parameter \(\beta\), also known as inverse temperature or precision, controls the trade-off between entropy maximization and energy minimization, essentially interpolating between a purely stochastic choice (\(\beta = 0\)) and a purely rational choice (\(\beta \rightarrow \infty\)) minimizing the energy perfectly.

Figure 1
figure 1

(A) Schematic representation of an experimental trial with deviation angle \(\theta\). The dotted line represents the participant’s hand movement and the continuous line represents the rotated movement observed on the screen. (B) Experimental protocol. The continuous line represents the deviation angles \(\theta\) imposed during one experimental cycle, where trials 1 to 25 constitute the forward process and trials 34 to 58 constitute the backward process. The dotted line represents the beginning of the next cycle. (C) Illustration of the equilibrium distributions (2) with \(b,\theta _n=0\) resulting from the exponential quadratic error (1) and, respectively, \(\beta =1,1.5,2\). The shaded area represents the target, which tolerates, at most, an error of \(2 ^\circ\). (D) Comparison between the equilibrium distributions that we fit using the initial 100 trials (before participants experience any perturbation) and participants’ performance in the washout plateaus between cycles (the sequence of trials with \(\theta = 0\) that separate forward and backward protocol), to check whether participants equilibrate between cycles, as required by the fluctuation theorem. Red shows the normalized error histogram for the in-between plateaus exemplarily for participant 7, green shows the histogram of the fitted equilibrium distribution for the initial block of 100 trials of the same participant. The comparison for all other participants can be found in Fig. 7.

The task consisted of a sequence of target reaching trials, where the rotation angle \(\theta _n\) changed from one trial n to the next trial \(n+1\) according to a given up-down protocol—see Fig. 1B—, so that participants’ responses over trials could be represented by a trajectory \(\varvec{x}=(x_0,x_1,\ldots ,x_N)\). When the environment is changing over trials, we can distinguish cumulative error changes \(\Delta E_{ext}(\varvec{x}) {:=}\sum _{n=0}^{N-1} (E_{n+1}(x_n)-E_{n}(x_n))\) that are induced externally by changes in the environmental parameter \(\theta _n\), from cumulative error changes \(\Delta E_{int}(\varvec{x}) {:=}\sum _{n=1}^{N} (E_{n}(x_n)-E_{n}(x_{n-1}))\) due to internal adaptation when subjects change their response from \(x_{n-1}\) to \(x_{n}\). Crucially, it is exactly the externally induced changes in error, \(\Delta E_{ext}(\varvec{x})\), analogous to the physical concept of work, that drive the adaptation process: if \(\Delta E_{ext}(\varvec{x})\) is large, the system is more surprised and has to adapt more. In the following, we thus refer to \(\Delta E_{ext}(\varvec{x})\) as driving error or driving signal. When applying Crooks’ fluctuation theorem for general adaptive systems18 to the above setting, we obtain the linear relation

$$\begin{aligned} \Delta E_{ext}(\varvec{x}) - \Delta F = \frac{1}{\beta }\log \left( \frac{\rho ^F(\varvec{x})}{\rho ^B(\varvec{x}^R)} \right) , \end{aligned}$$
(3)

where \(\varvec{x}^R = (x_N,\ldots ,x_1)\) is the reverse trajectory, \(\Delta F\) denotes the free energy difference \(F_N-F_0\) and the distributions \(\rho ^F(\cdot )\) and \(\rho ^B(\cdot )\) denote the probability of observing a certain trajectory when the learner faces a series of environments in some specific order or the order is reversed, respectively. This form of Crooks’ theorem allows for an intuitive interpretation, in that any difference in probability of a trajectory and its reverse signifying a hysteresis can be directly related to an excess loss that is irretrievably generated because of imperfect adaptation. Unfortunately, Equation (3) is hard to determine from data, as it would require to estimate probability distributions over paths. However, there is an equivalent form of Crooks’ theorem that groups all trajectories according to their associated value of \(\Delta E_{ext}(\varvec{x})\) with corresponding distributions \(\rho ^F\) and \(\rho ^B\) over these values, such that

$$\begin{aligned} \Delta E_{ext}(\varvec{x}) - \Delta F = \frac{1}{\beta }\log \left( \frac{\rho ^F(\Delta E_{ext}(\varvec{x}))}{\rho ^B(-\Delta E_{ext}(\varvec{x}))} \right) . \end{aligned}$$
(4)

The distribution \(\rho ^F(\cdot )\) can be interpreted as the probability that the learner experiences a certain overall surprise when being exposed sequentially to a series of environments and \(\rho ^B(\cdot )\) is the analogous concept when the order in which the environments are presented is reversed. In equation (4), these densities are evaluated at the actual driving errors \(\Delta E_{ext}(\varvec{x})\) and \(-\Delta E_{ext}(\varvec{x})\), respectively, for a particular adaptive trajectory \(\varvec{x}\).

A direct consequence of (4) is Jarzynski’s equality6, which states that

$$\begin{aligned} \big \langle e^{-\beta \Delta E_{ext}(\varvec{X})}\big \rangle = e^{-\beta \Delta F}, \end{aligned}$$
(5)

where \(\langle ~ \cdot ~ \rangle {:=}{\mathbb {E}}[~\cdot ~]\) denotes the expectation operator, considering \(\varvec{X} = (X_n)_{n=0}^N\) a Markov chain with transition densities \(\Pi _n\) that have \(p^{eq}_n\) as stationary distributions, that is, for each n, \(p^{eq}_n\) is the stationary distribution for \(X_n\). In our experiment, \(\varvec{X}\) represents participants’ responses that are repeated over multiple repetitions of the forward-backward protocol. In the following, we will test the relationships (4) and (5) experimentally with \(\Delta F = 0\) as our human learners start and end in the same environmental state (i.e. \(F_N=F_0\)). Note that, in our particular setting where there is no overall change in the free energy \((\Delta F=0)\), Equation (5) suggests that the expected value \(\big \langle e^{-\beta \Delta E_{ext}(\varvec{X})}\big \rangle\) equals \(e^{-\beta 0}=1\) irrespective of the value taken by \(\beta\). This provides a quantitative prediction that we will evaluate empirically below.

In our experiment the task is divided into 20 cycles of 66 trials each, following the protocol (9) illustrated in Fig. 1B. We refer to trials 1 to 25 of each cycle as a realization of the forward process and trials 34 to 58 as a realization of the backward process. Notice the backward process consists of the same angles as the forward process, that is, the same utility functions, but in reversed order. Thus, we record for each participant 20 values for \(\Delta E_{ext}(\varvec{x})\) in both the forward and backward processes that we use to estimate participants’ probability densities of the forward and backward processes, \(\rho ^F\) and \(\rho ^B\), respectively, using kernel density estimation. As the amount of data is limited to test the linear relation in (4), we will use simulation results in the following to compare against participants’ behavior.

Figure 2
figure 2

Simulation of Crooks’ fluctuation theorem. (A) Simulation with 1000 cycles. In black, the theoretical prediction; in red, the linear regression for the simulated data and, in green, the simulated points. Since the simulated data set adjusts pretty well to Crooks’ fluctuation theorem (4), Jarzynski’s equality (5) is fulfilled. (B) Simulation with 20 cycles and bootstrapping. The black line is the theoretical prediction (4) while the red line and shaded area are, respectively, the mean and the 99 % confidence interval of (4) after 1000 bootstraps of the driving error values obtained in a single run (which consists of 20 cycles).

When simulating an artificial decision-maker based on a stochastic optimization scheme with Markovian dynamics, for example a Metropolis-Hasting algorithm with target distribution \(p_n^{eq}\propto \exp (-\beta E_n)\), it is clear that we can recover the linear relationship (4), provided that sufficient samples are collected18—see, for example, a simulation with 1000 cycles in Fig. 2A, where we can see a good adjustment between the theoretical prediction (in black) and the linear regression of the observed data (in red). As a result, (5) also holds in this scenario. The more critical question is what happens when only few samples are available. To this end, we use the stochastic optimization algorithm to simulate the protocol of our experiment, that is, 20 cycles, and indicate confidence intervals using 1000 bootstraps. It can be seen in Fig. 2B that the theoretical prediction is consistent with the \(99\%\) confidence interval in the region where \(|\Delta E_{\text {ext}}| \le 4\) (which is the region where our experimental data lies). Using the same bootstrapped data, we obtain several estimates of \(\langle e^{- \Delta E_{ext}(\varvec{X})}\rangle\) (the mean of \(e^{- \Delta E_{ext}(\varvec{X})}\) for the observed values of \(\Delta E_{ext}(\varvec{X})\) at each bootstrap) which we use to calculate a confidence interval for it. This results in the \(99\%\) confidence interval for \(\langle e^{-\Delta E_{ext}(\varvec{X})}\rangle\) being (0.48,  1.64), which is consistent with the theoretical prediction \(\langle e^{-\Delta E_{ext}(\varvec{X})}\rangle = 1\) for \(\Delta F = 0\) according to Equation (5). Accordingly, we will expect a similar behavior for our experimental data. Note we take, for simplicity, \(b=0\), \(\beta =1\) and, for all n, \(A_n=[-90,90]\) in these simulations (see Methods).

Figure 3
figure 3

Hysteresis effect. The filled triangles are the mean of the observed angles for every deviation in both the forward process, in green, and the backward process, in red. The black line is the forward protocol. Note that we have mirrored the triangles for the backward process to make them coincide with those in the forward process that are exposed to the same true angle. Participants that achieve at least \(50\%\) adaptation are shaded by a green background color. Hysteresis can be observed between trials 1 and 5, 9 and 17 and 21 and 25. Notice, as expected, the forward means are below the backward in the first region, above in the second and below again in the third.

Participants’ average adaptive responses can be seen in Fig. 3 compared to the experimentally imposed true parameter values (the trial-by-trial responses can be seen in Fig. 6). The green and red lines distinguish the forward and backward trajectories, respectively, so that, from the contrast between the two curves, hysteresis becomes apparent, as common in simple physical systems22 and as reported previously in similar experiments for sensorimotor adaptation50. Participants that achieve at least \(50\%\) adaptation are shaded by a green background color and are our participants of interest. The three participants that fail to achieve this minimum adaptation level are marked by a red shade. Instead of excluding these participants entirely from the analysis, we keep them in to show the contrast to the well-adapted participants and to highlight that the results reported for the well-adapted participants do not hold trivially for any participant producing inconsistent behavior.

Figure 4 shows participants’ data compared to the theoretical prediction from (4) and the 99 % confidence interval after 1000 bootstraps as in the case of the simulations in Fig. 2B. There, we see that our data follow the trend of the theoretical prediction and lie within or close to the confidence interval bounds of the prediction in broad regions for several participants. This is not a trivial result, as can be easily seen, when randomizing the temporal order of the trajectory points or when replacing the utility function with another one that does not fit the setup. Figure 5A,B show this, for example, for an inverted Mexican hat ((10) with \(\sigma =4\)) that assigns low utility to the target region, and for resamples of the trajectory points in a random order, respectively. Both results are clearly incompatible with the theoretical prediction.

When conducting an additional robustness analysis in Fig. 8, we found that, under the proposed utility function, participants’ behavior is compatible with Crooks’ fluctuation theorem for a broad neighbourhood of parameter settings, but breaks down when choosing implausible parameters. Regarding Jarzynski’s equality (5), the confidence intervals for the majority of participants are consistent with the theoretical prediction when using the bootstrapped values to calculate \(\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle\) (cf. Table 1). In contrast, when following the same procedure for both the inverted Mexican hat and the randomized procedure, we obtain consistency for a considerably smaller number of participants. In particular, for the inverted Mexican hat, we obtain consistency for only two participants. Moreover, these participants are \(S_8\) and \(S_9\), which belong to the group that did not reach at least \(50\%\) adaptation (indicated by the red background area in the figures). For the randomized procedure, the expected number of participants that show consistency is also close to two, although the specific participants which are consistent vary with the realization of the randomized procedure. More specifically, after 1000 runs of the randomized procedure, the mean number of consistent participants we observed was 2.33.

Table 1 Experimental results for Jarzynski’s equality. We include the confidence intervals for the left hand side of (5), which we obtain after bootstrapping the observed values of \(\Delta E_{ext}(\varvec{x})\) for the forward process 1000 times and estimating \(\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle\) by its mean for each set of bootstrapped data. In our experiment we have \(\Delta F=0\) in the right hand side of (5), resulting in a theoretical prediction of \(\langle e^{-\beta \Delta E_{ext}(\varvec{X})}\rangle =1.0\). Note, that for most subjects the value of 1.0 lies inside the confidence interval, which does not hold when assuming unsuitable loss functions, as discussed at the end of the Results. Participants that achieve at least \(50\%\) adaptation (c.f. Fig. 3) are shaded by a green background color .

Discussion

Figure 4
figure 4

Experimental results for Crooks’ fluctuation theorem when the sensorimotor loss behaves as an exponential quadratic error (1). The black line is the theoretical prediction of Crooks’ fluctuation theorem (4) while the curves stand for the mean path after 1000 bootstraps of the observed driving error values. Participants that achieve at least \(50\%\) adaptation (c.f. Fig. 3) are shaded by a green background color. The shaded areas inside the graphs are the 99% confidence intervals which result from bootstrapping. Note we fit the parameters for each participant according to Sect. A.3.3.

Figure 5
figure 5

Control results for Crooks’ fluctuation theorem in two scenarios: (A) the sensorimotor loss behaves like a Mexican hat function and (B) the sensorimotor loss behaves as an exponential quadratic error but we sample the observed angles randomly with repetition. The black line is the theoretical prediction of Crooks’ fluctuation theorem (4) while the curves stand for the mean path after 1000 bootstraps of the observed driving error values. The shaded areas inside the graphs are the 99% confidence intervals which result from bootstrapping. Note, for simplicity, we assume \(\beta =1\) for all participants when using the Mexican hat to demonstrate that the result in (A) does not trivially hold for any cost function. For (B), we fit the parameters for each participant according to Sect. A.3.3.

In our experiment we have investigated the hypothesis that human sensorimotor adaptation may be participant to the thermodynamic fluctuation theorems first reported by Crooks7 and Jarzynski20. In particular, we tested whether changes in sensorimotor error induced externally by an experimental protocol are linearly related to the log-ratio of the probabilities of behavioral trajectories under a given forward and time-reversed backward protocol of a sequence of visuomotor rotations. We found that participants’ data, in all cases where participants showed an appropriate adaptive response, was consistent with this prediction or close to its confidence interval bounds, as expected from our simulations with finite sample size. Moreover, we found that the exponentiated error averaged over the path probabilities was statistically compatible with unity for these participants, in line with Jarzynski’s theorem.

Together these results not only extend the experimental evidence of Boltzmann-like relationships between the probabilities of behavior and the corresponding order-inducing functions—such as energy, utility, or sensorimotor error—from the equilibrium to the non-equilibrium domain, but also from simple physical systems to more complex learning systems when studying adaptation in changing environments, deepening, thus, the parallelism between thermodynamics in physics and decision-making systems31.

When testing for the validity of thermodynamic relations, one of the most critical issues is the choice of the energy function, that is, in our case, the error cost function. In physical systems, the energy function is usually hypothesized following from simple models involving point masses, springs, rigid bodies, etc., and generally requires knowledge of the degrees of freedom of the system under consideration. Here we have used an exponential quadratic error as a utility function, as it has been suggested previously that human pointing behavior can be best captured by loss functions that approximately follow a negative parabola for small errors and then level off for large errors24. In the absence of very large errors, many studies in the literature on sensorimotor learning have only used the quadratic loss term48, 52. Quadratic errors have also been advocated in the context of the central limit theorem and in terms of prediction errors in the context of predictive coding36, 45,46,47. Thus, our assumptions regarding the loss function are compatible with the literature at large. Crucially, the reported results fail when assuming non-sensical cost functions, like the Mexican hat.

Experimental tests of both Jarzynski’s equality (5) and Crooks fluctuation theorem (4) have been previously reported in classical physics5, 11, 28, 37, 49 and also, in the case of Jarzynski’s equality, in quantum physics1, 44. Importantly, these results have been successfully tested in several contexts: unfolding and refolding processes involving RNA5, 28, electronic transitions between electrodes manipulating a charge parameter37, rotation of a macroscopic object inside a fluid surrounded by magnets where the current of a wire attached to the macroscopic object is manipulated11, and a trapped ion1, 44. Despite differences in physical realization, protocols, and energy functions (and thus work functions), all the above experiments follow the same basic design behind the approach presented here. This supports the claim that fluctuation theorems do not necessarily rely on involved physical assumptions but are simple mathematical properties of certain stochastic processes18, although originally they were derived in the context of non-equilibrium thermodynamics6, 19.

Mathematically, Crooks theorem (4) holds for any Markov process (i), whose initial distribution is in equilibrium (ii), and whose transition probabilities satisfy detailed balance with respect to the corresponding equilibrium distributions (iii)18. Our experimental test of Equation (4) can be seen, thus, as a test for the hypothesis that human sensorimotor adaptation processes satisfy conditions (i), (ii), and (iii). Condition (i) requires adaptation to be Markovian, which is in line with most error-driven models of sensorimotor adaptation43 that assume some internal state update of the form \(x_{t+1}=f(x_t, e)\) with adaptive state x and error e. While such models have proven fruitful for simple adaptation tasks like ours, they also have clear limitations, for example when it comes to meta-learning processes that have been reported in more complex learning scenarios2, 17. Condition (ii) is supported by our data in the second and last rows of Fig. 7, where it can be seen that participants’ behavior at the beginning of each cycle is at least approximately consistent with the equilibrium behavior recorded prior to the start of the experiment. Condition (iii) requires that the adaptive process converges to the equilibrium distribution (2) dictated by the environment and that the behaviour remains statistically unchanged when staying in that environment. Moreover, it requires that the equilibrium behavior at each energy level is time-reversible, that means, once adaptation has ceased the trial-by-trial behavior would have the same statistics when played forward or backward in a video recording. Note, however, that does not imply time-reversibility over the entire adaptation trajectory, but is only required locally for each transition step. In our sensorimotor setting, this would mean that after a suitably long adaptation time with perfect adaptation there would ultimately be no hysteresis, and accordingly it would be impossible to tell where the learner has come from. If we regard, for example, Metropolis-Hastings as a plausible model of adaptation, as some kind of stochastic optimization scheme, detailed balance and time reversibility would be fulfilled16, 38. What kind of model describes human adaptive behavior best, and whether such a model is compatible with detailed balance is ultimately an open question. In our experiment at least, the condition seems to be fulfilled well enough to stay within the confidence intervals associated with the predictions made by Crooks’ theorem.

While Jarzynski’s equality (5) directly follows from Crooks theorem, weaker assumptions are sufficient to derive it18, 19. In particular, condition (iii) regarding detailed balance is not necessary, as it is only required that the behavioral distribution does not change anymore once the equilibrium distribution is reached. Thus, Equation (5) can be used as a test for the weaker hypothesis that human sensorimotor adaptation satisfies conditions (i), (ii) and stationarity after convergence. While Jarzynski’s equality only requires samples from the forward process, Crooks theorem also tests the relation between the forward and the backward processes. In particular, Crooks theorem decouples the information processing with respect to any particular environment from the biases introduced by the adaptation history, that is, it assumes the transition probabilities for any given environment are independent of the history. In other words, the conditional probabilities have no memory and, thus, all memory effects are explained in terms of the state of the learning system prior to making some decision. Hence, the observed difference in behaviour after having adapted to the same environment, the hysteresis, is solely explained in terms of the information processing history before encountering the environment. Such hysteresis effects are not only common in simple physical systems like magnets or elastic bands, but have also been reported for sensorimotor tasks23, 40, 50. The hysteresis effects we report in Fig. 3 are in line with a system obeying Crooks theorem and can be replicated using Markov Chain Monte Carlo simulations of adaptation16.

Our study is part of a number of recent studies that have tried to harness equilibrium and non-equilibrium thermodynamics to gain new theoretical insights into simple learning systems12, 13, 31, 35, 46. For example, the information that can be acquired by learning in simple forward neural networks has been shown to be bounded by thermodynamic costs given by the entropy change in the weights and the heat dissipated into the environment42. More generally, when interpreting a system’s response to a stochastic driving signal in terms of computation, the amount of non-predictive information contained in the state about past environmental fluctuations is directly related to the amount of thermodynamic dissipation46. This suggests that thermodynamic fundamentals, like the second law, can be carried over to learning systems. Consider, for example, a Bayesian learner where the utility is given by the log-likelihood model and where the data are presented either in one chunk for a single update, or consecutively in little batches with many little updates. Rather than having one big surprise, in the latter case the cumulative surprise is much smaller as prior expectations can be continuously adapted, up to a point where the cumulative surprise reaches a lower bound given by the log-likelihood of the data, which corresponds to the free energy difference before and after learning16. Fluctuation theorems have recently also been attributed a fundamental role in the context of the Free Energy Principle, with relations to information geometry and decision-theoretic concepts like risk, ambiguity, expected information gain and expected value9, 33. Due to the central role of the concept of variational free energy in inference processes15, this raises the interesting question in how far our results may generalise to any belief-updating process, including for example perceptual inference and perceptual hysteresis. Finally, it has even been suggested that the dissipation of absorbed work as it is studied in a generalized Crooks theorem may underlie a general thermodynamic mechanism for self-organization and adaptation in living matter12, raising the question of whether such a general principle of adaptive dissipation could also govern biological learning processes35.