## Abstract

A free-energy principle has been proposed recently that accounts for action, perception and learning. This Review looks at some key brain theories in the biological (for example, neural Darwinism) and physical (for example, information theory and optimal control theory) sciences from the free-energy perspective. Crucially, one key theme runs through each of these theories — optimization. Furthermore, if we look closely at what is optimized, the same quantity keeps emerging, namely value (expected reward, expected utility) or its complement, surprise (prediction error, expected cost). This is the quantity that is optimized under the free-energy principle, which suggests that several global brain theories might be unified within a free-energy framework.

## Key points

Adaptive agents must occupy a limited repertoire of states and therefore minimize the long-term average of surprise associated with sensory exchanges with the world. Minimizing surprise enables them to resist a natural tendency to disorder.

Surprise rests on predictions about sensations, which depend on an internal generative model of the world. Although surprise cannot be measured directly, a free-energy bound on surprise can be, suggesting that agents minimize free energy by changing their predictions (perception) or by changing the predicted sensory inputs (action).

Perception optimizes predictions by minimizing free energy with respect to synaptic activity (perceptual inference), efficacy (learning and memory) and gain (attention and salience). This furnishes Bayes-optimal (probabilistic) representations of what caused sensations (providing a link to the Bayesian brain hypothesis).

Bayes-optimal perception is mathematically equivalent to predictive coding and maximizing the mutual information between sensations and the representations of their causes. This is a probabilistic generalization of the principle of efficient coding (the infomax principle) or the minimum-redundancy principle.

Learning under the free-energy principle can be formulated in terms of optimizing the connection strengths in hierarchical models of the sensorium. This rests on associative plasticity to encode causal regularities and appeals to the same synaptic mechanisms as those underlying cell assembly formation.

Action under the free-energy principle reduces to suppressing sensory prediction errors that depend on predicted (expected or desired) movement trajectories. This provides a simple account of motor control, in which action is enslaved by perceptual (proprioceptive) predictions.

Perceptual predictions rest on prior expectations about the trajectory or movement through the agent's state space. These priors can be acquired (as empirical priors during hierarchical inference) or they can be innate (epigenetic) and therefore subject to selective pressure.

Predicted motion or state transitions realized by action correspond to policies in optimal control theory and reinforcement learning. In this context, value is inversely proportional to surprise (and implicitly free energy), and rewards correspond to innate priors that constrain policies.

## References

- 1.
Huang, G. Is this a unified theory of the brain?

*New Scientist***2658**, 30–33 (2008). - 2.
Friston K., Kilner, J. & Harrison, L. A free energy principle for the brain.

*J. Physiol. Paris***100**, 70–87 (2006).**An overview of the free-energy principle that describes its motivation and relationship to generative models and predictive coding. This paper focuses on perception and the neurobiological infrastructures involved.** - 3.
Ashby, W. R. Principles of the self-organising dynamic system.

*J. Gen. Psychol.***37**, 125–128 (1947). - 4.
Nicolis, G. & Prigogine, I.

*Self-Organisation in Non-Equilibrium Systems*(Wiley, New York, 1977). - 5.
Haken, H.

*Synergistics: an Introduction. Non-Equilibrium Phase Transition and Self-Organisation in Physics, Chemistry and Biology*3rd edn (Springer, New York, 1983). - 6.
Kauffman, S.

*The Origins of Order: Self-Organization and Selection in Evolution*(Oxford Univ. Press, Oxford, 1993). - 7.
Bernard, C.

*Lectures on the Phenomena Common to Animals and Plants*(Thomas, Springfield, 1974). - 8.
Applebaum, D.

*Probability and Information: an Integrated Approach*(Cambridge Univ. Press, Cambridge, UK, 2008). - 9.
Evans, D. J. A non-equilibrium free energy theorem for deterministic systems.

*Mol. Physics***101**, 15551–11554 (2003). - 10.
Crauel, H. & Flandoli, F. Attractors for random dynamical systems.

*Probab. Theory Relat. Fields***100**, 365–393 (1994). - 11.
Feynman, R. P.

*Statistical Mechanics: a Set of Lectures*(Benjamin, Reading, Massachusetts, 1972). - 12.
Hinton, G. E. & von Cramp, D. Keeping neural networks simple by minimising the description length of weights.

*Proc. 6th Annu. ACM Conf. Computational Learning Theory*5–13 (1993). - 13.
MacKay. D. J. C. Free-energy minimisation algorithm for decoding and cryptoanalysis.

*Electron. Lett.***31**, 445–447 (1995). - 14.
Neal, R. M. & Hinton, G. E. in

*Learning in Graphical Models*(ed. Jordan, M. I.) 355–368 (Kluwer Academic, Dordrecht, 1998). - 15.
Itti, L. & Baldi, P. Bayesian surprise attracts human attention.

*Vision Res.***49**, 1295–1306 (2009). - 16.
Friston, K., Daunizeau, J. & Kiebel, S. Active inference or reinforcement learning?

*PLoS ONE***4**, e6421 (2009). - 17.
Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation.

*Trends Neurosci.***27**, 712–719 (2004).**A nice review of Bayesian theories of perception and sensorimotor control. Its focus is on Bayes optimality in the brain and the implicit nature of neuronal representations.** - 18.
von Helmholtz, H. in

*Treatise on Physiological Optics*Vol. III 3rd edn (Voss, Hamburg, 1909). - 19.
MacKay, D. M. in

*Automata Studies*(eds Shannon, C. E. & McCarthy, J.) 235–251 (Princeton Univ. Press, Princeton, 1956). - 20.
Neisser, U.

*Cognitive Psychology*(Appleton-Century-Crofts, New York, 1967). - 21.
Gregory, R. L. Perceptual illusions and brain models.

*Proc. R. Soc. Lond. B Biol. Sci.***171**, 179–196 (1968). - 22.
Gregory, R. L. Perceptions as hypotheses.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***290**, 181–197 (1980). - 23.
Ballard, D. H., Hinton, G. E. & Sejnowski, T. J. Parallel visual computation.

*Nature***306**, 21–26 (1983). - 24.
Kawato, M., Hayakawa, H. & Inui, T. A forward-inverse optics model of reciprocal connections between visual areas.

*Network: Computation in Neural Systems***4**, 415–422 (1993). - 25.
Dayan, P., Hinton, G. E. & Neal, R. M. The Helmholtz machine.

*Neural Comput.***7**, 889–904 (1995).**This paper introduces the central role of generative models and variational approaches to hierarchical self-supervised learning and relates this to the function of bottom-up and top-down cortical processing pathways.** - 26.
Lee, T. S. & Mumford, D. Hierarchical Bayesian inference in the visual cortex.

*J. Opt. Soc. Am. A Opt. Image Sci. Vis.***20**, 1434–1448 (2003). - 27.
Kersten, D., Mamassian, P. & Yuille, A. Object perception as Bayesian inference.

*Annu. Rev. Psychol.***55**, 271–304 (2004). - 28.
Friston, K. J. A theory of cortical responses.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***360**, 815–836 (2005). - 29.
Beal, M. J.

*Variational Algorithms for Approximate Bayesian Inference*. Thesis, University College London (2003). - 30.
Efron, B. & Morris, C. Stein's estimation rule and its competitors – an empirical Bayes approach.

*J. Am. Stats. Assoc.***68**, 117–130 (1973). - 31.
Kass, R. E. & Steffey, D. Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models).

*J. Am. Stat. Assoc.***407**, 717–726 (1989). - 32.
Zeki, S. & Shipp, S. The functional logic of cortical connections.

*Nature***335**, 311–317 (1988).**Describes the functional architecture of cortical hierarchies with a focus on patterns of anatomical connections in the visual cortex. It emphasizes the role of functional segregation and integration (that is, message passing among cortical areas).** - 33.
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex.

*Cereb. Cortex***1**, 1–47 (1991). - 34.
Mesulam, M. M. From sensation to cognition.

*Brain***121**, 1013–1052 (1998). - 35.
Sanger, T. Probability density estimation for the interpretation of neural population codes.

*J. Neurophysiol.***76**, 2790–2793 (1996). - 36.
Zemel, R., Dayan, P. & Pouget, A. Probabilistic interpretation of population code.

*Neural Comput.***10**, 403–430 (1998). - 37.
Paulin, M. G. Evolution of the cerebellum as a neuronal machine for Bayesian state estimation.

*J. Neural Eng.***2**, S219–S234 (2005). - 38.
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes.

*Nature Neurosci.***9**, 1432–1438 (2006). - 39.
Friston, K., Mattout, J., Trujillo-Barreto, N., Ashburner, J. & Penny, W. Variational free energy and the Laplace approximation.

*Neuroimage***34**, 220–234 (2007). - 40.
Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive field effects.

*Nature Neurosci.***2**, 79–87 (1998).**Applies predictive coding to cortical processing to provide a compelling account of extra-classical receptive fields in the visual system. It emphasizes the importance of top-down projections in providing predictions, by modelling perceptual inference.** - 41.
Mumford, D. On the computational architecture of the neocortex. II. The role of cortico-cortical loops.

*Biol. Cybern.***66**, 241–251 (1992). - 42.
Friston, K. Hierarchical models in the brain.

*PLoS Comput. Biol.***4**, e1000211 (2008). - 43.
Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P. & Woods, D. L. Shape perception reduces activity in human primary visual cortex.

*Proc. Natl Acad. Sci. USA***99**, 15164–15169 (2002). - 44.
Garrido, M. I., Kilner, J. M., Kiebel, S. J. & Friston, K. J. Dynamic causal modeling of the response to frequency deviants.

*J. Neurophysiol.***101**, 2620–2631 (2009). - 45.
Sherman, S. M. & Guillery, R. W. On the actions that one nerve cell can have on another: distinguishing “drivers” from “modulators”.

*Proc. Natl Acad. Sci. USA***95**, 7121–7126 (1998). - 46.
Angelucci, A. & Bressloff, P. C. Contribution of feedforward, lateral and feedback connections to the classical receptive field center and extra-classical receptive field surround of primate V1 neurons.

*Prog. Brain Res.***154**, 93–120 (2006). - 47.
Grossberg, S. Towards a unified theory of neocortex: laminar cortical circuits for vision and cognition.

*Prog. Brain Res.***165**, 79–104 (2007). - 48.
Grossberg, S. & Versace, M. Spikes, synchrony, and attentive learning by laminar thalamocortical circuits.

*Brain Res.***1218**, 278–312 (2008). - 49.
Barlow, H. in

*Sensory Communication*(ed. Rosenblith, W.) 217–234 (MIT Press, Cambridge, Massachusetts, 1961). - 50.
Linsker, R. Perceptual neural organisation: some approaches based on network models and information theory.

*Annu. Rev. Neurosci.***13**, 257–281 (1990). - 51.
Oja, E. Neural networks, principal components, and subspaces.

*Int. J. Neural Syst.***1**, 61–68 (1989). - 52.
Bell, A. J. & Sejnowski, T. J. An information maximisation approach to blind separation and blind de-convolution.

*Neural Comput.***7**, 1129–1159 (1995). - 53.
Atick, J. J. & Redlich, A. N. What does the retina know about natural scenes?

*Neural Comput.***4**, 196–210 (1992). - 54.
Optican, L. & Richmond, B. J. Temporal encoding of two-dimensional patterns by single units in primate inferior cortex. III Information theoretic analysis.

*J. Neurophysiol.***57**, 132–146 (1987). - 55.
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.

*Nature***381**, 607–609 (1996). - 56.
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation.

*Annu. Rev. Neurosci.***24**, 1193–1216 (2001).**A nice review of information theory in visual processing. It covers natural scene statistics and empirical tests of the efficient coding hypothesis in individual neurons and populations of neurons.** - 57.
Friston, K. J. The labile brain. III. Transients and spatio-temporal receptive fields.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***355**, 253–265 (2000). - 58.
Bialek, W., Nemenman, I. & Tishby, N. Predictability, complexity, and learning.

*Neural Comput.***13**, 2409–2463 (2001). - 59.
Lewen, G. D., Bialek, W. & de Ruyter van Steveninck, R. R. Neural coding of naturalistic motion stimuli.

*Network***12**, 317–329 (2001). - 60.
Laughlin, S. B. Efficiency and complexity in neural coding.

*Novartis Found. Symp.***239**, 177–187 (2001). - 61.
Tipping, M. E. Sparse Bayesian learning and the Relevance Vector Machine.

*J. Machine Learn. Res.***1**, 211–244 (2001). - 62.
Paus, T., Keshavan, M. & Giedd, J. N. Why do many psychiatric disorders emerge during adolescence?

*Nature Rev. Neurosci.***9**, 947–957 (2008). - 63.
Gilestro, G. F., Tononi, G. & Cirelli, C. Widespread changes in synaptic markers as a function of sleep and wakefulness in

*Drosophila*.*Science***324**, 109–112 (2009). - 64.
Roweis, S. & Ghahramani, Z. A unifying review of linear Gaussian models.

*Neural Comput.***11**, 305–345 (1999). - 65.
Hebb, D. O.

*The Organization of Behaviour*(Wiley, New York, 1949). - 66.
Paulsen, O. & Sejnowski, T. J. Natural patterns of activity and long-term synaptic plasticity.

*Curr. Opin. Neurobiol.***10**, 172–179 (2000). - 67.
von der Malsburg, C.

*The Correlation Theory of Brain Function*. Internal Report 81–82, Dept. Neurobiology, Max-Planck-Institute for Biophysical Chemistry (1981). - 68.
Singer, W. & Gray, C. M. Visual feature integration and the temporal correlation hypothesis.

*Annu. Rev. Neurosci.***18**, 555–586 (1995). - 69.
Bienenstock, E. L., Cooper, L. N. & Munro, P. W. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex.

*J. Neurosci.***2**, 32–48 (1982). - 70.
Abraham, W. C. & Bear, M. F. Metaplasticity: the plasticity of synaptic plasticity.

*Trends Neurosci.***19**, 126–130 (1996). - 71.
Pareti, G. & De Palma, A. Does the brain oscillate? The dispute on neuronal synchronization.

*Neurol. Sci.***25**, 41–47 (2004). - 72.
Leutgeb, S., Leutgeb, J. K., Moser, M. B. & Moser, E. I. Place cells, spatial maps and the population code for memory.

*Curr. Opin. Neurobiol.***15**, 738–746 (2005). - 73.
Durstewitz, D. & Seamans, J. K. Beyond bistability: biophysics and temporal dynamics of working memory.

*Neuroscience***139**, 119–133 (2006). - 74.
Anishchenko, A. & Treves, A. Autoassociative memory retrieval and spontaneous activity bumps in small-world networks of integrate-and-fire neurons.

*J. Physiol. Paris***100**, 225–236 (2006). - 75.
Abbott, L. F., Varela, J. A., Sen, K. & Nelson, S. B. Synaptic depression and cortical gain control.

*Science***275**, 220–224 (1997). - 76.
Yu, A. J. & Dayan, P. Uncertainty, neuromodulation and attention.

*Neuron***46**, 681–692 (2005). - 77.
Doya, K. Metalearning and neuromodulation.

*Neural Netw.***15**, 495–506 (2002). - 78.
Chawla, D., Lumer, E. D. & Friston, K. J. The relationship between synchronization among neuronal populations and their mean activity levels.

*Neural Comput.***11**, 1389–1411 (1999). - 79.
Fries, P., Womelsdorf, T., Oostenveld, R. & Desimone, R. The effects of visual stimulation and selective visual attention on rhythmic neuronal synchronization in macaque area V4.

*J. Neurosci.***28**, 4823–4835 (2008). - 80.
Womelsdorf, T. & Fries, P. Neuronal coherence during selective attentional processing and sensory-motor integration.

*J. Physiol. Paris***100**, 182–193 (2006). - 81.
Desimone, R. Neural mechanisms for visual memory and their role in attention.

*Proc. Natl Acad. Sci. USA***93**, 13494–13499 (1996).**A nice review of mnemonic effects (such as repetition suppression) on neuronal responses and how they bias the competitive interactions between stimulus representations in the cortex. It provides a good perspective on attentional mechanisms in the visual system that is empirically grounded.** - 82.
Treisman, A. Feature binding, attention and object perception.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***353**, 1295–1306 (1998). - 83.
Maunsell, J. H. & Treue, S. Feature-based attention in visual cortex.

*Trends Neurosci.***29**, 317–322 (2006). - 84.
Spratling, M. W. Predictive-coding as a model of biased competition in visual attention.

*Vision Res.***48**, 1391–1408 (2008). - 85.
Reynolds, J. H. & Heeger, D. J. The normalization model of attention.

*Neuron***61**, 168–185 (2009). - 86.
Schroeder, C. E., Mehta, A. D. & Foxe, J. J. Determinants and mechanisms of attentional modulation of neural processing.

*Front. Biosci.***6**, D672–D684 (2001). - 87.
Hirayama, J., Yoshimoto, J. & Ishii, S. Bayesian representation learning in the cortex regulated by acetylcholine.

*Neural Netw.***17**, 1391–1400 (2004). - 88.
Edelman, G. M. Neural Darwinism: selection and reentrant signaling in higher brain function.

*Neuron***10**, 115–125 (1993). - 89.
Knobloch, F. Altruism and the hypothesis of meta-selection in human evolution.

*J. Am. Acad. Psychoanal.***29**, 339–354 (2001). - 90.
Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & Edelman, G. M. Value-dependent selection in the brain: simulation in a synthetic neural model.

*Neuroscience***59**, 229–243 (1994). - 91.
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction.

*Psychol. Rev.***88**, 135–170 (1981). - 92.
Montague, P. R., Dayan, P., Person, C. & Sejnowski, T. J. Bee foraging in uncertain environments using predictive Hebbian learning.

*Nature***377**, 725–728 (1995).**A computational treatment of behaviour that combines ideas from optimal control theory and dynamic programming with the neurobiology of reward. This provided an early example of value learning in the brain.** - 93.
Schultz, W. Predictive reward signal of dopamine neurons.

*J. Neurophysiol.***80**, 1–27 (1998). - 94.
Daw, N. D. & Doya, K. The computational neurobiology of learning and reward.

*Curr. Opin. Neurobiol.***16**, 199–204 (2006). - 95.
Redgrave, P. & Gurney, K. The short-latency dopamine signal: a role in discovering novel actions?

*Nature Rev. Neurosci.***7**, 967–975 (2006). - 96.
Berridge, K. C. The debate over dopamine's role in reward: the case for incentive salience.

*Psychopharmacology (Berl.)***191**, 391–431 (2007). - 97.
Sella, G. & Hirsh, A. E. The application of statistical physics to evolutionary biology.

*Proc. Natl Acad. Sci. USA***102**, 9541–9546 (2005). - 98.
Rescorla, R. A. & Wagner, A. R. in

*Classical Conditioning II: Current Research and Theory*(eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton Century Crofts, New York, 1972). - 99.
Bellman, R. On the Theory of Dynamic Programming.

*Proc. Natl Acad. Sci. USA***38**, 716–719 (1952). - 100.
Watkins, C. J. C. H. & Dayan, P. Q-learning.

*Mach. Learn.***8**, 279–292 (1992). - 101.
Todorov, E. in

*Advances in Neural Information Processing Systems*(eds Scholkopf, B., Platt, J. & Hofmann T.)**19**, 1369–1376 (MIT Press, 2006). - 102.
Camerer, C. F. Behavioural studies of strategic thinking in games.

*Trends Cogn. Sci.***7**, 225–231 (2003). - 103.
Smith, J. M. & Price, G. R. The logic of animal conflict.

*Nature***246**, 15–18 (1973). - 104.
Nash, J. Equilibrium points in n-person games.

*Proc. Natl Acad. Sci. USA***36**, 48–49 (1950). - 105.
Wolpert, D. M. & Miall, R. C. Forward models for physiological motor control.

*Neural Netw.***9**, 1265–1279 (1996). - 106.
Todorov, E. & Jordan, M. I. Smoothness maximization along a predefined path accurately predicts the speed profiles of complex arm movements.

*J. Neurophysiol.***80**, 696–714 (1998). - 107.
Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., Shadmehr, R. & Bastian, A. J. Sensory prediction-errors drive cerebellum-dependent adaptation of reaching.

*J. Neurophysiol.***98**, 54–62 (2007). - 108.
Bays, P. M. & Wolpert, D. M. Computational principles of sensorimotor control that minimize uncertainty and variability.

*J. Physiol.***578**, 387–396 (2007).**A nice overview of computational principles in motor control. Its focus is on representing uncertainty and optimal estimation when extracting the sensory information required for motor planning.** - 109.
Shadmehr, R. & Krakauer, J. W. A computational neuroanatomy for motor control.

*Exp. Brain Res.***185**, 359–381 (2008). - 110.
Verschure, P. F., Voegtlin, T. & Douglas, R. J. Environmentally mediated synergy between perception and behaviour in mobile robots.

*Nature***425**, 620–624 (2003). - 111.
Cohen, J. D., McClure, S. M. & Yu, A. J. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***362**, 933–942 (2007). - 112.
Ishii, S., Yoshida, W. & Yoshimoto, J. Control of exploitation-exploration meta-parameter in reinforcement learning.

*Neural Netw.***15**, 665–687 (2002). - 113.
Usher, M., Cohen, J. D., Servan-Schreiber, D., Rajkowski, J. & Aston-Jones, G. The role of locus coeruleus in the regulation of cognitive performance.

*Science***283**, 549–554 (1999). - 114.
Voigt, C. A., Kauffman, S. & Wang, Z. G. Rational evolutionary design: the theory of

*in vitro*protein evolution.*Adv. Protein Chem.***55**, 79–160 (2000). - 115.
Freeman, W. J. Characterization of state transitions in spatially distributed, chaotic, nonlinear, dynamical systems in cerebral cortex.

*Integr. Physiol. Behav. Sci.***29**, 294–306 (1994). - 116.
Tsuda, I. Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systems.

*Behav. Brain Sci.***24**, 793–810 (2001). - 117.
Jirsa, V. K., Friedrich, R., Haken, H. & Kelso, J. A. A theoretical model of phase transitions in the human brain.

*Biol. Cybern.***71**, 27–35 (1994).**This paper develops a theoretical model (based on synergetics and nonlinear oscillator theory) that reproduces observed dynamics and suggests a formulation of biophysical coupling among brain systems.** - 118.
Breakspear, M. & Stam, C. J. Dynamics of a neural system with a multiscale architecture.

*Philos. Trans. R. Soc. Lond. B Biol. Sci.***360**, 1051–1074 (2005). - 119.
Bressler, S. L. & Tognoli, E. Operational principles of neurocognitive networks.

*Int. J. Psychophysiol.***60**, 139–148 (2006). - 120.
Werner, G. Brain dynamics across levels of organization.

*J. Physiol. Paris***101**, 273–279 (2007). - 121.
Pasquale, V., Massobrio, P., Bologna, L. L., Chiappalone, M. & Martinoia, S. Self-organization and neuronal avalanches in networks of dissociated cortical neurons.

*Neuroscience***153**, 1354–1369 (2008). - 122.
Kitzbichler, M. G., Smith, M. L., Christensen, S. R. & Bullmore, E. Broadband criticality of human brain network synchronization.

*PLoS Comput. Biol.***5**, e1000314 (2009). - 123.
Rabinovich, M., Huerta, R. & Laurent, G. Transient dynamics for neural processing.

*Science***321**48–50 (2008). - 124.
Tschacher, W. & Hake, H. Intentionality in non-equilibrium systems? The functional aspects of self-organised pattern formation.

*New Ideas Psychol.***25**, 1–15 (2007). - 125.
Maturana, H. R. & Varela, F.

*De máquinas y seres vivos*(Editorial Universitaria, Santiago, 1972). English translation available in Maturana, H. R. & Varela, F. in*Autopoiesis and Cognition*(Reidel, Dordrecht, 1980). - 126.
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons.

*Science***299**, 1898–1902 (2003). - 127.
Niv, Y., Duff, M. O. & Dayan, P. Dopamine, uncertainty and TD learning.

*Behav. Brain Funct.***1**, 6 (2005). - 128.
Fletcher, P. C. & Frith, C. D. Perceiving is believing: a Bayesian approach to explaining the positive symptoms of schizophrenia.

*Nature Rev. Neurosci.***10**, 48–58 (2009). - 129.
Phillips, W. A. & Silverstein, S. M. Convergence of biological and psychological perspectives on cognitive coordination in schizophrenia.

*Behav. Brain Sci.***26**, 65–82 (2003). - 130.
Friston, K. & Kiebel, S. Cortical circuits for perceptual inference.

*Neural Netw.***22**, 1093–1104 (2009).

## Acknowledgements

This work was funded by the Wellcome Trust. I would like to thank my colleagues at the Wellcome Trust Centre for Neuroimaging, the Institute of Cognitive Neuroscience and the Gatsby Computational Neuroscience Unit for collaborations and discussions.

## Author information

## Affiliations

### The Wellcome Trust Centre for Neuroimaging, University College London, Queen Square, London, WC1N 3BG, UK. k.friston@fil.ion.ucl.ac.uk

- Karl Friston

## Authors

### Search for Karl Friston in:

### Competing interests

The author declares no competing financial interests.

## Supplementary information

## PDF files

- 1.
### Supplementary information S1 (box)

The entropy of sensory states and their causes

- 2.
### Supplementary information S2 (box)

Variational free energy

- 3.
### Supplementary information S3 (box)

The free-energy principle and infomax

- 4.
### Supplementary information S4 (box)

Value and surprise

- 5.
### Supplementary information S5 (box)

Policies and cost

## Glossary

- Free energy
An information theory measure that bounds or limits (by being greater than) the surprise on sampling some data, given a generative model.

- Homeostasis
The process whereby an open or closed system regulates its internal environment to maintain its states within bounds.

- Entropy
The average surprise of outcomes sampled from a probability distribution or density. A density with low entropy means that, on average, the outcome is relatively predictable. Entropy is therefore a measure of uncertainty.

- Surprise
(Surprisal or self information.) The negative log-probability of an outcome. An improbable outcome (for example, water flowing uphill) is therefore surprising.

- Fluctuation theorem
(A term from statistical mechanics.) Deals with the probability that the entropy of a system that is far from the thermodynamic equilibrium will increase or decrease over a given amount of time. It states that the probability of the entropy decreasing becomes exponentially smaller with time.

- Attractor
A set to which a dynamical system evolves after a long enough time. Points that get close to the attractor remain close, even under small perturbations.

- Kullback-Leibler divergence
(Or information divergence, information gain or cross entropy.) A non-commutative measure of the non-negative difference between two probability distributions.

- Recognition density
(Or 'approximating conditional density'.) An approximate probability distribution of the causes of data (for example, sensory input). It is the product of inference or inverting a generative model.

- Generative model
A probabilistic model (joint density) of the dependencies between causes and consequences (data), from which samples can be generated. It is usually specified in terms of the likelihood of data, given their causes (parameters of a model) and priors on the causes.

- Conditional density
(Or posterior density.) The probability distribution of causes or model parameters, given some data; that is, a probabilistic mapping from observed data to causes.

- Prior
The probability distribution or density of the causes of data that encodes beliefs about those causes before observing the data.

- Bayesian surprise
A measure of salience based on the Kullback-Leibler divergence between the recognition density (which encodes posterior beliefs) and the prior density. It measures the information that can be recognized in the data.

- Bayesian brain hypothesis
The idea that the brain uses internal probabilistic (generative) models to update posterior beliefs, using sensory information, in an (approximately) Bayes-optimal fashion.

- Analysis by synthesis
Any strategy (in speech coding) in which the parameters of a signal coder are evaluated by decoding (synthesizing) the signal and comparing it with the original input signal.

- Epistemological automata
Possibly the first theory for why top-down influences (mediated by backward connections in the brain) might be important in perception and cognition.

- Empirical prior
A prior induced by hierarchical models; empirical priors provide constraints on the recognition density in the usual way but depend on the data.

- Sufficient statistics
Quantities that are sufficient to parameterize a probability density (for example, mean and covariance of a Gaussian density).

- Laplace assumption
(Or Laplace approximation or method.) A saddle-point approximation of the integral of an exponential function, that uses a second-order Taylor expansion. When the function is a probability density, the implicit assumption is that the density is approximately Gaussian.

- Predictive coding
A tool used in signal processing for representing a signal using a linear predictive (generative) model. It is a powerful speech analysis technique and was first considered in vision to explain lateral interactions in the retina.

- Infomax
An optimization principle for neural networks (or functions) that map inputs to outputs. It says that the mapping should maximize the Shannon mutual information between the inputs and outputs, subject to constraints and/or noise processes.

- Stochastic
Governed by random effects.

- Biased competition
An attentional effect mediated by competitive interactions among neurons representing visual stimuli; these interactions can be biased in favour of behaviourally relevant stimuli by both spatial and non-spatial and both bottom-up and top-down processes.

- Reentrant signalling
Reciprocal message passing among neuronal groups.

- Reinforcement learning
An area of machine learning concerned with how an agent maximizes long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to actions performed by the agent.

- Optimal control theory
An optimization method (based on the calculus of variations) for deriving an optimal control law in a dynamical system. A control problem includes a cost function that is a function of state and control variables.

- Bellman equation
(Or dynamic programming equation.) Named after Richard Bellman, it is a necessary condition for optimality associated with dynamic programming in optimal control theory.

- Optimal decision theory
(Or game theory.) An area of applied mathematics concerned with identifying the values, uncertainties and other constraints that determine an optimal decision.

- Gradient ascent
(Or method of steepest ascent.) A first-order optimization scheme that finds a maximum of a function by changing its arguments in proportion to the gradient of the function at the current value. In short, a hill-climbing scheme. The opposite scheme is a gradient descent.

- Principle of optimality
An optimal policy has the property that whatever the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

- Exploration–exploitation trade-off
Involves a balance between exploration (of uncharted territory) and exploitation (of current knowledge). In reinforcement learning, it has been studied mainly through the multi-armed bandit problem.

- Dynamical systems theory
An area of applied mathematics that describes the behaviour of complex (possibly chaotic) dynamical systems as described by differential or difference equations.

- Synergetics
Concerns the self-organization of patterns and structures in open systems far from thermodynamic equilibrium. It rests on the order parameter concept, which was generalized by Haken to the enslaving principle: that is, the dynamics of fast-relaxing (stable) modes are completely determined by the 'slow' dynamics of order parameters (the amplitudes of unstable modes).

- Autopoietic
Referring to the fundamental dialectic between structure and function.

- Helmholtzian
Refers to a device or scheme that uses a generative model to furnish a recognition density and learns hidden structures in data by optimizing the parameters of generative models.

## About this article

### Publication history

#### Published

### DOI

https://doi.org/10.1038/nrn2787

### Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.