Sensory perception relies on fitness-maximizing codes

Schaffner, Jonathan; Bao, Sherry Dongqi; Tobler, Philippe N.; Hare, Todd A.; Polania, Rafael

doi:10.1038/s41562-023-01584-y

Download PDF

Article
Open access
Published: 27 April 2023

Sensory perception relies on fitness-maximizing codes

Nature Human Behaviour volume 7, pages 1135–1151 (2023)Cite this article

8873 Accesses
7 Citations
100 Altmetric
Metrics details

Subjects

Abstract

Sensory information encoded by humans and other organisms is generally presumed to be as accurate as their biological limitations allow. However, perhaps counterintuitively, accurate sensory representations may not necessarily maximize the organism’s chances of survival. To test this hypothesis, we developed a unified normative framework for fitness-maximizing encoding by combining theoretical insights from neuroscience, computer science, and economics. Behavioural experiments in humans revealed that sensory encoding strategies are flexibly adapted to promote fitness maximization, a result confirmed by deep neural networks with information capacity constraints trained to solve the same task as humans. Moreover, human functional MRI data revealed that novel behavioural goals that rely on object perception induce efficient stimulus representations in early sensory structures. These results suggest that fitness-maximizing rules imposed by the environment are applied at early stages of sensory processing in humans and machines.

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Johan M. Melis, Igor Siwanowicz & Michael H. Dickinson

Collective intelligence: A unifying concept for integrating biology across scales and substrates

Article Open access 28 March 2024

Patrick McMillen & Michael Levin

Perceptography unveils the causal contribution of inferior temporal cortex to visual perception

Article Open access 18 April 2024

Elia Shahbazi, Timothy Ma, … Arash Afraz

Main

One of the main goals of the neural and behavioural sciences is to understand what general principles explain the solutions evolution has selected to extract and process information from the environment to guide behaviour. Half a century ago, it was postulated that neural systems should represent the sensory world as accurately and efficiently as possible by exploiting information about the statistical regularities of the environment, an idea known as efficient coding^1,2.

Efficient coding in sensory perception is typically assumed to be based on an information maximization criterion—that is, the sensory world must be represented as accurately as possible. One may think that this criterion makes sense for early sensory systems, as this is precisely the role of a sensor: a good measurement instrument must reliably measure the environmental variable that it was built for. However, the information maximization criterion does not necessarily consider the behavioural goals of the organism^3,4,5,6,7.

Is it reasonable that our sensory systems invest their limited resources to represent the world as accurately as possible irrespective of the organism’s goals? This question has kept scientists and philosophers busy for centuries and led to heated debates across various fields and domains including neuroscience, psychology, economics and evolutionary biology⁸. Some views support the idea that organisms should represent objects as they exist in the world, as closely as biological limitations allow^9,10. Others posit that perceptual representations should be in general different from the actual physical world, and these representations should directly map onto the utility they offer to the agents^11,12,13.

In partial support of the latter idea, recent neurophysiological evidence shows that early sensory systems represent not only information about physical sensory inputs but also non-sensory information according to the requirements of a specific task and the behavioural relevance of the stimuli^{14,15,16,17,18}. This does not necessarily imply that sensory systems should give up representing the ‘veridical’ world, as it has been demonstrated that neural systems can develop computational strategies that allow representing multiple behaviourally relevant features alongside objective sensory information^19,20. However, this line of research provides no indication of the actual benefit of having such mixed neural representations at the earliest stages of sensory processing, or how this information could be used to efficiently guide behaviour, given that we are limited in our capacity to process information.

The study of how systems should trade off the maximization of some utility function relevant to the goals of the organism against information-processing constraints is part of a growing body of research inspired by the work of Shannon, who put forward the idea that when optimizing a distortion function that characterizes the cost of particular errors, not all such errors are equally important. This implies that the unreliability of signal transmission is not necessarily uniform across the space of possible messages that can be transmitted²¹. Similar concepts have been borrowed from the field of statistical mechanics, where information processing in capacity-limited systems can be modelled as the energy required to move away from default states in thermodynamic systems, which can be quantified by differences in free energy^22,23. These principled approaches have played a fundamental role in neural process theories of early sensory systems^{24,25,26,27,28} as well as higher cognitive functions^{29,30,31,32,33,34,35}.

Our work builds on these normative theoretical principles to determine how a (neural) system should allocate information-processing resources to maximize fitness in different situations. We focus on two of the most common problems studied in decision-making: accuracy maximization in perceptual discrimination tasks and reward maximization for situations in which a particular attribute is related to a given currency value. This allowed us to test the following hypothesis: given that noisy communication channels always lose information during transmission, the brain will adapt to the fitness-maximizing rules of a particular environment at the earliest stages of sensory processing. We demonstrate that early visual structures in humans and artificial agents with sensory information-processing bottlenecks follow fitness-maximizing encoding schemes.

Results

Neural codes in an insect’s retina

Before moving on to humans, we introduce an illustrative example, as anecdotal evidence, to motivate the theoretical framework applied in our experiments. Specifically, we studied the responses of retinal neurons in the blowfly—the large monopolar cells (LMCs)—which encode sensory information about visual contrast levels. These neural codes are considered the first demonstration of efficient coding in biological organisms³⁶.

Visual features such as shape, colour and texture are important sensory signals that insects use to discriminate between competing flowers and fruit species, with visual contrast playing a key role^37,38. In our example, we assume that blowflies use knowledge of the different levels of contrast displayed by flowers and fruits to select food sources that promise more beneficial nutrients (that is, reward). In other words, we assume that there is a monotonic association between contrast and reward that makes some contrast discrimination mistakes more costly than others. Please note that this corresponds to the standard and most studied class of economic problems, where choices with a particular attribute are monotonically related to a given currency value. The hypothesis that we test here is that a neural code that simply maximizes information accuracy in the LMCs would not maximize the fitness of the organism.

Concretely, we studied the following problem. Suppose that the distribution of contrasts encountered by the blowfly in its natural environment is given by f(s). We define the function that transforms the contrast stimulus input s to neural responses r in the blowfly retina as r = h(s). Then, what is the optimal neural response shape h(s) if, given biological limitations, such a function can only generate a limited set of neural responses? Under this formulation, the following problem can be studied³⁹: find the optimal neural response function under two evolutionary optimization criteria, (1) the probability of mistakes minimization criterion and (2) the expected reward loss minimization criterion.

To solve this problem, we assume that the organism must make choices between alternatives drawn from the stimulus distribution, f(s), which describes the relative availability of the different alternatives in its environment (for example, how often a blowfly encounters a particular flower). The goal is to select the alternative that promises more reward to the organism, as this should lead the organism to maximize its fitness^3,39. In the case of the blowfly LMCs, one may suppose that the blowfly must often make fine discriminations and that different contrast levels are monotonically related to different reward values.

On the one hand, if the goal of the organism is to minimize the number of erroneous responses (that is, maximize discrimination accuracy between two stimuli s₁ and s₂)

$$\mathop{\min }\limits_{h}\quad \iint \,f({s}_{1},{s}_{2})\,P({{{\rm{error}}}}| h({s}_{1}),h({s}_{2})){\mathrm{d}}{s}_{1}{\mathrm{d}}{s}_{2},$$

(1)

it can be shown that the optimal neural response h(s) matches the cumulative distribution function (CDF) of the stimulus distribution (Fig. 1b; see also equation (11) in the Methods). However, this accuracy maximization strategy does not provide a precise account of the distribution of neural responses in the blowfly (Fig. 1).

**Fig. 1: Blowfly LMC responses are better explained by fitness than information maximization coding schemes.**

On the other hand, if the goal of the organism is to minimize expected reward loss (that is, maximize the amount of reward received after many decisions; see Supplementary Note 1 for derivation)

$$\mathop{\min }\limits_{h}\quad \iint \,f({s}_{1},{s}_{2})\,P({{{\rm{error}}}}| h({s}_{1}),h({s}_{2}))| {s}_{1}-{s}_{2}| {\mathrm{d}}{s}_{1}{\mathrm{d}}{s}_{2},$$

(2)

the optimal fitness-maximizing neural response h(s) provides a nearly perfect account of the neural responses of the blowfly retina (Fig. 1b,c and Methods). We emphasize that the remarkable overlap between the fitness-maximizing predictions and the empirical neural responses presented in Fig. 1 are not the product of curve fitting; instead, these predictions emerge from the normative decision model, which has no degrees of freedom (Methods).

A common approach adopted in computer science and neuroscience to study the way in which a system penalizes estimation mistakes to optimize performance is via the L_p loss function defined as $| \hat{s}(r)-s{| }^{p}$, where $\hat{s}$ is the sensory estimation, s is the true signal and p determines how errors are penalized. A recent parameter estimation study asked what type of error penalty best explained the LMC data⁴⁰. However, in this study, the authors did not have explicit hypotheses for the potential evolutionary and behavioural meaning of different values for error penalties, and they instead relied on numerical estimations of p that best explained the data. We demonstrate that the error penalty that provides a nearly perfect fit to the LMC response function corresponds to the blowfly LMC encoding function that guarantees maximal reward expectation to the organism under our sensory-reward mapping assumptions (Fig. 1c and Methods).

While the predictions of reward-maximizing sensory codes and the data from blowfly retinal neurons (that is, the earliest level of encoding) show a striking similarity (Fig. 1), this result does not directly address all aspects of our hypothesis that neural codes in early sensory areas adapt to the organism’s behavioural goals. This is because we do not know the specific function linking contrast to fitness for the blowfly, and we cannot show that the code used in their retinas adapts between contexts because we have data from only one context. While we emphasize that this result should be treated at this stage as anecdotal, it inspired us to test the fitness-maximizing hypothesis more directly in humans (see below) and may inspire others to do the same in other animals.

Adaptive fitness-maximizing sensory codes in humans

To more directly test the hypothesis that sensory perception relies on adaptive fitness-maximizing codes, we implemented an experiment with more than one context in human sensory encoding. To date, it has been widely accepted that the default neural code for orientation perception in humans is information maximization (infomax) coding⁴¹, as it can be shown that this code will minimize the probability of mistakes in perceptual discrimination tasks (Supplementary Note 1). One reason that infomax coding may typically explain human perception well in regard to orientation is that, for humans, orientation information does not typically signify reward and is instead used for navigation purposes. The fitness-maximizing code for orientation perception may thus be equivalent to infomax under standard conditions. Our experiments deviate from these standard conditions to test whether sensory encoding strategies adapt in a manner predicted by the theory of fitness maximization.

We designed behavioural tasks in which, on any given trial, human participants had to choose which of two simultaneously presented orientation stimuli, s₁ or s₂, was more diagonal (that is, closer to a 45-degree angle; Fig. 2). In experiment 1, the participants were trained in two different contexts but were always tested with stimuli in the same retinotopic locations, while in experiment 2, the participants were trained in only one context or the other but were tested with stimuli in trained and untrained retinotopic locations. The key aspect of both experiments is that decisions were made in two different stimulus–reward association contexts. In one context, the participants were paid a fixed reward for correct discrimination of the more diagonal stimulus in each trial and received no reward for incorrect decisions (henceforth the accuracy context, K_acc). In the second context, the participants were rewarded depending on the stimulus s that they selected in each trial, and the amount of reward was linearly mapped to the degree of diagonality of the input stimulus (henceforth the reward context, K_rew). Crucially, the prior distribution of sensory signals f(s) was exactly the same in both contexts. Stimuli close to cardinal orientations were presented the most often to match the statistics of natural scenes that humans typically encounter⁴² (Fig. 3a). This experimental design allowed us to test the competing hypotheses that neural codes in early sensory areas (1) maximize accurate representations of the environment and are thus constant in both reward contexts, or (2) adapt between contexts to instantiate efficient coding strategies that maximize fitness. The location-specific training in experiment 2 allowed us to test whether any adaptation occurs in early sensory regions that maintain retinotopic mappings or only later in downstream circuits that generalize across locations.

**Fig. 2: Behavioural paradigms used with human participants.**

**Fig. 3: Inference model theoretical predictions.**

We employed a general method for defining efficient codes by investigating optimal allocation of limited neural resources⁴³. On the basis of this framework, sensory precision, measured as Fisher information J(s), should be proportional to the amount of resources available k and the prior distribution f(s) raised to a power q

$$J(s)\propto k\times f{\left(s\right)}^{q},$$

(3)

hence known as the power-law efficient code. We show that an advantage of employing this framework is that there is a direct link between the power-law efficient codes and the fitness maximization solutions for the contexts that we consider here (Methods and Supplementary Note 1). In brief, the connection of the power-law efficient codes with accuracy versus reward maximization objectives is the following: if the power-law parameter q is relatively low, it shifts some neural resources away from where f(s) is high and relocates them where it is low. The reason for this spreading of coding resources is that, even though observing a rare stimulus is unlikely (and thus the probability of error will be low), if there is an error, it could be a very costly mistake. Therefore, when stimuli are directly associated to rewards relative to situations in which all mistakes are equally costly, it pays to allocate more neural resources to the segments of the stimulus space where f(s) is low (Fig. 3e and Supplementary Fig. 1). This theoretical link allowed us to derive various qualitative predictions that we used to test whether humans indeed adopt fitness-maximizing, as opposed to information-maximizing, neural codes of sensory perception.

A first prediction of fitness maximization theory is related to sensory discrimination differences in the two reward association contexts considered here. If participants maximize fitness under limited resources, discrimination accuracy for diagonal (that is, oblique) relative to cardinal orientations should improve more in K_rew than in K_acc over the course of the decision task (Fig. 3e). In line with this prediction, we found an interaction between reward association context, orientation and task phase (early or late) on discrimination accuracy (β = 1.46 ± 0.65; P_MCMC < 0.001; Supplementary Tables 1 and 2). Note that this interaction is not driven by a simple increase in sensitivity in K_rew—that is, a general improvement across the whole orientation space (Fig. 3b). Despite the differences between contexts, fitness maximization theory predicts that for both optimization objectives, discriminability should be higher in regions of the stimulus distribution prior with higher density (that is, greater in cardinal than oblique orientations) because these stimuli occur more frequently in all contexts (see the thick blue and thick red lines in Fig. 3e). The data are consistent with this prediction as well (main effect of obliqueness s in K_acc: β = −4.50 ± 0.45; P_MCMC < 0.001; in K_rew: β = −5.50 ± 0.47; P_MCMC < 0.001; Fig. 4a,b). These results thus support our hypothesis that perceptual coding of sensory information uses a fitness-maximizing code.

**Fig. 4: Human performance in the binary decision task.**

To further substantiate the conclusion that changes in behaviour were driven by fitness-maximizing codes rather than experience-driven increases in sensitivity, we fit the encoding model to the choice data to estimate parameters q and k. In line with the fitness maximization predictions, we found that in context K_rew the value of k did not change (Δk = −0.0004 ± 0.0014; P_MCMC = 0.61), while q decreased between the first and last part of the decision experiment by Δq = −0.29 ± 0.15 (P_MCMC = 0.03). The final value of q was significantly smaller in K_rew than in K_acc (Δq = −0.38 ± 0.14; P_MCMC = 0.004; Fig. 4c). Taken together, our results clearly indicate that the empirically observed behavioural changes in K_rew versus K_acc were not caused by simple practice-related sensitivity enhancements or differences in monetary payoffs (Methods).

Fitness-maximizing adaptation at sensory estimation stages

We next sought to determine whether the form of efficient adaptation observed in the decision task takes place only in downstream decision circuits or whether it is already implemented at earlier processing stages—for instance, in the circuits that generate estimations of sensory stimuli. To answer this question, we had participants perform an edge orientation estimation task before and after the contextual decision-making task (Fig. 2 and Methods). After training in either context, there was a significant decrease in the estimation bias (P_MCMC = 0.02; Supplementary Fig. 2). A decrease in the estimation bias is predicted by either a fitness-maximizing code or increased sensitivity (Fig. 3c,f). However, similar to the decision task, experience-dependent changes in sensitivity versus a fitness-maximizing code make distinct predictions for the estimation task in terms of the estimation variance. Greater sensitivity would lead to lower estimation variability for all orientations (Fig. 3d). In contrast, the fitness maximization hypothesis predicts that after participants adapt to context K_rew in the decision task, estimation variability for more oblique orientations will decrease, while estimation variability for stimuli near cardinal orientations will be slightly higher (Fig. 3g). This is because the theory predicts a shift in coding resources from high-probability cardinal orientations to low-probability diagonal orientations. Crucially, fitness maximization predicts that after exposure to context K_acc, there will be no change in the relative estimation variability for cardinal versus diagonal orientations. In line with these predictions, we found a significant interaction (β = −0.16 ± 0.09; P_MCMC = 0.04) between the change in estimation variability for oblique relative to cardinal stimuli across contexts (Fig. 5a and Supplementary Table 3).

**Fig. 5: Estimation task performance.**

Fitness-maximizing codes are retinotopically specific

A key conclusion that we draw from our results is that fitness-maximizing adaptation in both the decision task and the estimation task appears to have a common origin that does not depend on comparisons between decoded stimuli in downstream decision circuits. To explicitly test whether fitness-maximizing neural codes are indeed present at the earliest stages of sensory processing, we modified the decision and estimation tasks to train and test behavioural performance in retinotopically specific locations. In the modified estimation task, the participants were presented with an orientation stimulus in one of four spatial locations (Fig. 2b). Crucially, the participants in these experiments were trained in only two of these locations during the decision task and completed the decision task in only one context (either K_rew or K_acc). If adaptation is retinotopically specific, then changes in estimation task performance should be specific to the retinotopic locations trained during the decision task. In line with the fitness-maximizing predictions, for those trained in K_rew, we found the pattern predicted by a fitness-maximizing code in the location-specific changes in estimation variability (location × time(after − before) × oblique: β = −0.12 ± 0.07; P_MCMC = 0.04; Fig. 5b and Supplementary Table 4). For those trained in K_acc, this interaction was not significant (Fig. 5c and Supplementary Table 5). A comparison across groups showed that the effect was greater in the K_rew than the K_acc group (context × location × time(after − before) × oblique: β = −0.17 ± 0.10; P_MCMC = 0.04). Together, these results confirm the retinotopic specificity of fitness-maximizing coding rules in humans.

Artificial neural networks with sensory processing bottlenecks use fitness-maximizing codes

The existence of fitness-maximizing codes in early sensory systems—where they will literally change the way an organism sees the world—must be for very good reasons. We conducted machine learning analyses with artificial neural networks (ANNs⁴⁴) to investigate whether agents with informational bottlenecks must recode their sensory representations to fitness-maximizing schemes to achieve the best performance in decision-making tasks. Alternatively, the downstream decision circuits (which are not involved in estimating orientation as such) may not care about the accuracy with which orientation can be estimated and may have enough flexibility to maximize fitness even if the encoding scheme at early sensory stages is fixed to an infomax strategy.

The precision of neural representations at different sensory processing stages can be studied using recently developed neural network techniques in machine learning. We constructed an ANN implementation to test how (that is, at what layer of processing) it incorporates the behavioural goals of the agent when encoding sensory stimuli. More specifically, our premise is that the way in which internal representations of retinal sensory information are formed and used in the nervous system can be studied with a variational information bottleneck (VIB)-like objective^45,46,47,48, where in general the goal is to minimize the following loss function:

$$\mathop{\min }\limits_{\phi ,\theta }\quad E[\,{{\mbox{reward loss}}}\,]+\beta \times I,$$

(4)

where ϕ and θ are the parameters of the encoder and downstream decision circuit, respectively. In our ANN, the VIB-like objective trades (an approximation of) the amount of ‘visual’ information I that the encoder can process with the expected reward loss, via the regularization parameter β. Note that the analytical solutions developed in our work ‘drop’ costs on I by assuming that the noise in the encoder is small compared with the dynamic range of the signal (that is, the small-noise approximation, which is commonly adopted in early sensory systems to study neural coding efficiency, often leading to satisfactory predictions⁴⁹). The reason for using the VIB-like objective in our ANNs is that it provides a parsimonious way to induce pressures in the encoder to disentangle information up to a certain bound in a systematic manner (Supplementary Fig. 3).

We implemented an ANN that solved the same task as in our human experiments (Fig. 6a; see Methods for the details). The ANN received two retinal images corresponding to screen locations where the two Gabor patches were presented in our task. Just as in the human experiment, the decision rule that the ANN had to learn was to indicate which of the two input stimuli (left or right) was more diagonal, while maximizing the reward received across many trials. We trained networks in two contexts, corresponding to the human experiments: K_acc and K_rew (Methods). After the information had been encoded, it was fed to downstream neural circuits that used the encoded information to solve the task at hand (in our case, select the Gabor patch that was more diagonal), while considering the goals of the agent within the environmental context (for example, maximize decision accuracy or maximize reward consumption; Methods).

**Fig. 6: ANN implementation and analyses.**

After the networks were trained in each context, we first investigated whether the network could disentangle the hidden structure in the retinal image statistics to solve the downstream task. By applying a t-distributed stochastic neighbour embedding (t-SNE) algorithm to the neural responses of the bottleneck structure, we found that the network indeed learned a useful representation of the scalar angular orientations from the retinal images (Fig. 6b). However, this t-SNE solution provided no direct insight into how the encoder allocated its limited resources (that is, using an infomax or fitness-maximizing code). We therefore analysed the amount of information contained in the encoder layer, quantified as Fisher information, as a function of the angular orientation. This analysis more directly illuminated the pressures shaping the learning of the latent representation in the encoder.

Mirroring the predictions of fitness-maximizing theory and the human behavioural results, we found that, in general, the amount of information in the encoder layer was larger for cardinal orientations than for diagonal orientations. The network thus allocated the limited processing resources in the bottleneck encoder to the more recurrent portions of the angular stimulus space. Crucially, we found that for ANNs trained in context K_acc relative to ANNs trained in context K_rew, the amount of information was larger for more cardinal angles but smaller for diagonal orientations (Fig. 6c). Moreover, we found that these results were insensitive to the information-processing costs imposed in the encoder (Supplementary Fig. 3). Interestingly, we found that the retinal layer in our ANN architecture in which we did not explicitly incorporate information-processing regularization (Fig. 6a) also revealed signatures of information-processing allocation similar to the ones encountered in the second retinotopic layer (although less pronounced; Supplementary Fig. 4). We also studied how the ANN allocates information-processing resources in the first and second retinotopic layers when informational bottleneck pressures are imposed at the decision-making layer. We found that the fitness-maximizing patterns were also present in this scenario in the second retinotopic layer (Supplementary Fig. 5). We found that the fitness-maximizing patterns in layer 1 were present for high levels of network performance (that is, generally low β in the decision layer; Supplementary Fig. 6). Thus, even when information-processing pressures are very small at early sensory stages (and relatively large in downstream decision layers), neural networks still try to develop fitness-maximizing codes at the early sensory stages to compensate for reward loss due to processing limitations in downstream circuits. However, these effects are more pronounced if information-processing constraints are present at early stages. This set of results indicates that the network learns to allocate its neural resources in a fitness-maximizing manner following the predictions of the algorithmic normative theory and the behaviour exhibited by human participants.

The first network analysis additionally addressed the following concern: as downstream circuits are not involved in estimating orientation as such, they may not care about the accuracy with which orientation can be estimated. The ANN architecture we implemented here addressed exactly this question because the objective function the networks sought to optimize did not explicitly incorporate “reconstruction error minimization” (as classically implemented in variational autoencoder architectures^47,48). Instead, in our ANN architecture, the network had to find encoding solutions that benefited downstream operations supporting decision behaviour. That is, all that mattered to the network was what information about the latent (angular) space was most relevant to solve the decision task at hand and maximize reward. Nevertheless, we found that ANNs learned to implement efficient coding schemes in their encoding layers that maximized reward in each context.

Having demonstrated the efficient, fitness-maximizing nature of the encoder in an ANN, we investigated whether an ANN could achieve solutions similar to the ones obtained in the fitness-maximizing context K_rew, if we forced the encoder layer to maximize information transmission (that is, to use an infomax code). To test this, we first trained an ANN to maximize decision accuracy and then froze all network weights up to the encoder, but we left the downstream network weights free to change. Theoretically, ANNs with sufficient complexity can interpolate any objective function. One could therefore hypothesize that even if the encoder is restricted to maximize information transmission, downstream circuits could still figure out solutions that maximize reward gain on the basis of the representations coming from an infomax encoder. However, a competing hypothesis comes from an information-theoretic point of view, which holds that, in sensory discrimination tasks that face a bottleneck due to limited resources (like the one we study), once information is lost or processed in a suboptimal manner in one step of a noisy transmission channel, it cannot be recovered, irrespective of how complex the downstream circuits are. In line with the predictions from information theory, we found that freezing the encoder layers after training them in context K_acc resulted in a significant reward loss when downstream layers were retrained in context K_rew (Fig. 6d). Once we unfroze the encoder layer (that is, allowed it to depart from the K_acc constraint and use fitness-maximizing codes), we found that reward loss was significantly lower than in the K_acc trained network (Bayesian paired t-test P_MCMC < 0.001; Fig. 6d, bottom), reaching levels matching the network trained from scratch in context K_rew (Bayesian paired t-test P_MCMC = 0.86; Fig. 6d, bottom). Crucially, just as in the human experiments, the degree of discrimination accuracy was calibrated to be identical in all cases, and thus our results do not depend on different levels of accuracy across the ANNs (Fig. 6d, top; Bayesian paired t-test for all pairwise comparisons P_MCMC > 0.51). Moreover, we found that this result held independent of the degree of complexity (that is, size) of the downstream network, indicating that downstream circuits cannot compensate for the lack of fitness-maximizing codes at the encoding stage.

The findings from our ANN analysis clarify our human behaviour results. Our ANN analyses reveal how a fixed set of physical sensory inputs with relevant but hidden environmental/contextual statistics that the agent can only experience and learn over time are represented in coding schemes to maximize fitness. Moreover, studying ANNs with a VIB is useful because it provides a reasonably realistic model of how encoding schemes are adapted to optimize a given objective function when the resources to process information are limited.

Generalizing to other ecologically valid reward functions

The solutions to the two decision-making objectives studied here belong to the same family of power-law efficient codes with a single parameter that determines the solution of the decision objective. However, we acknowledge that when the system must deal with more complex sensory–reward mappings, the same analytical solutions might not generally apply. This would have to be tested case by case. Nevertheless, it is possible to go beyond the analytical solutions and employ the same framework to find general strategies of resource allocation for arbitrary stimulus–reward mapping functions. We address this possibility next.

Non-monotonic stimulus–reward functions

As has been emphasized in previous work, non-monotonic payoff functions are common^10,12,50. For instance, suppose that a physical attribute is related to the degree of salinity of food. Too little or too much salt can have deleterious consequences on an organism and its fitness. In such scenarios, it has been suggested that perception should not be tuned directly to the stimulus–reward associations, as the organisms will be able to know only how good or bad the payoff is⁵⁰. However, knowing that the payoff is bad provides no information about why it is bad and hence no clue to the adaptive course of action for an organism⁵⁰.

How should the neural resources be allocated in such cases? What strategy should the agent follow? We consider the following three scenarios. Scenario 1 corresponds to the accuracy maximization task (K_acc). Scenario 2 corresponds to a reward-maximizing task where rewards are linearly (monotonically) mapped to the physical stimulus values (K_rew, Fig. 7a). Scenario 3 corresponds to a non-monotonic mapping where stimuli in the middle of the sensory space deliver the highest reward values (Fig. 7b). In all three scenarios, we assume a right-skewed distribution of sensory stimuli over the physical value space (Fig. 7a,b) for comparison with the orientation experiments we conducted in humans. For scenario 3, there is no known closed-form solution to find the optimal allocation of resources, but please note that the minimization objective remains the same as for scenario 2: minimize the reward given up for every erroneous decision.

**Fig. 7: Studying efficient allocation of neural resources with non-monotonic stimulus–reward mappings.**

Here we emphasize that a key assumption of our framework is that, on the basis of experience or understanding of task instructions, the agent clearly understands which stimuli deliver more reward. For scenarios 1 and 2, higher levels of the stimuli are preferred. In scenario 3, however, stimuli in the middle are preferred. We can examine the downstream decoding process to understand how resources at encoding should be allocated in each scenario. Recall that the decoding rule in our model is the same in all cases: the Bayesian mean squared error. What is a possible strategy for the cases where the stimulus–reward mappings are monotonic or non-monotonic unimodal? A relatively simple strategy that preserves the ‘veridicality’ of sensory information is one in which the agent employs a categorization threshold τ over the space of physical stimuli and decodes the values $\hat{s}$ relative to that threshold. A simple implementation is one where the agent computes a relative decoded value $\tilde{s}=-| \tau -\hat{s}({s}_{0})|$, an operation that could be flexibly implemented in downstream circuits. The choice rule is then choose s₁ if ${\tilde{s}}_{1} > {\tilde{s}}_{2}$; otherwise, choose s₂. Thus, in addition to finding the optimal resource allocation function, the threshold τ is another latent variable to solve the reward maximization problem.

Before solving the optimization problem numerically, we note that the predictions for τ are relatively intuitive. In scenarios 1 and 2, τ should be set to the maximum stimulus value in the physical space, and the optimal resource allocation solutions remain the same as derived in our manuscript. In scenario 3, with reward values peaking in the middle of the stimulus distribution, the threshold will probably be located at τ ≈ 0.5 in our example in the encoding low-noise limit (not precisely at 0.5 due to biases and variance of $\hat{s}$ and the related influence of the prior distribution of physical stimuli).

The numerical solutions of resource allocation for scenarios 1 and 2 resemble, as expected, the analytical solutions in which more resources are allocated to regions of the physical space with the highest physical prior density. The amount of information is larger for lower sensory values in scenario 1 but larger for higher sensory values in scenario 2. For scenario 3, the solution for the categorization threshold is τ ≈ 0.5, and the resource allocation solution may be surprising and perhaps at first counterintuitive (Fig. 7c). Taking a closer look at the problem, we see that the solution indeed makes sense. First, we observe that the allocation of resources has a general trend to decrease as the sensory stimulus gets larger, thus following the expected result given the shape of the prior distribution of sensory stimuli. Second, the resource allocation solution has a dip at around s = 0.5 ≈ τ. This may appear initially counterintuitive given that these are the regions where the reward is the highest. However, note that (1) randomly drawing choice sets from this non-monotonic prior distribution is more likely to generate choice sets that are close in value than in the monotonic reward or accuracy scenarios, and (2) choice sets s_1,2 with values close to s = 0.5 are more likely to generate ‘mistakes’ given the resource allocation in Fig. 7c, but there is often little reward loss because the value function is relatively flat and symmetric (for example, s₁ = 0.52 and s₂ = 0.48 deliver the same reward). It is thus not worth investing too many resources near s = 0.5 even if the reward promised at those locations is high, because the potential for reward loss is low (Fig. 7d). We emphasize that this example is just one alternative strategy, but one that generates interesting predictions that could be tested in future experiments.

Efficient resource allocation under reaction time costs

We used simulations to study the scenario in which agents are rewarded/penalized for short/long reaction times (RTs) in both the K_acc and K_rew contexts. The goal was to study whether and how resource allocation changed relative to the accuracy maximization task without RT costs. Examining this scenario requires assumptions about a process model that jointly generates decisions and RTs. For simplicity and illustration purposes, we assumed that decisions and RTs were generated by a simple drift-diffusion model (DDM) with a constant decision bound b, decision evidence z and diffusion noise σ that was independent of the choice set inputs, which can be thought of as a downstream decision noise (Methods).

In this scenario, the loss function for the K_acc context is given by

$${\iint \!\!\! }_{S}\,f({s}_{1},{s}_{2})(P({{{\rm{error}}}}| {s}_{1},{s}_{2})+\eta {{E}}[{\mathrm{RT}}| {s}_{1},{s}_{2}]),$$

(5)

and the loss function for the K_rew context is given by

$${\iint \!\!\! }_{S}\,f({s}_{1},{s}_{2})\,(P({{{\rm{error}}}}| {s}_{1},{s}_{2})| {s}_{1}-{s}_{2}| +\eta {{E}}[{\mathrm{RT}}| {s}_{1},{s}_{2}]),$$

(6)

where η is the cost per RT unit (for example, in seconds). Note that as η → 0, the optimal decision bound would be z → ∞. Thus, the goal was to find the optimal balance between resource allocation and bound z that minimizes the loss in equations (5) and (6) for a given RT cost η and prior distribution of sensory stimuli in the environment f(s).

The numerical solutions revealed that the resource allocation solutions in context K_acc differed from the RT-cost-free scenario and depended on RT costs (Fig. 8a). While the RT-cost solutions were similar to the RT-cost-free solution for relatively high values of η, the smaller the RT costs, the more the allocation of resources tended to flatten. As expected, we found that higher η resulted in lower b (Fig. 8b). Second, in context K_rew, the RT-cost solutions were remarkably similar to the RT-cost-free solution. However, contrary to the K_acc environment, the RT-cost solutions in K_rew appeared to get steeper as the RT cost decreased (at least in the range of RT costs studied here; Fig. 8c). Once again, in context K_rew, higher η resulted in lower b (Fig. 8d).

**Fig. 8: Efficient resource allocation solutions considering RT costs.**

We emphasize that the results presented here are based on a simple DDM with constant bounds. The resource allocation solutions may slightly differ for DDMs where the bounds are allowed to collapse or drift/diffusion parameters dynamically change over time^51,52,53. This will be an interesting aspect to investigate in future research. Irrespective of these considerations, we show how the general framework developed here generates a rich set of testable predictions that allow for falsification and further refinement of the theory.

Discussion

Our theoretical and empirical results provide evidence that early stages of sensory processing encode environmental stimuli to maximize fitness and not necessarily to maximize perceptual accuracy. We have shown this to be the case in humans and artificial agents with sensory processing bottlenecks. Our findings indicate that downstream circuits do not need to continuously compute reward distributions on the basis of stimulus–outcome associations because this information should be efficiently embedded in the neural codes of sensory perception. This notion is supported by recent studies showing that functional remapping of stimulus–reward contingencies in early sensory areas causally depends on top-down control signals from prefrontal structures^17,18,54. We argue that this gives the organism the advantages of preventing information loss and rapidly transmitting behaviourally relevant information encoded by early sensory systems to downstream circuits specialized in action, learning or decision-making.

Efficient sensory adaptation to behavioural goals can arise without long-lasting synaptic changes or rewiring. Specific fitness-maximizing codes may have a structural basis if the environment and behavioural goals are stable over long periods, as may be the case for retinal contrast coding in the blowfly. However, efficient filtering of sensory information can rapidly occur via mechanisms of top-down contextual modulation of sensory processing, which can be achieved via mechanisms such as top-down attentional normalization⁵⁵. In fact, it has been shown that adaptation to behaviourally relevant sensory statistics (such as edge orientations) can occur in the course of one hour in human participants⁵⁶. Our key argument is that irrespective of whether efficient coding occurs via structural, synaptic or online top-down contextual modulations, it must occur at early stages if it is to be relevant for goal-directed behaviour. Information theory predicts that inefficient coding in regard to behavioural goals will cause a loss of relevant information that cannot be recovered in noisy transmission channels such as the brain. Our experiments with ANNs provide direct empirical evidence for this prediction by showing that restricting the initial encoding scheme to one that maximizes information causes suboptimal performance in specific contexts. Overall, a key contribution of our work is that we provide a formal justification of why and how neural recoding should occur across contexts in capacity-constrained and noisy transmission systems to maximize reward and fitness.

We found additional supporting evidence for this hypothesis when re-analysing data from a recent human functional MRI (fMRI) study⁵⁷. Specifically, we investigated whether novel goal-directed actions that promote people’s ‘survival’ in hypothetical scenarios they had never before encountered triggered an efficient reorganization of perceptual codes in the human brain (Supplementary Note 2). Our analyses revealed that switching back and forth between survival goals that required participants to use the same items in very different ways led the brain to efficiently represent sensory information in a goal-specific manner. More specifically, novel behavioural goals that relied on object recognition caused changes in stimulus representations at early stages of sensory processing. Regions showing changes in stimulus representation codes included V1–V3 as well as downstream object detection areas such as the lateral occipital cortex (LOC) (Supplementary Note 2). We note that these results do not explicitly support the quantitative theory developed here but instead provide support for the general idea that a system should employ resources in its early sensory areas to represent abstract behavioural goals. In addition, these results do not imply that V1–V3 and LOC are discarding veridical feature information and instead represent only goal-oriented values. Although veridicality might be compromised (resources are finite), strategies might be implemented to ensure that it is not entirely suppressed (for example, disentangling via orthogonalization^19,20).

Our study has some limitations and also generates interesting predictions that should be addressed in future research. First, the analytical solutions are restricted to accuracy maximization in discrimination tasks and reward maximization in the standard and most studied economic problem where properties of a good or action scale linearly with value. We acknowledge that when the system must deal with more complex sensory–reward mappings, the analytical solution to the resource allocation problem may not exist in a tractable form. Nevertheless, we provided some hints as to how the system can adapt to non-monotonic solutions with the use of categorization thresholds. Second, we acknowledge that our theory does not explain the dynamics of adaptation but generates predictions once the system has adapted after learning from experience. It thus remains unclear what the normative algorithms of efficient adaptation might be and how these could be connected with a biologically plausible algorithm that applies to arbitrary stimulus–reward association contexts such that reward expectation is maximized. Third, for the problems of accuracy and reward maximization with linear sensory–reward mappings, our model predicts that in edge cases where the prior distribution is approximately flat, the optimal solutions are indistinguishable and the agent should allocate resources equally across the whole sensory space in both cases. Fourth, with regard to the previous point, an additional prediction appears worthy of future testing: if the prior density is low for low sensory values as well as for high sensory values, and there is a linear stimulus–reward mapping across the whole sensory space, our model predicts that, relative to the standard accuracy maximization task, sensitivity should also increase for low sensory values. Our model predicts that this effect should become more pronounced during a reward maximization task than during a standard discrimination task (a hint of this prediction can be found in Supplementary Fig. 1).

Beyond the obvious relevance for biological organisms, our results may have important implications in ongoing developments in artificial intelligence as well. Recent deep generative models show a remarkable ability to encode high-dimensional signals into latent factors under the objective of accurately predicting the local environment with specific encoding constraints. However, on the basis of our results, such an optimization objective will not necessarily match those present in biological organisms. Interestingly, a recent successful artificial intelligence model⁵⁸ proposed instead that representation formation should be driven by the need to predict the motivational value of experiences accurately. Our results validate this notion and imply that the development of artificial intelligence algorithms that aim to resemble neurobehavioural functions should go beyond the objective of maximizing only the accurate transmission of information and account for the motivational aspects of the environment that enable the organism (or the artificial agent) to maximize fitness.

Finally, although drawn from a different domain of behaviour, our results lend substantial support to economic theories positing that context-dependent utility functions should maximize expected reward rather than the expected accuracy of decisions guided by reward^3,39,59. The corroborating evidence presented in our work grounded on the principles of neural coding and decision behaviour should help advance the development and refinement of these theories within economics and related disciplines of evolutionary biology and social sciences^12,60,61.

Methods

Participants

The participants were recruited by the Center for Neuroeconomics at the University of Zurich, Switzerland. The participants were instructed about all aspects of the experiment and gave written informed consent. None of the participants suffered from any neurological or psychological disorder or took medication that interfered with participation in our study. The participants received fixed monetary compensation for their participation in the experiment, in addition to a variable monetary payoff that depended on task performance (see below). The experiments conformed to the Declaration of Helsinki, and the experimental protocol was approved by the Ethics Committee of the Canton of Zurich.

Participants who failed to follow the eye fixation instructions on more than 25% of trials were excluded from the data analysis (n = 12). We measured the performance of the participants in the training tasks and excluded participants who were unable to perform the task at the easiest difficulty level (n = 11). Additionally, we had to exclude three participants due to technical problems with the data collection. The final sample thus comprised n = 86 participants (n = 25 in experiment 1 and n = 61 in experiment 2 (30 in K_rew)).

Experimental design and stimuli

The stimuli were generated with MATLAB (version 9.7)⁶², using the Psychtoolbox and displayed on a screen that was one metre away from the participants. The angle of the head was kept stable with a chin rest. The height of the chin rest was adjusted to position the centre of the screen at the height of the eyes. As stimuli, we used oriented Gabor patches, presented on a grey background. Each patch was composed of a high-contrast three-cycles-per-degree sinusoidal grating convoluted with a circular Gaussian with width 0.41° and subtended 2.98° vertically and 2.98° horizontally. In experiment 1, all Gabor patches were presented so that the centres fell 5.7° to the left or right of the centre of the monitor and on the horizontal midline. In experiment 2, the Gabor centres fell 4.7° to the left and right of the vertical midline and 4.7° above or below the horizontal midline.

Eye tracking

Eye-tracking data were acquired using an ST Research Eyelink 1000 eye-tracking system. Gaze position was sampled at 500 Hz. Eye movements away from fixation were computed for the window corresponding to the stimulus presentation. For every saved position, the absolute distance to the fixation cross was computed. If the absolute distance exceeded 4° of visual angle, the trial was marked to include an eye movement. For most participants, the average number of trials with eye movements was less than 5%. Participants (n = 12) who made eye movements that exceeded 4° of visual angle on more than 25% of trials were excluded from all analyses.

Experiment 1

The participants performed the experiment in multiple sessions to allow for training within the two contexts on different days. The order of the accuracy (K_acc) and reward (K_rew) context training was counterbalanced across participants. In total, every participant completed 240 trials in the estimation task and 400 trials in the decision task.

Experiment 2

In experiment 2, each participant trained in only one stimulus–reward association context (either K_acc or K_rew). Training in the binary judgement decision task was performed either in the two upper locations or in the two lower locations. The participants were randomly allocated to one of the two training locations. In the estimation tasks before and after the training task, the trial locations were evenly distributed between all four possible locations. In total, every participant completed 400 trials in the estimation task and 360 trials in the decision task.

Orientation estimation task

Before the start of every trial, the participants had to fixate on a cross in the middle of the screen. At the beginning of the trial, an arrow appeared for 0.5 seconds to indicate on which side the stimulus would be shown. Afterwards, the stimulus appeared on the indicated side for 0.6 seconds. The orientation of the stimulus was determined randomly within (0–179°). During stimulus presentation, the participant had to continue fixating on the cross. After the stimulus disappeared, a Gabor patch appeared in the middle of the screen. By pressing and holding the left mouse button, the participant then rotated the new Gabor patch until its perceived orientation matched the orientation of the previously observed target stimulus. The participant could end the trial by pressing the space key. After five seconds, the trial ended automatically. The trials were separated by a random intertrial interval of 1.5–2 seconds. The estimation task took place before and after the decision task (see below and Fig. 2). To avoid the possibility that participants developed contextual strategies, they were not informed in advance that a second estimation task was going to take place after the decision task.

Decision task

The fixation cross turned black to indicate the start of a trial. After 0.5 seconds, two Gabor patch stimuli appeared. The orientation of one of the stimuli was drawn from the approximate distribution of edges in the real world⁴². The orientation of the second stimulus was adjusted by a participant-specific difficulty score to keep performance at approximately 75% accuracy for all participants. The median accuracy across participants in K_rew was 77 ± 2.9% and in K_acc was 77 ± 2.8%. Additionally, on the basis of (1) calibration to 75% accuracy, (2) the linear mapping between the degree of diagonality and reward (that is, from 1 Swiss franc (CHF) for 0° to CHF 46 for 45° in the diagonality space), and (3) pilot data, we adjusted the payoff of correct trials in K_acc to match the expected payoff in K_rew. We calculated that setting the payoff for each correct response in K_acc to 15 CHF would fulfil these conditions. Our experimental data were in line with these calculations: the median payoff in K_acc was 15.00 ± 0 CHF, and in K_rew it was 14.70 ± 0.62 CHF.

On average, the stimulus orientation followed a prior distribution f(s) described by equation (7) and shown in Fig. 3a:

$$f(s)=\frac{1}{1.85-\cos (4s)}.$$

(7)

The stimuli were displayed for 0.6 seconds. During stimulus presentation, the participants had to fixate on the cross in the middle of the screen. When the stimuli disappeared, the participants had 2.5 seconds to decide which stimulus was more oblique. Independent of the RT, the full 2.5 seconds had to be waited out. Afterwards, the two stimuli were shown again in their positions, and the result of the choice and the orientations of the stimuli were displayed for 3 seconds until the trial ended. The trials were separated by a 1.5-to-2-second intertrial interval.

Blowfly retinal LMC experiment

Here we provide a brief description of the data collected in Laughlin’s seminal work³⁶, which we re-analyse in this work. To derive the prior for the sensory stimulus of interest f(s), the researcher measured the distribution of contrasts that occur in woodland settings of the blowfly environment. In brief, photographs were taken in the natural habitat of the blowfly such as sclerophyll woodland and lakeside lands. Relative intensities were measured across these scenes using a detector that scanned horizontally, like the ommatidium of a turning fly. The scans were digitized at intervals of 0.07° and convolved with a Gaussian point spread function of half-width 1.4°, corresponding to the angular sensitivity of a fly photoreceptor. Contrast values were obtained by dividing each scan into intervals of 10, 25 or 50°. Within each interval, the mean intensity ($\bar{I}$) was found and subtracted from every data point to give the fluctuation about the mean (ΔI). This difference value was divided by the mean to give the contrast ($\Delta I/\bar{I}$).

These data were used to construct a histogram, which was later transformed to a CDF (Fig. 1a and Supplementary Fig. 1). Here we used this CDF to reconstruct the probability density function f(s) (Supplementary Fig. 1). Once the prior distribution was obtained, the fly was placed in front of a screen with a light-emitting diode (LED). At the beginning of each trial, the LED luminance was set to the screen luminance and then changed to a new luminance drawn from the prior distribution f(s) for 100 ms. The stimulus s was defined as the proportional change of the difference between the background and LED luminances. We emphasize that the CDF of the contrast statistic comes directly from the contrast measurement methodology described in the preceding paragraph and reported by Laughlin. We thus did not make the original calculations for the prior f(s), nor is it influenced by the fitness-maximizing sensory coding theory.

Fitness-maximizing neural codes

In this section, we provide a detailed description of the connection between the L_p reconstruction error, the efficient code that maximizes reward expectation and the power-law efficient codes briefly described in the main text.

Suppose that the stimulus distribution is given by s ~ f(s). The function that transforms the input s to neural responses r is given by r = h(s). While the mapping h(s) is deterministic, here we assume that errors in the neural response r follow a distribution P[r|h(s)]. We apply a general approach that considers optimality criteria accounting for how well stimulus s can be reconstructed ($\hat{s}$) from the neural representations r. Wang and colleagues introduced a general formulation of the efficient coding problem in terms of minimizing the error in such reconstructions $\hat{s(r)}$ according to the L_p norm as a function of the norm parameter p (ref. ⁶³). In brief, the reconstruction is assumed to be based on the maximum likelihood estimate of the decoder in the low-noise regime, where P[r|h(s)] is assumed to be Gaussian distributed.

The goal is to find the optimal mapping function h*(s) to achieve a minimal L_p reconstruction error for any given prior stimulus distribution f(s). More formally, the problem is defined as: find h*(s) such that

$$\min {\left\langle {\left\vert \hat{s}(r)-s\right\vert }^{p}\right\rangle }_{s,r}\quad \,{{\mbox{s.t.}}}\,\,0\le h(s)\le 1,$$

(8)

where, without loss of generality, we assume that the operation range of the neuron is bounded between 0 and 1. It is possible to show that the optimal mapping h*(s) is given by equation (9)⁶³:

$${h}^{* }(s)=\frac{\int\nolimits_{-\infty }^{s}f{\left(\tilde{s}\right)}^{1/(1+p)}{\mathrm{d}}\tilde{s}}{\int\nolimits_{-\infty }^{\infty }f{\left(\tilde{s}\right)}^{1/(1+p)}{\mathrm{d}}\tilde{s}}.$$

(9)

If we define

$$\gamma \equiv 1/(1+p),$$

(10)

we observe that the normalized power function of the stimulus distribution f in equation (9) is the escort distribution with parameter γ (ref. ⁶⁴). Note that under this framework, infomax coding is given by the norm parameter p → 0, and therefore γ = 1, thus leading to the result that h(s) is the CDF of the prior distribution.

Efficient L _p error-minimizing codes and behavioural goals

Economics has a long tradition of studying the following problem: for a given distribution f(s) in the environment, what is the optimal shape of the internal representation (that is, h(s), which in economics is known as the utility function) if such function can only take a large but limited set of n discrete subjective values (that is, the internal readings, r) that code for any given stimulus s (refs. ^3,39)? The utility function is thus restricted to a set of step functions with n jumps, each corresponding to a utility increment of size 1/n. In this case, discrimination errors originate from the fact that the organism cannot distinguish two alternatives located at the same step of the utility function. Under this formulation, the following variant of the problem was studied: find the optimal utility function (h*) under two evolutionary optimization criteria, (1) the probability of mistakes minimization criterion and (2) the expected reward loss minimization criterion.

To solve this problem, we assume that the organism repeatedly makes choices between two alternatives drawn from the stimulus distribution f(s), where we may suppose that stimuli are linearly mapped to a reward value. The organism is endowed with a utility function that assigns a level of reward to each possible stimulus s from f(s). The alternative that promises more utility to the organism is chosen³⁹.

If the goal of the organism is to minimize the number of erroneous responses (that is, maximize discrimination accuracy), the optimal utility function ${h}_{{{{\rm{accuracy}}}}}^{* }$ is given by

$${h}_{{{{\rm{accuracy}}}}}^{* }(s)=\int\nolimits_{-\infty }^{s}f(\tilde{s}){\mathrm{d}}\tilde{s}.$$

(11)

According to this solution, the power parameter of the escort distribution in equation (9) is given by γ = 1, which corresponds to the infomax strategy.

However, if the goal of the organism is to minimize the expected reward loss (that is, maximize the amount of reward received after many decisions) and stimuli are linearly mapped to reward value, the optimal utility function ${h}_{{{{\rm{reward}}}}}^{* }$ is given by

$${h}_{{{{\rm{reward}}}}}^{* }(s)=\frac{\int\nolimits_{-\infty }^{s}f{\left(\tilde{s}\right)}^{2/3}{\mathrm{d}}\tilde{s}}{\int\nolimits_{-\infty }^{\infty }f{\left(\tilde{s}\right)}^{2/3}{\mathrm{d}}\tilde{s}}.$$

(12)

According to this solution, the power parameter of the escort distribution in equation (9) is given by γ = 2/3, which corresponds to optimizing the L_p minimization problem with parameter p given by

$$\gamma =2/3=\frac{1}{1+p}\quad \Rightarrow \quad p=0.5.$$

(13)

We found that this normative fitness-maximizing solution is the error penalty that best describes the LMC data⁴⁰ (these results are reported in the main text and Fig. 1). Additionally, please note that the solutions provided in equations (11) and (12) are derived on the basis of maximizing the accurate choices and reward expectation, respectively, without any assumptions about maximizing information efficiency as a goal in itself.

Connection to power-law efficient codes

We employed a general method for defining efficient codes by investigating optimal allocation of Fisher information J given (1) a bound of the organism’s capacity c to process information, (2) the frequency of occurrence f(s) and (3) the organism’s goal (for example, maximize perceptual accuracy or expected reward) according to

$$\mathop{{\mathrm{arg}}\,{\mathrm{max}}}\limits_{J(s)}-\int\,{\mathrm{d}}s\,f(s)J{\left(s\right)}^{-\alpha }$$

(14)

subject to a capacity bound

$$C(s)=\int\,{\mathrm{d}}s\,J{\left(s\right)}^{\beta }\le c,$$

(15)

with parameters α defining the coding objective and β > 0 specifying the capacity constraint⁴³. The solution of this optimization problem reveals that Fisher information should be proportional to the prior distribution f(s) raised to a power q, which is therefore referred to as the power-law efficient code

$${J}_{{{{\rm{opt}}}}}(s)={c}^{1/\beta }{\left(\frac{f{\left(s\right)}^{\gamma }}{\int\,{\mathrm{d}}sf{\left(s\right)}^{\gamma }}\right)}^{1/\beta }\triangleq kf{\left(s\right)}^{q},$$

(16)

where q = 1/(β + α) and γ = β/(β + α). Note that power-law parameter q is multiply determined, and to make progress in identifying it, we need to make some further assumptions. Here we opted for setting β = 0.5, as previously proposed in the standard infomax framework⁴¹; however, our conclusions are not affected by the specific value of β. This means that α determines how Fisher information is allocated relative to the prior, influencing the values of both q and γ. It can be shown that the infomax coding rule implies γ = 1 and therefore an efficient power-law code q = 2, and the reward expectation rule implies γ = 2/3 and therefore an efficient power-law code q = 4/3 (Supplementary Note 1). The power-law efficient codes thus allow us to establish a connection between behavioural goals in the contexts studied in this work (K_acc and K_rew) and parameter γ, which incorporates the goals of the organism under the resource-constrained framework that we study here.

Optimal inference

When specifying an inference problem using such an encoding–decoding framework, a key aspect for generating predictions of decision behaviour is to obtain expressions of the expected value and variance of the noisy estimations $\hat{s}$ for a given value input s₀. However, we first need to specify the encoding and decoding rules. We adopted an encoding function P(r|s) associated with the power-law efficient code that is parameterized as Gaussian⁴³

$$\begin{array}{rcl}P(r| s)&=&{{{\mathcal{N}}}}\left(s,\frac{1}{kf{\left(s\right)}^{q}}\right)\\ &=&\sqrt{\frac{kf{\left(s\right)}^{q}}{2\uppi }}\exp \left(-\frac{kf{\left(s\right)}^{q}}{2}{\left(r-s\right)}^{2}\right),\end{array}$$

(17)

and therefore Fisher information is allocated using an s-dependent variance ${\sigma }^{2}=1/kf{\left(s\right)}^{q}$. While we are aware that in our study the stimulus space is circular, given that discriminability thresholds are relatively low for orientation discrimination tasks in humans, it is safe to assume that the likelihood function can be locally approximated as a Gaussian distribution.

At the decoding stage, the observer computes the posterior using Bayes’s rule:

$$P(s| r)=\frac{P(r| s)f(s)}{P(r)}.$$

(18)

Theoretical and empirical evidence suggests that for orientation estimation tasks, estimates are typically biased away from the prior. This suggests that humans employ an expected value estimator of the posterior, at least for the infomax case⁴¹.

The expected value of the estimator can be defined as the input stimulus s₀ plus some average bias b(s₀). Using analytical approximations under the high-signal-to-noise regime, it is possible to show that the bias for the posterior expected value estimator can be approximated by⁶⁵

$$b\left({s}_{0}\right)\approx \left(1-\frac{1}{q}\right)\frac{1}{k}{\left(\frac{1}{f{\left(s\right)}^{2}}\right)}_{{s}_{0}}^{{\prime} }.$$

(19)

In a previous study, using model simulations and exploring parsimonious functional forms, it was shown that the proportionality constant of the bias term can be approximated by⁴³

$$\frac{\log (q)}{k\sqrt{q}}.$$

(20)

The analytical solution and the simulation-based solution of the proportionality constant are approximately equivalent for a range of q values relevant to our work (for example, q ∈ [0.5, 2]); that is

$$\frac{\log (q)}{k\sqrt{q}}\approxeq \left(1-\frac{1}{q}\right)\frac{1}{k},$$

(21)

thus validating the results derived in the analytical approximations that we used in the current work. However, using either function does not affect the qualitative or quantitative results in our study.

Using this result, the expected value of the estimators is given by

$$E[\hat{s}| {s}_{0}]\approx {s}_{0}+\left(1-\frac{1}{q}\right)\frac{1}{k}{\left(\frac{1}{f{\left(s\right)}^{q}}\right)}_{{s}_{0}}^{{\prime} }.$$

(22)

As already defined in the description of the behavioural task, in this study, we used a parametric form of the prior that closely resembles the shape of the natural distribution of orientations in the environment⁴²

$$f(s)=\omega \times \frac{1}{a-\cos (4s)},$$

(23)

with a > 1 determining the elevation (steepness) of the prior, and ω a normalizing constant. Using this parameterization of the prior, we can obtain an explicit analytical approximation of the bias:

$$\begin{array}{rcl}b({s}_{0})&\approx &\left(1-\frac{1}{q}\right)\frac{1}{k}\frac{\partial }{\partial s}{\left({\left(\frac{\omega }{a-\cos (4s)}\right)}^{-q}\right)}_{{s}_{0}}\\ &\approx &\left(1-\frac{1}{q}\right)\frac{1}{k}{\left(\frac{4q\sin (4s){\left(\frac{\omega }{a-\cos (4s)}\right)}^{1-q}}{\omega }\right)}_{{s}_{0}}.\end{array}$$

(24)

We can also obtain an analytical approximation of the variance under the high-signal-to-noise regime using the Cramer–Rao bound formulation:

$$\begin{array}{rcl}\,{{\mbox{Var}}}\,[\hat{s}| {s}_{0}]&\propto &{\left(\frac{1}{J(s)}\right)}_{{s}_{0}}\\ &\approx &\frac{1}{k}{\left(\frac{1}{f{\left(s\right)}^{q}}\right)}_{{s}_{0}}\\ &\approx &\frac{1}{k}{\left(\frac{(a-\cos (4s))}{\omega }\right)}_{{s}_{0}}^{q}.\end{array}$$

(25)

We can thus use equations (24) and (25) to derive the predictions presented in Fig. 3.

Finally, assuming that the estimators are normally distributed using the expected value and variance derived above, the probability that an agent chooses an alternative with orientation value s₁ over a second alternative with orientation value s₂ (recall that in our experiment the decision rule (objective) of the participants is to choose the orientation perceived as closer to the diagonal orientation) is given by

$$P\left({\hat{s}}_{1} > {\hat{s}}_{2}| {s}_{1},{s}_{2}\right)=\varPhi \left(\frac{{{{\rm{E}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]-{{{\rm{E}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]}{\sqrt{{{{\rm{Var}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]+{{{\rm{Var}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]}}\right),$$

(26)

where Φ() is the CDF of the normal distribution. When fitting the choice data to the model, we accounted for potential side (left/right) biases β₀ and lapse rates λ in the decision task using

$$P\left({\hat{s}}_{1} > {\hat{s}}_{2}| {s}_{1},{s}_{2}\right)=\frac{\lambda }{2}+\varPhi \left(\frac{{{{\rm{E}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]-{{{\rm{E}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]}{\sqrt{{{{\rm{Var}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]+{{{\rm{Var}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]}}+{\beta }_{0}\right)(1-\lambda ).$$

(27)

Fitting the power-law efficient model to human data

To fit the power-law efficient coding model to the choice data from the decision task, we used a hierarchical Bayesian model. We fit the early (1–200) and late (>200) training trials in each reward context separately. Posterior inference of the parameters in the hierarchical models was performed via the Gibbs sampler using the Markov chain Monte Carlo technique implemented in JAGS⁶⁶, assuming flat priors for both the mean and the noise of the estimates. For each model, we drew a total of 20,000 burn-in samples and subsequently took 5,000 new samples from three independent chains. We applied a thinning of 5 to this final sample, thus resulting in a final set of 3,000 samples for each parameter. We conducted Gelman–Rubin tests for each parameter to assess convergence of the chains. All latent variables in our Bayesian models had $\hat{R} < 1.05$, which suggests that all three chains converged to a target posterior distribution. We checked convergence of the group-level parameter estimates via visual inspection.

Behavioural and statistical analyses

In the estimation task, the observers’ behavioural error on a given trial was computed as the difference between the reported orientation and the presented orientation. The direction of the error was defined as positive if the reported orientation was more oblique than the presented orientation, or negative if vice versa. If the error on any given trial was bigger than 25% of the maximum possible error (90 degrees), we discarded that trial. To make full use of the data, we pooled all participants from both experiments for the analysis of the impact of the reward training context. Comparisons between trained and untrained locations used only the data from the location-specific training in experiment 2.

We computed the average bias and variance in five bins of 9° before and after the training phases. Next, we computed the average change in the variance in each bin for each participant. We used the changes in variance within the most cardinal and most oblique bins to test for the predicted interactions between diagonality and training type (K_acc or K_rew) or location (trained or untrained) using Bayesian hierarchical linear regressions implemented with the brms package (version 2.13.5)⁶⁷ in the statistical computing software R (version 3.6.3)⁶⁸. For each model, we used four chains with 2,000 samples per chain after burn-in. The P_MCMC values reported for these regressions represent one minus the probability of the reported effect being greater (less) than zero given the posterior distributions of the fitted model parameters.

We also compared the performance of participants in the binary judgement decision task using Bayesian hierarchical regressions implemented with brms in R. In this task, the participants had to decide which of two stimuli were more diagonal (closer to 45 degrees). We compared the accuracy of these decisions as a function of diagonality, training phase (early or late) and training type (K_acc or K_rew). We used four chains with 1,000 samples per chain after burn-in for a total of 4,000 posterior samples for each regression parameter. The P_MCMC values were computed in the same fashion as described above for the estimation task.

ANNs

Suppose that we have a dataset of x samples from a distribution of images represented by the retina where each image indicates an angular orientation s with an angular prior distribution p(s). Note that a key feature of our analyses is that knowledge about this angular prior is not explicitly given to the neural network; this prior is embedded in the statistics of image occurrences over space and time. Also note that there might be different images x_s that can be mapped to the same angle s₀ (for example, a Gabor patch with identical angle but different angular phases). Each stimulus is encoded by a set of latent codes (or a latent neural distributional code) z with a prior distribution p(z). This prior distribution results in a posterior distribution p(z|x) after observing image x. The neural coding system should thus learn a good representation of the environment (the distribution of physical sensory inputs) that might also need to be optimized for a particular downstream task (for example, maximize the reward consumption resulting from decision y). More specifically, we propose a VIB-like objective function (equation (4) in main text). In our ANN, the VIB-like objective trades (an approximation of) the amount of ‘visual’ information I that the encoder can process with the expected reward loss, via the regularization parameter β. Higher values of β thus introduce extra pressures in the network to encode information about the input image that can yield the most significant improvement in the downstream objective function. The neural network received two retinal images corresponding to screen locations where the two Gabor patches were presented in our task. We note that when training the ANN, the parameters of the encoder ϕ are shared for both retinal locations where the stimuli x_1,2 are presented. The decision rule that the neural network has to learn is to indicate which of the two input stimuli (left or right) is more diagonal, while maximizing the reward received across many trials. Also like in the human experiments, we trained networks in two contexts: K_acc and K_rew.For all VIB-like objectives studied here, we define the regularized ‘information transmission’ I as

$$\begin{array}{rcl}I&\equiv &{{\mathbb{E}}}_{{{{\bf{X}}}}}\left[{D}_{{\mathrm{KL}}}\left({p}_{\phi }({{{{\bf{z}}}}}_{1}| {{{{\bf{x}}}}}_{1})\parallel p({{{{\bf{z}}}}}_{1})\right)\right.\\ &&\left.+{D}_{{\mathrm{KL}}}\left({p}_{\phi }({{{{\bf{z}}}}}_{2}| {{{{\bf{x}}}}}_{2})\parallel p({{{{\bf{z}}}}}_{2})\right)\right],\end{array}$$

(28)

where D_KL is the Kullback–Leibler divergence. In context K_acc, the reward loss in the VIB-like objective is defined as

$$\begin{array}{rcl}E[\,{{\mbox{reward loss}}}\,]&\equiv &{{\mathbb{E}}}_{{p}_{\phi }({{{\bf{z}}}}| {{{\bf{x}}}})}\left[y({{{{\bf{x}}}}}_{1},{{{{\bf{x}}}}}_{2})(1-{p}_{\theta }(y=1| {{{{\bf{z}}}}}_{1},{{{{\bf{z}}}}}_{2}))\right.\\ &&\left.+(1-y({{{{\bf{x}}}}}_{1},{{{{\bf{x}}}}}_{2})){p}_{\theta }(y=1| {{{{\bf{z}}}}}_{1},{{{{\bf{z}}}}}_{2})\right],\end{array}$$

(29)

with y = 1 when the correct response is given by stimulus input x₁, and y = 0 otherwise. p(y = 1|z₁, z₂) is the probability that the network chooses x₁ given the encoding vectors z_1,2.

In context K_rew, the reward loss in the VIB-like objective is defined as

$$\begin{array}{rcl}E[\,{{\mbox{reward loss}}}\,]&\equiv &{{\mathbb{E}}}_{{p}_{\phi }({{{\bf{z}}}}| {{{\bf{x}}}})}\left[| s({{{{\bf{x}}}}}_{1})-s({{{{\bf{x}}}}}_{2})| \right.\\ &&\times \left\{y({{{{\bf{x}}}}}_{1},{{{{\bf{x}}}}}_{2})(1-{p}_{\theta }(y=1| {{{{\bf{z}}}}}_{1},{{{{\bf{z}}}}}_{2}))\right.\\ &&+\left.\left.(1-y({{{{\bf{x}}}}}_{1},{{{{\bf{x}}}}}_{2})){p}_{\theta }(y=1| {{{{\bf{z}}}}}_{1},{{{{\bf{z}}}}}_{2})\right\}\right],\end{array}$$

(30)

which is identical to the reward loss in the K_acc VIB-like objective function, except that the probability of an erroneous ANN decision is weighted by the absolute value of the difference in the cardinality values s(x₁) and s(x₂). The ANNs trained with VIB-like objective functions thus penalize reward loss following the K_acc and K_rew objectives employed in the analytical solutions (see equations (1) and (2) in the main text).

All networks tested here used layers that are standard in the machine learning literature. Each retinal input network consisted of convolutional 4 × 4 kernels, with a stride of two. In the results presented in this work, we used four filters, but we found that our results are largely insensitive to the number of filters used. We also investigated a fully connected input layer with different sizes (50–200 neurons), which led to nearly identical results and conclusions. The stochastic encoder has the form

$$p(z| x)={{{\mathcal{N}}}}\left(z| {g}_{e}^{\mu }(x),{g}_{e}^{\varSigma }(x)\right),$$

(31)

where g_e is a fully connected layer that receives as input the output of the retinal layer, where g_e outputs the K-dimensional mean vectors μ of z as well as the K × K covariance matrix Σ. In the results presented here, we use K = 4, but our results are similar for a range of K values from 2 to 16. We used the reparameterization trick to write p(z|x)dz = p(ϵ)dϵ, where z = g(x, ϵ) is a deterministic function of x and the Gaussian random variable ϵ. The noise is thus independent of the parameters of the network, and it is possible to take gradients that optimize the objective function in equation (4). The downstream integration network consisted of a fully connected network that receives as input the values of the noisy encoder z for each retinal input. The size of this layer for the results presented here is 20, but the main conclusions of our analyses are insensitive to the size of this layer. Finally, the decision module was a single sigmoidal unit indicating the selection of the left or right stimulus. All hidden units used rectified-linear activations. The networks were trained with Adam optimization with a learning rate of 0.0001.

To compute the Fisher information of the encoder, we first generated 500 inputs for each orientation stimulus s in the cardinality space from 0° to 45° in steps of 0.5°. We computed the empirical expected value vector

$$\bar{{{{\bf{z}}}}}(s)={{\mathbb{E}}}_{{{{\bf{z}}}} \sim p({{{\bf{z}}}}| s)}[{{{\bf{z}}}}]={\mathbb{E}}[{{{\bf{z}}}}| s].$$

(32)

By rescaling the responses z_i(s) such that the noise has unit variance, without loss of generality, the Fisher information J can be expressed as

$$J(s)=\mathop{\sum }\limits_{i=1}^{n}{\bar{z}}_{i}^{{\prime} }{\left(s\right)}^{2}={\left\Vert \frac{\partial \bar{{{{\bf{z}}}}}(s)}{\partial s}\right\Vert }_{2}^{2}.$$

(33)

Resource allocation with RT costs

We used simulations to study the scenario in which agents are rewarded for short RTs in both the K_acc and K_rew contexts. Examining this scenario requires assumptions about a process model that jointly generates decisions and RTs. We assumed that decisions and RTs T are generated by a simple DDM with a constant decision bound b, decision evidence z and diffusion noise σ that is independent of the choice set inputs, which can be thought of as a downstream decision noise. In the DDM, the data generation process does not change if we set, for instance, σ to a constant. Here we set σ = 1. Following the notation in our work, we define the decision evidence z(s₁, s₂) for the choice set s_1,2

$$\begin{array}{rcl}z({s}_{1},{s}_{2})&=&\frac{| {{{\rm{E}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]-{{{\rm{E}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]| }{\sqrt{{{{\rm{Var}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]+{{{\rm{Var}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]}}\\ &=&\frac{| {{{\rm{E}}}}\left[{\hat{s}}_{1}| {s}_{1}\right]-{{{\rm{E}}}}\left[{\hat{s}}_{2}| {s}_{2}\right]| }{\sqrt{\frac{1}{J({s}_{1})}+\frac{1}{J({s}_{2})}}},\end{array}$$

(34)

where J(s) is Fisher information, which determines resource allocation. To find the optimal resource allocation, we define

$$J(s)\equiv k\times \tilde{f}(s),$$

(35)

with the property

$$\int\,\tilde{f}(s){\mathrm{d}}s=1,\quad \frac{{\mathrm{d}}\tilde{F}}{{\mathrm{d}}s} > 0,$$

(36)

where $\tilde{F}$ is defined as the CDF of $\tilde{f}.$ Here we set k sufficiently high such that the low-noise limit property holds, and we numerically find $\tilde{f}$ (ref. ⁶⁹).

In the standard DDM, the probability of an erroneous response is given by (for simplicity, we approximate the normal CDF of equation (26) with the logit function corresponding to the analytical solution of the DDM; this approximation does not change the qualitative conclusions of our results)

$$P(\,{{\mbox{error}}}\,| {s}_{1},{s}_{2})=\frac{1}{1+{e}^{2b\times z({s}_{1},{s}_{2})}},$$

(37)

and the expected RT is given by⁷⁰

$${{{\rm{E}}}}[RT| {s}_{1},{s}_{2}]=\frac{b}{z({s}_{1},{s}_{2})}\tanh (b\times z({s}_{1},{s}_{2})).$$

(38)

In this scenario, the loss function for the K_acc context is given by equation (5) in the main text, and the loss function for the K_rew context is given by equation (6) in the main text. Note that as η → 0, the optimal decision bound would be z → ∞. The goal is thus to find the optimal balance between resource allocation J(s) and optimal bound z that minimizes the loss functions for a given RT cost η and for the prior distribution of sensory stimuli in the environment.

Representational similarity analyses of human fMRI data

We conducted additional conjunction analyses on the whole-brain maps of representational similarity for identity and usefulness that were originally computed by Castegnetti and colleagues⁵⁷. We obtained the thresholded (FWE P < 0.05) whole-brain maps from Castegnetti and colleagues and computed conjunctions between the identity and usefulness contrasts, as well as usefulness and independently defined masks of the LOC and primary visual areas V1–V3 to create the figure in Supplementary Note 2. The LOC mask was obtained from the fMRI meta-analysis tool Neurosynth (neurosynth.org) with the keyword ‘Lateral Occipital Cortex’ and thresholded at the Neurosynth default of P < 0.01 (FDR-corrected). The V1–V3 masks were extracted from the Julich-Brain Cytoarchitectonic Atlas and thresholded at 50% probability. The LOC and V1–V3 masks were then conjoined with the cluster-corrected statistical map of usefulness representations. For the full details about the fMRI data analyses, see Supplementary Note 2.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The behavioural data are available at https://osf.io/an274/. The LOC mask was obtained from the fMRI meta-analysis tool Neurosynth (neurosynth.org). The V1–V3 masks were extracted from the Julich-Brain Cytoarchitectonic Atlas (julich-brain-atlas.de)

Code availability

The analysis code is available at https://osf.io/an274/.

References

Barlow, H. B. Possible principles underlying the transformations of sensory messages. In Sensory Communication (ed. Rosenblith, W. A.) 217–233 (MIT Press, 1961).
Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 61, 183–193 (1954).
Article CAS PubMed Google Scholar
Robson, A. J. The biological basis of economic behavior. J. Econ. Lit. 39, 11–33 (2001).
Article Google Scholar
Salinas, E. How behavioral constraints may determine optimal sensory representations. PLoS Biol. 4, e387 (2006).
Article PubMed PubMed Central Google Scholar
Rustichini, A., Conen, K. E., Cai, X. & Padoa-Schioppa, C. Optimal coding and neuronal adaptation in economic decisions. Nat. Commun. 8, 1208 (2017).
Article PubMed PubMed Central Google Scholar
Młynarski, W. F. & Hermundstad, A. M. Adaptive coding for dynamic sensory inference. eLife 7, 189506 (2018).
Article Google Scholar
Heng, J. A., Woodford, M. & Polania, R. Efficient sampling and noisy decisions. eLife https://doi.org/10.7554/eLife.54962 (2020).
McKay, R. T. & Dennett, D. C. The evolution of misbelief. Behav. Brain Sci. 32, 493–510 (2009).
Article PubMed Google Scholar
Searle, J. R. Seeing Things as They Are: A Theory of Perception (Oxford Univ. Press, 2015).
Berke, M. D., Walter-Terrill, R., Jara-Ettinger, J. & Scholl, B. J. Flexible goals require that inflexible perceptual systems produce veridical representations: implications for realism as revealed by evolutionary simulations. Cogn. Sci. 46, e13195 (2022).
Article PubMed Google Scholar
Jackson, F. Perception: A Representative Theory (CUP Archive, 1977).
Hoffman, D. D., Singh, M. & Prakash, C. The interface theory of perception. Psychon. Bull. Rev. 22, 1480–1506 (2015).
Article PubMed Google Scholar
Prakash, C., Stephens, K. D., Hoffman, D. D., Singh, M. & Fields, C. Fitness beats truth in the evolution of perception. Acta Biotheor. 69, 319–341 (2021).
Article PubMed Google Scholar
Shuler, M. G. & Bear, M. F. Reward timing in the primary visual cortex. Science 311, 1606–1609 (2006).
Article CAS PubMed Google Scholar
Stănişor, L., van der Togt, C., Pennartz, C. M. A. & Roelfsema, P. R. A unified selection signal for attention and reward in primary visual cortex. Proc. Natl Acad. Sci. USA 110, 9136–9141 (2013).
Article PubMed PubMed Central Google Scholar
Poort, J. et al. Learning enhances sensory and multiple non-sensory representations in primary visual cortex. Neuron 86, 1478–1490 (2015).
Article CAS PubMed PubMed Central Google Scholar
Banerjee, A. et al. Value-guided remapping of sensory cortex by lateral orbitofrontal cortex. Nature 585, 245–250 (2020).
Article CAS PubMed Google Scholar
Norman, K. J. et al. Post-error recruitment of frontal sensory cortical projections promotes attention in mice. Neuron 109, 1202–1213.e5 (2021).
Article PubMed PubMed Central Google Scholar
Libby, A. & Buschman, T. J. Rotational dynamics reduce interference between sensory and memory representations. Nat. Neurosci. https://doi.org/10.1038/s41593-021-00821-9 (2021).
Avitan, L. & Stringer, C. Not so spontaneous: multi-dimensional representations of behaviors and context in sensory areas. Neuron 110, 3064–3075 (2022).
Article CAS PubMed Google Scholar
Shannon, C. E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec. 4, 142–163 (1959).
Google Scholar
Friston, K. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Article CAS PubMed Google Scholar
Ortega, P. A. & Braun, D. A. Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. A https://doi.org/10.1098/RSPA.2012.0683 (2013).
Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982 (2006).
Article CAS PubMed Google Scholar
Sharpee, T. O. et al. Adaptive filtering enhances information transmission in visual cortex. Nature 439, 936–942 (2006).
Article CAS PubMed PubMed Central Google Scholar
Fairhall, A. L., Lewen, G. D., Bialek, W. & De Ruyter van Steveninck, R. R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).
Article CAS PubMed Google Scholar
Sims, C. R. Efficient coding explains the universal law of generalization in human perception. Science 360, 652–656 (2018).
Article CAS PubMed Google Scholar
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Article CAS PubMed Google Scholar
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).
Article CAS PubMed Google Scholar
Polanía, R., Woodford, M. & Ruff, C. C. Efficient coding of subjective value. Nat. Neurosci. 22, 134–142 (2019).
Article PubMed Google Scholar
Zaslavsky, N., Kemp, C., Regier, T. & Tishby, N. Efficient compression in color naming and its evolution. Proc. Natl Acad. Sci. USA 115, 7937–7942 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bhui, R. & Gershman, S. J. Decision by sampling implements efficient coding of psychoeconomic functions. Psychol. Rev. 125, 985–1001 (2018).
Article PubMed Google Scholar
Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N. Y. Acad. Sci. 1251, 13–32 (2012).
Article PubMed Google Scholar
Sims, C. R., Jacobs, R. A. & Knill, D. C. An ideal observer analysis of visual working memory. Psychol. Rev. 119, 807–830 (2012).
Article PubMed PubMed Central Google Scholar
van den Berg, R. & Ma, W. J. A resource-rational theory of set size effects in human visual working memory. eLife https://doi.org/10.7554/eLife.34963 (2018).
Laughlin, S. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C 36, 910–912 (1981).
Article CAS PubMed Google Scholar
De Ibarra, N. H. & Giurfa, M. Discrimination of closed coloured shapes by honeybees requires only contrast to the long wavelength receptor type. Anim. Behav. 66, 903–910 (2003).
Article Google Scholar
Wehner, R. Handbook of Sensory Physiology (ed. Autrum, H.) 287–616 (Springer Berlin, 1981).
Netzer, N. Evolution of time preferences and attitudes toward risk. Am. Econ. Rev. 99, 937–955 (2009).
Article Google Scholar
Park, I. M. & Pillow, J. W. Bayesian efficient coding. Preprint at bioRxiv https://doi.org/10.1101/178418 (2017).
Wei, X.-X. & Stocker, A. A. A Bayesian observer model constrained by efficient coding can explain ‘anti-Bayesian’ percepts. Nat. Neurosci. 18, 1509–1517 (2015).
Article CAS PubMed Google Scholar
Girshick, A. R., Landy, M. S. & Simoncelli, E. P. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat. Neurosci. 14, 926–932 (2011).
Article CAS PubMed PubMed Central Google Scholar
Morais, M. J. & Pillow, J. W. Power-law efficient neural codes provide general link between perceptual bias and discriminability. Adv. Neural Inf. Process. Syst. 2018, 5071–5080 (2018).
Google Scholar
Storrs, K.R. and Kriegeskorte, N. Deep learning for cognitive neuroscience. Preprint arXiv https://doi.org/10.48550/arXiv.1903.01458 (2019).
Alemi, A. A., Fischer, I., Dillon, J. V. & Murphy, K. Deep variational information bottleneck. In 5th International Conference on Learning Representations, ICLR 2017—Conference Track Proceedings (ICLR, 2016).
Tishby, N. & Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop, ITW 2015 (IEEE, 2015); https://doi.org/10.1109/ITW.2015.7133169
Burgess, C. P. et al. Understanding disentangling in β-VAE. Psychol. Rev. 127, 891–917 (2018).
Bates, C. J. & Jacobs, R. A. Efficient data compression in perception and perceptual memory. Psychol. Rev. https://doi.org/10.1037/rev0000197 (2020).
Tkačik, G. & Bialek, W. Information processing in living systems. Annu. Rev. Condens. Matter Phys. https://doi.org/10.1146/annurev-conmatphys-031214-014803 (2016).
Anderson, B. L. Where does fitness fit in theories of perception? Psychon. Bull. Rev. 22, 1507–1511 (2015).
Article PubMed Google Scholar
Tajima, S., Drugowitsch, J. & Pouget, A. Optimal policy for value-based decision-making. Nat. Commun. 7, 12400 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fudenberg, D., Strack, P. & Strzalecki, T. Speed, accuracy, and the optimal timing of choices. Am. Econ. Rev. 108, 3651–3684 (2018).
Article Google Scholar
Hébert, B. & Woodford, M. Rational inattention when decisions take time. J. Econ. Theory 208, 105612 (2023).
Article Google Scholar
Liu, Y., Xin, Y. & Xu, N.-l. A cortical circuit mechanism for structural knowledge-based flexible sensorimotor decision-making. Neuron https://doi.org/10.1016/j.neuron.2021.04.014 (2021).
Reynolds, J. H. & Heeger, D. J. The normalization model of attention. Neuron 61, 168–185 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bates, C. J., Lerch, R. A., Sims, C. R. & Jacobs, R. A. Adaptive allocation of human visual working memory capacity during statistical and categorical learning. J. Vis. 19, 11 (2019).
Castegnetti, G., Zurita, M. & De Martino, B. How usefulness shapes neural representations during goal-directed behavior. Sci. Adv. https://doi.org/10.1126/SCIADV.ABD5363 (2021).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article CAS PubMed Google Scholar
Juechems, K., Balaguer, J., Spitzer, B. & Summerfield, C. Optimal utility and probability functions for agents with finite computational precision. Proc. Natl Acad. Sci. USA 118, e2002232118 (2021).
Article CAS PubMed Google Scholar
Robson, A. J. & Samuelson, L. in Handbook of Social Economics Vol. 1 (eds Benhabib, J. et al.) 221–310 (Elsevier, 2011); https://doi.org/10.1016/B978-0-444-53187-2.00007-3
Jerison, H. Evolution of the Brain and Intelligence (Elsevier, 2012).
MATLAB v.9.7 (R2019b) (The MathWorks Inc., 2019).
Wang, Z., Stocker, A. A. & Lee, D. D. Efficient neural codes that minimize L_p reconstruction error. Neural Comput. 28, 2656–2686 (2016).
Article PubMed Google Scholar
Bercher, J. F. Source coding with escort distributions and Rényi entropy bounds. Phys. Lett. A 373, 3235–3238 (2009).
Article CAS Google Scholar
Prat-Carrabin, A. & Woodford, M. Bias and variance of the Bayesian-mean decoder. Adv. Neural Inf. Process. Syst. 34, 23793–23805 (2021).
Plummer, M. et al. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proc. 3rd International Workshop on Distributed Statistical Computing (DSC 2003) (eds. Hornik, K. et al.) 20–22 (DSC, 2003).
Bürkner, P.-C. brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80, 1–28 (2017).
Article Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing, version 3.6.3 (R Foundation for Statistical Computing, 2020).
Grujic, N., Brus, J., Burdakov, D. & Polania, R. Rational inattention in mice. Sci. Adv. 8, 8935 (2022).
Article Google Scholar
Bogacz, R., Brown, E., Moehlis, J., Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol. Rev. 113, 700–765 (2006).
Article PubMed Google Scholar

Download references

Acknowledgements

We thank N. Netzer, M. Woodford and A. Stocker for providing helpful feedback on the manuscript text. We thank M. Zurita and B. De Martino for sharing their human fMRI data for re-analysis. J.S. and S.D.B. acknowledge support from Marlene-Porsche Foundation scholarships for their PhD studies. This work was supported by a European Research Council (ERC) starting grant (ENTRAINER) to R.P. This project has received funding from the ERC under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 758604). T.A.H. received support from the Swiss National Science Foundation (SNSF) (grant no. 32003B_166566). P.N.T. received support from the SNSF (grant nos 100019_176016 and 10001C_188878). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Funding

Open access funding provided by University of Zurich

Author information

These authors jointly supervised this work: Todd A. Hare, Rafael Polania.

Authors and Affiliations

Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland
Jonathan Schaffner, Sherry Dongqi Bao, Philippe N. Tobler & Todd A. Hare
Neuroscience Center Zurich, Zurich, Switzerland
Jonathan Schaffner, Sherry Dongqi Bao, Philippe N. Tobler, Todd A. Hare & Rafael Polania
Decision Neuroscience Lab, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
Rafael Polania

Authors

Jonathan Schaffner
View author publications
You can also search for this author in PubMed Google Scholar
Sherry Dongqi Bao
View author publications
You can also search for this author in PubMed Google Scholar
Philippe N. Tobler
View author publications
You can also search for this author in PubMed Google Scholar
Todd A. Hare
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Polania
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.N.T., T.A.H. and R.P. designed the study. J.S. and S.D.B. collected the data. J.S., S.D.B., T.A.H. and R.P. analysed the data. All authors interpreted the results and wrote the manuscript. P.N.T., T.A.H. and R.P. acquired the funding.

Corresponding authors

Correspondence to Todd A. Hare or Rafael Polania.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Human Behaviour thanks Katherine Storrs and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6, Tables 1–5, and Notes 1 and 2.

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schaffner, J., Bao, S.D., Tobler, P.N. et al. Sensory perception relies on fitness-maximizing codes. Nat Hum Behav 7, 1135–1151 (2023). https://doi.org/10.1038/s41562-023-01584-y

Download citation

Received: 09 November 2021
Accepted: 09 March 2023
Published: 27 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1038/s41562-023-01584-y

This article is cited by

Neural waves and computation in a neural net model I: Convolutional hierarchies
- Stephen Selesnick
Journal of Computational Neuroscience (2024)
Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset
- Aria Y. Wang
- Kendrick Kay
- Leila Wehbe
Nature Machine Intelligence (2023)