Abstract
Sensory information encoded by humans and other organisms is generally presumed to be as accurate as their biological limitations allow. However, perhaps counterintuitively, accurate sensory representations may not necessarily maximize the organism’s chances of survival. To test this hypothesis, we developed a unified normative framework for fitnessmaximizing encoding by combining theoretical insights from neuroscience, computer science, and economics. Behavioural experiments in humans revealed that sensory encoding strategies are flexibly adapted to promote fitness maximization, a result confirmed by deep neural networks with information capacity constraints trained to solve the same task as humans. Moreover, human functional MRI data revealed that novel behavioural goals that rely on object perception induce efficient stimulus representations in early sensory structures. These results suggest that fitnessmaximizing rules imposed by the environment are applied at early stages of sensory processing in humans and machines.
Main
One of the main goals of the neural and behavioural sciences is to understand what general principles explain the solutions evolution has selected to extract and process information from the environment to guide behaviour. Half a century ago, it was postulated that neural systems should represent the sensory world as accurately and efficiently as possible by exploiting information about the statistical regularities of the environment, an idea known as efficient coding^{1,2}.
Efficient coding in sensory perception is typically assumed to be based on an information maximization criterion—that is, the sensory world must be represented as accurately as possible. One may think that this criterion makes sense for early sensory systems, as this is precisely the role of a sensor: a good measurement instrument must reliably measure the environmental variable that it was built for. However, the information maximization criterion does not necessarily consider the behavioural goals of the organism^{3,4,5,6,7}.
Is it reasonable that our sensory systems invest their limited resources to represent the world as accurately as possible irrespective of the organism’s goals? This question has kept scientists and philosophers busy for centuries and led to heated debates across various fields and domains including neuroscience, psychology, economics and evolutionary biology^{8}. Some views support the idea that organisms should represent objects as they exist in the world, as closely as biological limitations allow^{9,10}. Others posit that perceptual representations should be in general different from the actual physical world, and these representations should directly map onto the utility they offer to the agents^{11,12,13}.
In partial support of the latter idea, recent neurophysiological evidence shows that early sensory systems represent not only information about physical sensory inputs but also nonsensory information according to the requirements of a specific task and the behavioural relevance of the stimuli^{14,15,16,17,18}. This does not necessarily imply that sensory systems should give up representing the ‘veridical’ world, as it has been demonstrated that neural systems can develop computational strategies that allow representing multiple behaviourally relevant features alongside objective sensory information^{19,20}. However, this line of research provides no indication of the actual benefit of having such mixed neural representations at the earliest stages of sensory processing, or how this information could be used to efficiently guide behaviour, given that we are limited in our capacity to process information.
The study of how systems should trade off the maximization of some utility function relevant to the goals of the organism against informationprocessing constraints is part of a growing body of research inspired by the work of Shannon, who put forward the idea that when optimizing a distortion function that characterizes the cost of particular errors, not all such errors are equally important. This implies that the unreliability of signal transmission is not necessarily uniform across the space of possible messages that can be transmitted^{21}. Similar concepts have been borrowed from the field of statistical mechanics, where information processing in capacitylimited systems can be modelled as the energy required to move away from default states in thermodynamic systems, which can be quantified by differences in free energy^{22,23}. These principled approaches have played a fundamental role in neural process theories of early sensory systems^{24,25,26,27,28} as well as higher cognitive functions^{29,30,31,32,33,34,35}.
Our work builds on these normative theoretical principles to determine how a (neural) system should allocate informationprocessing resources to maximize fitness in different situations. We focus on two of the most common problems studied in decisionmaking: accuracy maximization in perceptual discrimination tasks and reward maximization for situations in which a particular attribute is related to a given currency value. This allowed us to test the following hypothesis: given that noisy communication channels always lose information during transmission, the brain will adapt to the fitnessmaximizing rules of a particular environment at the earliest stages of sensory processing. We demonstrate that early visual structures in humans and artificial agents with sensory informationprocessing bottlenecks follow fitnessmaximizing encoding schemes.
Results
Neural codes in an insect’s retina
Before moving on to humans, we introduce an illustrative example, as anecdotal evidence, to motivate the theoretical framework applied in our experiments. Specifically, we studied the responses of retinal neurons in the blowfly—the large monopolar cells (LMCs)—which encode sensory information about visual contrast levels. These neural codes are considered the first demonstration of efficient coding in biological organisms^{36}.
Visual features such as shape, colour and texture are important sensory signals that insects use to discriminate between competing flowers and fruit species, with visual contrast playing a key role^{37,38}. In our example, we assume that blowflies use knowledge of the different levels of contrast displayed by flowers and fruits to select food sources that promise more beneficial nutrients (that is, reward). In other words, we assume that there is a monotonic association between contrast and reward that makes some contrast discrimination mistakes more costly than others. Please note that this corresponds to the standard and most studied class of economic problems, where choices with a particular attribute are monotonically related to a given currency value. The hypothesis that we test here is that a neural code that simply maximizes information accuracy in the LMCs would not maximize the fitness of the organism.
Concretely, we studied the following problem. Suppose that the distribution of contrasts encountered by the blowfly in its natural environment is given by f(s). We define the function that transforms the contrast stimulus input s to neural responses r in the blowfly retina as r = h(s). Then, what is the optimal neural response shape h(s) if, given biological limitations, such a function can only generate a limited set of neural responses? Under this formulation, the following problem can be studied^{39}: find the optimal neural response function under two evolutionary optimization criteria, (1) the probability of mistakes minimization criterion and (2) the expected reward loss minimization criterion.
To solve this problem, we assume that the organism must make choices between alternatives drawn from the stimulus distribution, f(s), which describes the relative availability of the different alternatives in its environment (for example, how often a blowfly encounters a particular flower). The goal is to select the alternative that promises more reward to the organism, as this should lead the organism to maximize its fitness^{3,39}. In the case of the blowfly LMCs, one may suppose that the blowfly must often make fine discriminations and that different contrast levels are monotonically related to different reward values.
On the one hand, if the goal of the organism is to minimize the number of erroneous responses (that is, maximize discrimination accuracy between two stimuli s_{1} and s_{2})
it can be shown that the optimal neural response h(s) matches the cumulative distribution function (CDF) of the stimulus distribution (Fig. 1b; see also equation (11) in the Methods). However, this accuracy maximization strategy does not provide a precise account of the distribution of neural responses in the blowfly (Fig. 1).
On the other hand, if the goal of the organism is to minimize expected reward loss (that is, maximize the amount of reward received after many decisions; see Supplementary Note 1 for derivation)
the optimal fitnessmaximizing neural response h(s) provides a nearly perfect account of the neural responses of the blowfly retina (Fig. 1b,c and Methods). We emphasize that the remarkable overlap between the fitnessmaximizing predictions and the empirical neural responses presented in Fig. 1 are not the product of curve fitting; instead, these predictions emerge from the normative decision model, which has no degrees of freedom (Methods).
A common approach adopted in computer science and neuroscience to study the way in which a system penalizes estimation mistakes to optimize performance is via the L_{p} loss function defined as \( \hat{s}(r)s{ }^{p}\), where \(\hat{s}\) is the sensory estimation, s is the true signal and p determines how errors are penalized. A recent parameter estimation study asked what type of error penalty best explained the LMC data^{40}. However, in this study, the authors did not have explicit hypotheses for the potential evolutionary and behavioural meaning of different values for error penalties, and they instead relied on numerical estimations of p that best explained the data. We demonstrate that the error penalty that provides a nearly perfect fit to the LMC response function corresponds to the blowfly LMC encoding function that guarantees maximal reward expectation to the organism under our sensoryreward mapping assumptions (Fig. 1c and Methods).
While the predictions of rewardmaximizing sensory codes and the data from blowfly retinal neurons (that is, the earliest level of encoding) show a striking similarity (Fig. 1), this result does not directly address all aspects of our hypothesis that neural codes in early sensory areas adapt to the organism’s behavioural goals. This is because we do not know the specific function linking contrast to fitness for the blowfly, and we cannot show that the code used in their retinas adapts between contexts because we have data from only one context. While we emphasize that this result should be treated at this stage as anecdotal, it inspired us to test the fitnessmaximizing hypothesis more directly in humans (see below) and may inspire others to do the same in other animals.
Adaptive fitnessmaximizing sensory codes in humans
To more directly test the hypothesis that sensory perception relies on adaptive fitnessmaximizing codes, we implemented an experiment with more than one context in human sensory encoding. To date, it has been widely accepted that the default neural code for orientation perception in humans is information maximization (infomax) coding^{41}, as it can be shown that this code will minimize the probability of mistakes in perceptual discrimination tasks (Supplementary Note 1). One reason that infomax coding may typically explain human perception well in regard to orientation is that, for humans, orientation information does not typically signify reward and is instead used for navigation purposes. The fitnessmaximizing code for orientation perception may thus be equivalent to infomax under standard conditions. Our experiments deviate from these standard conditions to test whether sensory encoding strategies adapt in a manner predicted by the theory of fitness maximization.
We designed behavioural tasks in which, on any given trial, human participants had to choose which of two simultaneously presented orientation stimuli, s_{1} or s_{2}, was more diagonal (that is, closer to a 45degree angle; Fig. 2). In experiment 1, the participants were trained in two different contexts but were always tested with stimuli in the same retinotopic locations, while in experiment 2, the participants were trained in only one context or the other but were tested with stimuli in trained and untrained retinotopic locations. The key aspect of both experiments is that decisions were made in two different stimulus–reward association contexts. In one context, the participants were paid a fixed reward for correct discrimination of the more diagonal stimulus in each trial and received no reward for incorrect decisions (henceforth the accuracy context, K_{acc}). In the second context, the participants were rewarded depending on the stimulus s that they selected in each trial, and the amount of reward was linearly mapped to the degree of diagonality of the input stimulus (henceforth the reward context, K_{rew}). Crucially, the prior distribution of sensory signals f(s) was exactly the same in both contexts. Stimuli close to cardinal orientations were presented the most often to match the statistics of natural scenes that humans typically encounter^{42} (Fig. 3a). This experimental design allowed us to test the competing hypotheses that neural codes in early sensory areas (1) maximize accurate representations of the environment and are thus constant in both reward contexts, or (2) adapt between contexts to instantiate efficient coding strategies that maximize fitness. The locationspecific training in experiment 2 allowed us to test whether any adaptation occurs in early sensory regions that maintain retinotopic mappings or only later in downstream circuits that generalize across locations.
We employed a general method for defining efficient codes by investigating optimal allocation of limited neural resources^{43}. On the basis of this framework, sensory precision, measured as Fisher information J(s), should be proportional to the amount of resources available k and the prior distribution f(s) raised to a power q
hence known as the powerlaw efficient code. We show that an advantage of employing this framework is that there is a direct link between the powerlaw efficient codes and the fitness maximization solutions for the contexts that we consider here (Methods and Supplementary Note 1). In brief, the connection of the powerlaw efficient codes with accuracy versus reward maximization objectives is the following: if the powerlaw parameter q is relatively low, it shifts some neural resources away from where f(s) is high and relocates them where it is low. The reason for this spreading of coding resources is that, even though observing a rare stimulus is unlikely (and thus the probability of error will be low), if there is an error, it could be a very costly mistake. Therefore, when stimuli are directly associated to rewards relative to situations in which all mistakes are equally costly, it pays to allocate more neural resources to the segments of the stimulus space where f(s) is low (Fig. 3e and Supplementary Fig. 1). This theoretical link allowed us to derive various qualitative predictions that we used to test whether humans indeed adopt fitnessmaximizing, as opposed to informationmaximizing, neural codes of sensory perception.
A first prediction of fitness maximization theory is related to sensory discrimination differences in the two reward association contexts considered here. If participants maximize fitness under limited resources, discrimination accuracy for diagonal (that is, oblique) relative to cardinal orientations should improve more in K_{rew} than in K_{acc} over the course of the decision task (Fig. 3e). In line with this prediction, we found an interaction between reward association context, orientation and task phase (early or late) on discrimination accuracy (β = 1.46 ± 0.65; P_{MCMC} < 0.001; Supplementary Tables 1 and 2). Note that this interaction is not driven by a simple increase in sensitivity in K_{rew}—that is, a general improvement across the whole orientation space (Fig. 3b). Despite the differences between contexts, fitness maximization theory predicts that for both optimization objectives, discriminability should be higher in regions of the stimulus distribution prior with higher density (that is, greater in cardinal than oblique orientations) because these stimuli occur more frequently in all contexts (see the thick blue and thick red lines in Fig. 3e). The data are consistent with this prediction as well (main effect of obliqueness s in K_{acc}: β = −4.50 ± 0.45; P_{MCMC} < 0.001; in K_{rew}: β = −5.50 ± 0.47; P_{MCMC} < 0.001; Fig. 4a,b). These results thus support our hypothesis that perceptual coding of sensory information uses a fitnessmaximizing code.
To further substantiate the conclusion that changes in behaviour were driven by fitnessmaximizing codes rather than experiencedriven increases in sensitivity, we fit the encoding model to the choice data to estimate parameters q and k. In line with the fitness maximization predictions, we found that in context K_{rew} the value of k did not change (Δk = −0.0004 ± 0.0014; P_{MCMC} = 0.61), while q decreased between the first and last part of the decision experiment by Δq = −0.29 ± 0.15 (P_{MCMC} = 0.03). The final value of q was significantly smaller in K_{rew} than in K_{acc} (Δq = −0.38 ± 0.14; P_{MCMC} = 0.004; Fig. 4c). Taken together, our results clearly indicate that the empirically observed behavioural changes in K_{rew} versus K_{acc} were not caused by simple practicerelated sensitivity enhancements or differences in monetary payoffs (Methods).
Fitnessmaximizing adaptation at sensory estimation stages
We next sought to determine whether the form of efficient adaptation observed in the decision task takes place only in downstream decision circuits or whether it is already implemented at earlier processing stages—for instance, in the circuits that generate estimations of sensory stimuli. To answer this question, we had participants perform an edge orientation estimation task before and after the contextual decisionmaking task (Fig. 2 and Methods). After training in either context, there was a significant decrease in the estimation bias (P_{MCMC} = 0.02; Supplementary Fig. 2). A decrease in the estimation bias is predicted by either a fitnessmaximizing code or increased sensitivity (Fig. 3c,f). However, similar to the decision task, experiencedependent changes in sensitivity versus a fitnessmaximizing code make distinct predictions for the estimation task in terms of the estimation variance. Greater sensitivity would lead to lower estimation variability for all orientations (Fig. 3d). In contrast, the fitness maximization hypothesis predicts that after participants adapt to context K_{rew} in the decision task, estimation variability for more oblique orientations will decrease, while estimation variability for stimuli near cardinal orientations will be slightly higher (Fig. 3g). This is because the theory predicts a shift in coding resources from highprobability cardinal orientations to lowprobability diagonal orientations. Crucially, fitness maximization predicts that after exposure to context K_{acc}, there will be no change in the relative estimation variability for cardinal versus diagonal orientations. In line with these predictions, we found a significant interaction (β = −0.16 ± 0.09; P_{MCMC} = 0.04) between the change in estimation variability for oblique relative to cardinal stimuli across contexts (Fig. 5a and Supplementary Table 3).
Fitnessmaximizing codes are retinotopically specific
A key conclusion that we draw from our results is that fitnessmaximizing adaptation in both the decision task and the estimation task appears to have a common origin that does not depend on comparisons between decoded stimuli in downstream decision circuits. To explicitly test whether fitnessmaximizing neural codes are indeed present at the earliest stages of sensory processing, we modified the decision and estimation tasks to train and test behavioural performance in retinotopically specific locations. In the modified estimation task, the participants were presented with an orientation stimulus in one of four spatial locations (Fig. 2b). Crucially, the participants in these experiments were trained in only two of these locations during the decision task and completed the decision task in only one context (either K_{rew} or K_{acc}). If adaptation is retinotopically specific, then changes in estimation task performance should be specific to the retinotopic locations trained during the decision task. In line with the fitnessmaximizing predictions, for those trained in K_{rew}, we found the pattern predicted by a fitnessmaximizing code in the locationspecific changes in estimation variability (location × time(after − before) × oblique: β = −0.12 ± 0.07; P_{MCMC} = 0.04; Fig. 5b and Supplementary Table 4). For those trained in K_{acc}, this interaction was not significant (Fig. 5c and Supplementary Table 5). A comparison across groups showed that the effect was greater in the K_{rew} than the K_{acc} group (context × location × time(after − before) × oblique: β = −0.17 ± 0.10; P_{MCMC} = 0.04). Together, these results confirm the retinotopic specificity of fitnessmaximizing coding rules in humans.
Artificial neural networks with sensory processing bottlenecks use fitnessmaximizing codes
The existence of fitnessmaximizing codes in early sensory systems—where they will literally change the way an organism sees the world—must be for very good reasons. We conducted machine learning analyses with artificial neural networks (ANNs^{44}) to investigate whether agents with informational bottlenecks must recode their sensory representations to fitnessmaximizing schemes to achieve the best performance in decisionmaking tasks. Alternatively, the downstream decision circuits (which are not involved in estimating orientation as such) may not care about the accuracy with which orientation can be estimated and may have enough flexibility to maximize fitness even if the encoding scheme at early sensory stages is fixed to an infomax strategy.
The precision of neural representations at different sensory processing stages can be studied using recently developed neural network techniques in machine learning. We constructed an ANN implementation to test how (that is, at what layer of processing) it incorporates the behavioural goals of the agent when encoding sensory stimuli. More specifically, our premise is that the way in which internal representations of retinal sensory information are formed and used in the nervous system can be studied with a variational information bottleneck (VIB)like objective^{45,46,47,48}, where in general the goal is to minimize the following loss function:
where ϕ and θ are the parameters of the encoder and downstream decision circuit, respectively. In our ANN, the VIBlike objective trades (an approximation of) the amount of ‘visual’ information I that the encoder can process with the expected reward loss, via the regularization parameter β. Note that the analytical solutions developed in our work ‘drop’ costs on I by assuming that the noise in the encoder is small compared with the dynamic range of the signal (that is, the smallnoise approximation, which is commonly adopted in early sensory systems to study neural coding efficiency, often leading to satisfactory predictions^{49}). The reason for using the VIBlike objective in our ANNs is that it provides a parsimonious way to induce pressures in the encoder to disentangle information up to a certain bound in a systematic manner (Supplementary Fig. 3).
We implemented an ANN that solved the same task as in our human experiments (Fig. 6a; see Methods for the details). The ANN received two retinal images corresponding to screen locations where the two Gabor patches were presented in our task. Just as in the human experiment, the decision rule that the ANN had to learn was to indicate which of the two input stimuli (left or right) was more diagonal, while maximizing the reward received across many trials. We trained networks in two contexts, corresponding to the human experiments: K_{acc} and K_{rew} (Methods). After the information had been encoded, it was fed to downstream neural circuits that used the encoded information to solve the task at hand (in our case, select the Gabor patch that was more diagonal), while considering the goals of the agent within the environmental context (for example, maximize decision accuracy or maximize reward consumption; Methods).
After the networks were trained in each context, we first investigated whether the network could disentangle the hidden structure in the retinal image statistics to solve the downstream task. By applying a tdistributed stochastic neighbour embedding (tSNE) algorithm to the neural responses of the bottleneck structure, we found that the network indeed learned a useful representation of the scalar angular orientations from the retinal images (Fig. 6b). However, this tSNE solution provided no direct insight into how the encoder allocated its limited resources (that is, using an infomax or fitnessmaximizing code). We therefore analysed the amount of information contained in the encoder layer, quantified as Fisher information, as a function of the angular orientation. This analysis more directly illuminated the pressures shaping the learning of the latent representation in the encoder.
Mirroring the predictions of fitnessmaximizing theory and the human behavioural results, we found that, in general, the amount of information in the encoder layer was larger for cardinal orientations than for diagonal orientations. The network thus allocated the limited processing resources in the bottleneck encoder to the more recurrent portions of the angular stimulus space. Crucially, we found that for ANNs trained in context K_{acc} relative to ANNs trained in context K_{rew}, the amount of information was larger for more cardinal angles but smaller for diagonal orientations (Fig. 6c). Moreover, we found that these results were insensitive to the informationprocessing costs imposed in the encoder (Supplementary Fig. 3). Interestingly, we found that the retinal layer in our ANN architecture in which we did not explicitly incorporate informationprocessing regularization (Fig. 6a) also revealed signatures of informationprocessing allocation similar to the ones encountered in the second retinotopic layer (although less pronounced; Supplementary Fig. 4). We also studied how the ANN allocates informationprocessing resources in the first and second retinotopic layers when informational bottleneck pressures are imposed at the decisionmaking layer. We found that the fitnessmaximizing patterns were also present in this scenario in the second retinotopic layer (Supplementary Fig. 5). We found that the fitnessmaximizing patterns in layer 1 were present for high levels of network performance (that is, generally low β in the decision layer; Supplementary Fig. 6). Thus, even when informationprocessing pressures are very small at early sensory stages (and relatively large in downstream decision layers), neural networks still try to develop fitnessmaximizing codes at the early sensory stages to compensate for reward loss due to processing limitations in downstream circuits. However, these effects are more pronounced if informationprocessing constraints are present at early stages. This set of results indicates that the network learns to allocate its neural resources in a fitnessmaximizing manner following the predictions of the algorithmic normative theory and the behaviour exhibited by human participants.
The first network analysis additionally addressed the following concern: as downstream circuits are not involved in estimating orientation as such, they may not care about the accuracy with which orientation can be estimated. The ANN architecture we implemented here addressed exactly this question because the objective function the networks sought to optimize did not explicitly incorporate “reconstruction error minimization” (as classically implemented in variational autoencoder architectures^{47,48}). Instead, in our ANN architecture, the network had to find encoding solutions that benefited downstream operations supporting decision behaviour. That is, all that mattered to the network was what information about the latent (angular) space was most relevant to solve the decision task at hand and maximize reward. Nevertheless, we found that ANNs learned to implement efficient coding schemes in their encoding layers that maximized reward in each context.
Having demonstrated the efficient, fitnessmaximizing nature of the encoder in an ANN, we investigated whether an ANN could achieve solutions similar to the ones obtained in the fitnessmaximizing context K_{rew}, if we forced the encoder layer to maximize information transmission (that is, to use an infomax code). To test this, we first trained an ANN to maximize decision accuracy and then froze all network weights up to the encoder, but we left the downstream network weights free to change. Theoretically, ANNs with sufficient complexity can interpolate any objective function. One could therefore hypothesize that even if the encoder is restricted to maximize information transmission, downstream circuits could still figure out solutions that maximize reward gain on the basis of the representations coming from an infomax encoder. However, a competing hypothesis comes from an informationtheoretic point of view, which holds that, in sensory discrimination tasks that face a bottleneck due to limited resources (like the one we study), once information is lost or processed in a suboptimal manner in one step of a noisy transmission channel, it cannot be recovered, irrespective of how complex the downstream circuits are. In line with the predictions from information theory, we found that freezing the encoder layers after training them in context K_{acc} resulted in a significant reward loss when downstream layers were retrained in context K_{rew} (Fig. 6d). Once we unfroze the encoder layer (that is, allowed it to depart from the K_{acc} constraint and use fitnessmaximizing codes), we found that reward loss was significantly lower than in the K_{acc} trained network (Bayesian paired ttest P_{MCMC} < 0.001; Fig. 6d, bottom), reaching levels matching the network trained from scratch in context K_{rew} (Bayesian paired ttest P_{MCMC} = 0.86; Fig. 6d, bottom). Crucially, just as in the human experiments, the degree of discrimination accuracy was calibrated to be identical in all cases, and thus our results do not depend on different levels of accuracy across the ANNs (Fig. 6d, top; Bayesian paired ttest for all pairwise comparisons P_{MCMC} > 0.51). Moreover, we found that this result held independent of the degree of complexity (that is, size) of the downstream network, indicating that downstream circuits cannot compensate for the lack of fitnessmaximizing codes at the encoding stage.
The findings from our ANN analysis clarify our human behaviour results. Our ANN analyses reveal how a fixed set of physical sensory inputs with relevant but hidden environmental/contextual statistics that the agent can only experience and learn over time are represented in coding schemes to maximize fitness. Moreover, studying ANNs with a VIB is useful because it provides a reasonably realistic model of how encoding schemes are adapted to optimize a given objective function when the resources to process information are limited.
Generalizing to other ecologically valid reward functions
The solutions to the two decisionmaking objectives studied here belong to the same family of powerlaw efficient codes with a single parameter that determines the solution of the decision objective. However, we acknowledge that when the system must deal with more complex sensory–reward mappings, the same analytical solutions might not generally apply. This would have to be tested case by case. Nevertheless, it is possible to go beyond the analytical solutions and employ the same framework to find general strategies of resource allocation for arbitrary stimulus–reward mapping functions. We address this possibility next.
Nonmonotonic stimulus–reward functions
As has been emphasized in previous work, nonmonotonic payoff functions are common^{10,12,50}. For instance, suppose that a physical attribute is related to the degree of salinity of food. Too little or too much salt can have deleterious consequences on an organism and its fitness. In such scenarios, it has been suggested that perception should not be tuned directly to the stimulus–reward associations, as the organisms will be able to know only how good or bad the payoff is^{50}. However, knowing that the payoff is bad provides no information about why it is bad and hence no clue to the adaptive course of action for an organism^{50}.
How should the neural resources be allocated in such cases? What strategy should the agent follow? We consider the following three scenarios. Scenario 1 corresponds to the accuracy maximization task (K_{acc}). Scenario 2 corresponds to a rewardmaximizing task where rewards are linearly (monotonically) mapped to the physical stimulus values (K_{rew}, Fig. 7a). Scenario 3 corresponds to a nonmonotonic mapping where stimuli in the middle of the sensory space deliver the highest reward values (Fig. 7b). In all three scenarios, we assume a rightskewed distribution of sensory stimuli over the physical value space (Fig. 7a,b) for comparison with the orientation experiments we conducted in humans. For scenario 3, there is no known closedform solution to find the optimal allocation of resources, but please note that the minimization objective remains the same as for scenario 2: minimize the reward given up for every erroneous decision.
Here we emphasize that a key assumption of our framework is that, on the basis of experience or understanding of task instructions, the agent clearly understands which stimuli deliver more reward. For scenarios 1 and 2, higher levels of the stimuli are preferred. In scenario 3, however, stimuli in the middle are preferred. We can examine the downstream decoding process to understand how resources at encoding should be allocated in each scenario. Recall that the decoding rule in our model is the same in all cases: the Bayesian mean squared error. What is a possible strategy for the cases where the stimulus–reward mappings are monotonic or nonmonotonic unimodal? A relatively simple strategy that preserves the ‘veridicality’ of sensory information is one in which the agent employs a categorization threshold τ over the space of physical stimuli and decodes the values \(\hat{s}\) relative to that threshold. A simple implementation is one where the agent computes a relative decoded value \(\tilde{s}= \tau \hat{s}({s}_{0})\), an operation that could be flexibly implemented in downstream circuits. The choice rule is then choose s_{1} if \({\tilde{s}}_{1} > {\tilde{s}}_{2}\); otherwise, choose s_{2}. Thus, in addition to finding the optimal resource allocation function, the threshold τ is another latent variable to solve the reward maximization problem.
Before solving the optimization problem numerically, we note that the predictions for τ are relatively intuitive. In scenarios 1 and 2, τ should be set to the maximum stimulus value in the physical space, and the optimal resource allocation solutions remain the same as derived in our manuscript. In scenario 3, with reward values peaking in the middle of the stimulus distribution, the threshold will probably be located at τ ≈ 0.5 in our example in the encoding lownoise limit (not precisely at 0.5 due to biases and variance of \(\hat{s}\) and the related influence of the prior distribution of physical stimuli).
The numerical solutions of resource allocation for scenarios 1 and 2 resemble, as expected, the analytical solutions in which more resources are allocated to regions of the physical space with the highest physical prior density. The amount of information is larger for lower sensory values in scenario 1 but larger for higher sensory values in scenario 2. For scenario 3, the solution for the categorization threshold is τ ≈ 0.5, and the resource allocation solution may be surprising and perhaps at first counterintuitive (Fig. 7c). Taking a closer look at the problem, we see that the solution indeed makes sense. First, we observe that the allocation of resources has a general trend to decrease as the sensory stimulus gets larger, thus following the expected result given the shape of the prior distribution of sensory stimuli. Second, the resource allocation solution has a dip at around s = 0.5 ≈ τ. This may appear initially counterintuitive given that these are the regions where the reward is the highest. However, note that (1) randomly drawing choice sets from this nonmonotonic prior distribution is more likely to generate choice sets that are close in value than in the monotonic reward or accuracy scenarios, and (2) choice sets s_{1,2} with values close to s = 0.5 are more likely to generate ‘mistakes’ given the resource allocation in Fig. 7c, but there is often little reward loss because the value function is relatively flat and symmetric (for example, s_{1} = 0.52 and s_{2} = 0.48 deliver the same reward). It is thus not worth investing too many resources near s = 0.5 even if the reward promised at those locations is high, because the potential for reward loss is low (Fig. 7d). We emphasize that this example is just one alternative strategy, but one that generates interesting predictions that could be tested in future experiments.
Efficient resource allocation under reaction time costs
We used simulations to study the scenario in which agents are rewarded/penalized for short/long reaction times (RTs) in both the K_{acc} and K_{rew} contexts. The goal was to study whether and how resource allocation changed relative to the accuracy maximization task without RT costs. Examining this scenario requires assumptions about a process model that jointly generates decisions and RTs. For simplicity and illustration purposes, we assumed that decisions and RTs were generated by a simple driftdiffusion model (DDM) with a constant decision bound b, decision evidence z and diffusion noise σ that was independent of the choice set inputs, which can be thought of as a downstream decision noise (Methods).
In this scenario, the loss function for the K_{acc} context is given by
and the loss function for the K_{rew} context is given by
where η is the cost per RT unit (for example, in seconds). Note that as η → 0, the optimal decision bound would be z → ∞. Thus, the goal was to find the optimal balance between resource allocation and bound z that minimizes the loss in equations (5) and (6) for a given RT cost η and prior distribution of sensory stimuli in the environment f(s).
The numerical solutions revealed that the resource allocation solutions in context K_{acc} differed from the RTcostfree scenario and depended on RT costs (Fig. 8a). While the RTcost solutions were similar to the RTcostfree solution for relatively high values of η, the smaller the RT costs, the more the allocation of resources tended to flatten. As expected, we found that higher η resulted in lower b (Fig. 8b). Second, in context K_{rew}, the RTcost solutions were remarkably similar to the RTcostfree solution. However, contrary to the K_{acc} environment, the RTcost solutions in K_{rew} appeared to get steeper as the RT cost decreased (at least in the range of RT costs studied here; Fig. 8c). Once again, in context K_{rew}, higher η resulted in lower b (Fig. 8d).
We emphasize that the results presented here are based on a simple DDM with constant bounds. The resource allocation solutions may slightly differ for DDMs where the bounds are allowed to collapse or drift/diffusion parameters dynamically change over time^{51,52,53}. This will be an interesting aspect to investigate in future research. Irrespective of these considerations, we show how the general framework developed here generates a rich set of testable predictions that allow for falsification and further refinement of the theory.
Discussion
Our theoretical and empirical results provide evidence that early stages of sensory processing encode environmental stimuli to maximize fitness and not necessarily to maximize perceptual accuracy. We have shown this to be the case in humans and artificial agents with sensory processing bottlenecks. Our findings indicate that downstream circuits do not need to continuously compute reward distributions on the basis of stimulus–outcome associations because this information should be efficiently embedded in the neural codes of sensory perception. This notion is supported by recent studies showing that functional remapping of stimulus–reward contingencies in early sensory areas causally depends on topdown control signals from prefrontal structures^{17,18,54}. We argue that this gives the organism the advantages of preventing information loss and rapidly transmitting behaviourally relevant information encoded by early sensory systems to downstream circuits specialized in action, learning or decisionmaking.
Efficient sensory adaptation to behavioural goals can arise without longlasting synaptic changes or rewiring. Specific fitnessmaximizing codes may have a structural basis if the environment and behavioural goals are stable over long periods, as may be the case for retinal contrast coding in the blowfly. However, efficient filtering of sensory information can rapidly occur via mechanisms of topdown contextual modulation of sensory processing, which can be achieved via mechanisms such as topdown attentional normalization^{55}. In fact, it has been shown that adaptation to behaviourally relevant sensory statistics (such as edge orientations) can occur in the course of one hour in human participants^{56}. Our key argument is that irrespective of whether efficient coding occurs via structural, synaptic or online topdown contextual modulations, it must occur at early stages if it is to be relevant for goaldirected behaviour. Information theory predicts that inefficient coding in regard to behavioural goals will cause a loss of relevant information that cannot be recovered in noisy transmission channels such as the brain. Our experiments with ANNs provide direct empirical evidence for this prediction by showing that restricting the initial encoding scheme to one that maximizes information causes suboptimal performance in specific contexts. Overall, a key contribution of our work is that we provide a formal justification of why and how neural recoding should occur across contexts in capacityconstrained and noisy transmission systems to maximize reward and fitness.
We found additional supporting evidence for this hypothesis when reanalysing data from a recent human functional MRI (fMRI) study^{57}. Specifically, we investigated whether novel goaldirected actions that promote people’s ‘survival’ in hypothetical scenarios they had never before encountered triggered an efficient reorganization of perceptual codes in the human brain (Supplementary Note 2). Our analyses revealed that switching back and forth between survival goals that required participants to use the same items in very different ways led the brain to efficiently represent sensory information in a goalspecific manner. More specifically, novel behavioural goals that relied on object recognition caused changes in stimulus representations at early stages of sensory processing. Regions showing changes in stimulus representation codes included V1–V3 as well as downstream object detection areas such as the lateral occipital cortex (LOC) (Supplementary Note 2). We note that these results do not explicitly support the quantitative theory developed here but instead provide support for the general idea that a system should employ resources in its early sensory areas to represent abstract behavioural goals. In addition, these results do not imply that V1–V3 and LOC are discarding veridical feature information and instead represent only goaloriented values. Although veridicality might be compromised (resources are finite), strategies might be implemented to ensure that it is not entirely suppressed (for example, disentangling via orthogonalization^{19,20}).
Our study has some limitations and also generates interesting predictions that should be addressed in future research. First, the analytical solutions are restricted to accuracy maximization in discrimination tasks and reward maximization in the standard and most studied economic problem where properties of a good or action scale linearly with value. We acknowledge that when the system must deal with more complex sensory–reward mappings, the analytical solution to the resource allocation problem may not exist in a tractable form. Nevertheless, we provided some hints as to how the system can adapt to nonmonotonic solutions with the use of categorization thresholds. Second, we acknowledge that our theory does not explain the dynamics of adaptation but generates predictions once the system has adapted after learning from experience. It thus remains unclear what the normative algorithms of efficient adaptation might be and how these could be connected with a biologically plausible algorithm that applies to arbitrary stimulus–reward association contexts such that reward expectation is maximized. Third, for the problems of accuracy and reward maximization with linear sensory–reward mappings, our model predicts that in edge cases where the prior distribution is approximately flat, the optimal solutions are indistinguishable and the agent should allocate resources equally across the whole sensory space in both cases. Fourth, with regard to the previous point, an additional prediction appears worthy of future testing: if the prior density is low for low sensory values as well as for high sensory values, and there is a linear stimulus–reward mapping across the whole sensory space, our model predicts that, relative to the standard accuracy maximization task, sensitivity should also increase for low sensory values. Our model predicts that this effect should become more pronounced during a reward maximization task than during a standard discrimination task (a hint of this prediction can be found in Supplementary Fig. 1).
Beyond the obvious relevance for biological organisms, our results may have important implications in ongoing developments in artificial intelligence as well. Recent deep generative models show a remarkable ability to encode highdimensional signals into latent factors under the objective of accurately predicting the local environment with specific encoding constraints. However, on the basis of our results, such an optimization objective will not necessarily match those present in biological organisms. Interestingly, a recent successful artificial intelligence model^{58} proposed instead that representation formation should be driven by the need to predict the motivational value of experiences accurately. Our results validate this notion and imply that the development of artificial intelligence algorithms that aim to resemble neurobehavioural functions should go beyond the objective of maximizing only the accurate transmission of information and account for the motivational aspects of the environment that enable the organism (or the artificial agent) to maximize fitness.
Finally, although drawn from a different domain of behaviour, our results lend substantial support to economic theories positing that contextdependent utility functions should maximize expected reward rather than the expected accuracy of decisions guided by reward^{3,39,59}. The corroborating evidence presented in our work grounded on the principles of neural coding and decision behaviour should help advance the development and refinement of these theories within economics and related disciplines of evolutionary biology and social sciences^{12,60,61}.
Methods
Participants
The participants were recruited by the Center for Neuroeconomics at the University of Zurich, Switzerland. The participants were instructed about all aspects of the experiment and gave written informed consent. None of the participants suffered from any neurological or psychological disorder or took medication that interfered with participation in our study. The participants received fixed monetary compensation for their participation in the experiment, in addition to a variable monetary payoff that depended on task performance (see below). The experiments conformed to the Declaration of Helsinki, and the experimental protocol was approved by the Ethics Committee of the Canton of Zurich.
Participants who failed to follow the eye fixation instructions on more than 25% of trials were excluded from the data analysis (n = 12). We measured the performance of the participants in the training tasks and excluded participants who were unable to perform the task at the easiest difficulty level (n = 11). Additionally, we had to exclude three participants due to technical problems with the data collection. The final sample thus comprised n = 86 participants (n = 25 in experiment 1 and n = 61 in experiment 2 (30 in K_{rew})).
Experimental design and stimuli
The stimuli were generated with MATLAB (version 9.7)^{62}, using the Psychtoolbox and displayed on a screen that was one metre away from the participants. The angle of the head was kept stable with a chin rest. The height of the chin rest was adjusted to position the centre of the screen at the height of the eyes. As stimuli, we used oriented Gabor patches, presented on a grey background. Each patch was composed of a highcontrast threecyclesperdegree sinusoidal grating convoluted with a circular Gaussian with width 0.41° and subtended 2.98° vertically and 2.98° horizontally. In experiment 1, all Gabor patches were presented so that the centres fell 5.7° to the left or right of the centre of the monitor and on the horizontal midline. In experiment 2, the Gabor centres fell 4.7° to the left and right of the vertical midline and 4.7° above or below the horizontal midline.
Eye tracking
Eyetracking data were acquired using an ST Research Eyelink 1000 eyetracking system. Gaze position was sampled at 500 Hz. Eye movements away from fixation were computed for the window corresponding to the stimulus presentation. For every saved position, the absolute distance to the fixation cross was computed. If the absolute distance exceeded 4° of visual angle, the trial was marked to include an eye movement. For most participants, the average number of trials with eye movements was less than 5%. Participants (n = 12) who made eye movements that exceeded 4° of visual angle on more than 25% of trials were excluded from all analyses.
Experiment 1
The participants performed the experiment in multiple sessions to allow for training within the two contexts on different days. The order of the accuracy (K_{acc}) and reward (K_{rew}) context training was counterbalanced across participants. In total, every participant completed 240 trials in the estimation task and 400 trials in the decision task.
Experiment 2
In experiment 2, each participant trained in only one stimulus–reward association context (either K_{acc} or K_{rew}). Training in the binary judgement decision task was performed either in the two upper locations or in the two lower locations. The participants were randomly allocated to one of the two training locations. In the estimation tasks before and after the training task, the trial locations were evenly distributed between all four possible locations. In total, every participant completed 400 trials in the estimation task and 360 trials in the decision task.
Orientation estimation task
Before the start of every trial, the participants had to fixate on a cross in the middle of the screen. At the beginning of the trial, an arrow appeared for 0.5 seconds to indicate on which side the stimulus would be shown. Afterwards, the stimulus appeared on the indicated side for 0.6 seconds. The orientation of the stimulus was determined randomly within (0–179°). During stimulus presentation, the participant had to continue fixating on the cross. After the stimulus disappeared, a Gabor patch appeared in the middle of the screen. By pressing and holding the left mouse button, the participant then rotated the new Gabor patch until its perceived orientation matched the orientation of the previously observed target stimulus. The participant could end the trial by pressing the space key. After five seconds, the trial ended automatically. The trials were separated by a random intertrial interval of 1.5–2 seconds. The estimation task took place before and after the decision task (see below and Fig. 2). To avoid the possibility that participants developed contextual strategies, they were not informed in advance that a second estimation task was going to take place after the decision task.
Decision task
The fixation cross turned black to indicate the start of a trial. After 0.5 seconds, two Gabor patch stimuli appeared. The orientation of one of the stimuli was drawn from the approximate distribution of edges in the real world^{42}. The orientation of the second stimulus was adjusted by a participantspecific difficulty score to keep performance at approximately 75% accuracy for all participants. The median accuracy across participants in K_{rew} was 77 ± 2.9% and in K_{acc} was 77 ± 2.8%. Additionally, on the basis of (1) calibration to 75% accuracy, (2) the linear mapping between the degree of diagonality and reward (that is, from 1 Swiss franc (CHF) for 0° to CHF 46 for 45° in the diagonality space), and (3) pilot data, we adjusted the payoff of correct trials in K_{acc} to match the expected payoff in K_{rew}. We calculated that setting the payoff for each correct response in K_{acc} to 15 CHF would fulfil these conditions. Our experimental data were in line with these calculations: the median payoff in K_{acc} was 15.00 ± 0 CHF, and in K_{rew} it was 14.70 ± 0.62 CHF.
On average, the stimulus orientation followed a prior distribution f(s) described by equation (7) and shown in Fig. 3a:
The stimuli were displayed for 0.6 seconds. During stimulus presentation, the participants had to fixate on the cross in the middle of the screen. When the stimuli disappeared, the participants had 2.5 seconds to decide which stimulus was more oblique. Independent of the RT, the full 2.5 seconds had to be waited out. Afterwards, the two stimuli were shown again in their positions, and the result of the choice and the orientations of the stimuli were displayed for 3 seconds until the trial ended. The trials were separated by a 1.5to2second intertrial interval.
Blowfly retinal LMC experiment
Here we provide a brief description of the data collected in Laughlin’s seminal work^{36}, which we reanalyse in this work. To derive the prior for the sensory stimulus of interest f(s), the researcher measured the distribution of contrasts that occur in woodland settings of the blowfly environment. In brief, photographs were taken in the natural habitat of the blowfly such as sclerophyll woodland and lakeside lands. Relative intensities were measured across these scenes using a detector that scanned horizontally, like the ommatidium of a turning fly. The scans were digitized at intervals of 0.07° and convolved with a Gaussian point spread function of halfwidth 1.4°, corresponding to the angular sensitivity of a fly photoreceptor. Contrast values were obtained by dividing each scan into intervals of 10, 25 or 50°. Within each interval, the mean intensity (\(\bar{I}\)) was found and subtracted from every data point to give the fluctuation about the mean (ΔI). This difference value was divided by the mean to give the contrast (\(\Delta I/\bar{I}\)).
These data were used to construct a histogram, which was later transformed to a CDF (Fig. 1a and Supplementary Fig. 1). Here we used this CDF to reconstruct the probability density function f(s) (Supplementary Fig. 1). Once the prior distribution was obtained, the fly was placed in front of a screen with a lightemitting diode (LED). At the beginning of each trial, the LED luminance was set to the screen luminance and then changed to a new luminance drawn from the prior distribution f(s) for 100 ms. The stimulus s was defined as the proportional change of the difference between the background and LED luminances. We emphasize that the CDF of the contrast statistic comes directly from the contrast measurement methodology described in the preceding paragraph and reported by Laughlin. We thus did not make the original calculations for the prior f(s), nor is it influenced by the fitnessmaximizing sensory coding theory.
Fitnessmaximizing neural codes
In this section, we provide a detailed description of the connection between the L_{p} reconstruction error, the efficient code that maximizes reward expectation and the powerlaw efficient codes briefly described in the main text.
Suppose that the stimulus distribution is given by s ~ f(s). The function that transforms the input s to neural responses r is given by r = h(s). While the mapping h(s) is deterministic, here we assume that errors in the neural response r follow a distribution P[rh(s)]. We apply a general approach that considers optimality criteria accounting for how well stimulus s can be reconstructed (\(\hat{s}\)) from the neural representations r. Wang and colleagues introduced a general formulation of the efficient coding problem in terms of minimizing the error in such reconstructions \(\hat{s(r)}\) according to the L_{p} norm as a function of the norm parameter p (ref. ^{63}). In brief, the reconstruction is assumed to be based on the maximum likelihood estimate of the decoder in the lownoise regime, where P[rh(s)] is assumed to be Gaussian distributed.
The goal is to find the optimal mapping function h*(s) to achieve a minimal L_{p} reconstruction error for any given prior stimulus distribution f(s). More formally, the problem is defined as: find h*(s) such that
where, without loss of generality, we assume that the operation range of the neuron is bounded between 0 and 1. It is possible to show that the optimal mapping h*(s) is given by equation (9)^{63}:
If we define
we observe that the normalized power function of the stimulus distribution f in equation (9) is the escort distribution with parameter γ (ref. ^{64}). Note that under this framework, infomax coding is given by the norm parameter p → 0, and therefore γ = 1, thus leading to the result that h(s) is the CDF of the prior distribution.
Efficient L _{p} errorminimizing codes and behavioural goals
Economics has a long tradition of studying the following problem: for a given distribution f(s) in the environment, what is the optimal shape of the internal representation (that is, h(s), which in economics is known as the utility function) if such function can only take a large but limited set of n discrete subjective values (that is, the internal readings, r) that code for any given stimulus s (refs. ^{3,39})? The utility function is thus restricted to a set of step functions with n jumps, each corresponding to a utility increment of size 1/n. In this case, discrimination errors originate from the fact that the organism cannot distinguish two alternatives located at the same step of the utility function. Under this formulation, the following variant of the problem was studied: find the optimal utility function (h*) under two evolutionary optimization criteria, (1) the probability of mistakes minimization criterion and (2) the expected reward loss minimization criterion.
To solve this problem, we assume that the organism repeatedly makes choices between two alternatives drawn from the stimulus distribution f(s), where we may suppose that stimuli are linearly mapped to a reward value. The organism is endowed with a utility function that assigns a level of reward to each possible stimulus s from f(s). The alternative that promises more utility to the organism is chosen^{39}.
If the goal of the organism is to minimize the number of erroneous responses (that is, maximize discrimination accuracy), the optimal utility function \({h}_{{{{\rm{accuracy}}}}}^{* }\) is given by
According to this solution, the power parameter of the escort distribution in equation (9) is given by γ = 1, which corresponds to the infomax strategy.
However, if the goal of the organism is to minimize the expected reward loss (that is, maximize the amount of reward received after many decisions) and stimuli are linearly mapped to reward value, the optimal utility function \({h}_{{{{\rm{reward}}}}}^{* }\) is given by
According to this solution, the power parameter of the escort distribution in equation (9) is given by γ = 2/3, which corresponds to optimizing the L_{p} minimization problem with parameter p given by
We found that this normative fitnessmaximizing solution is the error penalty that best describes the LMC data^{40} (these results are reported in the main text and Fig. 1). Additionally, please note that the solutions provided in equations (11) and (12) are derived on the basis of maximizing the accurate choices and reward expectation, respectively, without any assumptions about maximizing information efficiency as a goal in itself.
Connection to powerlaw efficient codes
We employed a general method for defining efficient codes by investigating optimal allocation of Fisher information J given (1) a bound of the organism’s capacity c to process information, (2) the frequency of occurrence f(s) and (3) the organism’s goal (for example, maximize perceptual accuracy or expected reward) according to
subject to a capacity bound
with parameters α defining the coding objective and β > 0 specifying the capacity constraint^{43}. The solution of this optimization problem reveals that Fisher information should be proportional to the prior distribution f(s) raised to a power q, which is therefore referred to as the powerlaw efficient code
where q = 1/(β + α) and γ = β/(β + α). Note that powerlaw parameter q is multiply determined, and to make progress in identifying it, we need to make some further assumptions. Here we opted for setting β = 0.5, as previously proposed in the standard infomax framework^{41}; however, our conclusions are not affected by the specific value of β. This means that α determines how Fisher information is allocated relative to the prior, influencing the values of both q and γ. It can be shown that the infomax coding rule implies γ = 1 and therefore an efficient powerlaw code q = 2, and the reward expectation rule implies γ = 2/3 and therefore an efficient powerlaw code q = 4/3 (Supplementary Note 1). The powerlaw efficient codes thus allow us to establish a connection between behavioural goals in the contexts studied in this work (K_{acc} and K_{rew}) and parameter γ, which incorporates the goals of the organism under the resourceconstrained framework that we study here.
Optimal inference
When specifying an inference problem using such an encoding–decoding framework, a key aspect for generating predictions of decision behaviour is to obtain expressions of the expected value and variance of the noisy estimations \(\hat{s}\) for a given value input s_{0}. However, we first need to specify the encoding and decoding rules. We adopted an encoding function P(rs) associated with the powerlaw efficient code that is parameterized as Gaussian^{43}
and therefore Fisher information is allocated using an sdependent variance \({\sigma }^{2}=1/kf{\left(s\right)}^{q}\). While we are aware that in our study the stimulus space is circular, given that discriminability thresholds are relatively low for orientation discrimination tasks in humans, it is safe to assume that the likelihood function can be locally approximated as a Gaussian distribution.
At the decoding stage, the observer computes the posterior using Bayes’s rule:
Theoretical and empirical evidence suggests that for orientation estimation tasks, estimates are typically biased away from the prior. This suggests that humans employ an expected value estimator of the posterior, at least for the infomax case^{41}.
The expected value of the estimator can be defined as the input stimulus s_{0} plus some average bias b(s_{0}). Using analytical approximations under the highsignaltonoise regime, it is possible to show that the bias for the posterior expected value estimator can be approximated by^{65}
In a previous study, using model simulations and exploring parsimonious functional forms, it was shown that the proportionality constant of the bias term can be approximated by^{43}
The analytical solution and the simulationbased solution of the proportionality constant are approximately equivalent for a range of q values relevant to our work (for example, q ∈ [0.5, 2]); that is
thus validating the results derived in the analytical approximations that we used in the current work. However, using either function does not affect the qualitative or quantitative results in our study.
Using this result, the expected value of the estimators is given by
As already defined in the description of the behavioural task, in this study, we used a parametric form of the prior that closely resembles the shape of the natural distribution of orientations in the environment^{42}
with a > 1 determining the elevation (steepness) of the prior, and ω a normalizing constant. Using this parameterization of the prior, we can obtain an explicit analytical approximation of the bias:
We can also obtain an analytical approximation of the variance under the highsignaltonoise regime using the Cramer–Rao bound formulation:
We can thus use equations (24) and (25) to derive the predictions presented in Fig. 3.
Finally, assuming that the estimators are normally distributed using the expected value and variance derived above, the probability that an agent chooses an alternative with orientation value s_{1} over a second alternative with orientation value s_{2} (recall that in our experiment the decision rule (objective) of the participants is to choose the orientation perceived as closer to the diagonal orientation) is given by
where Φ() is the CDF of the normal distribution. When fitting the choice data to the model, we accounted for potential side (left/right) biases β_{0} and lapse rates λ in the decision task using
Fitting the powerlaw efficient model to human data
To fit the powerlaw efficient coding model to the choice data from the decision task, we used a hierarchical Bayesian model. We fit the early (1–200) and late (>200) training trials in each reward context separately. Posterior inference of the parameters in the hierarchical models was performed via the Gibbs sampler using the Markov chain Monte Carlo technique implemented in JAGS^{66}, assuming flat priors for both the mean and the noise of the estimates. For each model, we drew a total of 20,000 burnin samples and subsequently took 5,000 new samples from three independent chains. We applied a thinning of 5 to this final sample, thus resulting in a final set of 3,000 samples for each parameter. We conducted Gelman–Rubin tests for each parameter to assess convergence of the chains. All latent variables in our Bayesian models had \(\hat{R} < 1.05\), which suggests that all three chains converged to a target posterior distribution. We checked convergence of the grouplevel parameter estimates via visual inspection.
Behavioural and statistical analyses
In the estimation task, the observers’ behavioural error on a given trial was computed as the difference between the reported orientation and the presented orientation. The direction of the error was defined as positive if the reported orientation was more oblique than the presented orientation, or negative if vice versa. If the error on any given trial was bigger than 25% of the maximum possible error (90 degrees), we discarded that trial. To make full use of the data, we pooled all participants from both experiments for the analysis of the impact of the reward training context. Comparisons between trained and untrained locations used only the data from the locationspecific training in experiment 2.
We computed the average bias and variance in five bins of 9° before and after the training phases. Next, we computed the average change in the variance in each bin for each participant. We used the changes in variance within the most cardinal and most oblique bins to test for the predicted interactions between diagonality and training type (K_{acc} or K_{rew}) or location (trained or untrained) using Bayesian hierarchical linear regressions implemented with the brms package (version 2.13.5)^{67} in the statistical computing software R (version 3.6.3)^{68}. For each model, we used four chains with 2,000 samples per chain after burnin. The P_{MCMC} values reported for these regressions represent one minus the probability of the reported effect being greater (less) than zero given the posterior distributions of the fitted model parameters.
We also compared the performance of participants in the binary judgement decision task using Bayesian hierarchical regressions implemented with brms in R. In this task, the participants had to decide which of two stimuli were more diagonal (closer to 45 degrees). We compared the accuracy of these decisions as a function of diagonality, training phase (early or late) and training type (K_{acc} or K_{rew}). We used four chains with 1,000 samples per chain after burnin for a total of 4,000 posterior samples for each regression parameter. The P_{MCMC} values were computed in the same fashion as described above for the estimation task.
ANNs
Suppose that we have a dataset of x samples from a distribution of images represented by the retina where each image indicates an angular orientation s with an angular prior distribution p(s). Note that a key feature of our analyses is that knowledge about this angular prior is not explicitly given to the neural network; this prior is embedded in the statistics of image occurrences over space and time. Also note that there might be different images x_{s} that can be mapped to the same angle s_{0} (for example, a Gabor patch with identical angle but different angular phases). Each stimulus is encoded by a set of latent codes (or a latent neural distributional code) z with a prior distribution p(z). This prior distribution results in a posterior distribution p(zx) after observing image x. The neural coding system should thus learn a good representation of the environment (the distribution of physical sensory inputs) that might also need to be optimized for a particular downstream task (for example, maximize the reward consumption resulting from decision y). More specifically, we propose a VIBlike objective function (equation (4) in main text). In our ANN, the VIBlike objective trades (an approximation of) the amount of ‘visual’ information I that the encoder can process with the expected reward loss, via the regularization parameter β. Higher values of β thus introduce extra pressures in the network to encode information about the input image that can yield the most significant improvement in the downstream objective function. The neural network received two retinal images corresponding to screen locations where the two Gabor patches were presented in our task. We note that when training the ANN, the parameters of the encoder ϕ are shared for both retinal locations where the stimuli x_{1,2} are presented. The decision rule that the neural network has to learn is to indicate which of the two input stimuli (left or right) is more diagonal, while maximizing the reward received across many trials. Also like in the human experiments, we trained networks in two contexts: K_{acc} and K_{rew}.For all VIBlike objectives studied here, we define the regularized ‘information transmission’ I as
where D_{KL} is the Kullback–Leibler divergence. In context K_{acc}, the reward loss in the VIBlike objective is defined as
with y = 1 when the correct response is given by stimulus input x_{1}, and y = 0 otherwise. p(y = 1z_{1}, z_{2}) is the probability that the network chooses x_{1} given the encoding vectors z_{1,2}.
In context K_{rew}, the reward loss in the VIBlike objective is defined as
which is identical to the reward loss in the K_{acc} VIBlike objective function, except that the probability of an erroneous ANN decision is weighted by the absolute value of the difference in the cardinality values s(x_{1}) and s(x_{2}). The ANNs trained with VIBlike objective functions thus penalize reward loss following the K_{acc} and K_{rew} objectives employed in the analytical solutions (see equations (1) and (2) in the main text).
All networks tested here used layers that are standard in the machine learning literature. Each retinal input network consisted of convolutional 4 × 4 kernels, with a stride of two. In the results presented in this work, we used four filters, but we found that our results are largely insensitive to the number of filters used. We also investigated a fully connected input layer with different sizes (50–200 neurons), which led to nearly identical results and conclusions. The stochastic encoder has the form
where g_{e} is a fully connected layer that receives as input the output of the retinal layer, where g_{e} outputs the Kdimensional mean vectors μ of z as well as the K × K covariance matrix Σ. In the results presented here, we use K = 4, but our results are similar for a range of K values from 2 to 16. We used the reparameterization trick to write p(zx)dz = p(ϵ)dϵ, where z = g(x, ϵ) is a deterministic function of x and the Gaussian random variable ϵ. The noise is thus independent of the parameters of the network, and it is possible to take gradients that optimize the objective function in equation (4). The downstream integration network consisted of a fully connected network that receives as input the values of the noisy encoder z for each retinal input. The size of this layer for the results presented here is 20, but the main conclusions of our analyses are insensitive to the size of this layer. Finally, the decision module was a single sigmoidal unit indicating the selection of the left or right stimulus. All hidden units used rectifiedlinear activations. The networks were trained with Adam optimization with a learning rate of 0.0001.
To compute the Fisher information of the encoder, we first generated 500 inputs for each orientation stimulus s in the cardinality space from 0° to 45° in steps of 0.5°. We computed the empirical expected value vector
By rescaling the responses z_{i}(s) such that the noise has unit variance, without loss of generality, the Fisher information J can be expressed as
Resource allocation with RT costs
We used simulations to study the scenario in which agents are rewarded for short RTs in both the K_{acc} and K_{rew} contexts. Examining this scenario requires assumptions about a process model that jointly generates decisions and RTs. We assumed that decisions and RTs T are generated by a simple DDM with a constant decision bound b, decision evidence z and diffusion noise σ that is independent of the choice set inputs, which can be thought of as a downstream decision noise. In the DDM, the data generation process does not change if we set, for instance, σ to a constant. Here we set σ = 1. Following the notation in our work, we define the decision evidence z(s_{1}, s_{2}) for the choice set s_{1,2}
where J(s) is Fisher information, which determines resource allocation. To find the optimal resource allocation, we define
with the property
where \(\tilde{F}\) is defined as the CDF of \(\tilde{f}.\) Here we set k sufficiently high such that the lownoise limit property holds, and we numerically find \(\tilde{f}\) (ref. ^{69}).
In the standard DDM, the probability of an erroneous response is given by (for simplicity, we approximate the normal CDF of equation (26) with the logit function corresponding to the analytical solution of the DDM; this approximation does not change the qualitative conclusions of our results)
and the expected RT is given by^{70}
In this scenario, the loss function for the K_{acc} context is given by equation (5) in the main text, and the loss function for the K_{rew} context is given by equation (6) in the main text. Note that as η → 0, the optimal decision bound would be z → ∞. The goal is thus to find the optimal balance between resource allocation J(s) and optimal bound z that minimizes the loss functions for a given RT cost η and for the prior distribution of sensory stimuli in the environment.
Representational similarity analyses of human fMRI data
We conducted additional conjunction analyses on the wholebrain maps of representational similarity for identity and usefulness that were originally computed by Castegnetti and colleagues^{57}. We obtained the thresholded (FWE P < 0.05) wholebrain maps from Castegnetti and colleagues and computed conjunctions between the identity and usefulness contrasts, as well as usefulness and independently defined masks of the LOC and primary visual areas V1–V3 to create the figure in Supplementary Note 2. The LOC mask was obtained from the fMRI metaanalysis tool Neurosynth (neurosynth.org) with the keyword ‘Lateral Occipital Cortex’ and thresholded at the Neurosynth default of P < 0.01 (FDRcorrected). The V1–V3 masks were extracted from the JulichBrain Cytoarchitectonic Atlas and thresholded at 50% probability. The LOC and V1–V3 masks were then conjoined with the clustercorrected statistical map of usefulness representations. For the full details about the fMRI data analyses, see Supplementary Note 2.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The behavioural data are available at https://osf.io/an274/. The LOC mask was obtained from the fMRI metaanalysis tool Neurosynth (neurosynth.org). The V1–V3 masks were extracted from the JulichBrain Cytoarchitectonic Atlas (julichbrainatlas.de)
Code availability
The analysis code is available at https://osf.io/an274/.
References
Barlow, H. B. Possible principles underlying the transformations of sensory messages. In Sensory Communication (ed. Rosenblith, W. A.) 217–233 (MIT Press, 1961).
Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 61, 183–193 (1954).
Robson, A. J. The biological basis of economic behavior. J. Econ. Lit. 39, 11–33 (2001).
Salinas, E. How behavioral constraints may determine optimal sensory representations. PLoS Biol. 4, e387 (2006).
Rustichini, A., Conen, K. E., Cai, X. & PadoaSchioppa, C. Optimal coding and neuronal adaptation in economic decisions. Nat. Commun. 8, 1208 (2017).
Młynarski, W. F. & Hermundstad, A. M. Adaptive coding for dynamic sensory inference. eLife 7, 189506 (2018).
Heng, J. A., Woodford, M. & Polania, R. Efficient sampling and noisy decisions. eLife https://doi.org/10.7554/eLife.54962 (2020).
McKay, R. T. & Dennett, D. C. The evolution of misbelief. Behav. Brain Sci. 32, 493–510 (2009).
Searle, J. R. Seeing Things as They Are: A Theory of Perception (Oxford Univ. Press, 2015).
Berke, M. D., WalterTerrill, R., JaraEttinger, J. & Scholl, B. J. Flexible goals require that inflexible perceptual systems produce veridical representations: implications for realism as revealed by evolutionary simulations. Cogn. Sci. 46, e13195 (2022).
Jackson, F. Perception: A Representative Theory (CUP Archive, 1977).
Hoffman, D. D., Singh, M. & Prakash, C. The interface theory of perception. Psychon. Bull. Rev. 22, 1480–1506 (2015).
Prakash, C., Stephens, K. D., Hoffman, D. D., Singh, M. & Fields, C. Fitness beats truth in the evolution of perception. Acta Biotheor. 69, 319–341 (2021).
Shuler, M. G. & Bear, M. F. Reward timing in the primary visual cortex. Science 311, 1606–1609 (2006).
Stănişor, L., van der Togt, C., Pennartz, C. M. A. & Roelfsema, P. R. A unified selection signal for attention and reward in primary visual cortex. Proc. Natl Acad. Sci. USA 110, 9136–9141 (2013).
Poort, J. et al. Learning enhances sensory and multiple nonsensory representations in primary visual cortex. Neuron 86, 1478–1490 (2015).
Banerjee, A. et al. Valueguided remapping of sensory cortex by lateral orbitofrontal cortex. Nature 585, 245–250 (2020).
Norman, K. J. et al. Posterror recruitment of frontal sensory cortical projections promotes attention in mice. Neuron 109, 1202–1213.e5 (2021).
Libby, A. & Buschman, T. J. Rotational dynamics reduce interference between sensory and memory representations. Nat. Neurosci. https://doi.org/10.1038/s41593021008219 (2021).
Avitan, L. & Stringer, C. Not so spontaneous: multidimensional representations of behaviors and context in sensory areas. Neuron 110, 3064–3075 (2022).
Shannon, C. E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec. 4, 142–163 (1959).
Friston, K. The freeenergy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Ortega, P. A. & Braun, D. A. Thermodynamics as a theory of decisionmaking with informationprocessing costs. Proc. R. Soc. A https://doi.org/10.1098/RSPA.2012.0683 (2013).
Smith, E. C. & Lewicki, M. S. Efficient auditory coding. Nature 439, 978–982 (2006).
Sharpee, T. O. et al. Adaptive filtering enhances information transmission in visual cortex. Nature 439, 936–942 (2006).
Fairhall, A. L., Lewen, G. D., Bialek, W. & De Ruyter van Steveninck, R. R. Efficiency and ambiguity in an adaptive neural code. Nature 412, 787–792 (2001).
Sims, C. R. Efficient coding explains the universal law of generalization in human perception. Science 360, 652–656 (2018).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).
Polanía, R., Woodford, M. & Ruff, C. C. Efficient coding of subjective value. Nat. Neurosci. 22, 134–142 (2019).
Zaslavsky, N., Kemp, C., Regier, T. & Tishby, N. Efficient compression in color naming and its evolution. Proc. Natl Acad. Sci. USA 115, 7937–7942 (2018).
Bhui, R. & Gershman, S. J. Decision by sampling implements efficient coding of psychoeconomic functions. Psychol. Rev. 125, 985–1001 (2018).
Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N. Y. Acad. Sci. 1251, 13–32 (2012).
Sims, C. R., Jacobs, R. A. & Knill, D. C. An ideal observer analysis of visual working memory. Psychol. Rev. 119, 807–830 (2012).
van den Berg, R. & Ma, W. J. A resourcerational theory of set size effects in human visual working memory. eLife https://doi.org/10.7554/eLife.34963 (2018).
Laughlin, S. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C 36, 910–912 (1981).
De Ibarra, N. H. & Giurfa, M. Discrimination of closed coloured shapes by honeybees requires only contrast to the long wavelength receptor type. Anim. Behav. 66, 903–910 (2003).
Wehner, R. Handbook of Sensory Physiology (ed. Autrum, H.) 287–616 (Springer Berlin, 1981).
Netzer, N. Evolution of time preferences and attitudes toward risk. Am. Econ. Rev. 99, 937–955 (2009).
Park, I. M. & Pillow, J. W. Bayesian efficient coding. Preprint at bioRxiv https://doi.org/10.1101/178418 (2017).
Wei, X.X. & Stocker, A. A. A Bayesian observer model constrained by efficient coding can explain ‘antiBayesian’ percepts. Nat. Neurosci. 18, 1509–1517 (2015).
Girshick, A. R., Landy, M. S. & Simoncelli, E. P. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat. Neurosci. 14, 926–932 (2011).
Morais, M. J. & Pillow, J. W. Powerlaw efficient neural codes provide general link between perceptual bias and discriminability. Adv. Neural Inf. Process. Syst. 2018, 5071–5080 (2018).
Storrs, K.R. and Kriegeskorte, N. Deep learning for cognitive neuroscience. Preprint arXiv https://doi.org/10.48550/arXiv.1903.01458 (2019).
Alemi, A. A., Fischer, I., Dillon, J. V. & Murphy, K. Deep variational information bottleneck. In 5th International Conference on Learning Representations, ICLR 2017—Conference Track Proceedings (ICLR, 2016).
Tishby, N. & Zaslavsky, N. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop, ITW 2015 (IEEE, 2015); https://doi.org/10.1109/ITW.2015.7133169
Burgess, C. P. et al. Understanding disentangling in βVAE. Psychol. Rev. 127, 891–917 (2018).
Bates, C. J. & Jacobs, R. A. Efficient data compression in perception and perceptual memory. Psychol. Rev. https://doi.org/10.1037/rev0000197 (2020).
Tkačik, G. & Bialek, W. Information processing in living systems. Annu. Rev. Condens. Matter Phys. https://doi.org/10.1146/annurevconmatphys031214014803 (2016).
Anderson, B. L. Where does fitness fit in theories of perception? Psychon. Bull. Rev. 22, 1507–1511 (2015).
Tajima, S., Drugowitsch, J. & Pouget, A. Optimal policy for valuebased decisionmaking. Nat. Commun. 7, 12400 (2016).
Fudenberg, D., Strack, P. & Strzalecki, T. Speed, accuracy, and the optimal timing of choices. Am. Econ. Rev. 108, 3651–3684 (2018).
Hébert, B. & Woodford, M. Rational inattention when decisions take time. J. Econ. Theory 208, 105612 (2023).
Liu, Y., Xin, Y. & Xu, N.l. A cortical circuit mechanism for structural knowledgebased flexible sensorimotor decisionmaking. Neuron https://doi.org/10.1016/j.neuron.2021.04.014 (2021).
Reynolds, J. H. & Heeger, D. J. The normalization model of attention. Neuron 61, 168–185 (2009).
Bates, C. J., Lerch, R. A., Sims, C. R. & Jacobs, R. A. Adaptive allocation of human visual working memory capacity during statistical and categorical learning. J. Vis. 19, 11 (2019).
Castegnetti, G., Zurita, M. & De Martino, B. How usefulness shapes neural representations during goaldirected behavior. Sci. Adv. https://doi.org/10.1126/SCIADV.ABD5363 (2021).
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015).
Juechems, K., Balaguer, J., Spitzer, B. & Summerfield, C. Optimal utility and probability functions for agents with finite computational precision. Proc. Natl Acad. Sci. USA 118, e2002232118 (2021).
Robson, A. J. & Samuelson, L. in Handbook of Social Economics Vol. 1 (eds Benhabib, J. et al.) 221–310 (Elsevier, 2011); https://doi.org/10.1016/B9780444531872.000073
Jerison, H. Evolution of the Brain and Intelligence (Elsevier, 2012).
MATLAB v.9.7 (R2019b) (The MathWorks Inc., 2019).
Wang, Z., Stocker, A. A. & Lee, D. D. Efficient neural codes that minimize L_{p} reconstruction error. Neural Comput. 28, 2656–2686 (2016).
Bercher, J. F. Source coding with escort distributions and Rényi entropy bounds. Phys. Lett. A 373, 3235–3238 (2009).
PratCarrabin, A. & Woodford, M. Bias and variance of the Bayesianmean decoder. Adv. Neural Inf. Process. Syst. 34, 23793–23805 (2021).
Plummer, M. et al. JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proc. 3rd International Workshop on Distributed Statistical Computing (DSC 2003) (eds. Hornik, K. et al.) 20–22 (DSC, 2003).
Bürkner, P.C. brms: an R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80, 1–28 (2017).
R Core Team. R: A Language and Environment for Statistical Computing, version 3.6.3 (R Foundation for Statistical Computing, 2020).
Grujic, N., Brus, J., Burdakov, D. & Polania, R. Rational inattention in mice. Sci. Adv. 8, 8935 (2022).
Bogacz, R., Brown, E., Moehlis, J., Holmes, P. & Cohen, J. D. The physics of optimal decision making: a formal analysis of models of performance in twoalternative forcedchoice tasks. Psychol. Rev. 113, 700–765 (2006).
Acknowledgements
We thank N. Netzer, M. Woodford and A. Stocker for providing helpful feedback on the manuscript text. We thank M. Zurita and B. De Martino for sharing their human fMRI data for reanalysis. J.S. and S.D.B. acknowledge support from MarlenePorsche Foundation scholarships for their PhD studies. This work was supported by a European Research Council (ERC) starting grant (ENTRAINER) to R.P. This project has received funding from the ERC under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 758604). T.A.H. received support from the Swiss National Science Foundation (SNSF) (grant no. 32003B_166566). P.N.T. received support from the SNSF (grant nos 100019_176016 and 10001C_188878). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Funding
Open access funding provided by University of Zurich
Author information
Authors and Affiliations
Contributions
P.N.T., T.A.H. and R.P. designed the study. J.S. and S.D.B. collected the data. J.S., S.D.B., T.A.H. and R.P. analysed the data. All authors interpreted the results and wrote the manuscript. P.N.T., T.A.H. and R.P. acquired the funding.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Human Behaviour thanks Katherine Storrs and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Tables 1–5, and Notes 1 and 2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Schaffner, J., Bao, S.D., Tobler, P.N. et al. Sensory perception relies on fitnessmaximizing codes. Nat Hum Behav 7, 1135–1151 (2023). https://doi.org/10.1038/s4156202301584y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s4156202301584y
This article is cited by

Better models of human highlevel visual cortex emerge from natural language supervision with a large and diverse dataset
Nature Machine Intelligence (2023)