Nature Neuroscience  Article
Decoding subjective decisions from orbitofrontal cortex
 Erin L Rich^{1}^{, }
 Jonathan D Wallis^{1, 2}^{, }
 Journal name:
 Nature Neuroscience
 Volume:
 19,
 Pages:
 973–980
 Year published:
 DOI:
 doi:10.1038/nn.4320
 Received
 Accepted
 Published online
Abstract
When making a subjective choice, the brain must compute a value for each option and compare those values to make a decision. The orbitofrontal cortex (OFC) is critically involved in this process, but the neural mechanisms remain obscure, in part due to limitations in our ability to measure and control the internal deliberations that can alter the dynamics of the decision process. Here we tracked these dynamics by recovering temporally precise neural states from multidimensional data in OFC. During individual choices, OFC alternated between states associated with the value of two available options, with dynamics that predicted whether a subject would decide quickly or vacillate between the two alternatives. Ensembles of valueencoding neurons contributed to these states, with individual neurons shifting activity patterns as the network evaluated each option. Thus, the mechanism of subjective decisionmaking involves the dynamic activation of OFC states associated with each choice alternative.
Subject terms:
At a glance
Figures
References
 PadoaSchioppa, C. Neurobiology of economic choice: a goodbased model. Annu. Rev. Neurosci. 34, 333–359 (2011).
 Rangel, A., Camerer, C. & Montague, P.R. A framework for studying the neurobiology of valuebased decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).
 Rushworth, M.F., Mars, R.B. & Summerfield, C. General mechanisms for making decisions? Curr. Opin. Neurobiol. 19, 75–83 (2009).
 Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 1292–1298 (2010).
 Dai, J. & Busemeyer, J.R. A probabilistic, dynamic, and attributewise model of intertemporal choice. J. Exp. Psychol. Gen. 143, 1489–1514 (2014).
 Wallis, J.D. Crossspecies studies of orbitofrontal cortex and valuebased decisionmaking. Nat. Neurosci. 15, 13–19 (2012).
 Rudebeck, P.H. & Murray, E.A. The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes. Neuron 84, 1143–1156 (2014).
 Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953–956 (2012).
 Fellows, L.K. Orbitofrontal contributions to valuebased decision making: evidence from humans with frontal lobe damage. Ann. NY Acad. Sci. 1239, 51–58 (2011).
 PadoaSchioppa, C. Neuronal origins of choice variability in economic decisions. Neuron 80, 1322–1336 (2013).
 Doya, K. Modulators of decision making. Nat. Neurosci. 11, 410–416 (2008).
 Sugrue, L.P., Corrado, G.S. & Newsome, W.T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).
 Lak, A. et al. Orbitofrontal cortex is required for optimal waiting based on decision confidence. Neuron 84, 190–201 (2014).
 Shimojo, S., Simion, C., Shimojo, E. & Scheier, C. Gaze bias both reflects and influences preference. Nat. Neurosci. 6, 1317–1322 (2003).
 Bouret, S. & Richmond, B.J. Ventromedial and orbital prefrontal neurons differentially encode internally and externally driven motivational values in monkeys. J. Neurosci. 30, 8591–8601 (2010).
 Churchland, M.M., Yu, B.M., Sahani, M. & Shenoy, K.V. Techniques for extracting singletrial activity patterns from largescale neural recordings. Curr. Opin. Neurobiol. 17, 609–618 (2007).
 Grattan, L.E. & Glimcher, P.W. Absence of spatial tuning in the orbitofrontal cortex. PLoS One 9, e112750 (2014).
 Rich, E.L. & Wallis, J.D. Mediallateral organization of the orbitofrontal cortex. J. Cogn. Neurosci. 26, 1347–1362 (2014).
 Wallis, J.D. & Miller, E.K. Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur. J. Neurosci. 18, 2069–2081 (2003).
 Hosokawa, T., Kennerley, S.W., Sloan, J. & Wallis, J.D. Singleneuron mechanisms underlying costbenefit analysis in frontal cortex. J. Neurosci. 33, 17385–17397 (2013).
 PadoaSchioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223–226 (2006).
 Rustichini, A. & PadoaSchioppa, C. A neurocomputational model of economic decisions. J. Neurophysiol. 114, 1382–1398 (2015).
 Hunt, L.T. et al. Mechanisms underlying cortical activity during valueguided choice. Nat. Neurosci. 15, 470–476 (2012).
 Jocham, G., Hunt, L.T., Near, J. & Behrens, T.E. A mechanism for valueguided choice based on the excitationinhibition balance in prefrontal cortex. Nat. Neurosci. 15, 960–961 (2012).
 Wang, X.J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36, 955–968 (2002).
 Usher, M. & McClelland, J.L. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108, 550–592 (2001).
 Georgopoulos, A.P., Schwartz, A.B. & Kettner, R.E. Neuronal population coding of movement direction. Science 233, 1416–1419 (1986).
 Fukushima, M., Saunders, R.C., Leopold, D.A., Mishkin, M. & Averbeck, B.B. Differential coding of conspecific vocalizations in the ventral auditory cortical stream. J. Neurosci. 34, 4665–4676 (2014).
 Willett, F.R., Suminski, A.J., Fagg, A.H. & Hatsopoulos, N.G. Improving brainmachine interface performance by decoding intended future movements. J. Neural Eng. 10, 026011 (2013).
 Lee, A.K. & Wilson, M.A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 1183–1194 (2002).
 Tremblay, S., Pieper, F., Sachs, A. & MartinezTrujillo, J. Attentional filtering of visual information by neuronal ensembles in the primate lateral prefrontal cortex. Neuron 85, 202–215 (2015).
 Johnson, A. & Redish, A.D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
 Jezek, K., Henriksen, E.J., Treves, A., Moser, E.I. & Moser, M.B. Thetapaced flickering between placecell maps in the hippocampus. Nature 478, 246–249 (2011).
 Seidemann, E., Meilijson, I., Abeles, M., Bergman, H. & Vaadia, E. Simultaneously recorded single units in the frontal cortex go through sequences of discrete and stable states in monkeys performing a delayed localization task. J. Neurosci. 16, 752–768 (1996).
 Cai, X. & PadoaSchioppa, C. Contributions of orbitofrontal and lateral prefrontal cortices to economic choice and the goodtoaction transformation. Neuron 81, 1140–1151 (2014).
 Hunt, L.T., Behrens, T.E., Hosokawa, T., Wallis, J.D. & Kennerley, S.W. Capturing the temporal evolution of choice across prefrontal cortex. Elife 4, e11945 (2015).
 McDannald, M.A. et al. Modelbased learning and the contribution of the orbitofrontal cortex to the modelfree world. Eur. J. Neurosci. 35, 991–996 (2012).
 Barron, H.C., Dolan, R.J. & Behrens, T.E. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).
 Takahashi, Y.K. et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron 80, 507–518 (2013).
 Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
 Schoenbaum, G., Takahashi, Y., Liu, T.L. & McDannald, M.A. Does the orbitofrontal cortex signal value? Ann. NY Acad. Sci. 1239, 87–99 (2011).
 Luk, C.H. & Wallis, J.D. Choice coding in frontal cortex during stimulusguided or actionguided decisionmaking. J. Neurosci. 33, 1864–1871 (2013).
 Strait, C.E. et al. Neuronal selectivity for spatial position of offers and choices in five reward regions. J. Neurophysiol. 115, 1098–1111 (2016).
 Lara, A.H. & Wallis, J.D. Executive control processes underlying multiitem working memory. Nat. Neurosci. 17, 876–883 (2014).
 Farovik, A. et al. Orbitofrontal cortex encodes memories within valuebased schemas and represents contexts that guide memory retrieval. J. Neurosci. 35, 8333–8344 (2015).
 McKenzie, S. et al. Hippocampal representation of related and opposing memories develop within distinct, hierarchically organized neural schemas. Neuron 83, 202–215 (2014).
 Carmichael, S.T. & Price, J.L. Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 363, 615–641 (1995).
 Carmichael, S.T. & Price, J.L. Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. J. Comp. Neurol. 363, 642–664 (1995).
 Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput. Biol. 7, e1002211 (2011).
 Habenschuss, S., Jonke, Z. & Maass, W. Stochastic computations in cortical microcircuit models. PLoS Comput. Biol. 9, e1003311 (2013).
 Asaad, W.F. & Eskandar, E.N. A flexible software tool for temporallyprecise behavioral control in Matlab. J. Neurosci. Methods 174, 245–258 (2008).
 Lara, A.H., Kennerley, S.W. & Wallis, J.D. Encoding of gustatory working memory by orbitofrontal neurons. J. Neurosci. 29, 765–774 (2009).
 Boulay, C.B., Pieper, F., Leavitt, M., MartinezTrujillo, J. & Sachs, A.J. Singletrial decoding of intended eye movement goals from lateral prefrontal cortex neural ensembles. J. Neurophysiol. 115, 486–499 (2016).
 Averbeck, B.B., Crowe, D.A., Chafee, M.V. & Georgopoulos, A.P. Neural activity in prefrontal cortex during copying geometrical shapes. II. Decoding shape segments from neural ensembles. Exp. Brain Res. 150, 142–153 (2003).
 Pesaran, B., Pezaris, J.S., Sahani, M., Mitra, P.P. & Andersen, R.A. Temporal structure in neuronal activity during working memory in macaque parietal cortex. Nat. Neurosci. 5, 805–811 (2002).
 Crowe, D.A. et al. Prefrontal neurons transmit signals to parietal neurons that reflect executive control of cognition. Nat. Neurosci. 16, 1484–1491 (2013).
Author information
Affiliations

Helen Wills Neuroscience Institute, University of California at Berkeley, Berkeley, California, USA.
 Erin L Rich &
 Jonathan D Wallis

Department of Psychology, University of California at Berkeley, Berkeley, California, USA.
 Jonathan D Wallis
Contributions
E.L.R. and J.D.W. designed the experiment and wrote the manuscript. E.L.R. collected and analyzed the data. J.D.W. supervised the project.
Competing financial interests
The authors declare no competing financial interests.
Author details
Erin L Rich
Search for this author in:
Jonathan D Wallis
Search for this author in:
Supplementary information
Supplementary Figures
 Supplementary Figure 1: Recording locations (372 KB)
OFC recording targets were determined from previously obtained MR images, and acute mapping of gray matter  white matter boundaries. Recording sites were bilateral in both subjects, and included areas 11 and 13. The middle image shows the center of the recording chambers in each subject (Subject M AP29, Subject N AP30, relative to the interaural line). The most anterior (top) and posterior (bottom) recording locations are also shown. Yellow lines indicate calculated electrode trajectories within each recording field.
 Supplementary Figure 2: Singleneuron encoding of reward information (101 KB)
(a) Chosen reward value and chosen reward type significantly increased during choice relative to fixation (χ^{2} test, * p ≤ 0.05). Unchosen value and type were encoded by <5% of neurons in both epochs, consistent with previous reports (PadoaSchioppa, C. and Assad, J.A. (2006) Nature 441, 223226; Hosokawa, T., et al. (2013) J Neurosci 33,1738517397). Bars show the percent of neurons whose firing rate significantly correlates with each regressor (p ≤ 0.01). (b) For finer temporal resolution, a multiple regression of the chosen and unchosen reward value was performed on bins of 100 ms stepped by 5 ms. Again, there was no evidence that the values of unchosen items were encoded by single neurons above chance levels. Bars show the percent of neurons whose firing rates significantly correlated with the chosen or unchosen regressor at each time in the trial. Chosen values were significantly encoded by OFC neurons, while unchosen values were not.
 Supplementary Figure 3: Decoding pictures from spikes or LFPs (135 KB)
(a) LDA classification of all 8 task pictures based on single unit spiking data or LFP. For 8 categories, chance decoding is 12.5%. Picture identity could be decoded from both spikes and LFPs above chance, with peak accuracy approximately 250 ms after stimulus onset, though spikes provided higher accuracy than LFPs. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. Pictures were decoded from both spikes and LFPs above chance. (b) Confusion matrices with colored number blocks referring to ordinal picture value 1 to 4. There were two pictures of each value level, one predicting a primary reward and one predicting a secondary reward. Accurate classifications are along the main diagonal. Classifier confusions of pictures with the same value but different reward types are the offdiagonal shaded squares. Most misclassifications occurred between primary and secondary pictures of the same value level. In these cases, the decoder accurately identified the picture’s reward value but misidentified the reward type (primary or secondary). (c) Percent of trials correctly classified and those classified as the correct value but wrong reward type (primary or secondary) for each subject. Overall, the decoder based on spiking data provided more accurate picture classification. This was largely because the LFP decoder was just as likely to misidentify as correctly identify the reward type. Bars show the mean ± SEM across behavioral sessions for each subject. ** paired ttest p ≤ 0.005, *** p ≤ 0.001.
 Supplementary Figure 4: Decoding value from spikes and LFPs (238 KB)
Since both spiking and LFP decoders identified reward types unreliably, we focused on decoding only value information. (a) LDA classified 8 pictures into 4 value categories, based on single unit spiking, LFP or both. Peak accuracy was approximately 44% with the spike decoder and 35% with the LFP decoder (chance = 25%). Decoding based on single neurons was more accurate than LFPs, and adding LFPs to single units did not significantly improve accuracy. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. (b) Scatter plots of peak accuracy from spike or LFP decoding as shown in A for each recording session. Decoding accuracy improves with more simultaneously recorded signals. (c) LDAs classified pictures into value categories as described in A, with decoding based on analytic amplitudes of individual LFP bands. Theta (48 Hz) and high gamma (70200 Hz) provided the best value decoding in both subjects; however a small amount of information was also contained in other frequency bands.
 Supplementary Figure 5: Comparison of categorical and continuous decoders (167 KB)
The value assigned to the single pictures is categorical in nature, but the concept of value likely exists on a continuous scale. Therefore, we determined whether a linear model that estimates value on a continuous scale would perform better than the LDA, which attempts to discriminate discrete categories. (a) Each picture value was associated with a different distribution of decoded values, and these distributions were overlapping. Histograms of continuous value estimates decoded with a linear model from single picture trials in both subjects (n = 22845 trials). Each observation is a trial, and trials were grouped according to the value of the picture the subject was shown. Regressions of decoded value on actual value were highly significant for all sessions (minimum r^{2} = 0.16, maximum p = 1.32 x 10^{20}; median r^{2} = 0.49, median p = 1.7 × 10^{79}). (b) Confusion matrices from categorical and continuous decoders. The linear model performed similarly to the LDA (48% versus 44%, where chance = 25%), but tended to make different types of errors. The linear model had a greater tendency to confuse adjacent categories but distinguish nonadjacent categories. (c) Table of sensitivities and specificities for decoding each comparison with LDA or a linear model. The first column shows the value comparisons being tested. Red = better performing decoder. We directly assessed each model’s ability to delineate pairs (or triads) of adjacent categories by calculating the sensitivity and specificity of the categorization. Sensitivity is the hit rate (correctly identifying a given category when it is present) and specificity is the correct rejection rate (correctly identifying a given category is not present). Overall, the linear model performed worse, with lower sensitivities and specificities than the LDA for adjacent categories. Thus, the linear model’s assumption of no categories resulted in a poorer ability to resolve the categorical nature of the training data, and increases the bias toward confusing neighboring values. Given this, we used the categorical LDA to analyze choice trials.
 Supplementary Figure 6: Detection of states in synthetic data sets (222 KB)
Three synthetic data sets were constructed from neural data collected on single picture trials. Each was comprised of a neural features x time x trials matrix, and corresponded to a hypothetical model of how neural data might represent the two available options. (a) A schematic illustrating how each data set was created for one choice trial. Blue = value 1, Red = value 4. In the States data set, alternating hidden states were created based on two choice options, e.g. 1 and 4. The number of states was determined randomly and independently for every trial, but approximated the number of states per trial observed in the real data (mean 4.87, median 5 synthesized states per trial). These states served as the ground truth for this synthetic data set. For every time point, two trials were randomly drawn from the distribution of single picture trials corresponding to the hidden state at that time, and averaged. This was done repeatedly to create a time series with alternating states. The result was a series of samples drawn from one value distribution for a short period of time, followed by a series drawn from another distribution to create alternating states. In the Averages set, we modeled the situation where all neural features responded by encoding the average of the two option values, such that no hidden states were defined. For each time point, one trial was randomly drawn from the distribution of single picture trials corresponding to the each choice option and averaged, as if each neural feature encoded the mean of the two options. In the Split data set, we modeled the situation where some features responded by encoding option A while others responded by encoding option B. Here, two trials were randomly drawn from the distribution of one of the two choice options and averaged. Which option was “encoded” varied by neural feature, creating a time series of consistent but mixed signals. The three synthetic data sets were submitted to the same LDA as the real choice data. That is, the LDA was trained on the same training set and classified the synthetic data in the same sliding windows. (b) A representative single trial showing the results of submitting each data set to the LDA used on the real choice data. All three panels show the same trial, but qualitatively the three data sets produced starkly different results. Alternating states were recovered from the States data set, but only low probability classifications were recovered from the other two sets. (c) The posterior probabilities of the most likely category were averaged over time for each trial in which the options did not have the same values (n = 3782). The States set recovered the states on which the set was built. In most cases, the correct values had high posterior probabilities. On other trials, the correct values were the most probable but by a smaller margin, a result of the random selection of observations serving as input data. In contrast, the Averages and Split sets produced consistently lower probability classifications. Without thresholding the output, the majority of observations from the Averages and Split data sets were classified with probabilities < 0.5 (gray line). (d) For all data sets, states were defined by criteria in the main paper, and the number of transitions between chosen and unchosen states within each trial was calculated. The real data showed a clear distribution centered around 3.5 state transitions, as described in the main paper. The States data largely showed 5 transitions per trial, which was unsurprising, since the set had been generated to do precisely that. Some trials (23.5%) showed no transitions that met the criteria for inclusion. These were primarily trials from sessions with a smaller number of features and poorer classification accuracy in the training set (see Supplementary Fig. 4B), suggesting that this is related to weak value signaling in the input data. We hypothesize that repeatedly sampling from the same distribution that includes some highfidelity trials and some lowfidelity trials, and mixing them together in time produces an overall lower confidence classification, so that some observations fell below the 0.5 cutoff and were thresholded out. For the Averages and Split data sets, a large proportion of trials found zero state transitions (43.5% and 60.4% respectively). This is in contrast to only 7.8% of trials in the real data. Furthermore, state transitions that did occur in the Averages and Split sets formed a wide, noisy distribution, in contrast to the tighter distribution in the real data. The mean and standard deviation of the number of state transitions per trial excluding trials with zero transitions was 3.8 ± 2.2 for the real data, 4.5 ± 1.2 for the States data, 7.1 ± 6.3 for the Averages data, and 6.3 ± 5.9 for the Split data. Pairwise tests for equality of variance found differences for all comparisons (Ftests, all F > 3.5, Bonferroni corrected p < 0.001), except for the comparison between Split and Averages (F_{1910,1340} = 1.12, Bonferroni corrected p = 0.12). Therefore, even when artifactual states were recovered from the data sets where no states were present, the distribution of transitions was flat, unlike real states or artificially created states. (e) Nonparametric comparison of state transitions per trial across the four data sets found more transitions in the real and States data than Averages or Split. A KruskalWallis test was highly significant (χ^{2}(3) = 1238, p < 0.001), and posthoc comparison of mean ranks from this test found no differences between real and States data (Tukey’s HSD p = 0.98), though both had more transitions than the Averages or Split data sets (Tukey’s HSD, p < 0.001). Each point is the mean rank ± SEM. ** p < 0.001. Overall, these analyses support the notion that neural states do exist in our data and are not a result of noisy decoding by the LDA. We intentionally created data sets with no states but different types of mixed signals, and found that the result does not match the clear signals we recovered from the real data. In contrast, synthetic data with clear states built in bore closer resemblance to the real data.
 Supplementary Figure 7: Variance in population vectors corresponds to neural states (299 KB)
(a) An example session showing the distribution of time bins classified chosen and unchosen options by LDA, projected into principal components (PC) space. Principal components analysis (PCA) was carried out on multidimensional population data without reference to LDA states, as described in online Methods, and state designations are overlaid. X and yaxes are two PCs (A and B) and each data point is a time bin. Trials (n = 130) are separated by the options available on each trial. For example ‘1v1’ indicates a choice between two options of value 1. The + indicates the center of each value distribution across all trials in the session. States identifying each of the available options occupy a different region of PC space, and when two options are available the trajectory travels through each region. (b) Twodimensional Gaussians fit to those points in A that were identified as each of the four value states by LDA. Color scales are normalized, and the + indicates the center of the distribution. (c) Five representative trials from the session shown in A and B with the entire trial trajectory shown by the gray line, and time bins classified as chosen and unchosen states shown by colored dots. To determine which choicerelated factors account for the most variance within a PC, we compared three multiple regression models. The first model included the values of the chosen and unchosen pictures on each trial. This model quantified the variance across trials related to the value of the options on offer, without discriminating any states within a trial. The second model identified whether or not each time point was categorized as belonging to each of the four value states by LDA, with four binary regressors. This was done so that there was no assumption of relationship between the states. Instead, the model would find variability related to the categorical separation of one value from the rest. The third model looked for variance in the PC attributed to the alternative option, which was not identified by the LDA at that time. That is, the same time points as the ‘Current States’ model were labeled by four binary regressors, according to the value of the other picture available on that trial. (d) The absolute value of the beta coefficients from three multiple regression models (option values / current state / alternate state), attempting to explain variance in each PC. Coefficients were consistently highest for the model based on the value states identified by LDA, and were also highest for lower PCs. These lower PCs account for the most variance in the population vectors themselves, indicating that the model fits prominent temporal features of the data. Each point is the mean ± SEM. (e) PCs were sorted by the percentage of their variance that was explained by each model. The best explained PC had approximately 16% of its variance accounted for by the current states model, and the best 5 each had >5% of variance explained by this model. The model based on the value of the alternate state explains essentially zero variance and is indistinguishable from the xaxis in this figure. Each point is the mean ± SEM. (f) The same data as shown in e, except that PCs are ordered according to how much variance they explain in the population vectors, illustrating that the dimensions most influenced by LDA state were approximately PCs 5 to 15. The lowest PCs (14) tended to capture some elements of ‘drift’ in neural signals across trials that were not associated with task variables. (g) Formal model comparisons were conducted by calculating the Akaike Information Criterion (AIC) from the deviance of each model, which corrects for different numbers of model parameters. The model with the lowest AIC has the best fit. In nearly all cases, particularly in the lower PCs, the model based on neural states identified by the LDA best fits the data. Overall, the current states model (red) performs the best, especially for the lowest PCs. The alternate states model (gray) is again indistinguishable from the xaxis.
 Supplementary Figure 8: Unsupervised clustering of population vectors reveals neural states (187 KB)
In the main paper, we describe the dynamics of neural states as switching back and fourth between two available options. A key element of this claim is that neural activity should transiently enter ‘state A’, move to a different state (‘B’), and then return to the original state. To assess this in the population vectors without using LDA, we focused on the five PCs with the highest r^{2} value for the “Current States” model above (Supplementary Fig. 7e). In these dimensions, the model explained >5% of the overall variance, while the rest were < 5%. (a) An example trial from the session shown in Supplementary Figure 7 ac. In this trial, multiple time bins (colored circles) were labeled by LDA as value 4 (red) or value 1 (blue). Based on the criteria described in the main paper, two value 1 states were identified (x and x’) and two value 4 states were identified (y and y’). The + shows the center of the 1 and 4 distributions across all trials from this session. (b) For all states identified by LDA, we calculated the Mahalanobis distance in 5 dimensions to each of 4 centroids, corresponding to the 4 option values. We used 50fold crossvalidation and ensured that the points being measured did not contribute to the computation of the distribution centers. Labeled states were closest to their respective centers, confirming that LDA and PCA extracted similar information from the population vectors contributing to each analysis. Plotted are the means ± SEM across sessions. (c) For every trial in which the same state was detected more than once, the same 4 Mahalanobis distances were calculated, and the first occurrence of the state was compared to the second (e.g. x versus x’). 10fold crossvalidation ensured that the trial being assessed did not contribute to the distributions it was compared to. Panels show states corresponding to a particular value. Plots are the mean ± SEM across trials. The first and second occurrence of a state tended to fall the same region of PC space, and there was no evidence that states change systematically over the course of a trial (2way ANOVAs of occurrence x center. All F(occurrence)_{1,347} < 0.1, p > 0.7. All F(centers)_{3,347} > 4, p ≤ 0.006). (d) We also considered states that were interleaved within a trial, in the pattern of ABA (n = 11720, each trial could have multiple interleaved patterns). If states are discrete and consistent, the population vectors should look similar in noncontiguous A states, but different in interleaved B states. For each sequence, the Mahalanobis distances to the initial state, A, were calculated. There was no difference between A states, which were both very close to the A center, but the interleaved B state was significantly farther away (Oneway ANOVA F_{2,35157} = 237 p = 6 x 10^{103}. Tukey’s HSD posthoc comparisons: A vs. A’ p = 0.94, A vs. B and A’ vs. B p < 0.0001). This is consistent with neural activity switching back and forth between states within a trial, as described in the main paper. The plot shows mean ± SEM. * p < 0.0001. (e) We also tested whether prominent features of populationlevel variance, extracted without reference to predefined states (i.e. in an unsupervised manner), map onto the LDA states. The 5 PCs that were best predicted by the current states model illustrated in Supplementary Figure 7 were separated with kmeans into 5 clusters, under the assumption that there would be four value distributions and a poorly classified noise cluster. The clusters were renumbered for each session according to their highest probability value category to allow comparison across sessions. If the clusters mapped onto states, one value state from the LDA would be represented more than others in a given cluster. The median (across sessions) percent of time bins in each cluster labeled by the LDA as belonging to each state is shown in the first heat plot. Cluster 1 clearly corresponded to value 1, clusters 3 and 4 were reasonably selective for values 2 and 3 respectively, and cluster 5 mostly isolated value 4 with some value 3. Further, the “confusions” of these assignments tended to occur with neighboring state values, indicating that, even if the boundaries vary by analysis or if different sessions would be better fit by a different number of clusters, both approaches extracted similar information from the data. The median percent of time bins in each cluster that came from trials with each chosen and unchosen option value are shown in the second and third heat plots. Here, clusters were evenly distributed across the trial types, such that the proportion of values associated with a cluster simply reflected the overall frequency of those values across trials (the asymmetry reflects that fact that animals most frequently chose higher values and most frequently did not choose lower values).
 Supplementary Figure 9: Decoded representations predict withincondition choice times (243 KB)
The strength of chosen (red) and unchosen (blue) representations was averaged in 20 ms bins, stepped forward by 5 ms and used to predict choice times. (a) Each pair of picture values was analyzed separately with multiple regressions, and β coefficients are plotted. Time bins significant at p ≤ 0.05 (uncorrected) are highlighted. Most comparisons show a trend toward negative correlations between choice times and the strength of chosen representations and positive correlations between choice times and unchosen representations. (b) The six curves in A were aggregated to compute statistical significance. The plot shows the mean beta values (fill = ± SEM; n = 6). Significance was calculated by aggregating the mean squared error (MSE) from all trials. The value √(MSE/n) is the standard error of the mean beta coefficient, and was used to compute a tstatistic and its 2sided significance level. Shaded blocks in the plot show regions where the aggregate beta for chosen (red) or unchosen (blue) strength was significantly different from zero. (p ≤ 0.05 × 6 consecutive time bins to correct for multiple comparisons). For this analysis, trials with very short choice times (<170 ms) were excluded to reduce the positive skew in the distributions when power was reduced (n = 2795 trials).
 Supplementary Figure 10: Quick and deliberative decisions, separately by subject (280 KB)
(a) The average representational strength of the chosen picture (red), the unchosen picture (blue) or one of the unavailable options (averaged across two unavailable options; green) is shown for quick and deliberative decisions as in Figure 3, but separated by subject (Subject M: n = 272 trials, Subject N: n = 166 trials). Both subjects show similar effects. When a quick decision is made, representations of the chosen item dominate, but when subjects deliberate, chosen and unchosen representations have similar strength. (b) Chosen and unchosen representations as in a, but aligned to the time of the choice. Chosen representations do not increase in strength at the time of the choice during deliberation trials. All plots show mean ± SEM.
PDF files
 Supplementary Text and Figures (2,245 KB)
Supplementary Figures 1–10
Additional data

Supplementary Figure 1: Recording locationsHover over figure to zoom

Supplementary Figure 2: Singleneuron encoding of reward informationHover over figure to zoom

Supplementary Figure 3: Decoding pictures from spikes or LFPsHover over figure to zoom

Supplementary Figure 4: Decoding value from spikes and LFPsHover over figure to zoom

Supplementary Figure 5: Comparison of categorical and continuous decodersHover over figure to zoom

Supplementary Figure 6: Detection of states in synthetic data setsHover over figure to zoom

Supplementary Figure 7: Variance in population vectors corresponds to neural statesHover over figure to zoom

Supplementary Figure 8: Unsupervised clustering of population vectors reveals neural statesHover over figure to zoom

Supplementary Figure 9: Decoded representations predict withincondition choice timesHover over figure to zoom

Supplementary Figure 10: Quick and deliberative decisions, separately by subjectHover over figure to zoom