Decoding subjective decisions from orbitofrontal cortex

Journal name:
Nature Neuroscience
Volume:
19,
Pages:
973–980
Year published:
DOI:
doi:10.1038/nn.4320
Received
Accepted
Published online

Abstract

When making a subjective choice, the brain must compute a value for each option and compare those values to make a decision. The orbitofrontal cortex (OFC) is critically involved in this process, but the neural mechanisms remain obscure, in part due to limitations in our ability to measure and control the internal deliberations that can alter the dynamics of the decision process. Here we tracked these dynamics by recovering temporally precise neural states from multidimensional data in OFC. During individual choices, OFC alternated between states associated with the value of two available options, with dynamics that predicted whether a subject would decide quickly or vacillate between the two alternatives. Ensembles of value-encoding neurons contributed to these states, with individual neurons shifting activity patterns as the network evaluated each option. Thus, the mechanism of subjective decision-making involves the dynamic activation of OFC states associated with each choice alternative.

At a glance

Figures

  1. Behavioral task and performance.
    Figure 1: Behavioral task and performance.

    (a) To begin a trial, subjects fixated a central point for 450 ms. On choice trials, two pictures at ±5° visual angle predicted different reward amounts. Subjects freely viewed both images and chose one by fixating it for 450 ms. After a choice, another cue appeared instructing a right or left joystick response, which, if executed correctly, resulted in the reward associated with the picture chosen at the beginning of the trial. Single-picture trials were identical to choice trials, except only one randomly selected picture was shown. Subjects had to fixate the picture for 450 ms and make the subsequently instructed joystick response to obtain reward. (b) Both subjects learned eight reward-predicting pictures well, choosing more valuable pictures on choice trials (left; regression of percent chosen per session on picture value. Subject M: n = 96 (24 sessions × 4 values), r2 = 0.96, P = 2 × 10−67; subject N: n = 80, r2 = 0.94, P = 3 × 10−50), and making faster joystick responses for higher value pictures on single-picture trials (right; RT, reaction time; regression of log(RT) on picture value: subject M r2 = 0.34, P = 6 × 10−10; Subject N r2 = 0.25, P = 3 × 10−6). Error bars are ± s.e.m.

  2. Decoded choice dynamics.
    Figure 2: Decoded choice dynamics.

    (a) Posterior probabilities derived from LDAs for chosen (red), unchosen (blue) and unavailable options (gray, average of both unavailable options), shown for six typical trials. (b) Pictures on the right and left of the screen (top) or that were chosen (Ch) and not chosen (UnCh) (bottom) explained significant variance in decoder classifications. Colored lines show times with significant β coefficients from the multiple regression (P ≤ 0.001 to account for multiple comparisons). (c) Histograms of putative states decoded from choice trials, according to the number of consecutive time bins in which the same value was decoded (duration). All data are plotted three times, each with a different threshold (gray shades). Observations with posterior probabilities below the designated threshold were removed. The vertical line indicates a 4-bin duration, which was the cut-off used to define a stable state. (d) The number of stable states per trial, averaged for each session. Chosen states were more prevalent than unchosen states.

  3. Odds of decoding unchosen options.
    Figure 3: Odds of decoding unchosen options.

    The odds of decoding the unchosen value ('target', red) were calculated as an odds ratio among all trials with the same chosen value and were compared to the odds of decoding a value between the chosen and unchosen option ('intermediate', gray) among the same trials. (a) For correct choices in which 3 was chosen, the odds of decoding 1 given that 1 was present were higher than the odds of decoding 2 given that 1 was present. (b) For correct choices in which 4 was chosen, the odds of decoding 2 given that 2 was present were higher than the odds of decoding 3 given that 2 was present. (c) For correct choices in which 4 was chosen, the odds of decoding 1 given that 1 was present were higher than the odds of decoding 2 given that 1 was present. (d) The odds of decoding 1 or 3, given that 1 was present, for correct choices in which 4 was chosen. Here decoder noise raises the odds of decoding 3; however, decoding 1 was still more likely. Shading, ±s.e.m. Odds ratios were calculated in 70-ms epochs, stepped by 15 ms. Red bar shows differences greater than shuffled trials at P ≤ 0.01 for at least six time bins (approximately 100 ms). This significance level was established by finding the threshold that reduced pre-stimulus false discoveries to ≤1%.

  4. Decoded representations predict choice times.
    Figure 4: Decoded representations predict choice times.

    (a) Posterior probabilities (averaged in 20-ms bins stepped by 5 ms) predicted choice times. β coefficients from multiple regressions of choice times on two factors: probability of chosen and unchosen states (maximum variance inflation factor = 1.05). (b) β coefficients from multiple regressions of choice times on two factors: probability of the chosen and unavailable states (maximum variance inflation factor = 2.22). Choice times were log-transformed and probabilities z-scored so β coefficients could be compared. Orange, chosen P ≤ 0.005; red, chosen P ≤ 0.001; teal, unchosen P ≤ 0.005; blue, unchosen P ≤ 0.001; gray, not significant. (c,d) The percentage of choice time variance accounted for by each factor, quantified with coefficients of partial determination (CPD). Ch, chosen; UnCh, unchosen; NA, not available.

  5. Quick versus deliberative decisions.
    Figure 5: Quick versus deliberative decisions.

    (a) Eye positions relative to the fixation point (black circle) and two choice options (red circles) are shown as points varying from black (start of trial) to red (option selection) for two example trials. In quick decisions, the subjects' eyes moved from the fixation point to one picture. In deliberative decisions, eyes fell within ±2.5° of the center of one picture at least twice before selecting an option. (b) Quick decisions increased with increasing difference in option values. The height of each bar is the overall ratio (log scale) of quick to deliberative decisions across all trials with a given value difference. (c,d) The average (±s.e.m.) probability that neural activity represented the chosen picture (red), the unchosen picture (blue) or an unavailable option (averaged across two unavailable options; green) for quick or deliberative decisions. (e,f) For quick decisions, there was a larger difference in the relative strength of the chosen and unchosen representation. At each time point, we performed a 3 × 2 ANOVA with factors of representation type (chosen, unchosen or unavailable) and decision type (quick or deliberative). The interaction term reached significance (P ≤ 0.05) in multiple bins after picture onset, indicating that neural representations varied by decision type (not shown). Tukey's HSD assessed pairwise contrasts. Red, P ≤ 0.005; orange, P ≤ 0.01; gray, P > 0.01.

  6. Targets of gaze do not affect OFC representations.
    Figure 6: Targets of gaze do not affect OFC representations.

    (a) Strength of chosen and unchosen representations during deliberation trials, aligned to the times the subject fixated one of the two choice options. Only the first and second fixations were included in this analysis, and only if they consisted of one saccade to each picture (n = 1,058 trials). (b) Strength of chosen and unchosen representations on trials with one saccade to the unchosen item, followed by a saccade to the chosen item (n = 550 trials). Plots are aligned to the first and second fixation in each sequence. Lines are mean posterior probabilities ± s.e.m. The chosen representation becomes progressively stronger than the unchosen representation, but this process is not affected by the fixations.

  7. Removing single neurons does not disrupt decoded states.
    Figure 7: Removing single neurons does not disrupt decoded states.

    (a) An example trial, with choice options of values 3 and 4. Colors indicate the value decoded at each point in time. The top row shows the values decoded from the full ensemble, and the rows beneath show the values decoded when each of eight neurons was held out. (b) For every neuron, a reduced ensemble was created by holding it out. Correlations were calculated between the time series of values decoded from the reduced ensemble and the corresponding full ensemble, and r2 values are shown. (c) The average effects of holding different numbers of neurons out from the full ensemble. Each point is the average r2 from one session, in which the same number of neurons were held out in different combinations. The black line is an exponential curve fit to the distribution. When 100% of neurons were removed, values were decoded from LFP data alone, and the minimum r2 observed was 0.049.

  8. Single neurons encode current states.
    Figure 8: Single neurons encode current states.

    Normalized firing rates of single neurons, re-aligned to the onsets of decoded states. (a) A neuron that encoded current states. In each subplot (average firing rates ± s.e.m.), neuron activity was aligned to states in which the right or left picture was decoded from the rest of the population (right or left states, respectively). During right states, this neuron encoded the value of pictures on the right (right value), but not left (left value; first two subplots). During left states, it encoded left values, but not right values. Bottom panels show β coefficients from a multiple regression of firing rate (averaged over 50 ms, stepped by 10 ms) on four factors corresponding to titles on the subplots (for example, right value during right states). Red, multiple regression test, P ≤ 0.01. (b) Percentage of neurons encoding current state value (i.e., right values during right states and left values during left states) (red) and those encoding the value of the alternate picture (i.e., right values during left states and left values during right states) (gray). Significance was defined by non-zero β coefficients in a multiple regression (P ≤ 0.01). Asterisks indicate that red and gray proportions differ (χ2 test, P ≤ 0.01). (c) Across all neurons, β coefficients in the first 100 ms after state onset for right values during right states (right | right) were correlated with left values during left states (left | left) (r2 = 0.55, p = 6 × 10−81; gray line, unity).

  9. Recording locations
    Supplementary Fig. 1: Recording locations

    OFC recording targets were determined from previously obtained MR images, and acute mapping of gray matter - white matter boundaries. Recording sites were bilateral in both subjects, and included areas 11 and 13. The middle image shows the center of the recording chambers in each subject (Subject M AP29, Subject N AP30, relative to the inter-aural line). The most anterior (top) and posterior (bottom) recording locations are also shown. Yellow lines indicate calculated electrode trajectories within each recording field.

  10. Single-neuron encoding of reward information
    Supplementary Fig. 2: Single-neuron encoding of reward information

    (a) Chosen reward value and chosen reward type significantly increased during choice relative to fixation (χ2 test, * p ≤ 0.05). Unchosen value and type were encoded by <5% of neurons in both epochs, consistent with previous reports (Padoa-Schioppa, C. and Assad, J.A. (2006) Nature 441, 223-226; Hosokawa, T., et al. (2013) J Neurosci 33,17385-17397). Bars show the percent of neurons whose firing rate significantly correlates with each regressor (p ≤ 0.01). (b) For finer temporal resolution, a multiple regression of the chosen and unchosen reward value was performed on bins of 100 ms stepped by 5 ms. Again, there was no evidence that the values of unchosen items were encoded by single neurons above chance levels. Bars show the percent of neurons whose firing rates significantly correlated with the chosen or unchosen regressor at each time in the trial. Chosen values were significantly encoded by OFC neurons, while unchosen values were not.

  11. Decoding pictures from spikes or LFPs
    Supplementary Fig. 3: Decoding pictures from spikes or LFPs

    (a) LDA classification of all 8 task pictures based on single unit spiking data or LFP. For 8 categories, chance decoding is 12.5%. Picture identity could be decoded from both spikes and LFPs above chance, with peak accuracy approximately 250 ms after stimulus onset, though spikes provided higher accuracy than LFPs. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. Pictures were decoded from both spikes and LFPs above chance. (b) Confusion matrices with colored number blocks referring to ordinal picture value 1 to 4. There were two pictures of each value level, one predicting a primary reward and one predicting a secondary reward. Accurate classifications are along the main diagonal. Classifier confusions of pictures with the same value but different reward types are the off-diagonal shaded squares. Most misclassifications occurred between primary and secondary pictures of the same value level. In these cases, the decoder accurately identified the picture’s reward value but misidentified the reward type (primary or secondary). (c) Percent of trials correctly classified and those classified as the correct value but wrong reward type (primary or secondary) for each subject. Overall, the decoder based on spiking data provided more accurate picture classification. This was largely because the LFP decoder was just as likely to misidentify as correctly identify the reward type. Bars show the mean ± SEM across behavioral sessions for each subject. ** paired t-test p ≤ 0.005, *** p ≤ 0.001.

  12. Decoding value from spikes and LFPs
    Supplementary Fig. 4: Decoding value from spikes and LFPs

    Since both spiking and LFP decoders identified reward types unreliably, we focused on decoding only value information. (a) LDA classified 8 pictures into 4 value categories, based on single unit spiking, LFP or both. Peak accuracy was approximately 44% with the spike decoder and 35% with the LFP decoder (chance = 25%). Decoding based on single neurons was more accurate than LFPs, and adding LFPs to single units did not significantly improve accuracy. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. (b) Scatter plots of peak accuracy from spike or LFP decoding as shown in A for each recording session. Decoding accuracy improves with more simultaneously recorded signals. (c) LDAs classified pictures into value categories as described in A, with decoding based on analytic amplitudes of individual LFP bands. Theta (4-8 Hz) and high gamma (70-200 Hz) provided the best value decoding in both subjects; however a small amount of information was also contained in other frequency bands.

  13. Comparison of categorical and continuous decoders
    Supplementary Fig. 5: Comparison of categorical and continuous decoders

    The value assigned to the single pictures is categorical in nature, but the concept of value likely exists on a continuous scale. Therefore, we determined whether a linear model that estimates value on a continuous scale would perform better than the LDA, which attempts to discriminate discrete categories. (a) Each picture value was associated with a different distribution of decoded values, and these distributions were overlapping. Histograms of continuous value estimates decoded with a linear model from single picture trials in both subjects (n = 22845 trials). Each observation is a trial, and trials were grouped according to the value of the picture the subject was shown. Regressions of decoded value on actual value were highly significant for all sessions (minimum r2 = 0.16, maximum p = 1.32 x 10-20; median r2 = 0.49, median p = 1.7 × 10-79). (b) Confusion matrices from categorical and continuous decoders. The linear model performed similarly to the LDA (48% versus 44%, where chance = 25%), but tended to make different types of errors. The linear model had a greater tendency to confuse adjacent categories but distinguish non-adjacent categories. (c) Table of sensitivities and specificities for decoding each comparison with LDA or a linear model. The first column shows the value comparisons being tested. Red = better performing decoder. We directly assessed each model’s ability to delineate pairs (or triads) of adjacent categories by calculating the sensitivity and specificity of the categorization. Sensitivity is the hit rate (correctly identifying a given category when it is present) and specificity is the correct rejection rate (correctly identifying a given category is not present). Overall, the linear model performed worse, with lower sensitivities and specificities than the LDA for adjacent categories. Thus, the linear model’s assumption of no categories resulted in a poorer ability to resolve the categorical nature of the training data, and increases the bias toward confusing neighboring values. Given this, we used the categorical LDA to analyze choice trials.

  14. Detection of states in synthetic data sets
    Supplementary Fig. 6: Detection of states in synthetic data sets

    Three synthetic data sets were constructed from neural data collected on single picture trials. Each was comprised of a neural features x time x trials matrix, and corresponded to a hypothetical model of how neural data might represent the two available options. (a) A schematic illustrating how each data set was created for one choice trial. Blue = value 1, Red = value 4. In the States data set, alternating hidden states were created based on two choice options, e.g. 1 and 4. The number of states was determined randomly and independently for every trial, but approximated the number of states per trial observed in the real data (mean 4.87, median 5 synthesized states per trial). These states served as the ground truth for this synthetic data set. For every time point, two trials were randomly drawn from the distribution of single picture trials corresponding to the hidden state at that time, and averaged. This was done repeatedly to create a time series with alternating states. The result was a series of samples drawn from one value distribution for a short period of time, followed by a series drawn from another distribution to create alternating states. In the Averages set, we modeled the situation where all neural features responded by encoding the average of the two option values, such that no hidden states were defined. For each time point, one trial was randomly drawn from the distribution of single picture trials corresponding to the each choice option and averaged, as if each neural feature encoded the mean of the two options. In the Split data set, we modeled the situation where some features responded by encoding option A while others responded by encoding option B. Here, two trials were randomly drawn from the distribution of one of the two choice options and averaged. Which option was “encoded” varied by neural feature, creating a time series of consistent but mixed signals. The three synthetic data sets were submitted to the same LDA as the real choice data. That is, the LDA was trained on the same training set and classified the synthetic data in the same sliding windows. (b) A representative single trial showing the results of submitting each data set to the LDA used on the real choice data. All three panels show the same trial, but qualitatively the three data sets produced starkly different results. Alternating states were recovered from the States data set, but only low probability classifications were recovered from the other two sets. (c) The posterior probabilities of the most likely category were averaged over time for each trial in which the options did not have the same values (n = 3782). The States set recovered the states on which the set was built. In most cases, the correct values had high posterior probabilities. On other trials, the correct values were the most probable but by a smaller margin, a result of the random selection of observations serving as input data. In contrast, the Averages and Split sets produced consistently lower probability classifications. Without thresholding the output, the majority of observations from the Averages and Split data sets were classified with probabilities < 0.5 (gray line). (d) For all data sets, states were defined by criteria in the main paper, and the number of transitions between chosen and unchosen states within each trial was calculated. The real data showed a clear distribution centered around 3.5 state transitions, as described in the main paper. The States data largely showed 5 transitions per trial, which was unsurprising, since the set had been generated to do precisely that. Some trials (23.5%) showed no transitions that met the criteria for inclusion. These were primarily trials from sessions with a smaller number of features and poorer classification accuracy in the training set (see Supplementary Fig. 4B), suggesting that this is related to weak value signaling in the input data. We hypothesize that repeatedly sampling from the same distribution that includes some high-fidelity trials and some low-fidelity trials, and mixing them together in time produces an overall lower confidence classification, so that some observations fell below the 0.5 cut-off and were thresholded out. For the Averages and Split data sets, a large proportion of trials found zero state transitions (43.5% and 60.4% respectively). This is in contrast to only 7.8% of trials in the real data. Furthermore, state transitions that did occur in the Averages and Split sets formed a wide, noisy distribution, in contrast to the tighter distribution in the real data. The mean and standard deviation of the number of state transitions per trial excluding trials with zero transitions was 3.8 ± 2.2 for the real data, 4.5 ± 1.2 for the States data, 7.1 ± 6.3 for the Averages data, and 6.3 ± 5.9 for the Split data. Pairwise tests for equality of variance found differences for all comparisons (F-tests, all F > 3.5, Bonferroni corrected p < 0.001), except for the comparison between Split and Averages (F1910,1340 = 1.12, Bonferroni corrected p = 0.12). Therefore, even when artifactual states were recovered from the data sets where no states were present, the distribution of transitions was flat, unlike real states or artificially created states. (e) Non-parametric comparison of state transitions per trial across the four data sets found more transitions in the real and States data than Averages or Split. A Kruskal-Wallis test was highly significant (χ2(3) = 1238, p < 0.001), and post-hoc comparison of mean ranks from this test found no differences between real and States data (Tukey’s HSD p = 0.98), though both had more transitions than the Averages or Split data sets (Tukey’s HSD, p < 0.001). Each point is the mean rank ± SEM. ** p < 0.001. Overall, these analyses support the notion that neural states do exist in our data and are not a result of noisy decoding by the LDA. We intentionally created data sets with no states but different types of mixed signals, and found that the result does not match the clear signals we recovered from the real data. In contrast, synthetic data with clear states built in bore closer resemblance to the real data.

  15. Variance in population vectors corresponds to neural states
    Supplementary Fig. 7: Variance in population vectors corresponds to neural states

    (a) An example session showing the distribution of time bins classified chosen and unchosen options by LDA, projected into principal components (PC) space. Principal components analysis (PCA) was carried out on multi-dimensional population data without reference to LDA states, as described in online Methods, and state designations are overlaid. X- and y-axes are two PCs (A and B) and each data point is a time bin. Trials (n = 130) are separated by the options available on each trial. For example ‘1v1’ indicates a choice between two options of value 1. The + indicates the center of each value distribution across all trials in the session. States identifying each of the available options occupy a different region of PC space, and when two options are available the trajectory travels through each region. (b) Two-dimensional Gaussians fit to those points in A that were identified as each of the four value states by LDA. Color scales are normalized, and the + indicates the center of the distribution. (c) Five representative trials from the session shown in A and B with the entire trial trajectory shown by the gray line, and time bins classified as chosen and unchosen states shown by colored dots. To determine which choice-related factors account for the most variance within a PC, we compared three multiple regression models. The first model included the values of the chosen and unchosen pictures on each trial. This model quantified the variance across trials related to the value of the options on offer, without discriminating any states within a trial. The second model identified whether or not each time point was categorized as belonging to each of the four value states by LDA, with four binary regressors. This was done so that there was no assumption of relationship between the states. Instead, the model would find variability related to the categorical separation of one value from the rest. The third model looked for variance in the PC attributed to the alternative option, which was not identified by the LDA at that time. That is, the same time points as the ‘Current States’ model were labeled by four binary regressors, according to the value of the other picture available on that trial. (d) The absolute value of the beta coefficients from three multiple regression models (option values / current state / alternate state), attempting to explain variance in each PC. Coefficients were consistently highest for the model based on the value states identified by LDA, and were also highest for lower PCs. These lower PCs account for the most variance in the population vectors themselves, indicating that the model fits prominent temporal features of the data. Each point is the mean ± SEM. (e) PCs were sorted by the percentage of their variance that was explained by each model. The best explained PC had approximately 16% of its variance accounted for by the current states model, and the best 5 each had >5% of variance explained by this model. The model based on the value of the alternate state explains essentially zero variance and is indistinguishable from the x-axis in this figure. Each point is the mean ± SEM. (f) The same data as shown in e, except that PCs are ordered according to how much variance they explain in the population vectors, illustrating that the dimensions most influenced by LDA state were approximately PCs 5 to 15. The lowest PCs (1-4) tended to capture some elements of ‘drift’ in neural signals across trials that were not associated with task variables. (g) Formal model comparisons were conducted by calculating the Akaike Information Criterion (AIC) from the deviance of each model, which corrects for different numbers of model parameters. The model with the lowest AIC has the best fit. In nearly all cases, particularly in the lower PCs, the model based on neural states identified by the LDA best fits the data. Overall, the current states model (red) performs the best, especially for the lowest PCs. The alternate states model (gray) is again indistinguishable from the x-axis.

  16. Unsupervised clustering of population vectors reveals neural states
    Supplementary Fig. 8: Unsupervised clustering of population vectors reveals neural states

    In the main paper, we describe the dynamics of neural states as switching back and fourth between two available options. A key element of this claim is that neural activity should transiently enter ‘state A’, move to a different state (‘B’), and then return to the original state. To assess this in the population vectors without using LDA, we focused on the five PCs with the highest r2 value for the “Current States” model above (Supplementary Fig. 7e). In these dimensions, the model explained >5% of the overall variance, while the rest were < 5%. (a) An example trial from the session shown in Supplementary Figure 7 a-c. In this trial, multiple time bins (colored circles) were labeled by LDA as value 4 (red) or value 1 (blue). Based on the criteria described in the main paper, two value 1 states were identified (x and x’) and two value 4 states were identified (y and y’). The + shows the center of the 1 and 4 distributions across all trials from this session. (b) For all states identified by LDA, we calculated the Mahalanobis distance in 5 dimensions to each of 4 centroids, corresponding to the 4 option values. We used 50-fold cross-validation and ensured that the points being measured did not contribute to the computation of the distribution centers. Labeled states were closest to their respective centers, confirming that LDA and PCA extracted similar information from the population vectors contributing to each analysis. Plotted are the means ± SEM across sessions. (c) For every trial in which the same state was detected more than once, the same 4 Mahalanobis distances were calculated, and the first occurrence of the state was compared to the second (e.g. x versus x’). 10-fold cross-validation ensured that the trial being assessed did not contribute to the distributions it was compared to. Panels show states corresponding to a particular value. Plots are the mean ± SEM across trials. The first and second occurrence of a state tended to fall the same region of PC space, and there was no evidence that states change systematically over the course of a trial (2-way ANOVAs of occurrence x center. All F(occurrence)1,347 < 0.1, p > 0.7. All F(centers)3,347 > 4, p ≤ 0.006). (d) We also considered states that were interleaved within a trial, in the pattern of A-B-A (n = 11720, each trial could have multiple interleaved patterns). If states are discrete and consistent, the population vectors should look similar in non-contiguous A states, but different in interleaved B states. For each sequence, the Mahalanobis distances to the initial state, A, were calculated. There was no difference between A states, which were both very close to the A center, but the interleaved B state was significantly farther away (One-way ANOVA F2,35157 = 237 p = 6 x 10-103. Tukey’s HSD post-hoc comparisons: A vs. A’ p = 0.94, A vs. B and A’ vs. B p < 0.0001). This is consistent with neural activity switching back and forth between states within a trial, as described in the main paper. The plot shows mean ± SEM. * p < 0.0001. (e) We also tested whether prominent features of population-level variance, extracted without reference to predefined states (i.e. in an unsupervised manner), map onto the LDA states. The 5 PCs that were best predicted by the current states model illustrated in Supplementary Figure 7 were separated with k-means into 5 clusters, under the assumption that there would be four value distributions and a poorly classified noise cluster. The clusters were renumbered for each session according to their highest probability value category to allow comparison across sessions. If the clusters mapped onto states, one value state from the LDA would be represented more than others in a given cluster. The median (across sessions) percent of time bins in each cluster labeled by the LDA as belonging to each state is shown in the first heat plot. Cluster 1 clearly corresponded to value 1, clusters 3 and 4 were reasonably selective for values 2 and 3 respectively, and cluster 5 mostly isolated value 4 with some value 3. Further, the “confusions” of these assignments tended to occur with neighboring state values, indicating that, even if the boundaries vary by analysis or if different sessions would be better fit by a different number of clusters, both approaches extracted similar information from the data. The median percent of time bins in each cluster that came from trials with each chosen and unchosen option value are shown in the second and third heat plots. Here, clusters were evenly distributed across the trial types, such that the proportion of values associated with a cluster simply reflected the overall frequency of those values across trials (the asymmetry reflects that fact that animals most frequently chose higher values and most frequently did not choose lower values).

  17. Decoded representations predict within-condition choice times
    Supplementary Fig. 9: Decoded representations predict within-condition choice times

    The strength of chosen (red) and unchosen (blue) representations was averaged in 20 ms bins, stepped forward by 5 ms and used to predict choice times. (a) Each pair of picture values was analyzed separately with multiple regressions, and β coefficients are plotted. Time bins significant at p ≤ 0.05 (uncorrected) are highlighted. Most comparisons show a trend toward negative correlations between choice times and the strength of chosen representations and positive correlations between choice times and unchosen representations. (b) The six curves in A were aggregated to compute statistical significance. The plot shows the mean beta values (fill = ± SEM; n = 6). Significance was calculated by aggregating the mean squared error (MSE) from all trials. The value √(MSE/n) is the standard error of the mean beta coefficient, and was used to compute a t-statistic and its 2-sided significance level. Shaded blocks in the plot show regions where the aggregate beta for chosen (red) or unchosen (blue) strength was significantly different from zero. (p ≤ 0.05 × 6 consecutive time bins to correct for multiple comparisons). For this analysis, trials with very short choice times (<170 ms) were excluded to reduce the positive skew in the distributions when power was reduced (n = 2795 trials).

  18. Quick and deliberative decisions, separately by subject
    Supplementary Fig. 10: Quick and deliberative decisions, separately by subject

    (a) The average representational strength of the chosen picture (red), the unchosen picture (blue) or one of the unavailable options (averaged across two unavailable options; green) is shown for quick and deliberative decisions as in Figure 3, but separated by subject (Subject M: n = 272 trials, Subject N: n = 166 trials). Both subjects show similar effects. When a quick decision is made, representations of the chosen item dominate, but when subjects deliberate, chosen and unchosen representations have similar strength. (b) Chosen and unchosen representations as in a, but aligned to the time of the choice. Chosen representations do not increase in strength at the time of the choice during deliberation trials. All plots show mean ± SEM.

References

  1. Padoa-Schioppa, C. Neurobiology of economic choice: a good-based model. Annu. Rev. Neurosci. 34, 333359 (2011).
  2. Rangel, A., Camerer, C. & Montague, P.R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545556 (2008).
  3. Rushworth, M.F., Mars, R.B. & Summerfield, C. General mechanisms for making decisions? Curr. Opin. Neurobiol. 19, 7583 (2009).
  4. Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 12921298 (2010).
  5. Dai, J. & Busemeyer, J.R. A probabilistic, dynamic, and attribute-wise model of intertemporal choice. J. Exp. Psychol. Gen. 143, 14891514 (2014).
  6. Wallis, J.D. Cross-species studies of orbitofrontal cortex and value-based decision-making. Nat. Neurosci. 15, 1319 (2012).
  7. Rudebeck, P.H. & Murray, E.A. The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes. Neuron 84, 11431156 (2014).
  8. Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953956 (2012).
  9. Fellows, L.K. Orbitofrontal contributions to value-based decision making: evidence from humans with frontal lobe damage. Ann. NY Acad. Sci. 1239, 5158 (2011).
  10. Padoa-Schioppa, C. Neuronal origins of choice variability in economic decisions. Neuron 80, 13221336 (2013).
  11. Doya, K. Modulators of decision making. Nat. Neurosci. 11, 410416 (2008).
  12. Sugrue, L.P., Corrado, G.S. & Newsome, W.T. Matching behavior and the representation of value in the parietal cortex. Science 304, 17821787 (2004).
  13. Lak, A. et al. Orbitofrontal cortex is required for optimal waiting based on decision confidence. Neuron 84, 190201 (2014).
  14. Shimojo, S., Simion, C., Shimojo, E. & Scheier, C. Gaze bias both reflects and influences preference. Nat. Neurosci. 6, 13171322 (2003).
  15. Bouret, S. & Richmond, B.J. Ventromedial and orbital prefrontal neurons differentially encode internally and externally driven motivational values in monkeys. J. Neurosci. 30, 85918601 (2010).
  16. Churchland, M.M., Yu, B.M., Sahani, M. & Shenoy, K.V. Techniques for extracting single-trial activity patterns from large-scale neural recordings. Curr. Opin. Neurobiol. 17, 609618 (2007).
  17. Grattan, L.E. & Glimcher, P.W. Absence of spatial tuning in the orbitofrontal cortex. PLoS One 9, e112750 (2014).
  18. Rich, E.L. & Wallis, J.D. Medial-lateral organization of the orbitofrontal cortex. J. Cogn. Neurosci. 26, 13471362 (2014).
  19. Wallis, J.D. & Miller, E.K. Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur. J. Neurosci. 18, 20692081 (2003).
  20. Hosokawa, T., Kennerley, S.W., Sloan, J. & Wallis, J.D. Single-neuron mechanisms underlying cost-benefit analysis in frontal cortex. J. Neurosci. 33, 1738517397 (2013).
  21. Padoa-Schioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223226 (2006).
  22. Rustichini, A. & Padoa-Schioppa, C. A neuro-computational model of economic decisions. J. Neurophysiol. 114, 13821398 (2015).
  23. Hunt, L.T. et al. Mechanisms underlying cortical activity during value-guided choice. Nat. Neurosci. 15, 470476 (2012).
  24. Jocham, G., Hunt, L.T., Near, J. & Behrens, T.E. A mechanism for value-guided choice based on the excitation-inhibition balance in prefrontal cortex. Nat. Neurosci. 15, 960961 (2012).
  25. Wang, X.J. Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36, 955968 (2002).
  26. Usher, M. & McClelland, J.L. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108, 550592 (2001).
  27. Georgopoulos, A.P., Schwartz, A.B. & Kettner, R.E. Neuronal population coding of movement direction. Science 233, 14161419 (1986).
  28. Fukushima, M., Saunders, R.C., Leopold, D.A., Mishkin, M. & Averbeck, B.B. Differential coding of conspecific vocalizations in the ventral auditory cortical stream. J. Neurosci. 34, 46654676 (2014).
  29. Willett, F.R., Suminski, A.J., Fagg, A.H. & Hatsopoulos, N.G. Improving brain-machine interface performance by decoding intended future movements. J. Neural Eng. 10, 026011 (2013).
  30. Lee, A.K. & Wilson, M.A. Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36, 11831194 (2002).
  31. Tremblay, S., Pieper, F., Sachs, A. & Martinez-Trujillo, J. Attentional filtering of visual information by neuronal ensembles in the primate lateral prefrontal cortex. Neuron 85, 202215 (2015).
  32. Johnson, A. & Redish, A.D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 1217612189 (2007).
  33. Jezek, K., Henriksen, E.J., Treves, A., Moser, E.I. & Moser, M.B. Theta-paced flickering between place-cell maps in the hippocampus. Nature 478, 246249 (2011).
  34. Seidemann, E., Meilijson, I., Abeles, M., Bergman, H. & Vaadia, E. Simultaneously recorded single units in the frontal cortex go through sequences of discrete and stable states in monkeys performing a delayed localization task. J. Neurosci. 16, 752768 (1996).
  35. Cai, X. & Padoa-Schioppa, C. Contributions of orbitofrontal and lateral prefrontal cortices to economic choice and the good-to-action transformation. Neuron 81, 11401151 (2014).
  36. Hunt, L.T., Behrens, T.E., Hosokawa, T., Wallis, J.D. & Kennerley, S.W. Capturing the temporal evolution of choice across prefrontal cortex. Elife 4, e11945 (2015).
  37. McDannald, M.A. et al. Model-based learning and the contribution of the orbitofrontal cortex to the model-free world. Eur. J. Neurosci. 35, 991996 (2012).
  38. Barron, H.C., Dolan, R.J. & Behrens, T.E. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 14921498 (2013).
  39. Takahashi, Y.K. et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron 80, 507518 (2013).
  40. Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267279 (2014).
  41. Schoenbaum, G., Takahashi, Y., Liu, T.L. & McDannald, M.A. Does the orbitofrontal cortex signal value? Ann. NY Acad. Sci. 1239, 8799 (2011).
  42. Luk, C.H. & Wallis, J.D. Choice coding in frontal cortex during stimulus-guided or action-guided decision-making. J. Neurosci. 33, 18641871 (2013).
  43. Strait, C.E. et al. Neuronal selectivity for spatial position of offers and choices in five reward regions. J. Neurophysiol. 115, 10981111 (2016).
  44. Lara, A.H. & Wallis, J.D. Executive control processes underlying multi-item working memory. Nat. Neurosci. 17, 876883 (2014).
  45. Farovik, A. et al. Orbitofrontal cortex encodes memories within value-based schemas and represents contexts that guide memory retrieval. J. Neurosci. 35, 83338344 (2015).
  46. McKenzie, S. et al. Hippocampal representation of related and opposing memories develop within distinct, hierarchically organized neural schemas. Neuron 83, 202215 (2014).
  47. Carmichael, S.T. & Price, J.L. Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 363, 615641 (1995).
  48. Carmichael, S.T. & Price, J.L. Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. J. Comp. Neurol. 363, 642664 (1995).
  49. Buesing, L., Bill, J., Nessler, B. & Maass, W. Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput. Biol. 7, e1002211 (2011).
  50. Habenschuss, S., Jonke, Z. & Maass, W. Stochastic computations in cortical microcircuit models. PLoS Comput. Biol. 9, e1003311 (2013).
  51. Asaad, W.F. & Eskandar, E.N. A flexible software tool for temporally-precise behavioral control in Matlab. J. Neurosci. Methods 174, 245258 (2008).
  52. Lara, A.H., Kennerley, S.W. & Wallis, J.D. Encoding of gustatory working memory by orbitofrontal neurons. J. Neurosci. 29, 765774 (2009).
  53. Boulay, C.B., Pieper, F., Leavitt, M., Martinez-Trujillo, J. & Sachs, A.J. Single-trial decoding of intended eye movement goals from lateral prefrontal cortex neural ensembles. J. Neurophysiol. 115, 486499 (2016).
  54. Averbeck, B.B., Crowe, D.A., Chafee, M.V. & Georgopoulos, A.P. Neural activity in prefrontal cortex during copying geometrical shapes. II. Decoding shape segments from neural ensembles. Exp. Brain Res. 150, 142153 (2003).
  55. Pesaran, B., Pezaris, J.S., Sahani, M., Mitra, P.P. & Andersen, R.A. Temporal structure in neuronal activity during working memory in macaque parietal cortex. Nat. Neurosci. 5, 805811 (2002).
  56. Crowe, D.A. et al. Prefrontal neurons transmit signals to parietal neurons that reflect executive control of cognition. Nat. Neurosci. 16, 14841491 (2013).

Download references

Author information

Affiliations

  1. Helen Wills Neuroscience Institute, University of California at Berkeley, Berkeley, California, USA.

    • Erin L Rich &
    • Jonathan D Wallis
  2. Department of Psychology, University of California at Berkeley, Berkeley, California, USA.

    • Jonathan D Wallis

Contributions

E.L.R. and J.D.W. designed the experiment and wrote the manuscript. E.L.R. collected and analyzed the data. J.D.W. supervised the project.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Recording locations (372 KB)

    OFC recording targets were determined from previously obtained MR images, and acute mapping of gray matter - white matter boundaries. Recording sites were bilateral in both subjects, and included areas 11 and 13. The middle image shows the center of the recording chambers in each subject (Subject M AP29, Subject N AP30, relative to the inter-aural line). The most anterior (top) and posterior (bottom) recording locations are also shown. Yellow lines indicate calculated electrode trajectories within each recording field.

  2. Supplementary Figure 2: Single-neuron encoding of reward information (101 KB)

    (a) Chosen reward value and chosen reward type significantly increased during choice relative to fixation (χ2 test, * p ≤ 0.05). Unchosen value and type were encoded by <5% of neurons in both epochs, consistent with previous reports (Padoa-Schioppa, C. and Assad, J.A. (2006) Nature 441, 223-226; Hosokawa, T., et al. (2013) J Neurosci 33,17385-17397). Bars show the percent of neurons whose firing rate significantly correlates with each regressor (p ≤ 0.01). (b) For finer temporal resolution, a multiple regression of the chosen and unchosen reward value was performed on bins of 100 ms stepped by 5 ms. Again, there was no evidence that the values of unchosen items were encoded by single neurons above chance levels. Bars show the percent of neurons whose firing rates significantly correlated with the chosen or unchosen regressor at each time in the trial. Chosen values were significantly encoded by OFC neurons, while unchosen values were not.

  3. Supplementary Figure 3: Decoding pictures from spikes or LFPs (135 KB)

    (a) LDA classification of all 8 task pictures based on single unit spiking data or LFP. For 8 categories, chance decoding is 12.5%. Picture identity could be decoded from both spikes and LFPs above chance, with peak accuracy approximately 250 ms after stimulus onset, though spikes provided higher accuracy than LFPs. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. Pictures were decoded from both spikes and LFPs above chance. (b) Confusion matrices with colored number blocks referring to ordinal picture value 1 to 4. There were two pictures of each value level, one predicting a primary reward and one predicting a secondary reward. Accurate classifications are along the main diagonal. Classifier confusions of pictures with the same value but different reward types are the off-diagonal shaded squares. Most misclassifications occurred between primary and secondary pictures of the same value level. In these cases, the decoder accurately identified the picture’s reward value but misidentified the reward type (primary or secondary). (c) Percent of trials correctly classified and those classified as the correct value but wrong reward type (primary or secondary) for each subject. Overall, the decoder based on spiking data provided more accurate picture classification. This was largely because the LFP decoder was just as likely to misidentify as correctly identify the reward type. Bars show the mean ± SEM across behavioral sessions for each subject. ** paired t-test p ≤ 0.005, *** p ≤ 0.001.

  4. Supplementary Figure 4: Decoding value from spikes and LFPs (238 KB)

    Since both spiking and LFP decoders identified reward types unreliably, we focused on decoding only value information. (a) LDA classified 8 pictures into 4 value categories, based on single unit spiking, LFP or both. Peak accuracy was approximately 44% with the spike decoder and 35% with the LFP decoder (chance = 25%). Decoding based on single neurons was more accurate than LFPs, and adding LFPs to single units did not significantly improve accuracy. Each line shows mean accuracy ±SEM across 44 behavioral sessions for both subjects. (b) Scatter plots of peak accuracy from spike or LFP decoding as shown in A for each recording session. Decoding accuracy improves with more simultaneously recorded signals. (c) LDAs classified pictures into value categories as described in A, with decoding based on analytic amplitudes of individual LFP bands. Theta (4-8 Hz) and high gamma (70-200 Hz) provided the best value decoding in both subjects; however a small amount of information was also contained in other frequency bands.

  5. Supplementary Figure 5: Comparison of categorical and continuous decoders (167 KB)

    The value assigned to the single pictures is categorical in nature, but the concept of value likely exists on a continuous scale. Therefore, we determined whether a linear model that estimates value on a continuous scale would perform better than the LDA, which attempts to discriminate discrete categories. (a) Each picture value was associated with a different distribution of decoded values, and these distributions were overlapping. Histograms of continuous value estimates decoded with a linear model from single picture trials in both subjects (n = 22845 trials). Each observation is a trial, and trials were grouped according to the value of the picture the subject was shown. Regressions of decoded value on actual value were highly significant for all sessions (minimum r2 = 0.16, maximum p = 1.32 x 10-20; median r2 = 0.49, median p = 1.7 × 10-79). (b) Confusion matrices from categorical and continuous decoders. The linear model performed similarly to the LDA (48% versus 44%, where chance = 25%), but tended to make different types of errors. The linear model had a greater tendency to confuse adjacent categories but distinguish non-adjacent categories. (c) Table of sensitivities and specificities for decoding each comparison with LDA or a linear model. The first column shows the value comparisons being tested. Red = better performing decoder. We directly assessed each model’s ability to delineate pairs (or triads) of adjacent categories by calculating the sensitivity and specificity of the categorization. Sensitivity is the hit rate (correctly identifying a given category when it is present) and specificity is the correct rejection rate (correctly identifying a given category is not present). Overall, the linear model performed worse, with lower sensitivities and specificities than the LDA for adjacent categories. Thus, the linear model’s assumption of no categories resulted in a poorer ability to resolve the categorical nature of the training data, and increases the bias toward confusing neighboring values. Given this, we used the categorical LDA to analyze choice trials.

  6. Supplementary Figure 6: Detection of states in synthetic data sets (222 KB)

    Three synthetic data sets were constructed from neural data collected on single picture trials. Each was comprised of a neural features x time x trials matrix, and corresponded to a hypothetical model of how neural data might represent the two available options. (a) A schematic illustrating how each data set was created for one choice trial. Blue = value 1, Red = value 4. In the States data set, alternating hidden states were created based on two choice options, e.g. 1 and 4. The number of states was determined randomly and independently for every trial, but approximated the number of states per trial observed in the real data (mean 4.87, median 5 synthesized states per trial). These states served as the ground truth for this synthetic data set. For every time point, two trials were randomly drawn from the distribution of single picture trials corresponding to the hidden state at that time, and averaged. This was done repeatedly to create a time series with alternating states. The result was a series of samples drawn from one value distribution for a short period of time, followed by a series drawn from another distribution to create alternating states. In the Averages set, we modeled the situation where all neural features responded by encoding the average of the two option values, such that no hidden states were defined. For each time point, one trial was randomly drawn from the distribution of single picture trials corresponding to the each choice option and averaged, as if each neural feature encoded the mean of the two options. In the Split data set, we modeled the situation where some features responded by encoding option A while others responded by encoding option B. Here, two trials were randomly drawn from the distribution of one of the two choice options and averaged. Which option was “encoded” varied by neural feature, creating a time series of consistent but mixed signals. The three synthetic data sets were submitted to the same LDA as the real choice data. That is, the LDA was trained on the same training set and classified the synthetic data in the same sliding windows. (b) A representative single trial showing the results of submitting each data set to the LDA used on the real choice data. All three panels show the same trial, but qualitatively the three data sets produced starkly different results. Alternating states were recovered from the States data set, but only low probability classifications were recovered from the other two sets. (c) The posterior probabilities of the most likely category were averaged over time for each trial in which the options did not have the same values (n = 3782). The States set recovered the states on which the set was built. In most cases, the correct values had high posterior probabilities. On other trials, the correct values were the most probable but by a smaller margin, a result of the random selection of observations serving as input data. In contrast, the Averages and Split sets produced consistently lower probability classifications. Without thresholding the output, the majority of observations from the Averages and Split data sets were classified with probabilities < 0.5 (gray line). (d) For all data sets, states were defined by criteria in the main paper, and the number of transitions between chosen and unchosen states within each trial was calculated. The real data showed a clear distribution centered around 3.5 state transitions, as described in the main paper. The States data largely showed 5 transitions per trial, which was unsurprising, since the set had been generated to do precisely that. Some trials (23.5%) showed no transitions that met the criteria for inclusion. These were primarily trials from sessions with a smaller number of features and poorer classification accuracy in the training set (see Supplementary Fig. 4B), suggesting that this is related to weak value signaling in the input data. We hypothesize that repeatedly sampling from the same distribution that includes some high-fidelity trials and some low-fidelity trials, and mixing them together in time produces an overall lower confidence classification, so that some observations fell below the 0.5 cut-off and were thresholded out. For the Averages and Split data sets, a large proportion of trials found zero state transitions (43.5% and 60.4% respectively). This is in contrast to only 7.8% of trials in the real data. Furthermore, state transitions that did occur in the Averages and Split sets formed a wide, noisy distribution, in contrast to the tighter distribution in the real data. The mean and standard deviation of the number of state transitions per trial excluding trials with zero transitions was 3.8 ± 2.2 for the real data, 4.5 ± 1.2 for the States data, 7.1 ± 6.3 for the Averages data, and 6.3 ± 5.9 for the Split data. Pairwise tests for equality of variance found differences for all comparisons (F-tests, all F > 3.5, Bonferroni corrected p < 0.001), except for the comparison between Split and Averages (F1910,1340 = 1.12, Bonferroni corrected p = 0.12). Therefore, even when artifactual states were recovered from the data sets where no states were present, the distribution of transitions was flat, unlike real states or artificially created states. (e) Non-parametric comparison of state transitions per trial across the four data sets found more transitions in the real and States data than Averages or Split. A Kruskal-Wallis test was highly significant (χ2(3) = 1238, p < 0.001), and post-hoc comparison of mean ranks from this test found no differences between real and States data (Tukey’s HSD p = 0.98), though both had more transitions than the Averages or Split data sets (Tukey’s HSD, p < 0.001). Each point is the mean rank ± SEM. ** p < 0.001. Overall, these analyses support the notion that neural states do exist in our data and are not a result of noisy decoding by the LDA. We intentionally created data sets with no states but different types of mixed signals, and found that the result does not match the clear signals we recovered from the real data. In contrast, synthetic data with clear states built in bore closer resemblance to the real data.

  7. Supplementary Figure 7: Variance in population vectors corresponds to neural states (299 KB)

    (a) An example session showing the distribution of time bins classified chosen and unchosen options by LDA, projected into principal components (PC) space. Principal components analysis (PCA) was carried out on multi-dimensional population data without reference to LDA states, as described in online Methods, and state designations are overlaid. X- and y-axes are two PCs (A and B) and each data point is a time bin. Trials (n = 130) are separated by the options available on each trial. For example ‘1v1’ indicates a choice between two options of value 1. The + indicates the center of each value distribution across all trials in the session. States identifying each of the available options occupy a different region of PC space, and when two options are available the trajectory travels through each region. (b) Two-dimensional Gaussians fit to those points in A that were identified as each of the four value states by LDA. Color scales are normalized, and the + indicates the center of the distribution. (c) Five representative trials from the session shown in A and B with the entire trial trajectory shown by the gray line, and time bins classified as chosen and unchosen states shown by colored dots. To determine which choice-related factors account for the most variance within a PC, we compared three multiple regression models. The first model included the values of the chosen and unchosen pictures on each trial. This model quantified the variance across trials related to the value of the options on offer, without discriminating any states within a trial. The second model identified whether or not each time point was categorized as belonging to each of the four value states by LDA, with four binary regressors. This was done so that there was no assumption of relationship between the states. Instead, the model would find variability related to the categorical separation of one value from the rest. The third model looked for variance in the PC attributed to the alternative option, which was not identified by the LDA at that time. That is, the same time points as the ‘Current States’ model were labeled by four binary regressors, according to the value of the other picture available on that trial. (d) The absolute value of the beta coefficients from three multiple regression models (option values / current state / alternate state), attempting to explain variance in each PC. Coefficients were consistently highest for the model based on the value states identified by LDA, and were also highest for lower PCs. These lower PCs account for the most variance in the population vectors themselves, indicating that the model fits prominent temporal features of the data. Each point is the mean ± SEM. (e) PCs were sorted by the percentage of their variance that was explained by each model. The best explained PC had approximately 16% of its variance accounted for by the current states model, and the best 5 each had >5% of variance explained by this model. The model based on the value of the alternate state explains essentially zero variance and is indistinguishable from the x-axis in this figure. Each point is the mean ± SEM. (f) The same data as shown in e, except that PCs are ordered according to how much variance they explain in the population vectors, illustrating that the dimensions most influenced by LDA state were approximately PCs 5 to 15. The lowest PCs (1-4) tended to capture some elements of ‘drift’ in neural signals across trials that were not associated with task variables. (g) Formal model comparisons were conducted by calculating the Akaike Information Criterion (AIC) from the deviance of each model, which corrects for different numbers of model parameters. The model with the lowest AIC has the best fit. In nearly all cases, particularly in the lower PCs, the model based on neural states identified by the LDA best fits the data. Overall, the current states model (red) performs the best, especially for the lowest PCs. The alternate states model (gray) is again indistinguishable from the x-axis.

  8. Supplementary Figure 8: Unsupervised clustering of population vectors reveals neural states (187 KB)

    In the main paper, we describe the dynamics of neural states as switching back and fourth between two available options. A key element of this claim is that neural activity should transiently enter ‘state A’, move to a different state (‘B’), and then return to the original state. To assess this in the population vectors without using LDA, we focused on the five PCs with the highest r2 value for the “Current States” model above (Supplementary Fig. 7e). In these dimensions, the model explained >5% of the overall variance, while the rest were < 5%. (a) An example trial from the session shown in Supplementary Figure 7 a-c. In this trial, multiple time bins (colored circles) were labeled by LDA as value 4 (red) or value 1 (blue). Based on the criteria described in the main paper, two value 1 states were identified (x and x’) and two value 4 states were identified (y and y’). The + shows the center of the 1 and 4 distributions across all trials from this session. (b) For all states identified by LDA, we calculated the Mahalanobis distance in 5 dimensions to each of 4 centroids, corresponding to the 4 option values. We used 50-fold cross-validation and ensured that the points being measured did not contribute to the computation of the distribution centers. Labeled states were closest to their respective centers, confirming that LDA and PCA extracted similar information from the population vectors contributing to each analysis. Plotted are the means ± SEM across sessions. (c) For every trial in which the same state was detected more than once, the same 4 Mahalanobis distances were calculated, and the first occurrence of the state was compared to the second (e.g. x versus x’). 10-fold cross-validation ensured that the trial being assessed did not contribute to the distributions it was compared to. Panels show states corresponding to a particular value. Plots are the mean ± SEM across trials. The first and second occurrence of a state tended to fall the same region of PC space, and there was no evidence that states change systematically over the course of a trial (2-way ANOVAs of occurrence x center. All F(occurrence)1,347 < 0.1, p > 0.7. All F(centers)3,347 > 4, p ≤ 0.006). (d) We also considered states that were interleaved within a trial, in the pattern of A-B-A (n = 11720, each trial could have multiple interleaved patterns). If states are discrete and consistent, the population vectors should look similar in non-contiguous A states, but different in interleaved B states. For each sequence, the Mahalanobis distances to the initial state, A, were calculated. There was no difference between A states, which were both very close to the A center, but the interleaved B state was significantly farther away (One-way ANOVA F2,35157 = 237 p = 6 x 10-103. Tukey’s HSD post-hoc comparisons: A vs. A’ p = 0.94, A vs. B and A’ vs. B p < 0.0001). This is consistent with neural activity switching back and forth between states within a trial, as described in the main paper. The plot shows mean ± SEM. * p < 0.0001. (e) We also tested whether prominent features of population-level variance, extracted without reference to predefined states (i.e. in an unsupervised manner), map onto the LDA states. The 5 PCs that were best predicted by the current states model illustrated in Supplementary Figure 7 were separated with k-means into 5 clusters, under the assumption that there would be four value distributions and a poorly classified noise cluster. The clusters were renumbered for each session according to their highest probability value category to allow comparison across sessions. If the clusters mapped onto states, one value state from the LDA would be represented more than others in a given cluster. The median (across sessions) percent of time bins in each cluster labeled by the LDA as belonging to each state is shown in the first heat plot. Cluster 1 clearly corresponded to value 1, clusters 3 and 4 were reasonably selective for values 2 and 3 respectively, and cluster 5 mostly isolated value 4 with some value 3. Further, the “confusions” of these assignments tended to occur with neighboring state values, indicating that, even if the boundaries vary by analysis or if different sessions would be better fit by a different number of clusters, both approaches extracted similar information from the data. The median percent of time bins in each cluster that came from trials with each chosen and unchosen option value are shown in the second and third heat plots. Here, clusters were evenly distributed across the trial types, such that the proportion of values associated with a cluster simply reflected the overall frequency of those values across trials (the asymmetry reflects that fact that animals most frequently chose higher values and most frequently did not choose lower values).

  9. Supplementary Figure 9: Decoded representations predict within-condition choice times (243 KB)

    The strength of chosen (red) and unchosen (blue) representations was averaged in 20 ms bins, stepped forward by 5 ms and used to predict choice times. (a) Each pair of picture values was analyzed separately with multiple regressions, and β coefficients are plotted. Time bins significant at p ≤ 0.05 (uncorrected) are highlighted. Most comparisons show a trend toward negative correlations between choice times and the strength of chosen representations and positive correlations between choice times and unchosen representations. (b) The six curves in A were aggregated to compute statistical significance. The plot shows the mean beta values (fill = ± SEM; n = 6). Significance was calculated by aggregating the mean squared error (MSE) from all trials. The value √(MSE/n) is the standard error of the mean beta coefficient, and was used to compute a t-statistic and its 2-sided significance level. Shaded blocks in the plot show regions where the aggregate beta for chosen (red) or unchosen (blue) strength was significantly different from zero. (p ≤ 0.05 × 6 consecutive time bins to correct for multiple comparisons). For this analysis, trials with very short choice times (<170 ms) were excluded to reduce the positive skew in the distributions when power was reduced (n = 2795 trials).

  10. Supplementary Figure 10: Quick and deliberative decisions, separately by subject (280 KB)

    (a) The average representational strength of the chosen picture (red), the unchosen picture (blue) or one of the unavailable options (averaged across two unavailable options; green) is shown for quick and deliberative decisions as in Figure 3, but separated by subject (Subject M: n = 272 trials, Subject N: n = 166 trials). Both subjects show similar effects. When a quick decision is made, representations of the chosen item dominate, but when subjects deliberate, chosen and unchosen representations have similar strength. (b) Chosen and unchosen representations as in a, but aligned to the time of the choice. Chosen representations do not increase in strength at the time of the choice during deliberation trials. All plots show mean ± SEM.

PDF files

  1. Supplementary Text and Figures (2,245 KB)

    Supplementary Figures 1–10

  2. Supplementary Methods Checklist (471 KB)

Additional data