(a) An example session showing the distribution of time bins classified chosen and unchosen options by LDA, projected into principal components (PC) space. Principal components analysis (PCA) was carried out on multi-dimensional population data without reference to LDA states, as described in online Methods, and state designations are overlaid. X- and y-axes are two PCs (A and B) and each data point is a time bin. Trials (n = 130) are separated by the options available on each trial. For example ‘1v1’ indicates a choice between two options of value 1. The + indicates the center of each value distribution across all trials in the session. States identifying each of the available options occupy a different region of PC space, and when two options are available the trajectory travels through each region. (b) Two-dimensional Gaussians fit to those points in A that were identified as each of the four value states by LDA. Color scales are normalized, and the + indicates the center of the distribution. (c) Five representative trials from the session shown in A and B with the entire trial trajectory shown by the gray line, and time bins classified as chosen and unchosen states shown by colored dots. To determine which choice-related factors account for the most variance within a PC, we compared three multiple regression models. The first model included the values of the chosen and unchosen pictures on each trial. This model quantified the variance across trials related to the value of the options on offer, without discriminating any states within a trial. The second model identified whether or not each time point was categorized as belonging to each of the four value states by LDA, with four binary regressors. This was done so that there was no assumption of relationship between the states. Instead, the model would find variability related to the categorical separation of one value from the rest. The third model looked for variance in the PC attributed to the alternative option, which was not identified by the LDA at that time. That is, the same time points as the ‘Current States’ model were labeled by four binary regressors, according to the value of the other picture available on that trial. (d) The absolute value of the beta coefficients from three multiple regression models (option values / current state / alternate state), attempting to explain variance in each PC. Coefficients were consistently highest for the model based on the value states identified by LDA, and were also highest for lower PCs. These lower PCs account for the most variance in the population vectors themselves, indicating that the model fits prominent temporal features of the data. Each point is the mean ± SEM. (e) PCs were sorted by the percentage of their variance that was explained by each model. The best explained PC had approximately 16% of its variance accounted for by the current states model, and the best 5 each had >5% of variance explained by this model. The model based on the value of the alternate state explains essentially zero variance and is indistinguishable from the x-axis in this figure. Each point is the mean ± SEM. (f) The same data as shown in e, except that PCs are ordered according to how much variance they explain in the population vectors, illustrating that the dimensions most influenced by LDA state were approximately PCs 5 to 15. The lowest PCs (1-4) tended to capture some elements of ‘drift’ in neural signals across trials that were not associated with task variables. (g) Formal model comparisons were conducted by calculating the Akaike Information Criterion (AIC) from the deviance of each model, which corrects for different numbers of model parameters. The model with the lowest AIC has the best fit. In nearly all cases, particularly in the lower PCs, the model based on neural states identified by the LDA best fits the data. Overall, the current states model (red) performs the best, especially for the lowest PCs. The alternate states model (gray) is again indistinguishable from the x-axis.