In the main paper, we describe the dynamics of neural states as switching back and fourth between two available options. A key element of this claim is that neural activity should transiently enter ‘state A’, move to a different state (‘B’), and then return to the original state. To assess this in the population vectors without using LDA, we focused on the five PCs with the highest r2 value for the “Current States” model above (Supplementary Fig. 7e). In these dimensions, the model explained >5% of the overall variance, while the rest were < 5%. (a) An example trial from the session shown in Supplementary Figure 7 a-c. In this trial, multiple time bins (colored circles) were labeled by LDA as value 4 (red) or value 1 (blue). Based on the criteria described in the main paper, two value 1 states were identified (x and x’) and two value 4 states were identified (y and y’). The + shows the center of the 1 and 4 distributions across all trials from this session. (b) For all states identified by LDA, we calculated the Mahalanobis distance in 5 dimensions to each of 4 centroids, corresponding to the 4 option values. We used 50-fold cross-validation and ensured that the points being measured did not contribute to the computation of the distribution centers. Labeled states were closest to their respective centers, confirming that LDA and PCA extracted similar information from the population vectors contributing to each analysis. Plotted are the means ± SEM across sessions. (c) For every trial in which the same state was detected more than once, the same 4 Mahalanobis distances were calculated, and the first occurrence of the state was compared to the second (e.g. x versus x’). 10-fold cross-validation ensured that the trial being assessed did not contribute to the distributions it was compared to. Panels show states corresponding to a particular value. Plots are the mean ± SEM across trials. The first and second occurrence of a state tended to fall the same region of PC space, and there was no evidence that states change systematically over the course of a trial (2-way ANOVAs of occurrence x center. All F(occurrence)1,347 < 0.1, p > 0.7. All F(centers)3,347 > 4, p ≤ 0.006). (d) We also considered states that were interleaved within a trial, in the pattern of A-B-A (n = 11720, each trial could have multiple interleaved patterns). If states are discrete and consistent, the population vectors should look similar in non-contiguous A states, but different in interleaved B states. For each sequence, the Mahalanobis distances to the initial state, A, were calculated. There was no difference between A states, which were both very close to the A center, but the interleaved B state was significantly farther away (One-way ANOVA F2,35157 = 237 p = 6 x 10-103. Tukey’s HSD post-hoc comparisons: A vs. A’ p = 0.94, A vs. B and A’ vs. B p < 0.0001). This is consistent with neural activity switching back and forth between states within a trial, as described in the main paper. The plot shows mean ± SEM. * p < 0.0001. (e) We also tested whether prominent features of population-level variance, extracted without reference to predefined states (i.e. in an unsupervised manner), map onto the LDA states. The 5 PCs that were best predicted by the current states model illustrated in Supplementary Figure 7 were separated with k-means into 5 clusters, under the assumption that there would be four value distributions and a poorly classified noise cluster. The clusters were renumbered for each session according to their highest probability value category to allow comparison across sessions. If the clusters mapped onto states, one value state from the LDA would be represented more than others in a given cluster. The median (across sessions) percent of time bins in each cluster labeled by the LDA as belonging to each state is shown in the first heat plot. Cluster 1 clearly corresponded to value 1, clusters 3 and 4 were reasonably selective for values 2 and 3 respectively, and cluster 5 mostly isolated value 4 with some value 3. Further, the “confusions” of these assignments tended to occur with neighboring state values, indicating that, even if the boundaries vary by analysis or if different sessions would be better fit by a different number of clusters, both approaches extracted similar information from the data. The median percent of time bins in each cluster that came from trials with each chosen and unchosen option value are shown in the second and third heat plots. Here, clusters were evenly distributed across the trial types, such that the proportion of values associated with a cluster simply reflected the overall frequency of those values across trials (the asymmetry reflects that fact that animals most frequently chose higher values and most frequently did not choose lower values).