Three synthetic data sets were constructed from neural data collected on single picture trials. Each was comprised of a neural features x time x trials matrix, and corresponded to a hypothetical model of how neural data might represent the two available options. (a) A schematic illustrating how each data set was created for one choice trial. Blue = value 1, Red = value 4. In the States data set, alternating hidden states were created based on two choice options, e.g. 1 and 4. The number of states was determined randomly and independently for every trial, but approximated the number of states per trial observed in the real data (mean 4.87, median 5 synthesized states per trial). These states served as the ground truth for this synthetic data set. For every time point, two trials were randomly drawn from the distribution of single picture trials corresponding to the hidden state at that time, and averaged. This was done repeatedly to create a time series with alternating states. The result was a series of samples drawn from one value distribution for a short period of time, followed by a series drawn from another distribution to create alternating states. In the Averages set, we modeled the situation where all neural features responded by encoding the average of the two option values, such that no hidden states were defined. For each time point, one trial was randomly drawn from the distribution of single picture trials corresponding to the each choice option and averaged, as if each neural feature encoded the mean of the two options. In the Split data set, we modeled the situation where some features responded by encoding option A while others responded by encoding option B. Here, two trials were randomly drawn from the distribution of one of the two choice options and averaged. Which option was “encoded” varied by neural feature, creating a time series of consistent but mixed signals. The three synthetic data sets were submitted to the same LDA as the real choice data. That is, the LDA was trained on the same training set and classified the synthetic data in the same sliding windows. (b) A representative single trial showing the results of submitting each data set to the LDA used on the real choice data. All three panels show the same trial, but qualitatively the three data sets produced starkly different results. Alternating states were recovered from the States data set, but only low probability classifications were recovered from the other two sets. (c) The posterior probabilities of the most likely category were averaged over time for each trial in which the options did not have the same values (n = 3782). The States set recovered the states on which the set was built. In most cases, the correct values had high posterior probabilities. On other trials, the correct values were the most probable but by a smaller margin, a result of the random selection of observations serving as input data. In contrast, the Averages and Split sets produced consistently lower probability classifications. Without thresholding the output, the majority of observations from the Averages and Split data sets were classified with probabilities < 0.5 (gray line). (d) For all data sets, states were defined by criteria in the main paper, and the number of transitions between chosen and unchosen states within each trial was calculated. The real data showed a clear distribution centered around 3.5 state transitions, as described in the main paper. The States data largely showed 5 transitions per trial, which was unsurprising, since the set had been generated to do precisely that. Some trials (23.5%) showed no transitions that met the criteria for inclusion. These were primarily trials from sessions with a smaller number of features and poorer classification accuracy in the training set (see Supplementary Fig. 4B), suggesting that this is related to weak value signaling in the input data. We hypothesize that repeatedly sampling from the same distribution that includes some high-fidelity trials and some low-fidelity trials, and mixing them together in time produces an overall lower confidence classification, so that some observations fell below the 0.5 cut-off and were thresholded out. For the Averages and Split data sets, a large proportion of trials found zero state transitions (43.5% and 60.4% respectively). This is in contrast to only 7.8% of trials in the real data. Furthermore, state transitions that did occur in the Averages and Split sets formed a wide, noisy distribution, in contrast to the tighter distribution in the real data. The mean and standard deviation of the number of state transitions per trial excluding trials with zero transitions was 3.8 ± 2.2 for the real data, 4.5 ± 1.2 for the States data, 7.1 ± 6.3 for the Averages data, and 6.3 ± 5.9 for the Split data. Pairwise tests for equality of variance found differences for all comparisons (F-tests, all F > 3.5, Bonferroni corrected p < 0.001), except for the comparison between Split and Averages (F1910,1340 = 1.12, Bonferroni corrected p = 0.12). Therefore, even when artifactual states were recovered from the data sets where no states were present, the distribution of transitions was flat, unlike real states or artificially created states. (e) Non-parametric comparison of state transitions per trial across the four data sets found more transitions in the real and States data than Averages or Split. A Kruskal-Wallis test was highly significant (χ2(3) = 1238, p < 0.001), and post-hoc comparison of mean ranks from this test found no differences between real and States data (Tukey’s HSD p = 0.98), though both had more transitions than the Averages or Split data sets (Tukey’s HSD, p < 0.001). Each point is the mean rank ± SEM. ** p < 0.001. Overall, these analyses support the notion that neural states do exist in our data and are not a result of noisy decoding by the LDA. We intentionally created data sets with no states but different types of mixed signals, and found that the result does not match the clear signals we recovered from the real data. In contrast, synthetic data with clear states built in bore closer resemblance to the real data.