Introduction

The NMDA receptor (NMDAR) subserves memory mechanisms at several timescales, including sustained working memory delay activity1,2 and different temporal components of synaptic potentiation3,4,5. In addition, hypofunction of NMDARs is linked to psychiatric disease, in particular schizophrenia6, and it possibly contributes to abnormal working memory function in patients with schizophrenia7,8. Indeed, reduced prefrontal NMDAR density characterizes this disease9. Yet, the specific neural alterations by which NMDAR hypofunction could lead to memory deficits in schizophrenia are still under debate7,8. Here, we studied working memory function in healthy controls, patients with schizophrenia, and patients recovering from anti-NMDAR encephalitis (see “Methods“ section and Supplementary Table 1). Anti-NMDAR encephalitis is characterized by an antibody-mediated reduction of NMDARs10, accompanied by initial psychosis and long-lasting memory deficits11,12. The prevalence of positive symptoms during the early stages of the disease causes frequent misdiagnosis as a schizophrenia spectrum disorder13,14. Here, we tested patients that had overcome acute stages, and had progressed to a more stabilized period with some positive symptoms but dominated by negative and cognitive symptoms, comparable to those in stabilized schizophrenia patients15. Due to the parallels in neurobiology, clinical aspects, and cognition of the two diseases, we expected working memory deficits in anti-NMDAR encephalitis to qualitatively resemble those in schizophrenia. This correspondence allows linking alterations in working memory to the NMDAR in both patient groups.

We assessed memory alterations in a visuospatial delayed-response task (Fig. 1a) on two coexisting temporal scales: single-trial working memory precision as a proxy of active memory maintenance during short delays, and serial dependence of responses on previously memorized stimuli16,17 (serial biases, Fig. 1b) as a read-out of passive information maintenance across trials. Our results show reduced serial dependence but intact working memory precision in both patient populations. Neural correlates of this task have been identified in monkey prefrontal cortex18,19,20, inspiring computational models that can capture key aspects of neural dynamics and behavior18,21,22. The biophysical detail of these models permits to investigate how NMDAR hypofunction at different synaptic sites affects circuit dynamics and working memory. Candidate mechanisms are a disturbed balance between cortical excitation and inhibition (excitation/inhibition balance), as it is observed in schizophrenia and in studies using NMDAR antagonists (e.g., ketamine)2,6,23,24, and alterations in NMDAR-regulated short-term synaptic potentiation3,4,5,25. In the modeling section of this study, we systematically test the potential of these candidate mechanisms for explaining our behavioral findings. We conclude that a reduction in short-term potentiation in a network model of working memory most parsimoniously reproduces the experimentally observed memory alterations in schizophrenia and anti-NMDAR encephalitis.

Fig. 1: Reduced working memory-dependent serial dependence in anti-NMDAR encephalitis and schizophrenia.
figure 1

a In each trial, subjects were to remember a stimulus that appeared for 0.25 s at a randomly chosen circular location with fixed distance from the center. Delay lengths varied randomly between trials (0, 1 or 3 s). Subjects made a mouse click to report the remembered location and started the next trial by moving the mouse back to the screen’s center during the inter-trial-interval (ITI). b Serial dependence is measured as a systematic shift of responses towards previous target locations. Attractive effects depend on the distance θd between previous and current stimulus. c Precision for each subject and delay was inversely estimated as the circular s.d. of bias-corrected error distributions (“Methods”). For longer delays, participants’ responses were less precise (delay, F(2,147) = 76.87, p < 1e−16). There were no overall or delay-dependent group differences in precision (group, F(2,147) = 1.74, p = 0.18; group × delay, F(4,147) = 0.07, p = .99, all p-values from ANOVA). Error bars indicate ±s.e.m. df, Serial dependence by group and delay length. Serial dependence is calculated as the ‘folded’ error \(\theta ^{e\prime }\) for different θd (dashed lines; “Methods”). Solid lines show linear model fits (“Methods”), omitting intercepts and negative values of θd. Shading, ±s.e.m. across pooled trials from n = 19 healthy controls (ctrl), n = 17 patients with schizophrenia (schz), and n = 16 patients with anti-NMDAR encephalitis (enc). gi Individual (random coefficients; dots) and group estimates of serial bias strength (fixed effects; error bars indicate mean and bootstrapped 95% C.I. of the mean) by delay. g Serial dependence was repulsive in 0 s trials (DoG(θd), F(1,52) = 12.67, p = 0.0008), independently of group (group × DoG(θd), F(2,52) = 0.46, p = 0.63). h For 1 s trials, group differences in serial dependence emerged (group × DoG(θd), F(2,48) = 6.52, p = 0.003) between ctrl and schz (t = 3.73, p = 7e−4, Cohen’s d = 1.28) and enc and schz (t = 2.73, p = 0.01, Cohen’s d = 0.98). i After 3 s delay, both patient groups showed reduced biases compared to ctrl (group × DoG(θd), F(2,50) = 15.35, p = 6e−5; ctrl vs enc, t = 4.14, p = 2e−4, Cohen’s d = 1.45; ctrl vs schz, t = 6.44, p = 2e−7, Cohen’s d = 2.21, and enc vs schz, t = 3.40, p = 0.002, Cohen’s d = 1.22). All t-tests, two-sided. In all panels, single data points show data from n = 19 healthy controls (ctrl), n = 17 patients with schizophrenia (schz), and n = 16 patients with anti-NMDAR encephalitis (enc).

Results

Unaltered working memory precision in both patients groups

First, we sought to identify alterations in single-trial working memory precision, as an indication of a possible dysfunction of activity-based memory maintenance. Meta-analyses report mainly negative findings for delay-dependent precision impairments in schizophrenia and ketamine studies7,26 (but see ref. 27). We calculated the circular standard deviation of bias-corrected response errors (“Methods”) as an inverse estimate of precision for each participant and delay. Correcting for biases as a systematic source of error allowed us to estimate memory precision independently of serial biases. For all groups, precision decreased equally with delay (Fig. 1c), indicating spared active working memory maintenance over short delays of up to 3 s in encephalitis and schizophrenia.

Patients’ memories are less biased towards previous memories

Next, we tested whether NMDAR-related memory alterations could be observed at intermediate timescales by measuring serial dependence. Serial dependence is defined as a systematic shift of responses towards previously remembered, uncorrelated stimuli16 (Fig. 1b), revealing that traces of recently processed stimuli persist in memory circuits and are integrated with new memories. Importantly, these attractive biases emerge over the trial’s memory delay, indicating a dependence on memory processes28,29. In conditions without memory requirements, only small repulsive biases are present, possibly generated during perceptual processing28,29,30. To assess NMDAR-related differences in serial dependence, we modeled single-trial errors θe as a linear mixed model of delay length, group, and a non-linear basis function of the distance θd between consecutive stimuli16,29 (derivative-of-Gaussian, DoG(θd), “Methods”, Eq. (1); Supplementary Fig. 1), and we assessed the significance of fixed effects through ANOVA tables (“Methods”).

Serial dependence explained only a small fraction of single-trial errors in working memory (conditional R2 = 0.03 for the linear model presented in Eq. (1)), reflecting its small magnitude compared to the typical extent of response inaccuracies (Fig. 1c), but it depended strongly on relevant task factors: In accordance with previous results28,29, we found a dependence of attractive bias strength on memory delay (delay × DoG(θd), (F(2,58) = 13.89, p = 1e−5). Moreover, biases differed between groups of participants (group × DoG(θd), F(2,49) = 9.68, p = 0.0003), especially when comparing groups for different delay lengths (group × delay × DoG(θd), F(4,58) = 8.45, p = 2e−5). Figure 1d–f shows linear model fits and average bias curves for 0, 1, and 3 s delays (see Supplementary Figs. 24 for single-subject bias curves and fits). Groupwise linear models (Eq. (2)) allowed to assess the delay dependence of biases within each population (delay × DoG(θd)): For healthy controls, initially repulsive biases became gradually more attractive with delay length (F(2,17) = 26.91, p = 6e−6; Supplementary Fig. 5). Encephalitis patients showed a qualitatively similar, but reduced pattern (F(2,23) = 5.06, p = 0.015). In contrast, no attractive bias emerged over delay in patients with schizophrenia (F(2,16) = 1.31, p = 0.30). Rather, a repulsive bias dominated all delay lengths in this group (DoG(θd), F(1,16) = 9.07, p = 0.008). Post-hoc tests and between-group comparisons are reported in Fig. 1g–i.

Serial dependence is known to fade with increasing inter-trial intervals (ITI)29. We controlled for ITI length by including ITI × DoG(θd) as a covariate in our linear model (“Methods”, Eq. (4); Supplementary Fig. 6): For each additional second of ITI, serial bias decreased by 0.46 ± 0.12° (mean ± s.d.). However, group differences in serial dependence remained unchanged after including the covariate. The timescale of serial dependence was further defined by how many past trials influenced the current response. We observed a much weaker delay-dependent bias towards the penultimate trial, but there was no consistent evidence for group differences (Supplementary Fig. 7a–c).

Antipsychotic medication does not explain group differences

We also controlled for potential effects of antipsychotic medication in chlorpromazine equivalents (CPZ, “Methods”) in light of significant group differences in CPZ estimates (Supplementary Table 1), and an association of CPZ with individual serial bias strength within groups (Supplementary Fig. 8). When including CPZ as a covariate (“Methods”, Eq. (5)), delay-dependent biases still markedly differed between groups (Supplementary Fig. 8, caption). We designed two additional analyses to demonstrate the independence of group differences from the effect of antipsychotic medication: First, we showed that the difference in serial dependence persisted when we compared healthy controls to the unmedicated subset of encephalitis patients (n = 12 out of 16 encephalitis patients, Supplementary Fig. 9a–f). Second, we designed an analysis to test conservatively the group effect once we removed all the explanatory power of CPZ: We first fitted single-trial errors θe as a function of CPZ and its one- and two-way interactions with delay and DoG(θd) in all subjects. On average, CPZ in patients with schizophrenia (370.6 ± 462.4 mg day−1, mean ± s.d.) explained a reduction of 1.06° in biases in the 3 s delay condition, and only a reduction of 0.08° in encephalitis patients (with CPZ equivalents of 26.6 ± 52.7 mg day−1, mean ± s.d.). Residuals of the linear model, now free of linear and multiplicative effects of CPZ estimates, were fitted as a function of group, delay, DoG(θd), and their interactions. Supplementary Fig. 9g–l shows that group differences in memory-dependent biases remained marked (a reduction of 2.51° for schz, and 1.62° for enc in the 3 s delay condition) and highly significant even after conservatively controlling for CPZ.

Encephalitis patients’ biases increase with recovery

We did not find correlations between individuals’ bias estimates for 3 s delay trials and the severity of psychiatric symptoms for encephalitis or schizophrenia patients (Supplementary Fig. 8 and Supplementary Table 1). These between-subjects analyses were possibly underpowered, so we designed a within-subject longitudinal assessment for n = 14 encephalitis patients that returned for a follow-up session after 3–12 months (mean 8.5 months). As expected, clinical symptoms improved in these patients (Supplementary Table 2) and we found that serial dependence normalized with the patients’ recovery (Eq. (8); Supplementary Fig. 10). Interestingly, for this subsample of encephalitis patients, positive and general symptoms measured in the PANSS scale correlated with serial dependence in the follow-up session (PANSS pos, r = −0.70, C.I. = [−0.90, −0.26], p = 0.006; PANSS gen, r = −0.62, C.I. = [−0.87, −0.13], p = 0.02), but again not significantly in the baseline session (PANSS pos, r = −0.38, C.I. = [−0.76, 0.19], p = 0.19; PANSS gen, r = −0.02, C.I. = [−0.54, 0.52], p = 0.94), although the direction of the effect was congruent between the two sessions. Moreover, patients with a stronger longitudinal normalization of biases improved more on the scale of positive symptoms (PANSS pos) in the follow-up session, when compared to the baseline session, r = −0.54, C.I. = [−0.83, −0.02], p = 0.04 (Supplementary Fig. 10g; all correlations, Pearson’s r).

Together, our experimental results show no differences in single-trial memory maintenance, but a strong reduction of delay-dependent biases in anti-NMDAR encephalitis that ameliorates with patients’ recovery, and a complete absence of attractive biases in patients with schizophrenia. These findings are not explained by ITI length, general response correlations between trials (Supplementary Fig. 7d–f), response biases with respect to cardinal directions (Supplementary Fig. 11), or medication (Supplementary Fig. 8). Our conclusion is thus that alterations at the neural circuit level, related to NMDAR hypofunction, reduce serial dependence gradually, up to the point of completely disrupting attraction to previous stimuli. A prevailing idea associates NMDAR hypofunction in schizophrenia primarily to synapses onto GABAergic interneurons23, while the role of NMDARs in working memory has been emphasized in synapses between pyramidal neurons1,2,21. Alternatively, NMDARs could be involved in mechanisms directly associated with the generation of serial biases, such as short-term plasticity18,22,31. To assess these mechanistic explanations comparatively, we simulated consecutive trials of a spatial working memory task in a spiking neural network model of the prefrontal cortex21 (Fig. 2a). Prefrontal cortex not only holds working memory contents in an activity-based code19,20, but also keeps long-lasting latent (possibly synaptic) memory traces that produce serial dependence18.

Fig. 2: Ring attractor network with synaptic STP shows serial dependence.
figure 2

Simulations of two consecutive working memory trials (current trial n, previous trial n − 1) in a spiking neural network model with bump-attractor dynamics (“Methods”). a Spike times (x-axis) of excitatory neurons, ordered on y-axis by preferred angular location. Colored bars in a, b mark previous and current stimulus onset times (olive) and previous response (red). The solid orange line shows the population vector decoded from firing rates (sliding windows of 250 ms). In trial n, the active memory representation got biased towards the memory representation in trial n − 1. b Firing rate (black) and potentiated weight trace wij for neurons at 0° (orange) averaged over 1,000 trials and 20 neurons centered around 0°. Spiking activity and synaptic strength increased during trial n − 1 delay and decreased after the response. At current stimulus onset, information about trial n − 1 remained only in the potentiated weight trace. To facilitate interpretation, we excluded trials for which any neuron participated in previous and current-trial delay activity (i.e., showed firing rates >10 spikes s−1 after stimulus onset in trial n). c, d Associativity and decay of modeled STP. The strength of each individual synapse is determined by wij (c, middle black trace), which is potentiated at each spike by an amount Δw that depends on the relative spike times tj and ti of pre- and postsynaptic neurons, respectively, and on the potentiation factor P that is chosen to represent different strengths of STP (different colored lines in (d); “Methods”, Eqs. (15) and (16)), and it is reduced by an amount relative to the synaptic strength at each presynaptic spike, resulting in activity-dependent decay (Eq. (17)).

NMDAR hypofunction in a prefrontal working memory circuit

We modeled a local prefrontal circuit, composed of neurons selective to the locations presented in the spatial working memory task. We used a network of excitatory and inhibitory neurons recurrently connected through AMPAR-, NMDAR- and GABAAR-mediated synaptic transmission in which persistent delay firing emerges from attractor dynamics (Fig. 2a, Supplementary Fig. 12; “Methods”). As proposed by the previous studies18,22,31, we modeled serial dependence as an effect of short-term plasticity that builds up at delay-active recurrent excitatory synapses and maintains information during the ITI in a subthreshold stimulus representation not reflected in firing rate selectivity (Fig. 2b, “Methods”). We implemented an associative mechanism of short-term potentiation (STP) that is NMDAR-dependent and upregulates glutamatergic efficacy, consistent with a long-lasting increase in the probability of presynaptic neurotransmitter release3,4. As described in refs. 3,4, this efficacy increase undergoes activity-dependent decay (Fig. 2c). In our simulations, stimulus-specific potentiated synaptic traces persisted through the ITI and attracted the next trial’s memory representation progressively over the course of the delay22,31. To mimic memory-independent repulsive biases29,30, current stimulus inputs were slightly shifted away from previous stimulus values by a fixed value31 (“Methods”). This shift represents adaptation effects in sensory regions and is therefore not affected by local circuit alterations in prefrontal cortex.

We assessed the effects of NMDAR dysfunction on serial dependence at three potential synaptic sites: based on the reported NMDAR-dependence of STP3,4,5, NMDAR hypofunction would reduce the strength of STP at excitatory synapses and disrupt delay-dependent biases (hypothesis I: reduced STP). Also, we tested the explanatory potential of reduced NMDAR-mediated synaptic transmission. In particular, we tested cortical disinhibition27, caused by diminished NMDAR efficacy at inhibitory interneurons (hypothesis II: reduced gEI), and the hypofunction of NMDARs at recurrent excitatory synapses, leading to diminished delay activity2,32,33 (hypothesis III: reduced gEE). To assess each of these mechanisms, we independently varied STP strength, gEI and gEE, and we read out “behavioral responses” after 0, 1, and 3 s from population activity in our network simulations (“Methods”). Then, we fitted a linear model to measure bias strength in each condition (Eq. (18), Supplementary Fig. 13). We sought to identify which mechanisms could independently reproduce the patterns of reduced and absent biases observed in patients, and their dependence on working memory delay (Fig. 1).

Reduced STP but not E-I imbalance disrupts memory biases

We found that both hypotheses I and III were qualitatively consistent with our experimental results: NMDAR hypofunction (whether reducing STP or gEE) reduced the strength of serial dependence (Fig. 3a, c, orange). In contrast, hypothesis II was discarded by our simulations: reducing gEI increased serial dependence (Fig. 3b, orange), contrary to our experimental results, and quickly led to network disinhibition, causing previous-trial delay activity to spontaneously reemerge in the ITI (Supplementary Fig. 14). Both for reduced gEI and reduced gEE, the percentage of outlier responses (where errors \(\left| {\theta ^{\mathrm{e}}} \right|\) > 57.3°, i.e. 1 radian) quickly rose as the network lost the stability of one of its two states (spontaneous activity for reduced gEI, and persistent delay activity for reduced gEE, dashed vertical lines in Fig. 3b, c), as illustrated in Supplementary Fig. 14. Moreover, we noted that memory precision was slightly affected by all three manipulations (Fig. 3a–c), in contrast with our behavioral findings (Fig. 1b), but consistent with other studies with longer delays27. Delay length and task complexity could be important factors to detect NMDAR-related differences in memory precision.

Fig. 3: Altered STP simulates reduced serial dependence in spiking neural networks.
figure 3

ac Serial dependence (orange, bias coefficients from linear model, “Methods”) and precision (black, circular s.d. of errors) as a function of model parameters in 3 s delay trials (20,000 trials per parameter value). Vertical dashed lines indicate transition to unstable network regimes for which more than 10% of trials were outliers (\(\left| {\theta ^{\mathrm{e}}} \right|\) > 57.3°, i.e., 1 radian). Shading, 95% C.I. for regression estimates of bias coefficients in simulated responses. a Serial dependence decreased gradually when decreasing STP (potentiation factor P), while the network remained stable for all simulated values of P. Precision changed slightly as a function of STP. b Cortical disinhibition via decreased gEI augmented serial bias while strongly affecting precision and stability, either due to instability of persistent activity (right, Supplementary Fig. 14b), or due to instability of spontaneous activity (left, Supplementary Fig. 14a). c Lowering recurrent cortical excitation (gEE) led to the opposite pattern, decreasing biases. df Delay dependence of biases for each group, as defined by parameter values in (ac), (respectively colored triangles). Points depict mean bias strength (over 20,000 trials) for each parameter value. For comparison, error bars indicate 95% CI for bias strength obtained from n = 19 healthy controls (ctrl), n = 17 patients with schizophrenia (schz), and n = 16 patients with anti-NMDAR encephalitis (enc) (reordered from Fig. 1g–i). d Lowering STP strength reproduced the experimental data. In e, f reduction of NMDAR conductances (gEI or gEE) did not reproduce group and delay dependencies of experimental biases. gi Solid lines, simulated serial dependence by delay length for different values of P, indicated by colored triangles in (a) (20,000 trials per potentiation level P). Dashed lines with error bars, serial dependence in encephalitis, schizophrenia, and healthy controls. Bias calculated as averaged ‘folded’ error \(\theta ^{{\mathrm{e}}\prime }\) for binned absolute previous-current distances θd. Shading, ±s.e.m. Compare to Supplementary Fig. 15 for a network with STP (and STP disruptions in patients) in both E–E and E–I connections.

In addition, we found that hypotheses I and III could be disambiguated based on biases produced by the different linear models in 0, 1, and 3 s delays (Fig. 3d–f). Even for the lowest value of gEE within the stable network regime (Fig. 3c), attractive biases increased with delay (Fig. 3f). While this manipulation can qualitatively reproduce decreased delay-dependent biases in the encephalitis group, it is incompatible with our results for patients with schizophrenia (Fig. 1), who do not develop attractive biases in memory trials. In contrast, reduced STP at recurrent excitatory synapses captured a pattern of equally strong repulsive biases for all delay lengths (Fig. 3d). Note that these findings also hold for a network with STP (and NMDAR-dependent reductions in STP) in inhibitory interneurons34 (Supplementary Fig. 15). Based on our simulations, we conclude that the disruption of STP, a mechanism operating on a longer timescale than activity-based memory maintenance, provides a plausible explanation for altered serial dependence as observed in schizophrenia and anti-NMDAR encephalitis.

Discussion

In this study, we assessed working memory alterations in two patient groups linked to NMDAR hypofunction, and hypothesized that their shared clinical and neurobiological features should be reflected in qualitatively similar behavioral patterns. In accordance with this reasoning, we found a drastic reduction of working memory serial dependence both in patients with anti-NMDAR encephalitis and schizophrenia, as compared to healthy controls. In contrast, we did not find memory maintenance deficits on timescales of a few seconds, suggesting that cognitive deficits in these patients8,12 might be partly explained by the disruption of long-lasting, inactive memory traces, and a lacking integration of past and current memories. Our modeling results show that simple alterations in cortical excitation (hypotheses II and III), as proposed by current theories of NMDAR hypofunction in schizophrenia6,24,27, cannot fully explain these behavioral findings. Instead, altered serial dependence is mechanistically accounted for by a disruption in slower dynamics, here specified as NMDAR-dependent associative STP (hypothesis I) that is triggered by sustained delay activity and influences memory representations in upcoming trials. Our results suggest that clinical reports of short-term memory alterations in schizophrenia and anti-NMDAR encephalitis could be understood in the light of reduced synaptic potentiation25. This is consistent with in vitro studies, which have demonstrated the dependence of STP on specific subunit components of the NMDAR3,4, and reduced STP in genetic mouse models of schizophrenia35. Importantly, our modeling is not incompatible with altered cortical excitatory or inhibitory tone as a result of hypofunctional NMDARs. Rather, it states the necessity of assuming alterations in a mechanism operating on longer timescales, such as STP. For instance, diminished STP alongside symmetric effects on both E-E and E-I synapses could maintain the excitation/inhibition balance and thus stable delay activity, while interrupting passive between-trial information maintenance.

Future studies should address the effects of pharmacological NMDAR blockade on serial dependence. These studies could unequivocally confirm the role of the NMDAR for trial-history effects in working memory, and at the same time allow to ask more specific questions: On the one hand, serial dependence effects under different NMDAR antagonists should vary according to how blocking specific NMDAR subunits modulates synaptic potentiation at different timescales3. Our results cannot address subunit specificity because anti-NMDAR encephalitis (and possibly schizophrenia9) is associated with hypofunction of the GluN1 subunit, which is contained in all NMDARs36,37. On the other hand, pharmacological studies in combination with neural recordings could reveal how trial-history representations are affected by the blockade of NMDARs18,38. In rodents, long-term pharmacological experiments during behavior could be complemented with in vitro studies to assess STP directly. Finally, pharmacological studies would clarify if the alterations in serial dependence occur as a result of acute NMDAR hypofunction or whether they depend on compensatory changes in STP that arise after early, acute phases of cortical excitation/inhibition imbalance in these diseases (e.g., as a long-term adjustment of the probability of presynaptic neurotransmitter release).

We showed how working memory in the two investigated diseases is altered in a parallel way, and how these alterations are parsimoniously explained by manipulating a single, NMDAR-dependent synaptic variable in our model. However, substantial neurobiological heterogeneity must underlie the differences in epidemiology and longitudinal development of schizophrenia and autoimmune anti-NMDAR encephalitis39. Under this reasoning, we cannot exclude that distinct biological mechanisms in our two patient groups might lead to convergent patterns of working memory processing. For instance, our modeling shows that encephalitis patients’ biases could also be explained qualitatively by a reduced excitation-to-inhibition ratio in the memory circuit (Fig. 3f), consistent with task-related fMRI BOLD activity in ketamine33, and the effect of NMDAR antagonists on single-cell firing rates in monkey PFC2. In contrast, we could not confirm the findings of previous modeling work of schizophrenia, postulating that deficits in working memory precision and higher susceptibility to distractors40,41 or alterations in probabilistic reasoning42 could be explained by an increased excitation-to-inhibition ratio, leading to cortical disinhibition. This mechanistic alteration cannot replicate serial dependence deficits in schizophrenia in our model (Fig. 3b, e). Reduced short-term plasticity, in contrast, would predict reduced working memory precision after long memory delays (Fig. 3a, see also ref. 43), and higher susceptibility to distractors44 in line with reported behavior in schizophrenia41, which was previously proposed to reflect an excessive excitation-to-inhibition ratio. In addition, some incongruences with previous findings might be explained by the acuteness of the patients’ condition, with more acute or psychotic stages being connected with patterns of disinhibition, and less acute stages with residual alterations in synaptic plasticity, but not cortical excitation. Alternatively, mechanisms not considered in our model could be at play. For instance, NMDAR dysfunction could negatively affect long-range connectivity45,46,47 between trial history-tracking areas38 and areas that hold current working memory contents (like prefrontal cortex), and in this way impede the integration of previous with current memories. Note, however, that recent combined experimental and theoretical work in primate and human prefrontal cortex shows how both past and current memories are jointly represented in prefrontal cortex, and how their interaction subserves serial dependence18.

Our findings advance the conceptual understanding of working memory alterations in schizophrenia and anti-NMDAR encephalitis, as they demonstrate a selective disruption of information carryover between trials, reflected by a reduction of serial dependence that is robustly found in neurotypical subjects17. We found several indicators of clinical relevance for our finding. First, as anti-NMDAR encephalitis patients recovered, their biases normalized in the direction of healthy controls (Supplementary Fig. 10a–c). Second, the amount of this normalization correlated across patients with their improvement on a scale that measures positive symptoms (Supplementary Fig. 10g), indicating a potential relation between psychotic symptoms and reductions in serial dependence. Third, both the alterations in serial dependence and the strength of positive symptoms were higher for patients with schizophrenia than for the anti-NMDAR encephalitis group. Still, studies with larger sample sizes are needed to confirm the relation of psychotic symptoms and reduced serial biases at the subject-level, which in our study did not reach significance for two out of three analyses in patients with schizophrenia and anti-NMDAR encephalitis (Supplementary Fig. 8 and “Results”).

Serial dependence could also reflect a clinically relevant dimension which is not or only mildly related to the assessed psychiatric scales. In this sense, it has been argued that serial dependence could facilitate information processing in temporally coherent real-world situations17. Alternatively, serial biases could be the mere by-product of long-lasting cellular or synaptic mechanisms that support memory stabilization during working memory delays48. Our study is in line with previous findings of reduced susceptibility to proactive interference in schizophrenia49,50. However, while proactive interference is mainly discussed in the context of cognitive control, the limited complexity of our task restricts possible interpretations of reduced between-trial interference and supports the role of reduced residual memory traces. Moreover, thanks to our task’s well-studied single-neuron correlates18,19,20 and biophysical models18,19,21 and the comparison with anti-NMDAR encephalitis patients, we provide a specific mechanistic model of synaptic deficits leading to reduced previous-trial interference in schizophrenia.

Interestingly, a reduction in serial dependence has recently been reported for patients with autism51, a disease also associated with NMDAR hypofunction52 and alterations in synaptic potentiation25. Further, as for autism, our findings of reduced serial dependence are compatible with normative accounts of information processing in schizophrenia. Classic theories and recent studies have reported an underweighting of past context, or in Bayesian terms, learned priors, and an overweighting of incoming perceptual information in patients with schizophrenia42,53,54 and NMDAR hypofunction55. Long-lived traces of past stimuli could serve as Bayesian priors to perception and memory, and a disruption of STP might be regarded as a biological implementation of a reduced usage of priors in schizophrenia and anti-NMDAR encephalitis.

Methods

Experimental sample

We included n = 16 patients with anti-NMDAR encephalitis (enc), n = 17 patients with schizophrenia or schizoaffective disorder (n = 12 and n = 5, respectively; schz), and n = 19 neurologically and psychiatrically healthy control participants (ctrl), all with normal or corrected vision. Behavioral data from n = 14 healthy controls has been included in a previous study18. Psychiatric diagnoses (or the absence thereof for controls) were confirmed using the Structured Clinical Interview for DSM IV (SCID-I)56. Patients diagnosed with anti-NMDAR encephalitis were recruited from different centers (n = 14 in Spain, n = 1 in Germany and n = 1 in the United Kingdom) at the moment of hospital discharge and completed the experiment around 5.5 months after disease onset (median, interquartile range i.q.r. = 3.7–7.2 months). All patients fulfilled clinical diagnostic criteria of anti-NMDAR encephalitis with confirmation of CSF IgG antibodies against the GluN1 subunit of the NMDAR57. All subjects were tested in our laboratory for antibodies against NMDAR in serum36 and all healthy controls and patients with schizophrenia were seronegative. Anti-NMDAR encephalitis is known to have a prolonged process of recovery after the acute stage of the disease58, and patients in the prolonged recovery phase still suffer from cognitive deficits as has been previously described in cohorts with long follow-up12. All patients were sufficiently recovered to participate in the testing procedure. Controls and patients with schizophrenia were recruited from the Barcelona area and from Hospital Clínic (Barcelona, Spain), respectively. Patients with schizophrenia were tested 35.0 months after diagnosis (median, i.q.r. = 16.0–69.5 months) and were clinically stable at the time of testing. All participants (and, in the case of minors of age, their legal guardians) provided written informed consent and were monetarily compensated for their time and travel expenses, as reviewed and approved by the Research Ethics Committee of Hospital Clínic. All subjects were assessed for psychiatric symptoms and functionality through a battery of standard tests including the Spanish versions of the Positive and Negative Syndrome Scale (PANSS)59, the Young Mania Rating Scale (YMRS)60, the Hamilton Depression Rating Scale (HAM-D)61 and the Global Assessment of Functioning Scale (GAF)62. Finally, the dose of antipsychotic medication at the moment of testing was estimated as chlorpromazine equivalent (CPZ, mg day−1)63. For a demographic and clinical overview of the populations, please refer to Supplementary Table 1.

Experimental task protocol and behavioral testing

Participants completed two 1.5 h sessions performing a visuospatial working memory task described in Fig. 1a. In each session, participants were asked to complete 12 blocks of 48 trials. However, some participants did not complete all blocks (on average, participants completed 1114.1 ± 134.4 trials (mean ± s.d., ctrl), 1086.0 ± 189.9 trials (enc), and 1030.6 ± 192.8 trials (schz)).

For stimulus presentation, we used Psychopy v3.1.5 on Python 2.7, running on a 17” HP ProBook laptop. Each trial began with the presentation of a central black fixation square on a gray background (0.5 × 0.5 cm) for 1.1 s. A single colored circle (stimulus, diameter 1.4 cm, 1 out of 6 randomly chosen colors with equal luminance) was then presented during 0.25 s at one of 360 randomly chosen angular locations at a fixed radius of 4.5 cm from the center. The stimulus was followed by a randomly chosen delay of 0 (16.67% of trials), 1 (66.67% of trials), or 3 s (16.67% of trials) in which only the fixation dot remained visible (except for 0 s trials, where the stimulus remained visible until the participant started to move the cursor). When the fixation dot changed to the stimulus’ color (probe), participants were asked to respond by making a mouse click at the remembered location (response). A white circle indicated the stimulus’ radial distance, so participants only had to remember the angular position. After the response, the cursor had to be moved back to the fixation dot to start a new trial (ITI). Participants were instructed to maintain fixation during the fixation period, stimulus presentation, and memory delay and were free to move their eyes during response and when returning the cursor to the fixation dot.

Error and serial dependence analysis

Response errors \(\theta _n^{\mathrm{e}}\) in trial n were measured as the angular distance between response and target. To exclude errors due to guessing or motor imprecision, we only analyzed responses within an angular distance of 1 radian and a radial distance of 2.25 cm from the stimulus. Further, we excluded trials in which the time of response initiation exceeded 3 s, and trials for which the time between the previous trial’s response probe and the current trial’s stimulus presentation exceeded 5 s. In total, 2.6 ± 4.2% (mean ± s.d., ctrl), 4.8 ± 6.9% (enc) and 7.5 ± 9.6% (schz) of trials per participant were rejected (but only 0.1 ± 0.2% (ctrl), 0.4 ± 0.5% (enc) and 0.6 ± 0.7% (schz) of trials were excluded due to angular response errors).

We then measured serial dependence as the error in the current trial as a function of the circular distance between the previous and the current trial’s target location. Figure 1c–e depict ‘folded’ serial dependence: We multiplied trial-wise errors \(\theta _n^{\mathrm{e}}\) by the sign of the previous-current distance, \(\theta _n^{\mathrm{d}}\): \(\theta _n^{{\mathrm{e}}\prime } = \theta _n^{\mathrm{e}} * {\mathrm{sign}}\left( {\theta _n^{\mathrm{d}}} \right)\), and then binned data based on absolute values \(|\theta _n^{\mathrm{d}}|\). Errors \(\theta _n^{{\mathrm{e}}\prime }\) were then averaged for each \(|\theta _n^{\mathrm{d}}|\) in sliding windows with size π/3 in steps of π/30. Positive mean folded errors should be interpreted as attraction towards the previous stimulus and negative mean folded errors as repulsion away from the previous location. In all figures including bias curves, s.e.m. are calculated across pooled trials from all subjects for each group and delay. For visualization, all values were transformed from radians to angular degrees.

Linear (mixed) models

We modeled signed errors \(\theta _{nm}^{\mathrm{e}}\) in trial n and subject m using linear mixed models that included the dummy-coded variables group (ctrl, enc or schz) and delay (0, 1, or 3 s), and a nonlinear function of previous-current stimulus distance \(\theta _{nm}^{\mathrm{d}}\), DoG(\(\theta _{nm}^{\mathrm{d}}\)), which has been used for modeling serial dependence16,29. DoG(\(\theta _{nm}^{\mathrm{d}}\)) is the normalized first derivative of a Gaussian with fixed location hyperparameter μ = 0. Its scale parameter σ was determined using cross-validation as explained below (see also Supplementary Fig. 1). Our main linear model is:

$$\theta _{nm}^e = \,\beta _0 + \beta _{1,g}{\mathrm{group}}_{nm} + \beta _{2,d}{\mathrm{delay}}_{nm} - \beta _3{\mathrm{DoG}}(\theta _{nm}^d)\\ + \beta _{4,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm} - \beta _{5,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _{6,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{7,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d) - \gamma _{2,m,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(1)

β coefficients estimate fixed, and γ coefficients random effects. Coefficient subscripts g and d denote that a separate coefficient was estimated for different values of dummy-coded variables group or delay, respectively, resulting in a total of 18 β coefficients for Eq. (1). Coefficient subscript m denotes that a separate coefficient was estimated for each subject. Bias strength for a certain condition can then be read out as the sum of coefficients of all terms containing DoG(\(\theta _{nm}^{\mathrm{d}}\)) and the dependence of bias strength on other variables is assessed by evaluating the significance of interaction terms containing DoG(\(\theta _{nm}^{\mathrm{d}}\)) and the relevant variable. To measure response precision, bias-corrected response errors were defined as linear model residuals εnm from Eq. (1). For each subject and delay, inverse response precision was then measured as the circular s.d. of εnm.

Group- (Eq. (2), Supplementary Fig. 5) and delay-wise (Eq. (3), Fig. 1g–i) linear models were defined as:

$$\theta _{nm}^e = \,\, \beta _0 + \beta _{1,d}{\mathrm{delay}}_{nm} - \beta _2{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{3,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d) - \gamma _{2,m,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(2)
$$\theta _{nm}^e = \,\, \beta _0 + \beta _{1,g}{\mathrm{group}}_{nm} - \beta _2{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{3,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(3)

The effect of covariates ITI length (Eq. (4)) and CPZ equivalent (Eq. (5)) were assessed as:

$$\theta _{nm}^e = \ \beta _0 + \beta _{1,g}{\mathrm{group}}_{nm} + \beta _{2,d}{\mathrm{delay}}_{nm} - \beta _3{\mathrm{DoG}}(\theta _{nm}^d)\\ + \beta _{4,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm} - \beta _{5,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _{6,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{7,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _8{\mathrm{ITI}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \gamma _{2,m,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(4)
$$\theta _{nm}^e = \ \beta _0 + \beta _{1,g}{\mathrm{group}}_{nm} + \beta _{2,d}{\mathrm{delay}}_{nm} - \beta _3{\mathrm{DoG}}(\theta _{nm}^d)\\ + \beta _{4,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm} - \beta _{5,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _{6,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{7,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _{8,d}{\mathrm{CPZ}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \gamma _{2,m,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(5)

Further, a conservative estimate of group effects when controlling for CPZ equivalents was obtained by first regressing trialwise errors as CPZ-dependent effects excluding random effects to not absorb variance related to the experimental group that subjects belonged to (notice dropped m subscripts):

$$\theta _n^e = \ \beta _0 + \beta _1{\mathrm{CPZ}}_n + \beta _{2,d}{\mathrm{CPZ}}_n{\mathrm{delay}}_n - \beta _3{\mathrm{CPZ}}_n{\mathrm{DoG}}(\theta _n^d)\\ - \beta _{4,d}{\mathrm{CPZ}}_n{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) + \varepsilon _n$$
(6)

and subsequently modeling residuals εn as main and interaction effects of group, delay, and DoG(\(\theta _{nm}^d\)) as described in Eq. (1) (Supplementary Fig. 9g–l).

Biases towards stimuli in trial n − 2 were measured by including distances to the penultimate stimulus, \(\theta _{nm}^{{\mathrm{d}}\prime }\)

$$\theta _{nm}^e = \ \beta _0 + \beta _{1,g}{\mathrm{group}}_{nm} + \beta _{2,d}{\mathrm{delay}}_{nm} - \beta _3{\mathrm{DoG}}(\theta _{nm}^d)\\ + \beta _{4,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm} - \beta _{5,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _{6,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) - \beta _{7,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \beta _8{\mathrm{DoG}}(\theta _{nm}^{d{\prime}}) - \beta _{9,g}{\mathrm{group}}_{nm}{\mathrm{DoG}}(\theta _{nm}^{d{\prime}}) - \beta _{10,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^{d{\prime}})\\ - \beta _{11,g,d}{\mathrm{group}}_{nm}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^{d{\prime}}) + \gamma _{0,m} - \gamma _{1,m}{\mathrm{DoG}}(\theta _{nm}^d)\\ - \gamma _{2,m,d}{\mathrm{delay}}_{nm}{\mathrm{DoG}}(\theta _{nm}^d) + \varepsilon _{nm}$$
(7)

Baseline and follow-up sessions in encephalitis patients and controls were compared by:

$$\theta _n^e = \ \beta _0 + \beta _1{\mathrm{session}}_n + \beta _{2,g}{\mathrm{group}}_n + \beta _{3,d}{\mathrm{delay}}_n\\ - \beta _4{\mathrm{DoG}}(\theta _n^d) + \beta _{5,g}{\mathrm{session}}_n{\mathrm{group}}_n + \beta _{6,d}{\mathrm{session}}_n{\mathrm{delay}}_n\\ + \beta _{7,g,d}{\mathrm{group}}_n{\mathrm{delay}}_n - \beta _8{\mathrm{session}}_n{\mathrm{DoG}}(\theta _n^d)\\ - \beta _{9,g}{\mathrm{group}}_n{\mathrm{DoG}}(\theta _n^d) - \beta _{10,d}{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d)\\ - \beta _{11,g}{\mathrm{session}}_n{\mathrm{group}}_n{\mathrm{DoG}}(\theta _n^d) - \beta _{12,d}{\mathrm{session}}_n{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d)\\ - \beta _{13,g,d}{\mathrm{group}}_n{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) - \beta _{14,g,d}{\mathrm{session}}_n{\mathrm{group}}_n{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) + \varepsilon _n$$
(8)

where sessionn takes values 0 or 1 (baseline vs. follow-up). In this model, we did not include random effects due to increased model complexity and resulting difficulties in model convergence. For extended linear models in Eqs. (4), (5), (7), and (8), we compared nested models via Wald Tests to determine the optimal model complexity. Data was analyzed in Python 3.7. We used different packages from R statistics (version 3.6.3) through the ‘rpy2’ interface64. All linear mixed models were fitted, compared and statistically tested with packages ‘lme4’65 and ‘lmerTest’66, which calculates ANOVA tables for the fixed effects of the linear mixed model by estimating degrees of freedom and F values using Satterthwaite’s method. For optimization, we used the ‘optimx’ package67 ‘nlimb’ algorithm with a convergence tolerance of 0.003 and checked the consistency of parameter estimates with other optimization algorithms (‘L-BFGS-B’, ‘bobyqa’). Note that the normality assumption of residuals was not met (normality test, s2 + k2 = 4248.72, p < 1e−16), but with only slightly diverting kurtosis (Fisher) = 3.37 and skewness = 0.12 parameters. Due to the large number of trials (n = 52,394), this should not compromise statistical inference68. Moreover, all effects of relevant task variables are visualized both in a model-based and model-free way to confirm their congruence.

Basis function selection and hyperparameter cross-validation

To determine the hyperparameter σ used in Eqs. (1)–(8), we fitted errors \(\theta _n^{\mathrm{e}}\) in trial n as a linear model including factors group, delay, and DoG(\(\theta _n^{\mathrm{d}}\)) as described in Eq. (1), but excluding random effects:

$$\theta _n^e = \,\, \beta _0 + \beta _{1,g}{\mathrm{group}}_n + \beta _{2,d}{\mathrm{delay}}_n - \beta _3{\mathrm{DoG}}(\theta _n^d)\\ + \beta _{4,g,d}{\mathrm{group}}_n{\mathrm{delay}}_n - \beta _{5,g}{\mathrm{group}}_n{\mathrm{DoG}}(\theta _n^d)\\ - \beta _{6,d}{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) - \beta _{7,g,d}{\mathrm{group}}_n{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) + \varepsilon _n$$
(9)

while setting Gaussian hyperparameters μ = 0 and \(\sigma \in [0.2,1.8]\) (in radians). For each value of the scale parameter σ, we used a stratified cross-validation procedure, fitting the linear model to 67% of the trials from each subject and testing the prediction in the left-out 33% of trials. Performance for each σ was evaluated using the mean squared error (MSE) of predictions from 1000 cross-validation repetitions. σ was chosen so as to minimize the MSE obtained by the linear model, yielding σ = 0.8 (Supplementary Fig. 1).

To test whether a linear model with repulsive biases at high distances \(|\theta _n^{\mathrm{d}}|\) fitted our data more parsimoniously, we compared cross-validation MSE for linear models with first- and third-derivative-of-Gaussian basis functions (Supplementary Fig. 1). We repeated the hyperparameter fitting procedure described above for the third-derivative-of-Gaussian model using hyperparameters μ = 0 and \(\sigma \in [0.6,2.0]\) rad. As the first-derivative-of-Gaussian model produced smaller MSE in the cross-validation procedure, we discarded the third-derivative-of-Gaussian model. Thus, all linear model results reported in this manuscript correspond to the first-derivative-of-Gaussian model.

Confidence intervals and effect sizes

We compared single-subject bias estimates between groups using post hoc t-tests. Effect sizes for these comparisons were estimated as Cohen’s d, defined as \(d = \frac{{\mu _1 - \mu _2}}{s}\) for independent samples, where s is the pooled standard deviation: \(s = \sqrt {\frac{{(n_1 - 1)s_1^2 + (n_2 - 1)\left. {s_2^2} \right)}}{{n_1 + n_2 - 2}}}\), and as \(d = \frac{t}{{\sqrt n }}\) for related samples. For correlations of individual subjects’ biases with symptoms, we used Pearson correlation and calculated parametric 95% confidence intervals (‘CIr’ function from the ‘psychometric’69 package). In the face of small, potentially non-normal samples, we confirmed significant results with bootstrap confidence intervals and p-values, leading to consistent results in all but one correlation (Supplementary Fig. 10g): Here, we obtained C.I = [−0,83, −0,02] and p = 0.04 with parametric methods, but C.I. = [−0.85, 0.09] and p = 0.09 with non-parametric methods (all two-sided; note however that our directed hypothesis of an expected negative correlation supports a one-sided test with p = 0.04). Confidence intervals of the mean (Figs. 1 and 3, and Supplementary Figs. 5, 6 and 9) were calculated as 95% bootstrap confidence intervals.

Neural network architecture and dynamics

We simulated consecutive pairs of trials in a spiking neural network model of prefrontal cortex implemented in Brian270. NE = 1024 excitatory and NI = 256 inhibitory leaky integrate-and-fire neurons were connected all-to-all via synapses governed by NMDAR-, AMPAR-, and GABAAR-dynamics, as described in ref. 21.

The dynamics of the membrane voltage of excitatory neurons \(V_i\left( {i = 1..N_{ E }} \right)\) were given by:

$$C_m\frac{{dV_i}}{{dt}} = \ - g_L(V_i - E_L) - g_{EE,A}\mathop {\sum}\limits_j^{N_E} {W_{ij}^{EE}} s_j^A(V_i - E_A)\\ - \frac{{g_{EE,N}}}{{1 + e^{ - aV_i}/3.57}}\mathop {\sum}\limits_j^{N_E} {W_{ij}^{EE}} s_j^N(V_i - E_N)\\ - g_{IE}\mathop {\sum}\limits_j^{N_I} {s_j^G} (V_i - E_G) - g_{{\mathrm{ext}},E}s_{{\mathrm{ext}}}(V_i - E_A) + I_i^s$$
(10)

with membrane capacitance \(C_m = 0.5\,{\mathrm{nF}}\), leak conductance \(g_L = 25\,{\mathrm{nS}}\), leak reversal potential \(E_L = - 70\,{\mathrm{mV}}\), AMPAR, GABAAR and NMDAR reversal potentials \(E_A = 0\,{\mathrm{mV}}\), \(E_G = - 70\,{\mathrm{mV}}\), \(E_N = 0\,{\mathrm{mV}}\), unitary conductances \(g_{{\mathrm{ext}},E} = 3.1\,{\mathrm{nS}}\), \(g_{IE} = 2.672\,{\mathrm{nS}}\), \(g_{EE,N} = 0.56\,{\mathrm{nS}}\), \(g_{EE,A} = 0.502\,{\mathrm{nS}}\), and the NMDAR magnesium block parameter \(a = 0.062\,{\mathrm{mV}}^{ - {\mathrm{1}}}\). In simulations of reduced NMDAR conductance, parameters \(g_{ {EE,N} }\) or respectively \(g_{ {EI,N} }\) were modulated as indicated in Fig. 3b, c, e, f and Supplementary Fig. 14.

The membrane voltage of inhibitory neurons followed:

$$C_m\frac{{dV_i}}{{dt}} = \ - g_L(V_i - E_L) - g_{EI,A}\mathop {\sum}\limits_j^{N_E} {s_j^A} (V_i - E_A)\\ - \frac{{g_{EI,N}}}{{1 + e^{ - aV_i}/3.57}}\mathop {\sum}\limits_j^{N_E} {s_j^N} (V - E_N)\\ - g_{II}\mathop {\sum}\limits_j^{N_I} {s_j^G} (V_j - E_G) - g_{{\mathrm{ext}},I}s_{{\mathrm{ext}}}(V_i - E_A)$$
(11)

with \(C_m = 0.2\,{\mathrm{nF}}\), \(g_L = 20\,{\mathrm{nS}}\), \(g_{{\mathrm{ext}},I} = 2.38\,{\mathrm{nS}}\), \(g_{II} = 2.048\,{\mathrm{nS}}\), \(g_{EI,A} = 0.384\,{\mathrm{nS}}\) and \(g_{EI,N} = 0.424\,{\mathrm{nS}}\).

The kinetics of synaptic variables \(s_i^A(i = 1 \cdots N_E)\), \(s_i^G(i = 1 \cdots N_I)\), and sext were determined by

$$\frac{{ds_X}}{{dt}} = - \frac{{s_X}}{{\tau _X}} + w\mathop {\sum}\limits_i \delta (t - t_i)$$
(12)

with \(\tau _A = 2\,{\mathrm{ms}}\), \(\tau _G = 10\,{\mathrm{ms}}\), \(\tau _{{\mathrm{ext}}} = 2\,{\mathrm{ms}}\), and the summation running over all spike times ti so that at each spike time the synaptic variable increased by a step of magnitude w, which was generally set to 1 except for synapses undergoing synaptic potentiation (see below). For sext, spike times were generated as a Poisson spike train of rate 1800 spikes s−1 (simulating inputs from 1000 external Poisson neurons firing at 1.8 spikes s−1 each).

The slower and saturating NMDAR synaptic variables \(s_i^N(i = 1 \ldots N_E)\) followed the coupled equations:

$$\frac{{ds_i^N}}{{dt}} = - \frac{{s_i^N}}{{\tau _{N_s}}} + \alpha _Nx_i(1 - s_i^N)$$
(13)
$$\frac{{dx_i}}{{dt}} = - \frac{{x_i}}{{\tau _{N_x}}} + w\mathop {\sum}\limits_j \delta (t - t_j)$$
(14)

with \(\tau _{N_s} = 100\;{\mathrm{ms}}\), \(\tau _{N_x} = 2\;{\mathrm{ms}}\), and \(\alpha _N = 0.5\;{\mathrm{kHz}}\).

The strength of recurrent excitatory synapses was modulated depending on the distance in preferred location of presynaptic and postsynaptic excitatory neurons: \(W_{ij}^{EE} = J(\theta _i - \theta _j)\), where J is a Gaussian function (centered at μ = 0 with σ = 14.4 degrees) plus a constant, tuned so that \(\mathop {\sum}\limits_j J (\theta _i - \theta _j) = N_E\) and J(0) = 1.63. As a result, neurons with similar preferred locations had 1.63 stronger weights than the average weight (Supplementary Fig. 10 for network scheme and weight profiles).

STP rule in neural network simulations

For connections between excitatory neurons, the spike-triggered step in AMPAR and NMDAR synaptic variables w could vary individually for each specific connection: wij characterized the step at the synapse from neuron j onto neuron i. Upon synchronized pre- and postsynaptic spiking, wij was slightly enhanced by an amount Δw that depended on the relative spike times of neuron j and i (Fig. 2c) to simulate an increase in probability of glutamate release71:

$$w_{ij} = w_{ij} + \Delta _w(t_j - t_i) \ge 1$$
(15)

The associative nature of this rule was determined by a potentiation function that required synchronization within a specific temporal window (Fig. 2d):

$$\Delta _w(t_j - t_i) = P\exp \left( { - |t_j - t_i|/\tau _\Delta } \right),$$
(16)

with potentiation factor P = 0.00022 and τΔ = 20 ms. Changes were sustained (did not decay with time), but synapses depotentiated based on presynaptic activity3: at each presynaptic spike

$$w_{ {ij} } = w_{ {ij} } - 0.04 \ast ( {w_{ {ij}} - 1} )$$
(17)

Trial structure in neural network simulations

We simulated 20,000 pairs of consecutive trials with independent randomized stimulus locations. Network inputs \(\theta _n^s\) in trial n with stimulus s were slightly transformed to mimic a repulsive baseline bias away from previous stimulus locations, resulting from sensory aftereffects produced in lower-level cortical areas29: \(\theta _n^{s\prime } = \theta _n^s + 1.25\,{\mathrm{DoG}}(\theta _n^{\mathrm{d}})\), where DoG(\(\theta _n^{\mathrm{d}}\)) is the first-derivative-of-Gaussian function with μ = 0 and σ = 0.8 radians, and \(\theta _n^{\mathrm{d}}\) is the distance between previous and current stimulus.

Simulations started with a stimulus presentation at 0° (trial n − 1) for 0.25 s. After the input was removed, a delay of 1 s followed. A negative input to the whole network during 0.25 s simulated the response and removed stimulus-associated neural activity. After an ITI of 3 s, a second stimulus (trial n) was delivered at a random location for 0.25 s. The second delay duration was 3 s. To obtain behavioral readouts from the network, we counted each neuron’s spikes during three time windows of 0.25 ms: 0–0.25 s after stimulus offset (0 s delay condition), 0.75–1 s (1 s delay), and 2.75–3 s after stimulus offset (3 s delay). The behavioral response was determined as the angular direction of the population vector of spike counts.

Neural network behavioral analysis

We first calculated the percentage of outlier responses and excluded outlier trials from the network’s population vector responses (response error >1 radian). Circular standard deviations and serial dependence were then calculated from the network’s population vector responses analogous to human error analyses. In Fig. 3a–f, bias strength was measured as the sum of bias term coefficients in the linear model

$$\theta _n^e = \beta _0 + \beta _{1,d}{\mathrm{delay}}_n - \beta _2{\mathrm{DoG}}(\theta _n^d) - \beta _{3,d}{\mathrm{delay}}_n{\mathrm{DoG}}(\theta _n^d) + \varepsilon _n$$
(18)

that fitted errors \(\theta _n^{\mathrm{e}}\) in trial n from each parameter manipulation (P, gEE, and gEI) separately as a function of delay and DoG(\(\theta _n^{\mathrm{d}}\)) with μ = 0 and σ = 0.6 radians.

Hyperparameter cross-validation for neural network responses

The value of hyperparameter σ was determined in a cross-validation procedure for the baseline condition with P = 0.00022, gEE = 0.56 nS, and gEI = 0.424 nS, for values \(\sigma \in [0.2,1.8]\) (in radians). For each value of σ, we fitted the linear model described in Eq. (18) to 67% of trials and tested the prediction in the left-out 33% of trials. Performance for each σ was evaluated using the mean squared error (MSE) of predictions from 1000 cross-validation repetitions. σ was chosen to minimize the MSE of the linear model, yielding σ = 0.6 radians (Supplementary Fig. 13).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.