A sensory memory to preserve visual representations across eye movements

Saccadic eye movements (saccades) disrupt the continuous flow of visual information, yet our perception of the visual world remains uninterrupted. Here we assess the representation of the visual scene across saccades from single-trial spike trains of extrastriate visual areas, using a combined electrophysiology and statistical modeling approach. Using a model-based decoder we generate a high temporal resolution readout of visual information, and identify the specific changes in neurons’ spatiotemporal sensitivity that underly an integrated perisaccadic representation of visual space. Our results show that by maintaining a memory of the visual scene, extrastriate neurons produce an uninterrupted representation of the visual world. Extrastriate neurons exhibit a late response enhancement close to the time of saccade onset, which preserves the latest pre-saccadic information until the post-saccadic flow of retinal information resumes. These results show how our brain exploits available information to maintain a representation of the scene while visual inputs are disrupted.


1-1) Animal preparation and surgical procedures
For each animal, a head-post was implanted on the skull using dental acrylic and orthopedic titanium screws. The surgical procedures were performed under strict aseptic conditions while the animals were anesthetized by Isoflurane. Recording chambers were mounted on the skull and fastened by screws and dental acrylic. The craniotomy was performed within the chamber, giving access to extrastriate visual areas including V4 and MT.

1-2) Data acquisition
The spike sorter program was employed to perform a principal component analysis, and clusters of spikes with similar waveform properties were manually classified as belonging to a single neuron (single unit). The sorted spikes were then read into Matlab to verify the presence of a visually-sensitive RF. From a population of 709 well-isolated neurons, 86 neurons were discarded because they did not respond to any probe stimuli before and/or after the saccade, and the rest were used for analyses. Visual stimuli were presented on a 24-inch ASUS VG248QE LED monitor with a resolution of 1920x1080 pixels with a In total, data were recorded from 332 MT and 291 V4 neurons during 108 recording sessions. Only one area (V4 or MT) was recorded in a given session. The saccade target location and saccade direction could vary from session to session. The various saccade target locations are shown in Supplementary figure 1e. In 23 sessions (in monkeys E and O), the monkey made saccades in two opposite horizontal directions, and the ST was randomly presented either right or left of FP (at the same radius) across trials (these data were used to test the effect of saccade direction, Supplementary Fig. 9). In the remaining 85 sessions, the ST was presented at only one location each day. In total, for 439 out of 623 neurons saccades were made to the left, and to the right for 146 out of 623 neurons, and in the remaining 38 neurons the saccades were made in other directions (195,200, and 225 degrees). In the sessions where the monkey made saccades in two directions, the monkey performed more than twice as many trials as a single-direction session to accommodate enough trials for further analysis. 3 Prior to data collection, areas V4 and MT were identified based on anatomical landmarks and the pattern of neural responses to different stimuli (controlled with custom Matlab codes using the MonkeyLogic toolbox 1 ). Before each session of recording, the boundaries of the RFs of the neurons were roughly mapped based on audible neural responses to a moving or stationary white bar on a black background at different directions and locations on the screen. For each session, the position of the grid on the screen and the distance between adjacent probes were adjusted to cover the estimated RFs and post-saccadic if the displacement is greater than 6 dva and the speed of the displacement stays above 80 dva/s for 3 ms. Saccade onset was then defined as the first time when the eye speed reaches 80 dva/s.

1-3) Location of RF centers
The locations of RF centers for all recorded neurons are illustrated in Supplementary figure 1f. RF center calculations are based on the mean probe-aligned responses, averaged over a window of 25 ms around the time of maximum response for each neuron. The map is 4 then interpolated (interleaving 1000 points between each probe location) and thresholded at 0.99 of the maximum response. The center of RF is then calculated as the center of mass of the resulting spatial map. For the population of neurons, the average RF center was located 8.35±0.16 dva away from the initial FP ( Supplementary Fig. 1f).

SOM 2) Encoding model framework and estimation
Using the sparse variable generalized linear model (S-model) encoding model, we can capture the neuron's high-resolution spatiotemporal sensitivity using limited perisaccadic data. Using dimensionality reduction to identify the subset of STUs representing the stimulus-response relationship (see Methods) is a key step enabling us to successfully fit the S-model parameters using the sparse neuronal data during saccades. Then using an optimization procedure in the point process maximum likelihood estimation framework, we fit the model to sparse spiking data at the level of single trials. The resulting encoding framework enables us to decipher the nature of saccade-induced modulatory computations in a precise and computationally tractable manner using the time-varying kernels representing the neuron's dynamic sensitivity across different delays and locations for any specific time relative to the saccade.

SOM 2-1) Model fitting
We model the spiking response using a conditionally inhomogeneous Poisson point process, with its CIF defined in Eq. 5 in the Methods. The probability of a spike train associated with this Poisson process is given by, where is the sequence of input stimuli, and (") = 2 (") ( )3 represents the sequence of binned spike counts with bins of size ∆ ms on trial . Here, the bin size was chosen equal to 1 ms which ensures that at most one spike can fall in each time bin. The point process loglikelihood (LL) of the observed spike trains given the model is as follows, where denotes the set of parameters used to describe the model kernels defined in Eq. 5 in the Methods. Note that ,,-,.,/ , representing the weights of STUs in the stimulus kernels, is among this set of parameters. The parameters in are estimated by maximizing the loglikelihood function in Eq. 10. The sigmoidal nonlinearity in the S-model's CIF (Eq. 5 in the Methods) makes the LL function not convex, meaning it may not give a unique optimal solution (more details in 2, 3 ). Also, considering the number of data points (spiking events) relative to the number of model parameters to be estimated, this optimization may be subject to the overfitting problem. To handle these nonconvexity and overfitting challenges and find a robust and interpretable solution among other possible solutions, we adopted several parameter selection, sparsity, and smoothness regularization strategies. These strategies included early stopping, representation of model kernels using smooth basis functions, datainformed initialization, an iterative, robust gradient-based search algorithm, and cross validation, in addition to those strategies described in the dimensionality reduction section (see Methods) for the stimulus kernels specifically. For more details of these optimization strategies refer to ref. 2.

SOM 2-2) Model evaluation
The models' performance is evaluated over test data, which is used neither for training the model nor for the validation process. In order to estimate the instantaneous firing rates, the sequences of stimuli presented to the neuron are given to the model according to  figure 4b shows that the EV between the model-predicted firing rate and the empirical firing rate (y-axis) matches that obtained between 1000 pairs of average firing rate sequences measured over randomly selected subset of probe-aligned spike trains, used as a measure of inherent variability in the neural data itself (x-axis). The average firing rate sequences were computed by binning the probealigned spike response using nonoverlapping windows of 30 ms and smoothing the binned response with a Gaussian window of 5 ms (full width half max) and normalizing to have a mean of zero and unit standard deviation. The model-data EV and data-data EV are highly correlated, showing that the model-predicted response captured the stimulus-response 7 relationship in the data across the population of the neurons (data-data EV: 84.79±0.54, model-data EV: 79.33 ± 0.78, Pearson correlation: 0.85, p<0.001).
Next, we generalized this firing rate level accuracy analysis by evaluating how well the model predicted the firing rate in response to the presentation of experimental sequences of probe stimuli appearing at random locations during the fixation period (500 to 200 ms before saccade). Supplementary figure 4c shows that the correlation coefficient (CC) between the model-predicted firing rate and the empirical firing rate in response to the repeated presentation of a sequence of probe stimuli falls within the level of the inherent trial-by-trial variability. The data-data correlation coefficient was measured between binned firing rates in response to the same 300 ms stimulus sequence; data were randomly split (60%-40%) 15 times and the mean is reported. The binning, smoothing, and normalizing of these data were the same as in the EV analysis. The normalized correlation in percent is calculated as the ratio between the model-data correlation (y-axis) and the data-data correlation (x-axis) and is shown as the diagonal histogram in Supplementary figure 4c. The data-data and model-data CC are positively correlated (data-data CC= 0. presented versus trials where it was not. Similarly, for the discriminability, we evaluate the ability of the neuron's response at time to differentiate a probe from its surrounding probes 10 in terms of the spike count of the neuron in a 10 ms window around time in the trials that probes were presented at in a 10 ms window around − .

SOM 5) Future field remapping
It has been shown that in several sensory areas of the cortex, neurons become responsive to their future field (FF, same as RF2) prior to a saccade (in LIP 5 , FEF 6 , V2, V3a 7 , and V4 8 ). In area MT, predictive future field remapping of stable stimuli has not been where 1 is the average firing rate in the late response window (70 to 110 ms to the probe onset for the MT neurons and 65 to 130 ms for the V4 neurons) for the probes appearing in the perisaccadic period and 0 is the average firing rate of the neurons to the probes presented at the same location on the screen during the first fixation averaged over the same response 11 window. For both the V4 and MT neuronal populations, the modulation index is significantly greater than zero (FMIMT = 0.02±0.01, p=0.001; FMIV4 = 0.06±0.01, p<0.001; Supplementary Fig. 7a).
It is important to rule out residual luminance of the FF probe after the eyes have landed as a cause of the response to the perisaccadic FF probes. This "phosphor persistence" was determined to be the basis for some previously reported remapping phenomena 12,13 . In We also looked for a relationship between RF eccentricity (e.g., radial distance from in which either sample probe 31 appears at a specific delay relative to t* (top three rows), or a random probe appears at the same delay relative to t* (bottom three rows). Bottom: Histograms show the firing rate distributions for the trials in which sample probe 31 appears at a specific delay relative to t* (red), or a random probe appears at the same delay (blue).
The AUC of an ROC between these two distributions for each time and delay value produces