Frontotemporal coordination predicts working memory performance and its local neural signatures

Neurons in some sensory areas reflect the content of working memory (WM) in their spiking activity. However, this spiking activity is seldom related to behavioral performance. We studied the responses of inferotemporal (IT) neurons, which exhibit object-selective activity, along with Frontal Eye Field (FEF) neurons, which exhibit spatially selective activity, during the delay period of an object WM task. Unlike the spiking activity and local field potentials (LFPs) within these areas, which were poor predictors of behavioral performance, the phase-locking of IT spikes and LFPs with the beta band of FEF LFPs robustly predicted successful WM maintenance. In addition, IT neurons exhibited greater object-selective persistent activity when their spikes were locked to the phase of FEF LFPs. These results reveal that the coordination between prefrontal and temporal cortex predicts the successful maintenance of visual information during WM.


Spike sorting isolation quality:
We cross validated the spike sorting quality using an SVM Classifier. Figure S2a shows the average classifier performance in categorizing the spike waveforms of simultaneously recorded neurons across all sessions. The overall high performance of the classifier indicates that the spikes of each sorted cluster are well isolated from each other cluster (ΔPerfIT = 98.80% ± 2.50, n = 301 pairs, p < 10 -50 , ΔPerfFEF = 98.67% ± 4.018, n = 105 pairs, p < 10 -18 compared to chance level, i.e. 50%).
To doubly control for the possibility of oversorting, we verified results involving spiking activity using only a single randomly selected spiking unit from each recording session (Fig. S2b). For the difference in SPL for correct vs. wrong trials (Fig. 4a), we measured the distribution of the median difference in correct vs. wrong SPL for 1000 random subsamples of 79 spike-LFP pairs, each with a maximum of one unit per recording session. For > 99.5% of the subsamples the SPL for correct was greater than wrong, and the median of this distribution was significantly greater than zero (ΔSPLCr-Wr = 0.008 ± 0.000, p < 10 -10 , n = 1000). For the modulation of object selectivity on High vs. Low PPL trials (Fig.  5e), we calculated the distribution of the median difference in modulation index (modulation of object discriminability for High vs. Low PPL trials) for In vs. Out, for 1000 subsamples (each with 40 units, a maximum of one unit per recording session, Fig. S2c). For 100% of the subsamples the modulation index was greater for In vs. Out trials, and the median of this distribution was significantly greater than zero (ΔMIIn-Out = 0.090 ± 0.000, p < 10 -10 , n = 1000). Thus, the significance of the SPL and MI results did not depend on having multiple spiking units from the same electrode included in the analysis. Additionally, we tested whether neurons recorded from the same electrode showed a correlation in their SPL and MI effects. We calculated the correlation between the effect reported in figure 4a (SPL correct wrong) for pairs of neurons recorded on the same electrode, and found no correlation ( Fig. S2d; r = 0.033, p = 0.210). Similarly, for the MI effect from figure 5e (MI In-Out), there was no correlation in effect size between neurons recorded on the same electrode (r = 0.1, p = 0.110). We further compared the magnitude of these correlation values with the distribution of correlations for randomly selected pairs of neurons recorded on different electrodes (Fig. S2e-f ); there was no significant difference between the correlation observed for same-electrode pairs and the median of the distribution of correlations for different-electrode pairs (correlation for same vs. different electrode pairs: SPL, p = 0.160; MI, p = 0.100). Altogether, these analyses confirm that sorting multiple spiking units from single electrode recordings did not artificially inflate the magnitude or significance of the reported effects.

Object selectivity of neural activity in IT:
The IT population did not exhibit a significant increase in spiking activity during the delay period compared to baseline for the preferred object (ΔNFR = -0.005 ± 0.007, p = 0.465, n = 235), but did show a significant decrease in spiking activity for the non-preferred object (ΔNFR = -0.052 ± 0.006, p < 10 -18 , n = 235). During the target period, IT firing rates were elevated when the animal saccaded to the preferred object (Cr vs. Wr, ΔNFR = 0.015± 0.004, p <10 -4 , n =232).

Controls for PPL analysis:
In order to control for differences in the number of trials, we repeated the PPL calculation using a trial matching procedure (Methods, Fig. S5a left), and found that in the beta band PPL was still greater for correct trials (ΔPPL = 0.494 ± 0.151, p = 0.001, n = 63).
The PPL statistics and data presented in figure 2 and the main text used a shuffling procedure to remove any effect of within-area phase locking (see Methods). Without this shuffling, there was still significantly higher beta band PPL on correct vs. wrong trials ( There was a correlation between the magnitude of FEF's spatial selectivity and the difference in FEF-IT beta band PPL between correct and wrong trials (r = 0.220, p<10 -5 , n = 136; Fig. S6); this correlation was significant in both monkeys (Table S1).

LFP power-power correlations between areas:
In order to control for whether the PPL results reflect the correlation of activity between areas, we calculated power-power correlation between areas and their relationship to performance. There was no difference in LFP power correlations between FEF and IT for correct vs. wrong trials in any frequency band, and beta band power correlations did not show object or location selectivity (full statistics in Table S1). Figure S1. Behavioral performance. For (a-d), data shown separately for M1 (blue) and M2 (green). a, Reaction time for correct vs. wrong trials across sessions. Each point represents the median reaction times for correct and wrong trials in one experimental session. Correct responses were slower than wrong responses (p < 10 -5 , n = 86 sessions, Wilcoxon's signed-rank test, two-sided). M1 performed the task at a greater average eccentricity and had lower average performance than M2. b, Variability in saccade landing points for correct vs. wrong trials across sessions. Landing point variability was greater on wrong trials (p < 10 -5 , n = 86 sessions, Wilcoxon's signed-rank test, two-sided). Each point represents the average of the standard deviations of saccade landing points for the two target positions in a single session. c, Performance as a function of sample eccentricity; averaged across all trials for each session. d, Estimated FEF RF position for all sessions. For the In condition, the sample was placed at the RF center. e, Eye position during the task, in the (V) vertical and (H) Horizontal plane, for In (red) vs. Out (blue) and correct (brown) and wrong (cyan) trials. Data represented as mean ± SEM. In (a-e) 6 sessions were excluded because of a lack of eye data. f, Microsaccade rate over the timecourse of a trial for correct (brown) and wrong (cyan) trials, shown as mean ± SEM (n = 86 sessions). Figure S2. Evaluation of the quality of collected data. a, SVM classifier performance on waveform clusters is high. Plot shows the average classifier performance (mean ± SEM) in categorizing the spike waveforms of simultaneously recorded neurons as a function of time across the recording session (spikes across each session were split into 50 bins) for FEF (pink, n =105 waveform pairs) and IT (orange, n = 301 waveform pairs). Inset histograms show the distribution of average classifier performance across cluster pairs for each session, in FEF (left) and IT (right). b, Control for Fig. 4a with one unit per recording session. Histogram shows the distribution of the median difference in correct vs. wrong SPL for 1000 subsamples (each with 79 spike-LFP pairs, a maximum of one unit per recording session); the median of this distribution was significantly greater than zero (ΔSPLCr-Wr = 0.008 ± 0.000, p < 10 -10 , n = 1000 subsamples, Wilcoxon's signed-rank test, two-sided). c, Control for Fig.  5e with one unit per recording session. Histogram shows the distribution of the median difference in modulation index (modulation of object discriminability for High vs. Low PPL trials) for In vs. Out, for 1000 subsamples (each with 40 units, a maximum of one unit per recording session); the median of this distribution was significantly greater than zero (ΔMIIn-Out = 0.090 ± 0.000, p < 10 -10 , n = 1000 subsamples, Wilcoxon's signed-rank test, two-sided). d, Control for figure 4a: scatterplot shows the correlation between SPL effect size (correctwrong) for pairs of neurons recorded from same electrode (Kendall correlation, r = 0.033, p = 0.210, n = 234 pairs). e, Control for figure 4a: Histogram shows the distribution of correlation values for 100 subsamples with n = 234 pairs each, as in (d) of randomly selected pairs of neurons from different electrodes. Red arrow shows the correlation value for sameelectrode pairs, not significantly different from the median of the distribution (p = 0.160, n = 100 subsamples, Wilcoxon's signed-rank test, two-sided). f, Control for figure 5e: similar to (e) but for the MI effect (In -Out); there was no significant difference between the correlation coefficient for sameelectrode pairs and different-electrode pairs (p = 0.100, n = 100 subsamples, Wilcoxon's signed-rank test, two-sided). Figure S3. Object discriminability during the delay and visual periods was correlated. Object discriminability during the task, for M1 (blue) and M2 (green), was correlated between the visual and delay periods (Kendall correlation, r = 0.32, p < 10 -5 , n = 253 IT units). Each point shows the ability of one IT unit to discriminate between two objects (measured with ROC).

Figure S4. Behavioral correlations of firing rate, selectivity, and beta LFP power in FEF and IT.
Scatter plots show firing rate (a-b, n = 170 FEF units, n = 232 IT units), selectivity index for object information (c, n = 170 FEF units) or location information (d, n = 232 IT units), and beta band LFP power (e-f, n = 92 FEF sites, n = 68 IT sites), for correct vs. wrong trials, for FEF (a,c,e) and IT (b,d,f). M1 plotted in blue and M2 in green. All p-values from two-sided Wilcoxon's signed-rank tests. Figure S5. PPL trial matching, M1 vs. M2, object and location coding. a, Beta band PPL was greater for correct trials when trial numbers were matched, and without the shuffle correction. Heatmaps show the time-frequency map of PPL, normalized to baseline across the course of the DMS task (n = 63 LFP recordings), when the number of correct and wrong trials were matched (left) and without shuffle correction (right). Scatterplots illustrate the average beta PPL for the correct vs. wrong trials with trial matching (top) and without shuffle correction (bottom); p-values from two-sided Wilcoxon's signedrank tests. b, The trend toward higher beta band PPL on correct trials was present in both M1 and M2. Heatmaps show the time-frequency map of PPL difference between correct and wrong trials, across the course of the DMS task for the IN condition in M1 (left; n = 34 LFP recordings) and M2 (right; n = 29 LFP recordings). c, Inter-areal beta PPL encoded object identity during the delay period. Heatmap shows the time-frequency map of PPL, normalized to baseline across the course of the DMS task (n = 69 LFP pairs; correct trials, IN condition) for Pref (left) and NPref (middle) trials, and the difference between them (right). Scatter plots in the upper right show the mean delay-period PPL in the beta and low-gamma frequency bands for Npref (x-axis) vs. Pref (y-axis); p-values from two-sided Wilcoxon's signed-rank tests. Bottom right: Timecourse of object selectivity for correct (brown) and wrong (cyan) trials, mean ± SEM. d, Inter-areal beta PPL encoded the location of the sample during the delay period. Heatmap shows the time-frequency map of PPL, normalized to baseline across the course of the DMS task (n = 69 LFP pairs; correct trials, Pref condition) for In (left), Out (middle), and the difference between them (right). Scatter plots in the upper right show the mean delay-period PPL in the beta and low-gamma frequency bands for Out (x-axis) vs. In (y-axis); p-values from two-sided Wilcoxon's signed-rank test. Bottom right: Timecourse of location selectivity for correct (brown) and wrong (cyan) trials, mean ± SEM. All scatterplots show M1 in blue and M2 in green. Figure S6. Relationship between the magnitude of FEF spatial selectivity and the change in FEF-IT PPL with performance. Scatter plot shows the magnitude of FEF spatial selectivity (area under the curve, ROC In vs. Out) for each FEF unit (x-axis) and the difference in PPL value at that site for correct vs. wrong trials (y-axis). There is a positive correlation between the strength of FEF spatial selectivity and the change in PPL for correct vs. wrong trials (Kendall correlation, r = 0.22, p < 10 -5 , n = 136 FEF units). Scatterplots show M1 in blue and M2 in green, and lines show the fit for each monkey (blue, green) and combined date (black). Correlation between IT unit discriminability and Δ High vs. for each measurement, reflecting the difference between the means; effect sizes >2 are considered a "huge effect" 5 . Note that the majority of our effects, most importantly the key findings ( Fig. 2a-c, 4a, 5e), show significance of p<0.001 and so would still be significant with Bonferroni correction of at least n=50 multiple comparisons. Green boxes indicate statistical significance. Table S2. Statistics for relationship between within-area and inter-areal measures and object selectivity, location selectivity, and performance. Neural measure is indicated by the label on the left (firing rate, LFP power or phase in a specific frequency band). Area (FEF, IT, or inter-areal) is listed at the top. Selectivity measure (performance, Cr vs. Wr; location, In vs. Out; object, Pref vs. NPref) is indicated in the second row above the relevant column. Performance measures were calculated based on the Pref condition for IT, on the In condition for FEF and on the Pref, In condition for the interactions. Each box indicates the magnitude of the difference between conditions (Δ, mean ± SEM), significance (p), and sample size (n) for the corresponding neural and selectivity measure. Statistical comparisons use a two-sided Wilcoxon's signed-rank test. Color indicates significant increases (green) and decreases (gray) for the Cr, In, or Pref condition.

IT FEF FEF-IT Interaction
Performance Object Performance Location Performance Object Location