Disentangling Decision-related Feedforward and Feedback Signals in Human Cortex

During perceptual decisions, agents accumulate sensory evidence for a particular choice towards an action plan. It is commonly held that sensory cortical areas encode evidence, which is accumulated on the way to downstream cortical regions that encode the action plan. This exclusively feedforward framework contrasts with the existence of powerful feedback connections within sensory-motor cortical pathways. We used behavioral analysis and magnetoencephalography to disentangle decisionrelated feedforward and feedback signals in human cortex. We isolated choice-dependent feedback signals by comparing choice-predictive signals across cortical regions to the weighting of sensory evidence on choice. Feedback signals were prominent in the lowest processing stage (primary visual cortex), expressed in a narrow frequency-band (around 10 Hz), and built up during decision formation, following the choice-predictive dynamics in downstream cortical regions. Our results challenge current, pure feedforward accounts of perceptual decision-making, and yield a computational interpretation of frequency-specific cortical oscillations. . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint


Introduction
A fundamental issue in neuroscience is to understand the mechanisms underlying decisions about the state of the sensory environment. Convergent progress in computational theory, behavioral modeling, and neurophysiological analysis has converged on an influential framework for perceptual decisions: Sensory evidence supporting a particular choice is accumulated over time into an internal "decision variable"; in many contexts, this decision variable is reflected in the motor plan to report the corresponding choice [1][2][3][4] . Computational models based on these notions performs well in fitting behavioral choice, reaction times, and associated confidence, of different species in a variety of behavioral task protocols [4][5][6] . Specifically, when the duration of available perceptual evidence is controlled by the environment, the sign of the decision variable at the end of the evidence sequence determines choice, and its magnitude 7,8 , combined with elapsed time 9 , determines the associated confidence.
This class of above models affords an intuitive interpretation of neural signals that have been observed at different processing stages of the cerebral cortex. The sensory evidence relevant for a given choice task is encoded by sensory cortical neurons 10 . For example, neuronal populations in visual cortical area MT encode the direction of motion of dynamic random dot patterns 11,12 and neuronal populations in visual cortical area V4 encode the color of these patterns 12 . By contrast, neural correlates of accumulated sensory evidence have been identified in the build-up of motor preparatory activity in different regions of the rat 13 , monkey 2,3,9 , and human cerebral cortex [14][15][16][17] . These latter results are in line with the idea that sensory responses are fed into an integrator on their way to associative and (pre-) motor cortical circuits 2,18,19 .
The above, dominant neurobiological framework of perceptual decision-making entails a purely feedforward accumulation of sensory evidence across the cortical sensory-motor pathways. In this view, decision-related neural activity in sensory cortex solely reflects the feedforward impact of sensory evidence on choice 11,20 . By contrast, the cortical system, in which the transformations from to sensory input to behavioral choice are realized, exhibits powerful anatomical feedback connectivity to sensory cortex from downstream regions in association and (pre)-motor cortex 21,22 . Indeed, comparisons of behavior and single-unit activity in monkeys suggest that part of the decision-related activity in visual cortex reflects feedback from downstream processing stages encoding the evolving decision variable 23-25 . theoretical work 26,27 . Here, we developed an approach to meets both requirements. We combined a visual perceptual choice task with quantitative psychophysics, and a novel, regionally-specific magnetoencephalography (MEG) decoding approach. This enabled us to concurrently track the dynamics of sensory evidence weighting on behavioral choice, evidence encoding in visual cortex, and decision build-up in downstream cortical regions. Extending recent advances in monkey physiology 12,[28][29][30] , we quantified sensory and decision-related signals across a large and comprehensive set of cortical regions. Our results suggest an important role of cortical feedback interactions in evidence accumulation and call for a revision of the dominant neurobiological framework of perceptual decision-making.

Results
Participants (N=15) performed a visual decision-making task, in which they compared the average contrast of a stream of 10 successive visual drifting grating stimuli (each 100 ms duration) whose contrast levels fluctuated ("test stimulus") with the contrast of a previously presented grating of 50% contrast ("reference stimulus", Fig. 1A). After the offset of the test stimulus, participants provided a simultaneous perceptual choice and confidence judgment, by pressing one of four buttons: they responded with the left or right hand to report whether the average contrast of the ten samples in the test stimulus was stronger than the reference contrast ( Figure 1A); and the index or middle finger (counter-balanced) to indicate high or low confidence in their choice, respectively. This task provided access to the temporal profiles of (i) the weighting of each piece of sensory evidence on behavioral choice, (ii) the neural encoding of sensory evidence and (iii) the neural encoding of the evolving decision variable. Jointly, these three signatures constrained the functional interpretation of decision-related neural activity. In what follows, we first isolate signatures of evidence and decision variable encoding across cortex, and then use them to dissect different components of decision-related activity in visual cortex. a reference stimulus with constant contrast across trials (Michelson contrast: 0.5; "reference"), followed by (1-1.5 s delay) ten consecutive samples, whose contrast fluctuated from sample to sample. The ten samples together were called "test" stimulus.
The task was to indicate if the test contrast (averaged across samples) was stronger or weaker than the reference contrast (see main text for details). Feedback was given after a delay (0-1.5 s; 250 ms low vs. high tone). Contrast values were sampled from a normal distribution (mean contrast ± continuously adjusted to 75% accuracy, s.d. sampled randomly from [0.05, 0.1, 0.15]). Gratings depicted here have lower spatial frequency than in the experiment for visualization purposes. B.
Psychophysical kernel quantifying impact of trial-by-trial contrast fluctuations on choice, as a function of sample position.
Further, contrast information at all sample positions had a significant leverage on choice (Figure S1D, compare curves for "test stronger" vs. "test weaker" choices) and confidence (compare "high confidence" vs. "low confidence" judgements for the same choice). We used psychophysical kernels 23,[33][34][35] to track the time course of the weighting of contrast information on choice. We quantified psychophysical kernels in terms of the area under curve (AUC) of the receiver-operating characteristic relating a given contrast level to the participant's choice ( 36 ; see Methods for details), which ranges between 0 and 1, whereby values larger than 0.5 indicated that sample contrasts stronger than reference tended be followed by "stronger than reference" choices. AUC-values for all sample positions were significantly above 0.5 ( Figure 1B).
Critically, however, the impact of contrast samples on choice declined over time across the test stimulus interval ( Figure 1B; slope of psychophysical kernel: -0.009, t(14)=-5.3, p=0.0001). This is consistent with results from a range of other perceptual choice tasks in humans and monkeys [33][34][35]37 , and provides a reference for interpreting the dynamics of decision-related neural activity below.

Sensory encoding across the visual cortical hierarchy
To track sensory and decision-related activity across cortex, we estimated the dynamics of many cortical regions (Methods). To characterize physiological properties of regionally-specific dynamics, we focused on a set of regions implicated in the visual decision process, and defined based on fMRI: . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint retinotopically organized visual cortical field maps 38 , as well as three regions exhibiting hand movement-specific lateralization 39 : anterior intraparietal sulcus (aIPS), the junction of the IPS and postcentral sulcus (IPS/PostCeS), and the hand-specific sub-region of primary motor cortex (henceforth called "M1-hand"; see Figures 2 and S2, and Methods). We then used frequency-based, multivariate decoding techniques to track sensory and decision-related activity in a total of 180 regions per hemisphere (Glasser et al, 2016), covering the entire cortex (see Figures 3,4, S3, and Methods). Temporal correlation between V1 contrast encoding profiles and psychophysical kernels from Figure 1B. Each dot is one participant. Black lines, average correlation; asterisks (t-test, p < 0.05). D. Decoding precision (averaged convex hull across gray shaded area) for all ROIs: Single contrast samples (left), accumulated contrast (center) and their difference (right).
Gamma-band activity in visual cortex scales monotonically with stimulus contrast [40][41][42][43][44][45][46] . Indeed, we found that in all visual cortical field maps, but not the movement-selective regions aIPS, IPS/PostCeS, and M1-hand, gamma-band (40-60 Hz) power was elevated throughout test presentation (Figure 2A), and differentiated between trials with stronger and weaker mean contrast ( Figure 2B). All areas, including the movement-selective ones, exhibited a sustained suppression of alpha-/beta-band (about 8-36 Hz) activity (Figure 2A), which did not differentiate between mean contrast conditions ( Figure   . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint 2B). The visual cortical power responses also tracked the rapid sample-to-sample fluctuations in contrast throughout the trial (Figure 3). We trained pattern classifiers to decode sample contrast from local spectral patterns (power values from 1-145 Hz) and regressed decoded onto physical sample contrasts (Methods). Contrast decoding profiles in V1 for consecutive samples showed peaks about 190 ms after sample onset ( Figure 3A). The convex hull curve of peak decoding values for individual contrast samples, a summary measure of decoding performance across samples, was significant throughout decision formation ( Figure 3B, orange).
The precision of V1 contrast encoding also decreased monotonically from the first to the ninth sample ( Figure 3B left, group average slope across first nine samples : -0.005, t(14)=-5.4, p=0.0001).
Measurements from the final sample may have been affected by a stimulus offset response, but the slope was also significantly negative across all ten samples (p=0.0002). Thus, contrast encoding in V1 was progressively attenuated during decision formation, highly consistent with recent findings from monkey visual cortex ( 30 ; their Figure 2e). Such an attenuation should translate into a weakening of the impact of stimulus fluctuations on choice, thus potentially explaining (part of) the decay observed in psychophysical kernels ( Figure 1B). Indeed, the temporal profiles of this decay in V1 contrast responses exhibited significant similarity to the temporal profiles of the psychophysical kernels (group average temporal correlation: r=0.32, t(14)=2.6, p=0.02; Figure 3C), thus establishing the behavioral relevance of the V1 decoding profiles in Figure 3A, B.

Encoding of action plan and accumulated evidence in the cortical visuo-motor pathway
When choices are reported with left-or right-hand movements (as in our task), the hemispheric lateralization of activity in motor and parietal cortical regions encodes specific choices 14,39,47 . This activity is frequency-specific and builds up during decision formation 14,15,48  The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint (using more fine-grained spatial information as well as signal phase and amplitude, Methods) to a set of frontal regions selected based on previous work ( Figure S3B). The results supported our focus on the above regions.
Critically, activity in motor and parietal cortex (M1 and IPS/PostCeS, respectively) did not only track the evolving plan to act (choice-predictive activity), but also the temporal accumulation of decisionrelevant sensory input. We regressed power values onto the average of all previous contrast samples at a specific latency after sample onset (Methods). The slope between decoded average contrast and actual average contrast quantified the precision of accumulated contrast read-out from spectral activity ("accumulated contrast", magenta lines in Figure 3B). The temporal profiles of accumulated evidence encoding showed a clear ramp-like signature, with a plateau around stimulus offset ( Figure 3B The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint fluctuations in cortical single-neuron activity, for a fixed stimulus category, and choice is referred to as "choice probability" 11,20,23,50 . One study found primacy in psychophysical kernels combined with an increase in choice probabilities from one sample to the next, in line with decision-related feedback 23 .
Yet, models with bi-directional signal flow between sensory and decision stages show that choice probabilities in sensory cortex will generally mix feedforward and feedback components 26,27 . of change of V1 kernels across test stimulus interval (slope across samples). Black bars, differences between raw and residual kernels (t-test, p<0.05); blue bars, slope deviations from 0 (t-test, p<0.05). E. Time-lagged correlation between M1-hand accumulated contrast profile ( Figure 3B, right) and V1 (overall) alpha-band kernel. Blue bar, difference from 0 (t-test, p<0.05).

Untangling feedforward and feedback components of decision-related activity in visual cortex
We reasoned that the spectral profiles of the associated local field potential activity (our MEG source estimates) might help disentangle the feedforward and feedback components of decision-related signals. Visual cortical gamma-band activity has been linked to feedforward cortical information flow 22,40,[51][52][53][54][55] . The same work has implicated visual cortical alpha-band activity in feedback signaling.
Those studies focused on physiological properties (e.g. time-lagged correlations between cortical regions or laminar activity profiles within regions) without a link to specific behaviors. In our task, contrast information (test stronger vs. weaker category) was robustly present in gamma-band across . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint the visual cortical hierarchy ( Figure 2B). We hypothesized that choice-predictive fluctuations in visual cortical gamma-band activity (i.e., within each stimulus category: weaker/stronger than reference) would reflect feedforward signaling, and choice-predictive fluctuations in the alpha-band feedback signaling. We tested this prediction by computing choice-prediction kernels from fluctuations of estimated V1 activity (extracted at the latency of peak contrast encoding, t=192 ms after sample onset), separately for the two frequency bands (averaged across hemispheres). We here refer to "V1 kernels", rather than "choice probability", to highlight the difference to single-unit measures, as well as the analogy with the above-described psychophysical kernels (the only difference being the use of trial-to-trial fluctuations of either neural activity or sample contrast; see Methods). Fluctuations of V1 alpha-band power (8-12 Hz) also predicted choice, but with distinctly different dynamics from those in the gamma-band, with significant prediction only for later samples ( Figure   4A, right, blue) and a time course that was negatively correlated with the psychophysical kernels ( Figure 4B, right). Both characteristics of the V1 alpha-band kernels were in line with the hypothesized feedback source. Indeed, removing contrast-driven fluctuations from alpha-power did not change the shape of V1 alpha-band kernels ( Figure 4A, right, green), showing that the decisionrelated fluctuations in alpha-power resulted from an intrinsic source, rather than from the external stimulus. Similar results were found across the visual cortical hierarchy ( Figure S4). In sum, choice-. CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint predictive fluctuations of alpha-and gamma-band activity differed in terms of both their sources and temporal profiles.
We focused on the alpha-and gamma-bands based on their basic physiological properties observed in previous work (see above), but this focus was corroborated by frequency-resolved analyses of the above effects across a broad frequency range. The reduction of choice-predictive gamma-band activity through removal of external contrast fluctuations occurred specifically in a narrow band (50-55 Hz, Figure 4C; p<0.05, FDR corrected). Likewise, the kernel slope (a measure of the increase/decrease across samples) showed a specific increase in V1 kernel magnitude over sample positions for a [10][11][12][13][14][15] Hz frequency band (alpha), complemented by a decrease in the 55-60 Hz band (gamma, Figure 4D, p<0.05, FDR corrected). Thus, the functional dissociation between different components of decisionrelated V1-activity was expressed in clearly delineated frequency bands.
While the up-ramping time course of V1 alpha-band kernels contrasted with the time course of the psychophysical and V1 gamma-band kernels (see above), it resembled the build-up of choicepredictive activity in IPS/PostCeS and M1-hand. Indeed, this similarity was expected from models of decision-related activity in visual cortex, in which the "top-down" component of choice-predictive activity is mediated by selective feedback of decision-related signals in downstream, "accumulator regions" to sensory cortex 26,27 . Yet, cortical feedback interactions also exhibit time delays longer than those of feedforward pathways, due to the involvement of slower synapses 56 and possibly polysynaptic pathways 57 . To directly test for the correlation between decision-related alpha-band activity in V1 and M1-hand, and to assess the directionality of this interaction, we computed the crosscorrelations (in units of stimulus samples) between the V1 alpha-band kernels and the M1-hand decoding profile. As predicted by the feedback hypothesis, the mean cross-correlation across participants peaked at a lag of one sample ( Figure 4E, 100 ms, t(14)=2.6, p=0.02), indicating that M1hand leads V1 alpha-band activity by one sample (about 100 ms). This result does not imply direct (mono-synaptic) interactions between choice-predictive activity in M1-hand and V1: other areas (e.g., parietal cortex) might have relayed this information from M1 to V1. Even so, our results clearly show that decision-related V1-activity in the alpha-band reflected the feedback of decision-related build-up activity evident in multiple downstream brain regions.

Discussion
Perceptual decision-making is a domain of cognition with a far-advanced integration between theory and neurobiological experimentation, owing to convergence between computational theory, behavioral psychophysics, and neurophysiology 1,2 . Despite its wide popularity, the canonical framework in decision neuroscience has been challenged by recent work 30,58,59 . Here, we focused on the fact that the dominant framework of perceptual decision-making ignores feedback connectivity, a key organization The finding that V1 activity in the gamma-band tracked fine-grained fluctuations in the visual contrast adds to a substantial body of evidence showing close, monotonic contrast encoding in V1 local field potentials: increases in contrast lead to changes in both gamma-band power and gamma peak frequency 41,42,45,60 , both of which are also evident in human MEG data 41,44,61 . Yet, gamma-band power changes and peak frequency depend not on stimulus contrast alone, but are influenced by stimulus position, size and spatial frequency 42 . We here kept stimulus position, size and spatial frequency constant and our decoding approach exploited the full spectral and spatial pattern of contrast-related power changes, thereby exploiting both peak shifts and overall power changes for visual contrast decoding.
Our approach was agnostic about the nature of the neural code used for solving the task 62 . Indeed, the decision process likely reads out the spiking output of visual cortical activity, rather than gamma-band activity 42 . While V1 spiking activity also scales with contrast in a monotonic fashion and is generally closely coupled to local field potentials (and the resulting MEG activity) in the gamma-band, these two signals can also dissociate 60 . For example, power and peak frequency scale linearly with contrast across the full range 41 , whereas V1 spiking saturates at higher contrasts 63 . These dissociations will contribute to the V1 gamma-band kernels in Figure 4, causing an underestimation of the true feedforward link between sensory responses in V1 and behavioral choice. Indeed, internal fluctuations in a neural signal that mediates the impact of the external stimulus on choice should show some choice-predictive effect, which we did not find in the residual gamma-band kernels for any visual cortical area (see Figures 4 and S4). This absence of effect may be explained by the indirect nature of the link between gamma-band activity and choice (with gamma-band activity only being a proxy of the spiking output), the low signal-to-noise ratio of the gamma-band responses in MEG, or a combination of the two. Despite these limitations, our results highlight the utility of local field potential oscillations in future invasive investigations of decision-related neuronal activity, and provide a new benchmark for the advancement of hierarchical cortical circuit models of perceptual decision-making 26 . They also provide the first direct physiological support for the hypothesized co-. CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint existence of feedforward and feedback components in the decision-related spiking activity of visual cortical neurons 23,26,64,65 . Our current work bridges between two fields of the literature that have been largely disconnected. The insight that feedforward and feedback components of decision-related activity fluctuations in early visual cortex were expressed in the gamma-vs. alpha-bands resonates well with the emerging view that feedforward and feedback information flow through the cortical hierarchy is mediated by layerspecific pathways communicating in these two spectral channels, respectively 22,40,51-55 . These pathways have been conceptualized in the context of attention 66 and predictive coding 52,67 , but not the accumulation of evidence towards a choice. Evidence accumulation, the essence of dynamic belief updating, may provide a powerful framework for advancing and testing theories of frequency-specific channels for inference.
Our insights into the decision process hinged on our ability to track the time course of the encoding of sensory evidence, accumulated evidence, and action plan from each of a large and comprehensive set of well-delineated functional cortical regions. To this end, we combined MEG source reconstruction with state-of-the-art anatomical atlases and a decoding approach. Hereby, multivariate pattern classification was based on the spectro-spatial patterns of local activity within single cortical regions 68 . Current fMRI approaches enable decoding of sensory or cognitive variables from fine-grained multi-voxel patterns in multiple cortical regions [69][70][71][72] , but they lack the necessary temporal resolution for tracking the dynamics of decision formation. Conversely, E/MEG decoding studies 73-75 provide the critical temporal resolution, but commonly use the whole sensor array as features for decoding, which precludes inferences about information flow between brain regions. Situated in between these two types of previous approaches, our current approach provides the opportunity to track information dynamics across the cortical hierarchy, akin to recent multi-areal recording work in animals 12,28,76,77 .
Critically, our work is the first to combine such a large-scale information tracking approach with a detailed behavioral analysis that constrains the functional interpretation of the signals in terms of the underlying information flow.
We found the strongest choice-predictive activity in anterior intraparietal (IPS/PostCeS) and (pre-) motor cortical regions. Complementary analyses indicated that choice information in these regions was primarily encoded in terms of large-scale spatial biases (right vs. left hemisphere) with respect to the ensuing action (left vs. right hand button press) used to report the perceptual choice ( Figure S2 and decoding based on the lateralization of spectral power patterns, data not shown). We found no robust choice encoding in regions that did not exhibit such large-scale biases. When using a decoding approach that could exploit finer-grained spatial patterns of both, signal phases and amplitudes, we obtained even higher choice-prediction values in M1 than our common, spatially coarse-grained . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint approach (compare M1 in panels A and B of Figure S3); but we did not find strong choice-predictive activity in more anterior regions of prefrontal cortex, in which previous studies of perceptual decisionmaking have identified robust choice-predictive signals in action-independent formats 12,71,78 . Those previous studies employed tasks, in which the evolving perceptual decision could not be mapped onto a specific action plan during decision formation 12,71,78 . Our task, by contrast, allowed for such a mapping, analogous to a large body of work on saccadic choices in monkeys 2 . The locus and format (motor vs. non-motor) of cortical build-up activity during decisions strongly depends on such behavioral contexts 79 , and decision-related build-up activity may have exclusively been expressed in terms of motor preparatory activity in our task. Nonetheless, it is possible that M1 hand area and IPS/PostCeS only reflected the outcome of an evidence accumulation process that took place elsewhere 18 , and that we failed to detect with our approach.
Our findings also add to, and extend, recent evidence from monkey physiology pertaining to the role of adaptation phenomena in decision-making 30 . Response attenuation in sensory cortex is a wellestablished phenomenon that occurs in various behavioral states (including anesthesia) and has been explained in terms of local cortical micro-circuit properties [80][81][82] . Yet, most variants of the canonical computation framework for perceptual decisions have ignored the existence of adaptation (but see 4,30 Our current results, in line with those of Yates et al 30 , show that the progressive attenuation of sensory evidence encoding can account for a continuous reduction in the impact of evidence on choice throughout decision formation. This so-called primacy has been observed for many perceptual choice tasks in monkeys 23,33 and humans 34,35,37 , and has commonly been explained in terms of bounded accumulation at the decision stage 33,35 . The here-observed attenuation of evidence encoding at the sensory stage is in line with behavioral modeling in humans and rodents 4 as well as monkey physiology 30 . Indeed, we found that the attenuation profile largely explained the time course of psychophysical kernels for the majority of participants ( Figure 3C). The realization that attenuation of sensory signals is sufficient to explain primacy (i.e., without any bound at the decision stage) should be taken into account when interpreting psychophysical kernels from perceptual decision-making tasks, and calls for directly monitoring the dynamics of sensory encoding, as we did here.
An important issue for future work is to pinpoint the function of decision-related feedback to early visual cortex during evidence accumulation. Feedback signals may be the source of confirmation biases 27 , a pervasive feature of human perceptual and higher-level judgment 83 . "Volatile" environments, in which the source of the sensory evidence can change unpredictably [84][85][86][87] , provide an opportunity to study the adaptive function of these biases and decision-related feedback signals.

Acknowledgments
We thank Genis Prat Ortega for ongoing discussion throughout this project, and Klaus Wimmer and Alan Stocker for thoughtful comments on the manuscript. This work has been funded by the Deutsche

Task and procedure
Participants completed five different recording sessions and each session consisted out of five blocks à 500 trials, lasting approximately 60 minutes. The first session was a training session used for exposing participants to the task and to calibrate their performance to 75% correct. A trial consisted of the following sequence of events ( Figure 1A). First, a reference stimulus (grating of fixed contrast at 0.5) was displayed for 400 ms. After a variable delay (uniform between 1 and 1.5 s) ten successive samples of variable contrasts (see below) were shown (each 100 ms); together these ten samples made up the test stimulus, the mean contrast of which participants should compare with the reference (i.e. forced choice: "stronger than reference" or "weaker than reference"). The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint The offset of the last sample marked the beginning of the response period for participants. Participants reported their binary choice, and their confidence about the correctness of that choice ("high" vs. "low") simultaneously, by pressing one of four different buttons, whereby the two hands were always mapped to different choices (mapping counterbalanced between participants). The index and ring fingers of each hand were then used to report confidence (again mapping counterbalanced across participants). During MEG sessions, participants used two response pads, one for each hand. During the training sessions participants, used the same stimulus-response mapping, but pressed keys on a computer keyboard. After a participant's response and a consecutive variable delay between 0 and 1.5 s auditory feedback was given (250 ms duration). A low tone indicated a wrong answer and a high tone indicated a correct answer.
The ten consecutive contrast samples were draws from a normal distribution centered on a participants 75% accuracy contrast level. This threshold was determined by running a QUEST staircase 89 continuously in the background. The standard deviation of the normal distribution was chosen randomly from [0.05, 0.1, 0.15] from trial to trial. After each set of 100 trials participants could take a short break self-timed break. After the third block, participants took a longer break lasting at least five minutes.

Stimuli
We used expanding or contracting circular gratings similar to the stimuli from Michalareas 40 . The intensity of a given pixel was given by computing a blending value for each pixel: where d (x,y) was the distance of pixel (x,y) to the center of the screen and r=3/4° determined the spatial frequency of the grating. Varying s over frames yielded expanding or contracting gratings, respectively. We varied s such that the grating moved with a speed of 4/3 °/s. On each trial, the grating either expanded or contracted, but never changed direction. To obtain a final color value we used a (x,y) to blend two grayscale colors that had the desired contrast: where c was the desired contrast. We furthermore set an inner annulus with a radius of 1.5° to uniform gray. The grating had a radius of 12.5°, but reached the vertical screen border after 11.3° from the center. The contrast of the reference grating was always set to 0.5, while the contrast of the test stimulus varied and changed every 100 ms, as controlled by the staircase procedure (see above).
Stimuli were generated using Psychtoolbox 3 for Matlab. They were back-projected on a transparent screen using a Sanyo PCL-XP51 projector with a resolution of 1920 x 1080 at 60 Hz. The luminance . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint profile was linearized by measuring and correcting for the systems gamma curve. A doubling of contrast values therefore also produced a doubling of luminance differences. During the first training session stimuli were presented on a VIEWPixx monitor with the same resolution and refresh rate (also linearized).

Data acquisition
We used a CTF MEG system with 275 axial gradiometer sensors and recorded at 1200Hz. Recordings took place in a dimly lit magnetically shielded room. We concurrently collected eye-position data with a SR-Research EyeLink 1000 eye-tracker (1000Hz). We continuously monitored head position by using three fiducial coils. After seating the participant in the MEG chair, we created and stored a template head position. At the beginning of each following session and after each block we guided participants back into this template position. We used Ag/AgCl electrodes to measure ECG and vertical and horizontal EOG.

Evaluation of choice and confidence dependence on contrast
We used a logistic regression to evaluate whether participants exploited different contrast samples for their choices and confidence judgements. We fit the following logistic regression to predict choices from contrast values: where y trl was the choice in a given trial and c trl,I the contrast value of sample i in the same trial. We evaluated the accuracy of this fit with 5-fold cross-validation. All available trials from one subject were split into five folds and we used each fold as test set once and all remaining folds for weight estimation. We carried out cross validation out per subject and then averaged across folds and subjects. Since we titrated participants' accuracy to 75% correct, we also expected that the accuracy of this logistic regression is bounded close to 75% correct. We also evaluated whether confidence judgements were based on contrast. To this end, we fit a similar logistic regression, but this time predicted confidence judgements for each response separately. We again evaluated the accuracy of this logistic regression with 5-fold cross-validation.

Definition of trial categories for the current task
The computation of psychophysical kernels described below required sorting trials into four categories defined by a unique combination of the physical stimulus category (i.e. mean contrast of test stimulus stronger or weaker than reference) and the participant's perceptual choice (test "stronger" or "weaker"). We defined these four categories based on signal detection-theory 36 , as follows. Hits and . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint misses: "stronger" and "weaker" choices, respectively for trials in which the physical test stimulus was stronger than the reference; false alarms and correct rejects: "test stronger" and "test weaker" choices, respectively for trials in which the physical test stimulus was weaker than the reference.

Computation of psychophysical kernels and neural-activity kernels
With the term "kernel", we refer to the time course of the correlation between trial-to-trial fluctuations in some quantity (stimulus sample contrast or measured neural activity) and the participant's behavioral choice. All kernels were computed after factoring out the physical stimulus category (i.e., mean test contrast stronger or weaker than reference).  Figure 1C.
Neural activity kernels were computed analogously, only that single-trial sample contrast values where substituted by single-trial neural activity values. Specifically, we used gamma-or alpha-band power values extracted from visual cortical field maps (see below) at t=192 ms after sample onset, the latency with peak contrast decoding precision. The so-computed kernels are referred to as "overall kernels" in Figure 4. A follow-up analysis aimed to further remove the effect of trial-to-trial fluctuations of the external contrast samples and isolate choice-predictive activity originating from intrinsic sources. Here, we regressed single-trial sample contrast values on the corresponding cortical power values (again at t=192 ms after sample onset). We then subtracted this prediction from power values and used these residuals in the computation of V1 kernels, yielding the so-called "residual kernels" in Figure 4.

Preprocessing, spectral analysis, and source reconstruction of MEG data
We used beamforming to reconstruct the sources of activity observed at the MEG sensor level. First, we automatically labeled artifacts in raw MEG data by detecting blinks, muscle artifacts, sensor jumps and cars passing by in the vicinity of the building. Blinks were detected based on concurrently recorded eye-movement signals (SR-Research EyeLink 1000). Sensor jumps were detected by . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint convolving each sensor with a filter designed to detect large sudden jumps and subsequently by looking for outliers in the filter response. Muscle and environmental artefacts were detected by filtering each channel in the 100-140Hy or <1Hz range and by detecting outliers that occurred simultaneously in many channels. To remove power line noise, we applied a notch filter. In a final step, we epoched data, downsampled to 600Hz and discarded all epochs that contained artifacts.
We computed time-frequency representations (TFRs) of single-trial data by using a multi-taper method. For low frequencies (2-10Hz in steps of 1Hz), we used a window length of 0.25 s (frequency smoothing of 8Hz). For high frequencies (10-150Hz in steps of 5Hz), we used a window length of 0.1 s (20Hz frequency smoothing).
We used linearly constrained minimum variance (LCMV) beamforming to estimate activity time courses at the level of cortical sources 90 . We constructed individual three-layer head models from subject specific MRI scans using fieldtrip ( 91 ; functions, ft_volumesegment and ft_prepare_mesh).
Head models were aligned to the MEG data by a transformation matrix that aligned the average fiducial coil position in the MEG data and the corresponding locations in each head model. Transformation matrices were generated using MNE software 92 . We computed one transformation matrix per recording session. Third, we reconstructed cortical surfaces from individual MRIs using Freesurfer and aligned two different atlases to each surface 38,93 . In a fourth step we used MNE 92 to compute LCMV filters for projecting data into source space. LCMV filters combined a forward model based on the head model and a source space constrained to the cortical sheet (4096 vertices per hemisphere, recursively subdivided octahedron) with a data covariance matrix estimated from the cleaned and epoched data. We computed one filter per vertex, based on the covariance matrix computed on the time-points from stimulus onset until 1.35s after stimulus onset across all trials. We chose the source orientation with maximum output source power at each cortical location. In a final step, we computed TFRs of the segmented MEG data (same method as described for the sensor-level TFR decomposition) and projected the complex time series into source space. In source space we computed TFR power at each vertex location and then averaged across all vertices within a ROI. We aligned the polarity of time-series at neighboring vertices, because the beamformer output potentially included arbitrary sign flips for different vertices. We then converted complex Fourier coefficients in each vertex into power (taking absolute value and squaring).
For several analyses we averaged power estimates across all vertices within a given ROI (see below).
We did this either by collapsing across or within both hemispheres. We then converted the resulting power values into units of percent power change relative to pre-stimulus baseline 14 . For this, we used a frequency and ROI specific, condition independent, pre-stimulus time-period (from -250 ms to 0 before test stimulus onset) as baseline for all trials. When collapsing within hemispheres, for some . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint analyses, we computed a lateralization score (contra-vs. ipsilateral to the hand used to report "test stimulus is stronger").
The code for the above-described analysis steps is available at www.github.com/Donnerlab/pymeg.

Regions of interest (ROIs)
We used two sets of ROIs. The first was confined to 18 cortical regions, all of which were defined by previous fMRI work and co-registered to individual structural MRIs: (i) retinotopically organized visual cortical field maps entailed in the atlas from Wang et al. 38 and (ii) three regions exhibiting hand movement-specific lateralization of cortical activity: aIPS, IPS/PCeS and the hand sub-region of M1 39 . Following a scheme proposed by Wandell and colleagues 94 , we grouped retinotopic visual cortical regions with a shared foveal representation into clusters (see Table 1), thus increasing the spatial distance between ROI centers and minimizing the risk of signal leakage. The second, cortex-wide, set of ROIs were all 180 clusters of regions covering the cerebral cortex, as defined in 93 . Those we also co-registered to individual MRIs.

Choice-specific lateralization of spectral power
We computed TFRs of choice-specific power lateralization for each ROI (contralateral vs. ipsilateral to hand movement used to report "test stronger" choice). Lateralized activity was computed for each physical stimulus (i.e. mean test contrast) and choice condition separately before computing the final contrast that isolated choice-specific activity. For the latter, we contrasted error and correct responses for each physical stimulus (contrast) condition separately. We first averaged trials from each . CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint combination of stimulus and choice condition and subsequently computed the difference between hits and misses, and then the difference between false alarms and correct rejects, and finally averaged these two differences. In other words, we computed the difference between "test stronger" and "test weaker" choices, separately for each physical stimulus condition (mean contrast of test stimulus), and only averaged their results afterwards. This ensured any activity differences due to the physical stimulus were factored out and the result isolated differential activity that was specific to behavioral choice. When similarly computing choice-specific power changes on hemisphere-averaged power values, we found no significant effects in any ROI (data not shown). Statistical significance of lateralization values was assessed by cluster-based permutation test (threshold free cluster enhancement, H=2, E=0.5, p<0.05).

Stimulus-specific (test contrast-related) dynamics of spectral power
Stimulus-specific activity was computed similarly to choice-specific activity, but on TFRs of the hemisphere-averaged power values (not their lateralization). This was done because the full-field, centrally presented visual grating stimuli were expected to produce about equally strong sensory responses in visual cortical field both hemispheres. To factor out choice information in this analysis, conversely to the computation of choice-specific activity, we now contrasted different physical stimulus conditions (i.e., mean test contrast) conditions, separately for each choice and then averaged their result. Statistical significance of power changes was assessed by cluster-based permutation test (TFCE, H=2, E=0.5, p<0.05).

Decoding of choices
We used multivariate pattern classification techniques 95 to decode choice information contained in the estimated activity patterns of individual ROIs (different sets, see above). Decoding was carried out separately for each time point throughout the trial in order to generate time courses of choice information. For each subject, ROI, and time point, we extracted single trial power values across all frequencies from the ROIs described above.
Two different sets of choice decoding analyses used different features for decoding: (i) the spectral pattern of power values from both hemispheres for each ROI (i.e. coarse-grained spatial patterns); this was applied to either the visuo-motor pathway set ( Figure 2C) or the cortex-wide set ( Figure S3A); and (ii) the spectral patterns of phase and power values from each individual vertex (i.e. fine-grained spatial patterns) for a subset of ROIs from the above two sets ( Figure S3B). While approach (i) imposed that hemispheric lateralization would be used as feature for decoding (based on de Gee et al. 39 ), approaches (ii) and (iii) allowed the classifiers to use lateralization only if useful for improving decoding performance. Due to the high computational demand of this analyses we focused on 17 ROIs in dorsolateral prefrontal cortex and M1-hand.
. CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint Features were z-scored before decoding based on training data only and the same transformation was applied to test data before evaluating decoding performance. We used linear support vector machines for decoding (C=1). Since choices were not equally distributed (<p_yes> = 0.57, <crit> = -0.23), we up sampled the minority class in the training data (but not in the test set) by randomly repeating elements until the frequency of choices was equal. We trained separate classifiers for trials where average contrast was below or above the reference contrast, effectively asking the classifier to separate correct answers from errors. For the time courses in Figure 2B we computed decoding performance (comparing "test stronger than reference" and "test weaker than reference" choices) across time and ROIs, separately for the two physical stimulus conditions (i.e., "test contrast stronger than reference" and "test contrast weaker than reference").
To evaluate classification performance, we used 10-fold cross-validation and computed ROC-AUC values to evaluate each classifier and averaged across folds. In short, we split all available data per subject into ten folds, keeping the same percentage of choices in each fold. We subsequently used nine folds to determine parameters of the classifier and computed prediction scores, which we used to compute ROC-AUC values. We use this cross-validation strategy for all decoding analyses. We used Bayesian inference to estimate uncertainty around average decoding performance (error bars in Figure   2B). We assumed that a participants AUC value (subscript s below) at time point t were samples from a T distribution and placed weakly informative priors on all parameters of this distribution. We :~Ν 0.5, 1 , :~0 , 5 , We used the NUTS sampler and two MCMC chains with 3000 iterations. We checked convergence visually and by ensuring that was below 1.05. Uncertainty estimates in Figure 2B display the 95% highest density interval for µ t .
We also compared decoding values in IPS/PostCeS after accounting for linear relationships with activity in M1-hand. We repeated decoding in IPS/PostCeS by using single trial power values across all frequencies. But this time we first predicted IPS/PostCeS power values from their corresponding values in M1-hand using linear regression for each frequency separately. We subtracted this prediction from IPS/PostCeS power values and repeated the decoding procedure as before.
. CC-BY 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.01.929893 doi: bioRxiv preprint

Decoding of contrast samples or accumulated contrast
We used a regression approach to assess whether source reconstructed spectral activity tracked individual contrast samples. For each region of interest, time point and subject we extracted single trial power values for all frequencies within the range from 1-145 Hz) and all ROIs (average across hemispheres and vertices) and used these to predict the contrast of the current sample, or the accumulated contrast (i.e. running sum of contrast samples up to here). Features were z-scored based on training data only and the same transformation was applied to test data before evaluating prediction performance. We used ridge regression to predict current sample contrast or accumulated contrast values from power values across all frequencies. The analysis yielded the linear combination of frequency-specific power values that maximized the correlation between predictors and response variable (current sample contrast or accumulated contrast). The strength of L2 regularization applied by ridge regression was governed by a single parameter which we optimized in a nested crossvalidation (possible values were 0.1, 1, 10). We evaluated prediction performance by computing the slope between predicted contrast and actual contrast, such that a value of, for example, 0.1 indicated that one unit of contrast change yielded a predicted contrast change of 10%. Since contrast was bounded between 0 and 1, the useful range for this measure was also between 0 and 1, with 1 being the best possible score. To decode the running mean of all contrast samples (accumulated contrast), we averaged the true contrast of all preceding and the current sample, assuming perfect evidence accumulation for simplicity, and predicted this value from the same data. We used 10-fold crossvalidation in all cases (see above for more details).