Introduction

Acquisition of a new motor skill entails an adaptive process of learning a mapping between motor commands and desired sensory outcomes, which serves as sensory feedback, including visual and proprioceptive feedback. Among the two types of feedback, visual feedback is a critical component of visuomotor learning, which guides motor planning to achieve a goal of a motor task, encouraging faster and more accurate performance1. In motor adaptation tasks, for instance, the visual feedback can provide altered sensory outcomes from a perturbed environment, which are used to quantify performance error. Several studies have identified a cortico-cerebellar network as the neural substrate of the performance error, a learning signal in motor adaptation2,3,4.

Previous studies have established distinctive neural substrates for different types of motor learning, a cortico-cerebellar network for motor adaptation and a cortico-striatal network for learning a new motor skill5,6,7,8,9,10. However, they employed relatively simple tasks, such as reaching tasks and motor sequence tasks, without considering the specific roles of visual feedback in motor learning. Visual feedback would play a critical role in the initial exploration stage of more complicated de novo skill acquisition, which often requires learning an arbitrary mapping between actions and their sensory outcomes. The amount of information provided by visual feedback could affect the extent of exploration, a hallmark of reinforcement learning. However, little has been investigated about the role of visual feedback in de novo motor learning and its effects on underlying neural substrates.

Here, we present an fMRI experiment in which participants learned a complicated continuous de novo motor skill introduced in our previous study11. Particularly, participants learned to control an on-screen cursor through finger movement while learning an arbitrary mapping between finger and cursor movements. We examined how the amount of information visual feedback provides influences de novo motor skill learning. Specifically, participants could see online cursor movement in a continuous feedback condition but not in a binary one. When a target was hit by a cursor for both conditions, participants could see the target turned red. Earlier studies similarly manipulated visual feedback to distinguish distinct types of learning9,10,12 and neural substrates involved in internally and externally driven movements13,14,15. However, these studies employed a simple motor control task without learning visuomotor mappings and focused on activity patterns in sensorimotor cortices. In our study, using a complicated continuous de novo motor task, we addressed how the extent of visual feedback modulated motor learning and associated corticostriatal activity. With continuous cursor feedback, participants would acquire internal models: hand-to-cursor forward mapping and cursor-to-hand inverse mapping16. However, given the binary feedback only, they would learn action values directly from trial and error17. Thus, we hypothesize that continuous feedback and binary feedback are respectively related to model-based and model-free reinforcement learning with distinct corticostriatal activity patterns17,18,19,20,21.

To identify the distinct corticostriatal activity patterns, we employed a GLM analysis with parametric regressors encoding time-varying learning performance provided by visual feedback11,22. Then, we assessed the selectivity of the striatal activity correlated with performance and further analyzed distinct activity patterns in the striatal subregions. Finally, we analyzed the dataset from a separate experiment to differentiate between striatal activity related to goal-directed motor control and those linked to random motor control. In this experiment, participants executed a simple motor control task without visual feedback, eliminating the learning aspect.

Materials and methods

Participants

The study included twenty-six healthy young volunteers who were right-handed according to the Edinburgh Handedness Inventory23 and had no history of neurological or psychiatric issues. All participants had normal or corrected-to-normal vision. Twenty-four participants (ten males, fourteen females; mean age = 24.9 \(\pm\) 4.7 years, age range = 18–35 years) completed all fMRI task sessions. Two participants who claimed severe fatigue during the experiment were excluded from further analysis. Written informed consent was provided by all participants. The study protocol was approved by the Institutional Review Board of Sungkyunkwan University, Suwon, Republic of Korea (IRB No. 2018–05-003–032). All the research methods were performed in accordance with the Declaration of Helsinki. Participants underwent two scanning sessions in a 3 T fMRI scanner, which took 1.5 h in total.

Localizer session

A separate experiment was conducted to identify the region related to finger movement. When the visual cue "Move" was displayed, participants equipped with an MR-compatible data glove (5DT Data Glove 14 Ultra) were directed to perform natural-speed movements with their right fingers, ceasing upon presentation of the "Stop" cue. Each "Move" or "Stop" phase persisted for 48 s, separated by 2 s intervals, and a total of six sets of "Move" and "Stop" conditions were executed, amounting to approximately 600 s in total. Then, we recalibrated a mapping matrix \(\mathbf{A}\) and an offset \({\mathbf{r}}_{0}\) using the finger-movement data from the last two “Move” blocks. Additionally, we ensured that all 25 grid cells on the \(5\times 5\) grid were reachable by finger movements (Fig. 1A).

Figure 1
figure 1

Experiment design and behavioral performance of all participants. (A) In the localizer session, participants move or stop their right fingers freely in response to the message displayed on the screen. The first principal components were calculated from the random finger movement by PCA and used as a finger-to-cursor movement mapping in the main session. (B) In the main session, a target appeared as a gray cell in the same order for each block, and participants were instructed to reach a target by moving their fingers. A target turned red when reached by a cursor, which was visible in odd-numbered blocks (cursor-on condition) but not in the even-numbered blocks. Target locations were altered in the test condition where a cursor was always visible. (C) Block-by-block group performance. Participants improved performance across alternating "cursor-on" (blue) and "cursor-off" (green) learning blocks and continued in the test blocks (yellow). Error bars indicate SEM.

Main session

We designed a task-based fMRI experiment comprising six runs using the same equipment as our previous study11. The mapping between hand postures and cursor positions was defined below.

$$\left[\begin{array}{c}x\\ y\end{array}\right]=\left[\begin{array}{ccccc}{a}_{x,1}& {a}_{x,2}& {a}_{x,3}& \cdots & {a}_{x,14}\\ {a}_{y,1}& {a}_{y, 2}& {a}_{y,3}& \cdots & {a}_{y,14}\end{array}\right]\times {\left[\begin{array}{ccccc}{s}_{1}& {s}_{2}& {s}_{3}& \cdots & {s}_{14}\end{array}\right]}^{T}+\left[\begin{array}{c}{x}_{0}\\ {y}_{0}\end{array}\right]$$

where \({s}_{k}(k=\mathrm{1,2},\dots ,14)\) indicates each of the 14 sensor inputs from the data glove, and \(x\) and \(y\) indicate the horizontal and vertical position of the cursor. In essence, the participants are required to acquire proficiency in executing hand gestures, enabling the displacement of the cursor to the desired coordinates (x, y) throughout the task's execution. The above equation can be rewritten as \(\mathbf{r}=\mathbf{A}\mathbf{s}+{\mathbf{r}}_{0}\), where the mapping matrix \(\mathbf{A}\) and the offset \({\mathbf{r}}_{0}\) were determined from the localizer session. Specifically, the mapping matrix's first and second rows were determined as the first two principal components of the covariance matrix calculated from the last three blocks of movement data ("Move" condition) of the localizer session where participants made random finger movements.

The first part of the main session consisted of four runs, each consisting of 144 trials from the current location to the target (twelve blocks of 12 trials). For 4 s during each trial, a gray grid cell with a yellow crosshair in its center appeared in one of the four target cells of a \(5\times 5\) grid, which was shown to participants (Fig. 1B). Overall, the duration of each run is 576 s (144 trials per run \(\times\) 4 s per trial) without breaks between trials. For counterbalance, target cells were assigned in a triangular configuration for half of the participants or inverted triangular for the other half for the first main session. Given that the cell number is defined as \(k=5i+j-5\), where \(i\) is a row index, \(j\) is a column index, and \(i, j\in {\mathbb{N}}\)), the target sequence in each block was ordered as cells 13–3-25–21-13–25-3–21-25–13-21–3 (triangle: sequence 1) or 13–23-5–1-13–5-23–1-5–13-1–23 (inverted triangle: sequence 2). This sequence was repeated for all twelve blocks during each run (144 trials) (Fig. 1B).

The cursor position was continuously represented by a white crosshair in each run's odd-numbered blocks (blue: On), while it was hidden in each run's even-numbered blocks (green: Off) (Fig. 1C). Meanwhile, when the cursor reached the target cell regardless of its visibility, the target cell changed to red (Fig. 1B). Once the target was hit, participants were instructed to maintain a static position to remain in roughly the same location. The proportion of time during which the target turned red was measured as a trial-by-trial success rate. Thus, the goal of the task was to place the cursor on the target grid as quickly and precisely as possible and keep it there. Moreover, participants were instructed to move the cursor between targets as straight as possible. In the second part of a main session consisting of two runs, a target sequence was altered to the other sequence (sequence 1 or 2) (Fig. 1B). The cursor position was consistently provided during these two runs (Test condition in Fig. 1C).

fMRI data acquisition

We collected fMRI data using a 3-T Siemens Magnetom Prisma scanner with a 64-channel head coil. Functional images were acquired utilizing an echo planar imaging (EPI) sequence with the following parameters: 300 volumes (310 volumes for localizer fMRI); repetition time (TR) = 2,000 ms; echo time (TE) = 35.0 ms; flip angle (FA) = 90°; field of view (FOV) = 200 mm; matrix, 101 X 113 X 91 voxels with a slice sickness of 2.0 mm and an in-plane resolution of 2.0 X 2.0 mm2 ; 72 axial slices. For anatomical reference, a T1-weighted anatomical scan of the entire brain was performed using a magnetization-prepared rapid acquisition with gradient echo MPRAGE sequence with the following parameters: TR = 2,300 ms; TE = 2.28 ms; FA = 8°; FOV = 256 mm; matrix = 204 X 262 X 260 voxels; 192 axial slices; and a slice thickness of 1.00 mm and an in-plane resolution of 1.0 X 1.0 mm2. Prior to the functional scans, two EPI images were acquired with opposite-phase encoding directions (posterior-to-anterior and anterior-to-posterior) for subsequent distortion correction.

Behavioral data analysis

MATLAB (version R2022a, MathWorks), Python (version 3.8.8), and Jupyter notebook (version 3.9.6) were utilized for all statistical analysis and data visualization. We used a two-sided paired t-test between different experimental conditions. A trial-by-trial success rate was computed as a proportion of the time the cursor was on the target, i.e., targets turned on in red. As each task block included all 12 possible paths between four target locations and the same target sequence was repeated, we calculated an averaged success rate in each block, as shown in Fig. 1C.

fMRI data analysis

AFNI (Analysis of Functional NeuroImage, Version 21.2.04, NIH; https://afni.nimh.nih.gov), MATLAB (version R2022a, MathWorks), Python (version 3.8.8), and Jupyter notebook (version 3.9.6) were utilized for all statistical analysis and data visualization.

Preprocessing

Anatomical and functional imaging data were preprocessed following a standard procedure guided by AFNI's afni_proc.py (Version 21.2.04). Initially, time series outliers from all voxels in the localizer and the functional images were attenuated by AFNI's 3dDespike function. Then, they were adjusted for slice-time acquisition (3dTshift with the quintic interpolation) and for six rigid body motions (3dvolreg). The images were spatially registered to the anatomical T1 volume in native space by AFNI's align_epi_anat.py script and translated non-linearly into the Montreal Neurological Institute (MNI) 152 template by auto_warp.py. Due to motion, an EPI volume extent mask resulting from the volume registration was applied to omit voxels without valid data at every TR. Next, all images were spatially smoothed using a Gaussian kernel with a \(4\times 4\times 4\)-mm full-width at half-maximum (3dmerge), and the time series were scaled to have a mean of 100 and a range between 0 and 200 (3dTstat and 3dcalc).

Whole-brain voxel-wise GLM analysis

To recognize regions modulating participants’ performance, “success” (i.e., reaching a target), we applied a parametric regressor (AFNI's 3Ddeconvolve with “stim_times_AM2” option) for a subject-level general linear model (GLM) analysis using six fMRI runs from the main task session22. We first calculated the time-varying success rate for each trial of 4 s-long time bins, defined as the proportion of time the cursor remained on the target grid cell and turned red. Then, we generated a pulse regressor whose amplitude is equal to the success rate at the offset of each trial and convolved it with a two-gamma function modeling a canonical hemodynamic response function (HRF). Then, we multiplied it with binary indicators coding for each feedback condition and generated two regressors of interest in the GLM analysis. Mathematically, a parametric regressor for each feedback condition \(r(t)\) was modeled as \(r(t)=\sum_{k=1}^{K}h(t-\tau )({a}_{k}-\overline{a })\) where K is the number of trials in the entire experiment, \(h(t)\) is a canonical hemodynamic response with a time delay \(\tau\), \({a}_{k}\) is the parameter (i.e., time duration of the red signal, performance) in the kth trial and \(\overline{a }\) is the average across the trials with the same condition. Therefore, there are two regressors for each of the "cursor-on" and "cursor-off" conditions, one for the parametric modulation convolved with \(h(t)\) and the other for the average activity level, which is a boxcar regressor encoding a 4-s trial during a block of 12 trials convolved with \(h(t)\) (Figure S1). Importantly, the boxcar regressors controlled the confounding effect due to different amounts of finger movement, which is larger for the "cursor-off" condition than the "cursor-on" condition. That is, we can directly compare the coefficients of parametric regressors between two feedback conditions and interpret the result as the effect of visual feedback (refer to Adkins and Lee22 for a similar method). For the result shown in Fig. 3, we used a single parametric regressor encoding all the trials instead of using separate parametric regressors to demonstrate the overall sensitivity of the striatal activity. To localize regions related to finger movement, we first applied boxcar regressors encoding 48 s long "Move" and "Stop" conditions for a subject-level GLM analysis. Then, we calculated a whole-brain contrast map between the beta maps for the conditions, \({\beta }_{move}>{\beta }_{stop}\). For each of the fMRI runs, in both GLM analyses, five regressors modeling up to quartic (i.e., fourth-order) polynomial trends of the fMRI data and six regressors estimating rigid-body head motion were incorporated as nuisance regressors. The analysis excluded the volumes related to excessive head motion (defined as a displacement larger than 0.4 mm). After the voxel-wise whole-brain GLM analyses were performed for each subject, the group-level t-test was performed using AFNI's 3dttest +  + with the "-Clustsim" option, which is a conservative non-parametric Monte Carlo simulation method to determine cluster-corrected significance level. The voxel-wise threshold was set to \(p \, < \text{ } 0.001\), and the simulation determined 180 suprathreshold voxels as a cluster-wise corrected threshold \(p \, < \text{ } 0.01\) within the whole-brain group mask used in Table 1. The mask was defined as the intersection of all participants' whole-brain scans by using 3dMean with “mask_inter” option.

Table 1 Significant clusters positively modulating performance.

ROI analysis

To analyze activity in the putamen and caudate nucleus, we defined ROIs using the intersection of the Harvard–Oxford subcortical structural atlas used in FSL (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases) and the Talairach Daemon database used in AFNI. The caudate nucleus was further subdivided into the “head” and “body” regions based on the AFNI's atlas. For the vmPFC ROI, we used the Talairach Daemon database.

As suggested by the literature 24, the putamen ROIs were further divided into the anterior (\({\text{Y}}>0\)) and posterior (\({\text{Y}}<0\)) regions with a 1-voxel gap between them to reduce partial volume effects. We resampled both atlases to 2-mm-cubic voxels matched to the spatial resolution of acquired functional images. Finally, we defined the whole striatum combining the caudate nucleus and putamen from both atlases for ROC analysis.

For each ROI, a NiftiLabelsMasker function of the Nilearn Python module was used to extract the average \(\beta\) estimates from the parametric GLM analysis of the success rate. The extracted data were analyzed using a two-way repeated measures ANOVA with the region (anterior and posterior, left and right) for each scan session (main and localizer) as within-subject factors. Subsequently, its effect sizes were estimated utilizing partial \(\eta\)-squared values, and Greenhouse–Geisser correct were performed.

Striatum topography of task fMRI vs. anatomy

Using a parametric regressor, we assessed the spatial agreement of the anatomically defined striatum and the region identified as significant by the group-level GLM analysis. The voxel-wise \(\beta\) values for each participant were first converted to z-scores using 3dttest +  + with the "-toz" option, after which the voxel-wise threshold of the z-scores was applied to create an ROC curve. The sensitivity and specificity indicate the proportion of voxels within the anatomically defined striatum (i.e., true positive) and outside the striatum smaller (i.e., true negative) than a varying threshold of the z-score. The area under the ROC curve (AUC) ranging from 0 to 1 indicates the performance of a classifier. The AUC of a perfect classifier is equal to 1, and that of a random classifier is equal to 0.5. Although these procedures varied slightly between individuals compared to anatomical volumes, the domain's boundaries were clearly delineated at the group level.

Consent to publication

Declaration of generative AI and AI-assisted technologies in the writing process: During the preparation of this work, the authors used ChatGPT4.0 and Grammarly to edit the manuscript with grammar checks. After using these tools, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication.

Results

Behavioral data analysis

Twenty-four participants completed the experiment, which lasted about an hour. As described in Materials and Methods, they freely moved and stopped their right fingers following a message that appeared on a computer screen, "Move" and "Stop" (Fig. 1A). In the main session, they learned to reach an on-screen target that appeared in one of four corners with a cursor controlled by right fingers, the movement of which was recorded by a data glove (Fig. 1B). Participants performed the task with or without continuous online visual feedback of the cursor position while a target turned red when reached with a cursor in both feedback conditions. They gradually learned to reach a target across the alternating conditions (Fig. 1C).

The initial and final performances roughly matched those of our previous study with a similar experiment11. As expected, learning performance was much higher when the cursor movement was visible, suggesting that participants heavily relied on continuous visual feedback to perform the complicated motor task. A two-way repeated measures ANOVA found a significant main effect of the learning stage (Early, block 1–24, i.e., total 24 blocks; Late, block 25–48, i.e., total 24 blocks; \(F\left(\mathrm{1,23}\right)=70.98\), \(p<{10}^{-4}\)) and feedback conditions (cursor on vs. off, \(F\left(\mathrm{1,23}\right)=227.1\), \(p<{10}^{-4}\)), and their interaction \(F\left(\mathrm{1,23}\right)=101.8\), \(p<{10}^{-4})\). Although the effect of the learning stage was much less in the binary feedback condition, participants improved their performance without online cursor feedback (Early, block 2, 4, 6, …, 24, total 12 blocks; Late, 26, 28, 30, …,48, total 12 blocks; \(T\left(23\right)=4.37\), \(p<0.001\)), substantiating the contribution of proprioceptive feedback to learning as well as the transfer of learning from "cursor-on" condition. When the set of target locations suddenly altered to the other one (Fig. 1B) in the test blocks with available online cursor feedback, the performance did not change (blocks 37, 39, 41,…, 47, total six blocks; block 49–60, i.e., total 12 blocks; in Fig. 1C, \(T\left(23\right)=0.77\), \(p=0.45\)). This result suggests that the participants learned to map between finger and cursor movements rather than simply memorizing the hand postures corresponding to targets11.

Corticostriatal responses modulated by learning performance

We hypothesized that the red color indicating "success," to which a target changes when virtually reached in both feedback conditions, would intrinsically motivate participants to improve their performance without external monetary reward11. Thus, to identify brain regions related to this goal-directed motor control, we first computed the task performance as a proportion of time turning on red for each trial, every 4 s. Then, it was convolved with a canonical hemodynamic response function to construct a parametric regressor used in general linear model (GLM) analysis, which identified regions where performance modulates activities (Table 1, Figure S2). Due to the task instruction that participants should stop moving their fingers once a target was reached, the parametric regressor was negatively correlated with the extent of movement. Thus, we found various regions related to motor control, such as sensorimotor cortices, thalamus, parietal regions, and cerebellum. Here, we focused on the regions where performance positively modulated activities (Table 1).

We performed this analysis separately for each of the two feedback conditions and a test condition and then compared the results (Fig. 2). For both feedback conditions, we primarily found robust positive modulation in the bilateral striatum and early visual cortex (Fig. 2 and Table 1). The striatum is the only region where we identified clusters larger than 450 voxels at the highly stringent threshold of \(p<5.0\times {10}^{-5}\). This result motivated us to further analyze the GLM's sensitivity in delineating the anatomically defined striatum. For stepwise increasing z-thresholds [z(min) = -8, z(max) = 8, z(step) = 0.5], an ROC analysis found that the voxels in the striatum were highly selective for our GLM method compared to the rest of the brain regions, resulting in AUC (Area Under Curve) scores of 0.95 (cursor-on), 0.92 (cursor-off) (Fig. 2B). The more robust response in the striatum than in the visual cortices further supports the hypothesis that the visual feedback of the performance would convey intrinsic reward to participants. To demonstrate how brain activity relates to performance, we illustrated the correlation between fMRI signals in the striatal region and a parametric regressor that modulates performance (Figure S3).

Figure 2
figure 2

Brain regions where activity is modulated by performance in three learning conditions, "cursor-on" (blue), "cursor-off" (green), and "cursor-on (test)" conditions. (A) The results of whole-brain voxel-wise GLM analysis with parametric modulation were converted to z-scores, and clusters revealed significant positive modulation (voxel-wise p < 0.005 for a visualization purpose). (B) The receiver operating characteristic (ROC) curves for the three conditions show highly selective activity within an anatomically defined striatum compared to the whole brain. (C) Comparison of activity in the corticostriatal ROIs (vmPFC, caudate nucleus, and putamen) among the three conditions.

While the overall striatal responses are similar for both feedback conditions, there were prominent differences between the conditions. First, activity in the middle temporal region (V5) is negatively modulated by the performance only when online cursor feedback is available. Since the performance is lower with larger cursor movement, the performance-modulating regressor found the V5 region negative, which is sensitive to the visual motion of the cursor. Thus, this activity was not found for the condition without online cursor feedback (cursor-off condition). In contrast, the activity in the anterior insular cortex, an important region of the salience network25,26, is found only for the "cursor-off" condition. In the "cursor-off" condition, hitting a target was more difficult, and thus, the visual feedback provided as a "red signal" could garner increased attention, potentially facilitating the salience network. Likely for the same reason, activity in the insular cortex was negatively influenced during the test condition with reduced saliency, where the online cursor remained consistently accessible without interleaved "cursor-off" blocks. Alternatively, the pronounced insular activity in the "cursor-off" condition would be related to heightened internal attention to proprioception from hand postures while integrating it with external visual feedback27. The absence of insular activity in the "cursor-on" condition is also consistent with previous studies demonstrating that the insular activity was suppressed during processing of higher-order visual stimuli such as visual motion15,28,29,30.

Interestingly, there was a clear difference between the two feedback conditions in the ventromedial prefrontal cortex (vmPFC) (Fig. 2A). An ROI analysis revealed that the activity was significantly higher in the "cursor-on" condition than in the "cursor-off" condition (\(T\left(23\right)=3.04\), \(p<0.01\)), without significant activity for the "cursor-off" condition (\(T\left(23\right)=1.33\), \(p=0.20\)) (Fig. 2A,C ). In contrast, in the "cursor-off" condition, the caudate nucleus activity was more robust than in the "cursor-on" condition (\(T\left(23\right)=3.80\), \(p<0.001\)) (Fig. 2C). We also found a significant interaction of the feedback condition and subregions (caudate and putamen) (\(F\left(\mathrm{1,23}\right)=26.0\), \(p< {10}^{-4}\), \({\eta }_{p}^{2}=0.53\)). The caudate and putamen regions respectively exhibited more pronounced activity in the "cursor-off" (\(T\left(23\right)=2.52\), \(p=0.019\)) and in the "cursor-on" conditions (\(T\left(23\right)=3.71\), \(p=0.0015\)). However, there was no significant difference in the putamen between the feedback conditions (\(T\left(23\right)=0.16\), \(p=0.87\)). These dissociable corticostriatal responses would implicate distinct roles of the vmPFC and the striatum in learning action values from feedback18,19,20,31,32.

The corticostriatal responses for the test condition in which the cursor feedback was similar to the "cursor-on" condition with highly selective response in the striatum with an AUC score of 0.90 (Fig. 2B) and negative modulation in the V5 region due to online cursor feedback (Fig. 2A). The result was unsurprising because the online cursor feedback was also available in the test condition. However, an ROI analysis revealed that the overall level of the striatal activity decreased compared to the other two feedback conditions only in the caudate nucleus (vs. cursor-on: \(T\left(23\right)=2.88\), \(p<0.01\); vs. cursor-off: \(T\left(23\right)=5.22\), \(p<{10}^{-4}\)) contributing to lowered AUC score (Fig. 2B, C). The activity in the vmPFC for the test condition was comparable to the "cursor-on" condition (\(T\left(23\right)=0.017\), \(p>0.5)\) and higher than the "cursor-off" condition (\(T\left(23\right)=2.12\), \(p<0.05)\)(Fig. 2C), although the cluster in the vmPFC did not survive after cluster-level correction (Table 1). The decreased activity in the caudate nucleus is potentially due to the learning effect, as shown in our previous study11. Although the target locations were altered in the test condition, participants learned the identical mapping throughout the session. The comparable activity in the vmPFC to the "cursor-on" condition is also consistent with the previous study (Fig. 2C). However, the extent of the vmPFC response is relatively limited for the test condition (Table 1).

Corticostriatal responses modulated by random finger movement

As we noted previously, the performance of our task is negatively related to the amount of finger movement due to the goal of the task staying longer in the target. Thus, instead of controlling motor control components in a GLM analysis, we analyzed the dataset from a separate experiment where participants randomly moved their fingers without any task-related visual feedback. Through this experiment, we aimed to confirm that the response patterns associated with performance (Fig. 2) were unique compared to those connected to motor control. For this separate dataset, we employed a standard GLM analysis to contrast the "Move" and "Stop" conditions (see Materials and Methods for more details). This simple contrast roughly aligned with the GLM analysis using a parametric regressor for the amount of finger movement since participants moved their fingers constantly during “Move” and stopped during "Stop". The analysis identified well-known regions related to motor function. These include motor and somatosensory cortices (M1/S1), supplementary motor area (SMA), left thalamus, inferior parietal cortex, insula, left posterior putamen, and cerebellum lobules 6 and 8 (Table 1). We also confirmed the typical laterality of the motor system, that is, contralaterally dominant in M1/S1, thalamus, and putamen and ipsilaterally dominant in the cerebellum.

In contrast to the performance-modulated response during the goal-directed finger movement discussed earlier, the striatal activity during random finger movement was less pronounced, particularly due to the lack of activity in the caudate nucleus (Fig. 3A). Thus, the overall response to random finger movement was also higher in the putamen than in the caudate nucleus (left: \(T\left(23\right)=6.49\), \(p\text{ } < { 10}^{-5},\) right: \(T\left(23\right)=3.07\),\(p \, < \text{ } 0.01\)). A subsequent ROI analysis for putamen subregions (Fig. 3B) discovered significant interaction of anteriority and laterality with distinct activity in the left posterior putamen only (anteriority, \(F\left(\mathrm{1,23}\right)=18.70\), \(p<0.001\), \({\eta }_{p}^{2}=0.45\); laterality, \(F\left(\mathrm{1,23}\right)=35.29\), \(p<1\times {10}^{-5}\), \({\eta }_{p}^{2}=0.61\); interaction, \(F\left(\mathrm{1,23}\right)=19.28\), \(p<0.001\), \({\eta }_{p}^{2}=0.46\)). The activity in the posterior putamen is contrasted with the performance-modulated activity, which is more robust in the anterior putamen (\(F\left(\mathrm{1,23}\right)=36.19\), \(p \, < \text{ } {10}^{-5}\), \({\eta }_{p}^{2}=0.61\)) (Fig. 3C) regardless of laterality and feedback condition (combined "cursor-on" and "cursor-off" in Fig. 3C and separated in Figure S4), which is also the case for the anterior versus posterior caudate nucleus (\(F\left(\mathrm{1,23}\right)=11.65\), \(p \, < \text{ } 0.01\), \({\eta }_{p}^{2}=0.34).\) The result is consistent with previous studies reporting the role of contralateral posterior putamen in a simple motor execution33,34,35 and the positive performance-related activity in the anterior striatum during the early stage of learning11. Similar to the putamen activity, the caudate activity that modulates performance was also found to be more robust in the anterior region than in the posterior region (\(T\left(23\right)=3.27\),\(p \, < \text{ } 0.01\)) regardless of laterality (We provided a full result of the striatal subregion ROI analysis in Figure S4.

Figure 3
figure 3

(A) Comparison of voxel-wise striatal responses related to performance during goal-directed motor control (right) and those related to random motor control (left). (B) During random motor control, the left posterior putamen showed contralateral activation. (C) Significant activations of the bilateral anterior putamen were observed for the performance-modulated response during goal-directed motor control. Error bars indicate SEM. *p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001 (uncorrected p).

Discussion

We designed a visuomotor learning task with two interleaved visual feedback, continuous and binary feedback, to understand the role of visual feedback in motor skill learning. Our GLM analysis with a parametric regressor modulating participants' performance provided by both visual feedback revealed robust activity in the striatum with strikingly high sensitivity and specificity. The entire region of the anatomically defined striatum was highly responsive to visual feedback.

Random finger movement without visual feedback did not elicit the global response in the striatum but rather the local response in the contralateral (left) posterior putamen36,37, which is predominantly interconnected with the primary motor cortex38,39. This result is also consistent with previous studies reporting a gradual shift of striatal activation from the anterior to the posterior region7,11. In other words, the anterior region is initially involved in the goal-directed movement with reward feedback, whereas the posterior region is associated with more habitual movement independent of reward feedback in the late stage of learning. Due to limited practice in a scanner lasting less than an hour, the response to the visual feedback was higher in the anterior striatum. In contrast, the response to random finger movement, which is not goal-directed but habitual in the absence of feedback, was more significant in the posterior striatum, specifically in the contralateral putamen. This observation, in conjunction with our currently obtained results from different cortical motor areas, supports the putamen's predominant role as a motor hub within the striatum, highlighting its contrasting relationship with the caudate nucleus40,41,42.

We found a double dissociation of corticostriatal activity between two feedback conditions. In the presence of online cursor feedback, the vmPFC exhibited more significant activity related to performance, while the caudate nucleus demonstrated reduced activity compared to the condition lacking online feedback. We speculate that the distinction is closely related to model-based versus model-free reinforcement learning17,43,44. Specifically, a forward model, which is a finger-to-cursor mapping, is learned only in the "cursor-on" condition while it is retrieved in the "cursor-off" condition. In the reinforcement learning framework, the forward model provides how an agent's state (i.e., hand posture and cursor position) is transitioned, a state-transition rule. Our results confirmed that participants did not merely memorize hand postures for targets; instead, they learned the mapping between finger and cursor movements, supporting our hypothesis regarding model-based reinforcement learning.

Thus, the vmPFC activity in the "cursor-on" condition would be related to the state-action value predicted from the forward model in model-based reinforcement learning as supported by previous studies using decision-making tasks18,19,20,21,31,32. However, when the online cursor feedback is unavailable, participants should rely more on trial-and-error than an uncertain forward model without online feedback and thus simply reinforce actions associated with larger rewards in a model-free manner. The activity in the striatum would contribute to model-free learning17,43.

Regarding the caudate nucleus activity, the lower prediction of success in the "cursor-off" condition, and thus, larger reward prediction errors when hitting a target would be related to the higher activity than the "cursor-on" condition. Indeed, previous research has shown that the caudate nucleus has a larger role in predicting reward errors, similar to the ventral striatum's function. Conversely, the putamen is predominantly engaged in predicting rewards through learning stimulus-action-reward associations in a more certain condition, leading to a more habitual behavior45,46. Thus, in the "cursor-off" condition with larger uncertainty, the caudate nucleus would be more sensitive to reward prediction errors. This interpretation aligns with the decreased caudate activity in the test condition, resulting from learning with reduced uncertainty. To test this idea, it is necessary to develop computational models to predict action values and associated reward prediction errors. However, creating such models for the complex motor tasks presented in this study poses a significant challenge16.

The highly sensitive striatal response appears to be related to intrinsic reward, motivating people to learn complicated motor skills without a monetary incentive. The intrinsic reward for good performance is sufficient to elicit striatal activity47,48,49, while specific subregions of the striatum are dissociable depending on the nature of the reward, extrinsic versus intrinsic49. The extrinsic and intrinsic rewards have dissociable effects on motor learning. The former influences early rapid improvements in speed and accuracy, whereas the latter influences training-based enhancement50. However, the extrinsic monetary reward could undermine the intrinsic reward processing, lowering motivation or performance51,52. It would be fascinating to determine if the extrinsic reward inhibits or enhances the effect of intrinsic reward on motor learning, including long-term retention of motor memory.

There are several limitations in the current study, primarily due to the experiment design and the motor task. First, our main GLM analysis using a parametric regressor did not completely remove the effects of motor components, such as the kinematics of finger movements on the corticostriatal activity, since participants were instructed to stop moving fingers when a target is reached with a red signal. Due to this instruction, the extent of finger movement is negatively correlated with performance, which we used for a parametric regressor. Consequently, it was difficult to completely remove the confounding motor control effects in the main analysis, although we used a separate boxcar regressor for each feedback condition in addition to parametric regressors (see Materials and Methods and Figure S1). Moreover, it is also hard to reject an alternative hypothesis that the red signal might play a role as an instruction instead of performance feedback. Thus, future studies should test another condition with online cursor feedback but without the red signal to fully understand the respective role of the two components of the visual feedback, the cursor and the red signal. Despite these limitations, our main results of corticostriatal response patterns are more likely related to goal-directed learning based on visual feedback because we found distinct corticostriatal activities in a separate simple motor control task. This result is consistent with our previous study using a similar experiment design11.

Lastly, our findings suggest a crucial role of immediate performance feedback in eliciting striatal responses. If this association extends to dopamine release, it could potentially aid in restoring the compromised striatal dopamine system in PD patients. Furthermore, it would be even more advantageous for rehabilitation to utilize extrinsic reward or augmented feedback to the extent that it does not undermine the effect of performance feedback53. Previous fMRI studies in other cognitive domains have shown that extrinsic51 and intrinsic reward54 strengthen long-term memory via dopamine release. Designing visual feedback directly related to performance would be essential to improve the long-term retention of motor memory, maximizing the treatment effects for PD patients.