Introduction

Early neuroimaging studies on human language production and comprehension focus primarily on the classical language network in the left inferior frontal and superior temporal cortex, such as Wernicke's and Broca's areas1. However, growing evidence from patient and neuroimaging studies have shown that language processing activates a much more complex and widely distributed network2,3,4,5,6. Increasingly research has reported a semantic category-specificity effect, such that specific semantic categories such as objects, relations and actions might be represented in particular brain regions3,7,8,9,10,11. In particular, the processing of action verbs has been found to be associated with the frontal and motor cortex12,13,14,15,16,17,18.

Studying how action verbs are represented in the brain is particularly interesting among various specific semantic categories, because it connects language processing to another debating theory in cognition, that is, embodied cognition7,8. The embodied view of language processing proposes that the internal representation of the action, which is related to the motor system, plays a key role in language comprehension19. In line with this proposal, a rich body of literature has linked the motor and language systems and shown that motor simulation is an automatic and necessary component of meaning representation3,20. A crucial case is the somatotopic representation of action verbs21,22,23. For example, Hauk and colleagues found that reading action verbs denoting leg, arm and face actions activated the corresponding motor and premotor cortex areas in a somatotopic pattern21. Similar somatotopic patterns were also found when participants listening to the action-related sentences24, reading idioms with action-related words25, reading literal or metaphoric action sentences26 and reading idioms27,28,29,30,31. In addition, imaging actions and listening action sounds demonstrated a body-part specific somatotopic representation32,33.

Moreover, previous EEG and MEG studies also confirmed the dedication of motor cortex in verbs processing30,34,35.For example, the N400 effect for verbs lateralized to the left motor brain areas in the early stage and the time-frequency analysis reflected desynchronization in the mu- and beta-frequency bands which was localized to motor and premotor areas30. A TMS study also found enhanced M1 activity at the hand region for the hand-related action verbs at 500 ms post-stimulus presentation36. Another TMS study revealed a task-dependent (i.e. in the motor imagery task only) facilitation effect of response times for hand action verbs12. This somatotopic pattern could even be induced by viewing the motion or motor imagery of different body parts37,38. Such an overlap of somatotopic patterns in the premotor and motor cortex between action verbs processing and relevant actions thus provides strong evidence for the embodied view of language processing.

However, recently this somatotopic representation of action verbs was challenged by some researchers. Several studies did not find a somatotopic distribution of action verbs or sentences in the motor cortex39,40,41,42. For instance, Zubicaray et al.43 investigated the association between brain activity of three effector-related action words (hand, foot and mouth) in two tasks, reading and action imitation. Three exclusive effector-related ROIs were obtained from the overlapping activity for observation and execution of actions for each effector. The results did not find effector-specific activation in these ROIs but reported a more general action word processing within the Broca area. Aziz-Zadeh et al.42 examined the pattern of brain activation during reading metaphorical sentences contain action words, but failed to reveal any significant motor activation for metaphorical sentences however, see44. Another study done by Postle et al41 examined the representation of action verbs in cytoarchitectonically defined primary and premotor cortex but did not find somatotopic pattern for different effectors, either. Since the tasks were to execute and observe simple, intransitive movements, Postle et al41 proposed that the lack of imageability may result in no specific activation for effectors. This proposal was supported by a dissociation study, which showed that somatotopic representation can be found only when participants actively imagined performing the actions represented by the verbs, but not when they made lexical-decisions about the verbs45. (However, note that contradictory results in the lexical-decision task were also found, e.g.46,47.) Another study further showed that action verbs in a subordinate level (e.g., to wipe) that rated higher in imageability also elicited greater activation in the motor program areas than both basic level (e.g., to clean) and abstract level (e.g., to judge) verbs48. Therefore, these results may indicate that the somatotopic distribution is specific for motor imagery, not for the semantic processing of verbs.

One of the key points in this debate is whether the somatotopic representation of action verbs in the motor cortex reflects accessing word meaning or deliberating upon the mental imagery of the corresponding action49,50. The relationship between semantic processing of actions and mental imagery is still debated in the embodied language theory. Although some researchers suggested that the neural correlates of action semantics and mental imagery should be identical, or at least overlapping3,51, other researchers proposed that there are dissociable brain regions between them in the motor and premotor cortex45. So far contradictory evidence exists for both views7. For example, TMS studies found a task-dependent facilitation effect for motor imagery when stimulating the motor area12. In addition, lesion studies showed that damage in the precentral/postcentral regions impairs motor imagery selectively52. However, EEG or MEG studies also found that motor activation during processing of action words occurs very early27,28,29,30,31,53. For example, in an EEG study comparing verbs referring to actions performed with different body parts, significant topographical differences in brain activity elicited by verb types were found starting ~250 ms after word onset54. These results thus challenge the motor imagery view since these early EEG and MEG differences could hardly be attributed to motor imagery processing.

Logographic scripts, such as Chinese, provide a unique way for dissociating the role of semantic and imageability processing in the somatotopic representation of action verbs. More than 80% of Chinese characters are compound characters that are formed by a phonetic radical indicating the pronunciation and a semantic radial indicating the meaning55,56. Studies have found that there are no fundamental differences between lexical processing of whole characters and sublexical processing of phonetic and semantic radicals in reading Chinese57,58. For example, using semantic priming paradigm, Zhou and Marslen-Wilson found that Chinese readers automatically decomposed the embedded phonetic radicals from the whole complex characters and mapped them onto their own phonological and semantic representations. They thus proposed that the sublexical processing of Chinese phonetic radicals is similar to the lexical processing of whole Chinese characters57.

An interesting phenomenon of Chinese action verbs is that many of them include a semantic radical that indicate the body part performing the action. For example, the verb “hit” da3 contains a radical “” denoting that this action is associated with the effector of “hand/arm” shou3 . Similarly, “run” pao3 contains a radical “”denoting the effector of “foot/leg”zu2 and “eat”chi1 contains a radical “” denoting the effector of “mouth” kou3 (Fig. 1). Such linguistic effector cues in Chinese action verbs can be analogous to adding a prefix of arm-, leg- and mouth- to the corresponding action verbs in English, e.g., arm-grab, leg-run or mouth-eat. However, there are still slight differences between the effector cues of arm, leg and mouth, such that the arm radical “”is not an integral single word in Chinese and thus is unpronounceable, whereas both the leg radical “” and mouth radical “” are variations of nouns “foot/leg”zu2 and “mouth” kou3 and thus are pronounceable with the same pronunciation as the effector nouns. To the best of our knowledge, no other modern languages and scripts provide such kinds of explicit linguistic cues to the action effectors in their action verbs, which is even more impressive given that the semantic radical in Chinese, which comes from early Chinese hieroglyphics, has been used for more than one thousand years since the Shang Oracle Bone scripts.

Figure 1
figure 1

Materials and procedure.

(A) The Chinese effector nouns for arm, leg and mouth was shown in row I. Corresponding Chinese action verbs with and without effector cues (the semantic radicals were marked as gray) were shown in row II and III, respectively. (B) Each stimuli was presented for 2500 ms in a passive reading task, jittered by an interval from 500 ms to 6500 ms.

One important characteristic of such linguistic effector cues in Chinese verbs is that it significantly increases the imageability of those verbs that possess them59. As a result, Chinese verbs have been found to have higher imageability than English verbs, whereas Chinese and English nouns do not differ in imageability60. Chinese action verbs thus provide us a unique opportunity to investigate the role of semantic and imageability processing in the somatotopic representation of action verbs. In addition, investigating the somatotopic representation of Chinese action verbs also helps answer the question of whether the somatotopic representation of action verbs is universal, that is, whether it exists consistently across different languages and scripts.

How could such semantic effector cues in Chinese verbs influence the somatotopic representation of action verbs? Unfortunately, although a large body of literature has compared the language processing in English and Chinese on orthographic processing61,62, semantic processing63,64,65,66,67,68 and phonological processing61,69,70, very few have investigated action verbs processing in Chinese and the results are inconsistent. Early neuroimaging studies on Chinese nouns and verbs have shown that Chinese nouns and verbs activate a wide range of overlapping brain areas, including the bilateral inferior frontal, occipital, the left middle and inferior temporal cortex regions17,64,71,72. However, the latest study showed that Chinese verbs elicit class-specific activation in the left lateral temporal and inferior frontal regions73. Other recent studies focused on the semantic category-specificity effect of Chinese verbs revealed similar effects as English verbs. For example, a study comparing Chinese tool-use action verbs and arm action verbs yielded stronger activation for the arm action verbs mainly in tone processing areas74. In another study using Chinese verbs denoting biological motion (e.g., walk) and mechanical motion (e.g., rotate) as materials, Lin and colleagues found that the posterior superior temporal sulcus showed preferences for biological-motion verbs, but the posterior middle temporal gyrus showed no sensitivity to mechanical motion verbs75. However, none of these studies have looked into the linguistic effector cues in Chinese verbs. Studies comparing Chinese and English nouns have found that semantic categorical cues in Chinese object nouns can facilitate the categorization processing for Chinese speakers, reflected by a diminished N300 and N400 ERP components for the typicality effect76. Thus, we might expect a different pattern of somatotopic representation in Chinese effect-cued verbs than English verbs.

In the current study, we aim to investigate the somatotopic representation in Chinese verbs. Specifically, we will focus on the comparison between Chinese action verbs that contain linguistic effector cues and those do not. We picked effector-cued and uncued verbs related to arm, leg and mouth and used functional magnetic resonance imaging (fMRI) to examine the activation during a passive reading task21. After controlling other factors such as word frequency and association to corresponding body-parts21, the comparison between verbs with effector cues and thus having high imageability and verbs without effector cues and thus having low imageability can help us dissociate the roles of semantic and imageability processing in the somatotopic representation of action verbs. In addition, we could expect that Chinese verbs without effector cues will show a similar somatotopic representation pattern as those verbs studies with alphabetic scripts, such as English77, which could help us examine the universal existence of samototopic representation and the embodied theory of language processing.

Results

Behavioral results

We first performed a 2 (Effector cues: Cued vs. Uncued) × 3 (Word categories: Arm words vs. Leg words vs. Mouth words) × 3 (Body parts: Arm vs. Leg vs. Mouth) ANOVA to the association rating. The results showed only a significant Word categories × Body part association interaction (F4,72 = 242.28, P < 0.001), which indicates that the Word categories were associated with corresponding Body parts respectively ( Fig. 2A ). No Effector cues × Word categories interaction was found, F4,72 = 1.097, P = 0.363. We also performed a 2 (Effector cues: Cued vs. Uncued) × 3 (Word categories: Arm vs. Leg vs. Mouth) ANOVA to the imageability rating. The results showed only a main effect of Effector cues, such that the average imageability rating scores of verbs with effector cues (M = 5.991, SD = 0.90) were significant higher than verbs without effector cues (M = 5.348, SD = 1.15), F1,18 = 25.727, P < 0.001 (Fig. 2B). These results are consistent with previous findings showing that semantic effector cues significantly increase the imageability of Chinese verbs60.

Figure 2
figure 2

Behavioral ratings and overall activation for the passive reading and motor localizer task (A) Mean association ratings for the six verb types showed that the arm-, leg- and mouth-related verbs were clearly distinct in meaning, with no significant difference between cued and uncued verbs.(B) Mean imageability ratings for the six verb types showed that cued verbs are rated higher than uncued verbs, P < 0.001. (C) Overall activation for finger, leg and tongue movements in the localizer task (column I) (voxelwise uncorrected P < 0.001, clusterwise corrected P < 0.05), as well as the cued (column II) and uncued (column III) arm-, leg- and mouth-related verbs in the passive reading task (voxelwise uncorrected P < 0.001, clusterwise corrected P < 0.05). Uncued action verbs demonstrated a somatotopic pattern in the motor and premotor cortex that is similar to the movement but cued verbs did not. Error bars indicate standard error of the mean. * P < 0.05 ** P < 0.01 *** P < 0.001.

fMRI general activation

The somatotopic patterns elicited by movements in the localizer task and six verb types are shown in Fig. 2C. The result revealed overlapping activation for all verbs in the language network, especially in the motor and premotor cortex of precentral and postcentral gyri. Generally, uncued verbs activated more widely spread brain regions in the frontal, parietal and temporal regions (Fig. 2C). Interestingly, a search of activation list (Table 1) showed that uncued leg verbs evoked more activation than cued leg verbs in the bilateral superior frontal and superior parietal gyri, as well as the left inferior parietal, superior temporal middle temporal and inferior temporal gyri; whereas uncued mouth verbs activated more activation than cued mouth verbs in the bilateral inferior frontal gryus and the left superior parietal, inferior parietal and superior temporal gyri. In contrast, the uncued and cued arm verbs elicited similar activities in most regions except the left inferior temporal gyrus and precuneus.

Table 1 Brain regions showing significant activation for the six types of action verbs (One sample t-test, voxelwise uncorrected P < 0.001, clusterwise corrected P < 0.05). MNI coordinates along with t-values are given for the maximally activated voxel in each local cluster

Effector specific ROI analysis

In order to explore the somatotopic pattern for different word categories, a 3 (ROIs: arm- vs. leg- vs. mouth- foci) × 3 (Word categories: arm vs. leg vs. mouth verbs) × 2 (Effector cues: cued vs. uncued) ANOVA was conducted for the left and right ROIs, respectively. The significant or marginally significant three-way interaction in the left (F4, 72 = 2.243, P = 0.073) and right ROIs (F 4, 72 = 3.308, P = 0.015) indicated that the somatotopy effect was modulated by both Effector cues and Word categories. Then, four 3 ROIs × 3 Word categories ANOVA for the uncued and cued verbs in both hemispheres confirmed significant or marginally significant ROIs × Word categories interaction for uncued verbs at the left hemisphere, F4, 72 = 2.927, P = 0.027 and for cued verbs at both hemispheres (left: F4, 72 = 1.992, P = 0.105; right: F4, 72 = 3.796, P = 0.007). The ROIs × Word categories interaction was also marginally significant for uncued verbs at the right hemisphere when excluding the arm verbs in a 3 ROIs × 2 Word categories ANOVA, F2, 36 = 2.538, P = 0.093.

For the uncued verbs, post hoc results indicated that leg verbs evoked the strongest activation (% signal changes) (0.154) than arm (0.094) and mouth verbs (0.012) in the left leg ROI. The uncued leg verbs also evoked stronger activation (0.117) than mouth verbs (0.036) in the right leg ROI. Similarly, mouth verbs showed the strongest activation (0.376) than arm (−0.084) and leg verbs (0.167) in the left mouth ROI. The uncued mouth verbs also evoked stronger activation (0.332) than leg verbs (0.153) in the right mouth ROI. No similar effect of % signal change was observed for uncued arm verbs at bilateral arm ROIs (left: arm = 0.094; leg = 0.154; mouth = 0.012; right: arm = 0.519; leg = 0.601; mouth = 0.561) (Fig. 3).

Figure 3
figure 3

Mean parameter estimates in the bilateral motor ROIs identified in the motor localizer task for cued and uncued arm, leg and mouth words.

Somatotopic patterns were found only for the uncued leg and mouth verbs in the bilateral leg and mouth regions (marked as gray), such that uncued leg/mouth verbs elicited the highest activation in corresponding motor regions. In contrast, inverse patterns were found for the cued leg and mouth verbs, such that cued leg/mouth verbs showed the lowest activation in corresponding motor regions. These effects were not found for arm verbs in the arm region. † P < 0.01 * P < 0.05 ** P < 0.01 *** P < 0.001.

Such somatotopic representations for uncued leg and mouth verbs were not found for the cued ones. On the contrary, the post hoc results indicated that cued leg verbs actually evoked the lowest % signal change in bilateral leg ROIs (left: −0.197; right: −0.282) than arm (left: 0.168; right: 0.112) and mouth verbs (left: 0.026; right: 0.027). Similarly, cued mouth verbs showed the lowest % signal change in bilateral mouth ROIs (left: −0.073; right: −0.256) than arm (left: 0.028; right: 0.046) and leg verbs (left: 0.043; right: 0.062). Again, such an effect was not observed in the arm ROIs (left: arm = 0.321, leg = 0. 376, mouth = 0.356; right: arm = 0.484, leg = 0.501, mouth = 0.342) (Fig. 3).

Such an inverse somatotopic pattern between cued and uncued verbs was confirmed by significant or marginally significant 2 Effector cues × 3 Word categories interaction in the left leg (F2, 36 = 2.918, P = 0.067), left mouth (F2, 36 = 2.793, P = 0.075) and right leg (F2, 36 = 3.621, P = 0.037) ROIs. Post hoc analysis showed that uncued leg verbs elicited marginally significant stronger activity than cued leg verbs in the bilateral leg ROIs (left: P = 0.070; right: P = 0.055). Similarly, uncued mouth verbs showed significant stronger activity than cued mouth verbs in the bilateral mouth ROIs (left: P = 0.021; right: P = 0.027). Such effects were not found for arm verbs in arm ROIs (Ps > 0.26).

Overall, the effector specific ROI analysis showed the somatotopic pattern for uncued leg and mouth verbs, as demonstrated by the strongest activity in leg or mouth ROIs. In contrast, a reverse somatotopic pattern was found for cued leg and mouth verbs as reflected by the lowest activity in leg or mouth ROIs. Such effects were not found for arm verbs in arm ROIs.

Strip ROI analysis

To further examine the semantic somatotopy in different verb types, we carried out another strip ROI analysis along the left motor and premotor strip (see data analysis). The 3-way ANOVA of 6 (Dorsality: 6 ROIs) × 2 (Effector cues: cued vs. uncued) × 3 (Word categories: arm vs. leg vs. mouth verbs) revealed significant three-way interactions in both the motor (F4,69, 180 = 3.451, P < 0.05) and premotor strip (F5, 83 = 4.554, P = 0.001), which were confirmed by significant interactions in two 6 Dorsality × 3 Word categories ANOVAs for cued verbs (motor strip: F3, 61 = 2.503, P = 0.061; premotor strip: F5, 93 = 4.192, P = 0.002). Although similar interactions were not significant for the uncued verbs (Ps > 0.13), it may again due to the absence of somototopic effect in the arm verbs. In fact, two 6 Dorsality × 2 Word categories (leg vs. mouth) ANOVAs for uncued verbs still revealed significant or marginally significant Dorsality × Word categories interactions for motor strip: F2, 44 = 2.352, P = 0.097. To further confirm the semantic somatotopy of cued and uncued verbs along the two strips, we divided the 6ROIs into three: dorsal regions (ROI1-ROI2), middle regions (ROI3-ROI4) and lateral regions (ROI5-ROI6)25,78.

Firstly, the 2 (Effector cues: Cued vs. Uncued) × 3 (Dorsality: dorsal, middle and lateral) × 3 (Word categories: leg and mouth) ANOVA obtained significant three way interactions in both two strips, Ps < 0.011. Then, four 3 (Dorsality: dorsal, middle and lateral) × 3 (Word categories: leg and mouth) ANOVAs for cued and uncued verbs in the motor and premotor strip revealed four significant main effects of Dorsality Ps < 0.001, such that middle region elicited significantly higher activation than dorsal and lateral regions, Ps < 0.01. The 3 Dorsality × 3 Word categories interaction were also significant for cued verbs (motor strip: F3, 46 = 2.771, P = 0.060; premotor strip: F4, 72 = 5.294, P = 0.001) but not for uncued verbs, Ps > 0.18. However, when we excluded arm verbs, the 3 Dorsality × 2 Word categories interactions were also marginally significant for the uncued verbs (motor strip: F2, 36 = 2.678, P = 0.082; premotor strip: F2, 36 = 2.294, P = 0.107). Post hoc analyses of these two ANOVAs revealed a somatotopic pattern for the uncued verbs (in the 3 × 2 ANOVA) and an inverse somatotopic pattern for cued verbs (in the 3 × 3 ANOVA) at both motor and premotor strips. For the uncued verbs, leg verbs evoked higher activation (% signal changes) in the dorsal region (motor strip: 0.870; premotor strip: 1.047) than mouth verbs (motor strip: 0.710; premotor strip: 0.945), whereas mouth verbs evoked higher activity in the lateral region (motor strip: 0.947; premotor strip: 0.833) than leg verbs (motor strip: 0.762; premotor strip: 0.464). For the cued verbs, leg verbs evoked the lowest activation (% signal changes) (0.541) than arm (0.603) and mouth verbs (0.596) in the dorsal region, whereas mouth verbs evoked the lowest activity in the lateral region (motor strip: 0.599; premotor strip: 0.464) compared to leg (motor strip: 0.958; premotor strip: 1.096) and arm verbs (motor strip: 0.770; premotor strip: 0.725). These results further confirmed the somatotopic and inverse somatotopic patterns for uncued and cued verbs we found in the effector ROI analyses.

Four ANOVA s of 2 Effector cues × 2 Word categories for two strips in the dorsal and lateral region found significant interaction in the lateral region (motor strip: F1,18 = 8.523, P = 0.009; premotor strip: F1,18 = 11.885, P = 0.003), such that uncued mouth verbs showed stronger activity than cued mouth verbs (motor strip: P < 0.01; premotor strip: P < 0.05) (Fig. 4).

Figure 4
figure 4

The positions (column I) and mean parameter estimates of the 12 priori selected ROIs along the left motor (column II) and premotor (column III) strip for the six verb types.

Uncued verbs showed a somatotopic representation among both strips (marked as gray), such that uncued leg verbs showed highest activation in the dorsal regions and uncued mouth verbs showed highest activation the ventral regions. In contrast, an inverse pattern was found for cued verbs, such that cued mouth verbs showed lowest activation in the ventral regions. In addition, ROIs in the middle region showed the strongest overall activation for all six verb types than those in the dorsal and ventral regions. Error bars indicate standard error of the mean. * P < 0.05 ** P < 0.01 *** P < 0.001.

The strip ROI analysis confirmed that the semantic somatotopy was only presented for the uncued leg and mouth verbs, as reflected by the stronger activation for uncued leg verbs in the dorsal region and mouth verbs in the lateral region. In contrast, an inverse somatotopy pattern was presented for cued leg and mouth verbs, such that cued leg and mouth verbs evoked reduced activity in corresponding dorsal and lateral region. In addition, the main effect of Dorsality indicated that all verbs elicited the strongest activation in the middle region of the motor and premotor strip.

Uncued vs. cued verbs contrasts

The Uncued vs. Cued contrast results (Fig. 5A) showed that uncued leg and mouth verbs evoked stronger activation at corresponding ventral and dorsal motor regions than cued leg and mouth verbs. No significant activation was found in the same contrast for arm verbs. These results repeatedly proved that uncued leg and mouth verbs generated stronger activations in corresponding motor areas than cued verbs.

Figure 5
figure 5

Activation and Mean parameter estimates for the direct contrast between uncued and cued verbs for arm, leg and mouth verbs.

(A) Brain regions that showed significant activation for the uncued vs. cued contrast in the motor and premotor regions (voxelwise uncorrected P < 0.001, clusterwise corrected P < 0.05 after Small Volume Correction) for leg and mouth verbs. Arm verbs showed no activation in this contrast. Leg verbs showed significant activation in the dorsal leg region, whereas mouth verbs showed significant activation in the ventral mouth region (B) Mean parameter estimates for the bilateral peak voxels of the leg and mouth verbs showed somatotopic patterns only for uncued verbs (marked as gray). The cued leg and mouth verbs again showed inverse somatotopic patterns. * P < 0.05 ** P < 0.01 *** P < 0.001.

The ROI analyses of peak voxels in each cluster (see data analysis) were conducted with two 2 (ROIs: leg- vs. mouth- foci) × 3 (Word categories: arm vs. leg vs. mouth verbs) × 2 (Effector cues: cued vs. uncued) ANOVA for the left and right ROIs. The results revealed significant or marginally significant 3-way interactions (left: F 2, 36 = 2.892, P = 0.068; right: F 2, 36 = 6.284, P = 0.005). We then performed four 2 ROIs × 3 Word categories ANOVAs for the uncued and cued verbs in the bilateral ROIs. Results revealed only one significant interaction for uncued verbs at the right ROI, F 1, 26 = 6.673, P = 0.009, which again might be due to the influence of arm verbs. We then excluded arm verbs and conducted four 2ROIs× 2 Word categories ANOVA with cued and uncued verbs at bilateral ROIs. Significant or marginally significant interactions were found for both uncued verbs (left: F1, 18 = 4.347, P = 0.052; right: F1, 18 = 7.698, P = 0.013) and cued verbs (right: F1, 18 = 4.054, P = 0.059). For uncued verbs, post hoc analysis revealed that leg verbs evoked stronger activation (% signal changes) in bilateral leg ROIs (left: 0.314; right: 0.060) than mouth verbs (left: 0.107; right: −0.208). Similarly, mouth verbs showed higher activation in bilateral mouth ROIs (left: 0.379; right: 0.715) than leg verbs (left: 0.220; right: 0.404). The opposite pattern was again found for cued verbs, such that cued leg verbs evoked lower activation in bilateral leg ROIs (left: 0.031; right: −0.308) than cued mouth verbs (left: 0.141; right:−0.301); and cued mouth verbs showed lower activation in bilateral mouth ROIs (left:v−0.116; right: 0.316) than cued leg verbs (left: −.050; right: 0.515).

To directly compare the difference between cued and uncued verbs in specific ROIs, a 2 Effector cues: × 3 Word categories ANOVA was conducted in 4 ROIs respectively. We found significant interactions in bilateral leg ROIs (left: F2, 36 = 5.364, P = 0.009; right: F2, 36 = 3.442, P = 0.012) and in right mouth ROI (F2, 36 = 4.327, P = 0.021). Post hoc analysis indicated that uncued leg verbs evoked stronger activity than cued leg verbs (left: P = 0.004; right: P = 0.003) and uncued mouth verbs evoked stronger activity than cued mouth verbs (left: P < 0.001; right: P < 0.001).

Consistent with the aforementioned effector specific ROI analysis, the Uncued vs. Cued ROI result showed both the somatotopic and inverse somatotopic patterns for uncued and cued verbs in corresponding motor regions with the exception of arm verbs (Fig. 5).

Discussion

The main aim of the present study was to examine whether Chinese action verbs with or without linguistic cues to the effectors (arm, leg and mouth) would activate differently in the corresponding motor and premotor cortex. The results showed that Chinese action verbs without effector cues elicited similar somatotopic representation in the motor and premotor cortex as alphabetic scripts. However, such a somatotopic representation was not found for Chinese action verbs with effector cues, which actually elicited reduced activation in corresponding motor and premotor areas, despite the fact that effector-cued verbs are rated higher in imageability than uncued verbs. These results are consistent across different analyses, either ROI analyses with motor regions identified by movement localizer (Fig. 3), predetermined strip ROI analyses (Fig. 4), or direct comparison between uncued and cued verbs (Fig. 5). Our results support the universality of somatotopic representation of action verbs in the motor system and provide direct evidence that pure linguistic properties can influence the semantic processing in category-specific semantic circuits.

Consistent with previous imaging studies, all six verb types (2 Effector cues by 3 Word categories) activated not only the general language processing network in the left inferior frontal gyrus and middle frontal gyrus69,79,80, but also the superior frontal gyrus and the precentral gyrus (motor cortex) and postcentral gyrus (premotor cortex), even in a passive reading task (Fig. 2, Table 1)21,25. These results indicate that the processing of Chinese action verbs was also associated with the sensory-motor system, as suggested by the embodied account of action verb processing found in alphabetic scripts.

However, when we examine the somatotopic representation in the motor and premotor regions, Chinese verbs with effector cues showed remarkably different patterns from those verbs without such cues. Those uncued verbs elicited a somatotopic pattern in the motor and premotor cortex, especially in the left hemisphere, such that leg verbs elicited the largest activation in the dorsal leg region of the motor cortex; meanwhile, mouth verbs elicited the largest activation in the ventral mouth region, as can be seen in the Fig. 2C. In contrast, such a somatotopic representation was found to be reversed for those effector-cued verbs. As can be seen in the Fig. 2C, cued leg verbs activated more in the ventral mouth region whereas cued mouth verbs activated more in the dorsal leg region, although cued arm verbs remains in the same region as uncued arm verbs. These results were confirmed by the ROI analysis of leg and mouth motor regions identified by the localizer task (Fig. 3), the predetermined strip analysis of the motor and promoter cortex (Fig. 4) and the direct comparison between cued and uncued verbs (Fig. 5), such that uncued verbs showed a somatotopic pattern in the motor and premotor cortex by eliciting the highest activation in corresponding motor regions, whereas cued verbs showed a inverse somatotopic pattern by eliciting the lowest activation in corresponding motor regions (Fig. 3, Fig. 4, Fig. 5), although cued or uncued arm verbs revealed no difference in the arm region.

These effects are even more impressive given the fact that cued action verbs were actually ranked higher in imageability than uncued verbs (Fig. 1). In previous studies of action verbs representation, imageability is a consistent predictor of the strength of activation in the motor and premotor cortex81,82,83. To the best of our knowledge, this is the first study showing that action verbs with higher imageability actually yield less activation in the motor and premotor cortex than those with lower imageability. Previous findings supporting the mental imagery interpretation have shown that verbs depict specific motor program (to wipe) and thus have higher imageability elicited more activation in the motor regions than verbs depict general motor program (to clear)48. In addition, somatotopic representation can be found only when participants actively imagined performing the actions represented by the verbs, but not when they made lexical-decisions about the verbs45. However, our results indicate that access to motor mental imagery is not necessarily associated with the somatotopic representation of the action verbs in the motor cortex. Rather, it is the linguistic and semantic properties of the action verbs that were represented in the motor and promoter cortex.

These results thus are consistent with the view that human motor and premotor cortex are a cortical basis of language comprehension3,7,77. Particularly, these results for the first time showed that the pure linguistic properties of action verbs can influence their representation in corresponding motor cortex. TMS studies have shown that stimulating the motor regions of different effectors can facilitate the processing of corresponding action verbs. For example, participants' reaction times were significantly reduced to leg related verbs (e.g., kick) in a lexical decision task when the leg areas was stimulated with TMS84. Our results showed a reverse direction of this association, such that providing linguistic effector information in the action verbs can facilitate the processing of action verbs in corresponding motor regions, as reflected by the reduced activations in these regions. These results thus together provide a further evidence for bidirectional cross-talk between motor activity and action verbs processing46,47.

How could the linguistic effector cues in Chinese action verbs reduce the activation of semantic processing in corresponding motor regions? Previous studies on Chinese object nouns with linguistic category cues that carry category membership information has shown that such linguistic cues can provide a “short cut” to categorization processing85. In a category verification task with typical and atypical items of a category, Chinese participants will not show the ERP typicality effect on N300 and N400 components because both typical and atypical items contain the same linguistic category cue in the Chinese object nouns. Such an effect was not found in English participants, even when using English items that also contain category cues in the English object nouns (e.g., goldfish and catfish). In addition, Chinese object nouns with pronounceable morphological category cues were also found to be more effective than those with unpronounceable orthographic category cues in inducing this effect85.

These findings on Chinese object nouns thus match very well with the current findings in Chinese action verbs, such that semantic radicals in Chinese characters can diminish the brain activation required in semantic processing in certain extent. According to the category-specific semantic circuits hypothesis77,86, meaning of spoken words was represented in two circuits, the general one in the perisylvian language cortex, especially the inferior frontal and superior temporal areas and the other semantic action–perception circuit in the corresponding sensory and motor cortex. Specifically, fine-grained action verb categories are represented by topographically specific semantic circuits in the motor system. The coupling of these two circuits provides learned, arbitrary links between the form of words and their meanings20. We propose that the semantic radicals in Chinese nouns and verbs, either cueing for category or effector, could serve as a semantic “short-cut” that in some extent facilitates access to the semantic action-perception circuits. Although such a short-cut might not allow one to totally bypass the action-perception circuits, it could still speed the retrieving of word meaning. Such a semantic “short-cut” is a unique characteristic of logographic script such as Chinese characters, which has had a long tradition of using linguistic cues to prime word meaning for thousands of years across large numbers of semantic categories. Previous evidence showed that the semantic radical facilitate the target word identification57,58. In addition, such a “short-cut” is more efficient for semantic radicals that are pronounceable than those that are not, because studies have shown that the sublexical processing of phonetic radicals that can pronounce alone as independent characters activates both semantic and phonological information corresponding to the radicals87,88.

This requirement of phonological information in cueing word meaning for Chinese characters could explain the results that effector-cued arm verbs with unpronounceable arm radicals did not elicit the same pattern of reduced activation as pronounceable effector-cued leg and mouth verbs in the corresponding motor and premotor cortex. It also helps explain the results that uncued mouth and leg verbs elicited particular strong and widely spread activation in the motor and premotor cortex (Fig. 2), particularly in the arm regions where uncued mouth and leg verbs actually elicited stronger activation than the uncued arm verbs and thus interfered the somatotopic pattern here (Fig. 3, Fig. 4). This is because the presence and prevalence of pronounceable effector cues in leg and mouth verbs actually make those uncued leg and mouth verbs be relative harder to process than those cued ones for Chinese participants. Comprehending those uncued leg and mouth verbs thus requires extra orthographic, phonological and semantic processing, especially in the middle frontal gyrus, a region that has been found to be particularly important for Chinese characters processing in both conversion of graphic form (orthography) to syllable and other operations concerning orthography-to-semantic mapping69,89,90,91,92. In contrast, such extra processing loads are not required for uncued arm verbs since effector-cued arm verbs are unpronounceable and so are as hard to process as uncued arm verbs. As can be seen in Fig. 3, effector-cued and uncued arm verbs elicited similar activation in the arm ROIs, whereas the uncued leg and mouth verbs elicited the stronger activation than uncued arm verbs in the arm ROI, thus interfered the somatotopic pattern of uncued arm verbs here. Actually, in previous verb studies with alphabetic scripts, the ventral regions of the frontal cortex (inferior frontal gyrus or mouth region) usually elicited the strongest overall activation for all word types21. In contrast, as can be seen in Fig. 3 and Fig. 4, it is the middle regions of the frontal cortex (middle frontal gyrus or motor region of arm) that elicited the strongest overall activation for all verb types in Chinese. Such overwhelming activation of Chinese character processing in middle frontal regions could be another reason that why we failed to identify the somatotopic representation for the arm verbs in the arm region.

Another possible explanation is that the pronounceable effector-cues may inhibit the semantic representation of the corresponding verbs in the motor or premotor cortex. It could be that the pronounceable effector-cues (e.g., mouth ) may activate the entire spectrum of body part movement patterns (e.g., the mouth region) first and the second morpheme of the verbs (e.g., eat ) then selects one motor schema (e.g., eat) from the preactivated spectrum, so that much inhibition is now necessary to turn off the preactivated motor schemas, which is why a local reduction of body-part specific motor activation was found for the pronounceable effector-cued verbs. However, due to the lack of behavioral data (e.g., response time), such interpretation can not confirm from the present study alone. Therefore, additional studies need to be carried out as currently to make unambiguous interpretation.

Taken together, our results strongly support the view that semantic action verbs processing is grounded in the motor cortex, which is universal across both alphabetic and logographic scripts. Our results also provide evidence that linguistic properties can in turn influence the semantic processing in the motor system. Providing the linguistic association of “leg-jump” and “mouth-eat” in language and script and strengthening them during years of learning and reading could actually reduce the required semantic processing of these leg and mouth related verbs in the leg and mouth motor regions. These findings could shed new light on the developing of potential therapy for semantic dementia or aphasia, as well as fulfilling our understanding on how language and experience more generally, shapes the brain.

Methods

Participants

Twenty-one healthy right-handed native Chinese college students participated in the study with payment. Two participants were removed from the passive reading task analysis due to either bad performance or fatigue. We also conducted a motor localizer scan and four participants were removed from the analysis due to serious head movements (>4 mm).

Nineteen participants' fMRI data entered the group analysis. The mean age of participants was 22.32 (SD = 1.73) with a range from 19 to 25 years old. They had normal or corrected-to-normal vision and screened with no history of neurological or psychiatric disorder. The recruitment of participants was approved by the Institute Review Board of Beijing Normal University. All participants' written informed consents were obtained.

Stimuli

A total of 120 Chinese single-character verbs were used as experimental stimuli in a 2 (Effector cues: Cued vs. Uncued) × 3 (Word categories: Arm vs. Leg vs. Mouth) design with six categories, each with 20 items (Fig. 1) (see the Supplementary information for a full list). In order to find the proper materials, a pilot rating study was performed to assess the semantic association between the words and the effector by 24 subjects21. Subjects were asked to rate words according to their action and visual associations and to make explicit whether the words referred to and reminded them of leg, arm and face movements that they could perform themselves. Item scores ranged from 1 (not relevant at all) to 7 (very relevant). The rating result showed the effector-specific association ( Fig. 2A ). The word frequency data were obtained from the Chinese National Corps (www.cncorpus.org). A UNIANOVA of 2 (Effector cues: Cued vs. Uncued) × 3 (Word categories: Arm vs. Leg vs. Mouth) with word frequency data (per million) showed neither significant main effect nor interaction (Ps > 0.61), Cued Arm: 31.81, SD = 40.55, Cued Leg: 14.36, SD = 22.81, Cued Mouth: 29.31, SD = 36.96, Uncued Arm: 28.28, SD = 32.08, Uncued Leg: 23.43, SD = 22.80, Uncued Mouth: 26.58, SD = 30.23. All the verb types were also matched for the number of strokes, Cued Arm: 8.9, SD = 3.04, Cued Leg: 14.55, SD = 2.66, Cued Mouth: 8.6, SD = 2.54, Uncued Arm: 9.75, SD = 2.93, Uncued Leg: 9.75, SD = 2.24, Uncued Mouth: 8.8, SD = 3.00. Twenty nouns with arbitrary semantic meaning were used as filler words to avoid focusing the participants' minds on action-related aspects of the stimuli21. Each word was randomly presented three times in the experiment and the same item did not present twice consecutively.

Experimental procedure

The main experiment contained 4 runs with a total of 480 trials (360 action verbs trials, 60 filler noun trials and 60 baseline checquerboard trials). In each trial, a word or checquerboard that subtended approximately 3.6° of visual angle was presented for 2500 ms. The inter-trial interval was jittered at 500, 2000, 3500, 5000 and 6500 ms with differing probabilities (50%, 25%, 12%, 7%, 6%, respectively). The subjects were instructed to pay attention to the word and perform a passive reading task or just keep attention to the checquerboard when it was presented21. A pseudorandomized stimulus sequence was alternated between subjects and a short rest was taken after each runs.

Finally, to identify the somatotopic motor regions in each volunteer individually, participants were asked to perform a motor localizer task, during which subjects had to move their left or right leg; left or right index finger; or tongue, cued by the body-part words see21. After the fMRI scan procedure, participants were asked to rate the imageability of all the action verbs used in the main experiment on a 7-points scale92.

Image acquisition

Subjects were scanned in a 3 T Siemens MR system using a head coil. The specific Echo-planar imaging (EPI) parameters of the fMRI sequence were as follows: TR = 1500 ms, TE = 28 ms; acquisition matrix = 64 × 64; flip angle 75°; in-plane resolution = 3.1 × 3.1 mm2; and field of view = 200 × 200 mm. The functional images consisted of 28 slices covering the whole brain (slice thickness 3 mm, in plane resolution 3.1 × 3.1 mm2).

Data analysis

Data were preprocessed and analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, Institute of Neurology, London; http://www.fil.ion.ucl.ac.uk/spm), implemented in Matlab (Mathworks, USA). MNI coordinates93 were transferred into Talairach coordinates94 according to the criteria specified by http://www.mrc-cbu.cam.ac.uk/Imaging/Common/mni space.shtml. Talairach coordinates were transferred to brain regions using the Talairach Daemon database95. The first two scans were discarded from the analysis to eliminate nonequilibrium effects of magnetization. Preprocessing involved realignment through rigid body registration to correct for head motion (head motion never exceeded 3 mm or 3°), slice-timing correction to the onset of the first slice, normalization to Montreal Neurological Institute space, interpolation of voxel sizes to 2 × 2 × 2 mm, smoothing (8-mm full-width/half-maximum kernel) and filtering (high-pass filter set at 128 s, low-pass filter achieved by convolution with hemodynamic response function).

A group-level random effects analyses of one-sample t-test was first conducted for the six word categories again the baseline (voxelwise uncorrected p < 0.001, clusterwise corrected p < 0.05) (Fig. 2C, Table 1). To exam the somatotopic representation of effector cued and uncued verbs in the motor and premotor cortex, we conducted three different region-of-interest (ROI) analyses. All ROIs were spheres of radius 10 mm, constructed using MarsBar for SPM96. Percent signal change values were computed by MarsBar and presented in the results section with brackets. First, we carried out an effector specific ROI analysis21 with six symmetrical ROIs identified in the random-effects analysis of the motor localizer task (voxelwise uncorrected p < 0.001, clusterwise corrected p < 0.05) (Fig. 2C) for the three body parts of arm, leg and mouth. These ROIs were selected from the peak voxel in the activation clusters of right hand, right leg and tongue movement, in the postcentral gyrus (left: −56 −19 48, t(14) = 5.47), the paracentral gyrus (left: −6 −38 62, t(14) = 8.75) and the postcentral gyrus (left: 66 −19 24, t(14) = 5.89), respectively (Fig. 3). The average parameter estimates over voxels in each ROIs for each individual subject were then calculated and sent to ANOVA analyses. ANOVAs with more than 1 degree of freedom in the numerator were adjusted for violations of sphericity according to the method of Greenhouse and Geisser. All the reported post hoc results were Bonferroni corrected.

Second, we performed an additional “motor strip ROI analysis” see25,78 with priori selected six ROIs along the left motor and premotor cortex (Fig. 4). The MNI coordinates of those ROIs were adopted from Boulenger et al25. The vertical z-coordinates of the 12 ROIs are between 25 and 68 mm in standard MNI, which cover maximal activation probabilities (t-values) in precentral gyrus for arm words at z = 48 mm, for leg words at z = 64 mm21,44 and for tongue movement at z = 25 mm78. The average parameter estimates for all of the 12 ROIs were obtained and sent to further ANOVA analyses. ANOVAs with more than 1 degree of freedom in the numerator were adjusted for violations of sphericity according to the method of Greenhouse and Geisser. All the reported post hoc results were Bonferroni corrected.

Third, we directly compared uncued vs. cued verbs for arm, leg and mouth related verbs in the motor and premotor cortex by conducting a one-sample t-test group analysis (voxelwise uncorrected P < 0.001, clusterwise corrected P < 0.05 after Small Volume Correction) masked by the precentral gryus, postcentral gyrus and paracentral lobule from the WFU_PickAtlas 2.40 (Fig. 5). The Small Volume Correction was carried out for both hemispheres separately within the precentral and postcentral gyrus. The contrast of uncued vs. cued arm verbs did not reveal any significant activation thus was excluded from the ROIs analysis. Average parameter estimates for the four ROIs of peak activation at each cluster (LH leg ROI: −25 −25 53, t (18) = 4.17; RH leg ROI: 28 −6 58, t (18) = 3.98; LH mouth ROI: −56 −19 14, t (18) = 5.68; RH mouth ROI: 63 3 24, t (18) = 3.91) were then sent to ANOVA analyses. ANOVAs with more than 1 degree of freedom in the numerator were adjusted for violations of sphericity according to the method of Greenhouse and Geisser. All the reported post hoc results were Bonferroni corrected.