Introduction

An advantage for concrete words as compared to abstract words has been demonstrated in a series of psycholinguistic studies. Neurologically unimpaired participants perform better on concrete than abstract words in free recall, cued recall, paired-associate learning and recognition; their reaction times in visual lexical decision are shorter with concrete than abstract words1. This effect is known as “concreteness effect”, and it increases in aphasic patients. This is especially evident in non-fluent aphasia, for example in patients with agrammatism2, where it has been found in spontaneous speech3, reading4, writing5, repetition6, naming7, and comprehension8. Several theories9,10,11,12 have been proposed to explain this advantage of concrete words but they share a common feature, namely a quantitative distinction between concrete and abstract concepts, with concrete items more strongly represented than abstract ones, either because they benefit from a verbal and visuo-perceptual representation10 or thanks to a larger contextual support12 or a larger number of semantic features9,11. For instance, the Dual Coding Theory10 postulates that concrete concepts are supported by both perceptual and verbal representations while abstract words are based exclusively on linguistic information. From the Dual Coding Theory perspective, the advantage of concrete compared to abstract concepts is attributed to the additional contribution of the sensory-motor systems triggered by imagery-based richer representations, presumably involving both hemispheres (not only the left hemisphere) and to a greater number of units activated in the semantic system for concrete words10. The hub-and-spokes model assumes that words are processed in a neural network containing one or more amodal hubs, sensorimotor modality-specific regions, and connections between them (cross-modal conjunctive representations13,14. Initially, it was hypothesized that the anterior temporal lobes (ATLs) were the main hub, but, later on, other potential high-order and low-order hubs have been introduced (e.g., left posterior cingulate cortex, dorsomedial pre-frontal cortex, inferior frontal gyrus, inferior parietal cortex, precuneus)15,16. From the Dual Hub Theory17,18 perspective the ATL processes taxonomic knowledge (shared features, e.g., dog → wolf) while the temporo-parietal areas, including the posterior middle temporal gyrus (pMTG), are involved in thematic knowledge (contiguity relations based on co-occurrence in events or scenarios, e.g., dog—leash).

However, these theories cannot explain the reversal of concreteness effect that has been documented in a number of brain-damaged patients, both single cases19,20,21,22,23,24,25,26,27,28,29, and group studies30,31,32,33,34, who consistently show better performance on abstract as compared to concrete words.

To account for the reversed concreteness effect, it has been proposed that abstract and concrete concepts are distinguished by the manner in which they are acquired, and by the relative weight of sensory-perceptual features in their representation20. An alternative explanation by Crutch and Warrington35, points to a fundamental difference in the architecture of concrete and abstract word representations: the primary organization of concrete concepts is categorical, whereas abstract concepts are predominantly represented by association to other items. In this framework, a reversed concreteness effect might result from selective damage to categorical information (which would selectively affect conceptual representations of concrete words).

The selective impairment of concrete and abstract concepts suggests different anatomical correlates. In aphasic patients, an increase of the concreteness effect has been associated to vascular damage in the territory of the left middle cerebral artery, involving the prefrontal cortex. Cases of reversed concreteness effect, in contrast, are associated to herpes simplex encephalitis26,29 and semantic dementia both in single cases20,21,22,25, and group studies30,31,32,34, that typically affect anterior temporal regions. These results have been confirmed in patients after left temporal pole resection33 and during direct electrical stimulation in awake surgery36. All these data seem to suggest a role of the left prefrontal cortex and the anterior temporal lobe, in processing abstract and concrete concepts, respectively. Notably, with the exception of Yi et al.’s34 and Bonner et al.’s30 studies, the reversal of concreteness effect has been found for nouns but not for verbs.

Neuroimaging data, however, do not totally match clinical evidence. Indeed, while the role of the left inferior frontal gyrus (IFG) for abstract words is confirmed, a previous meta-analysis37, based on 19 fMRI and PET studies, also showed an activation of the middle temporal gyrus (MTG), and, crucially, concrete concepts compared to abstract ones seem to activate the left posterior cingulate, precuneus, fusiform gyrus, parahippocampal cortex, therefore, posterior regions. However, Wang et al.37 took into consideration not only nouns and verbs, but also sentences and fixed expressions, such as idioms. Thus, we hypothesized that this incongruence between clinical and neuroimaging studies could partly depend on the use of very different type of stimuli.

The present systematic review and meta-analysis aimed at addressing which regions are consistently activated across experiments that require participants to process abstract and concrete words, trying to adopt more stringent criteria considering type of stimuli and modality of presentation (visual or auditory). The rationale of these sub-analyses is based on the fMRI literature suggesting that stimulus type, presentation modality, but also tasks could impact on the pattern of activation38,39. Accordingly, we did not include studies using complex stimuli as sentences, or short stories since these publications might tap on different cognitive processes including for example attention and working memory.

Consequently, our study differs from previous meta-analyses37,40 in two aspects:

  1. 1.

    We exclusively selected papers that used only words stimuli and presented specific contrasts (concrete > abstract and abstract > concrete stimuli).

  2. 2.

    We used a different method, choosing the more popular Activation Likelihood Estimation41,42,43 (ALE) as compared to the multilevel kernel density analysis (MKDA)44 applied by Wang et al.37. MKDA and ALE produce similar results, both using the location (xyz-coordinates) of local maxima reported by the individual studies, but MKDA uses a spherical kernel whose radius is determined by the analyst45 while ALE applies a Gaussian kernel whose FWHM is empirically determined. Moreover, our analyses are conducted on the last version of the GingerAle software, which managed to rectify some of the previous limitations of this instrument, e.g., the frequently used FDR correction is no longer supported43 and proposes new best-practice ALE recommendations like the cluster-level family-wise error (FWE) corrected threshold of p < 0.0546.

Finally, this is also an update of the previous reviews, including publications from the last 10 years.

Materials and methods

The present systematic review was conducted under the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines47.

Studies selection

Our meta-analysis is based on 32 neuroimaging studies exploring the neural basis of concrete and abstract words processing, using either PET or fMRI on adult participants, published between January 1996 and February 2021. Studies were selected using four electronic databases: MEDLINE (accessed by PubMed, https://www.ncbi.nlm.nih.gov/pubmed), PsycARTICLES (via EBSCOHost, https://search.ebscohost.com), PsycINFO (via EBSCOHost) and Web of Science (https://webofknowledge.com/). The search terms used were: (1) “semantic decision”, “semantic judgment”; “abstract words”, “concrete words “, “abstract concepts”, “concrete concepts”, “lexical decision” AND (2) “imaging”, “MRI”, “PET”. Additional sources such as reference lists of included studies and relevant systematic reviews were also checked.

Titles, abstracts, and full-text articles were screened and evaluated for eligibility based on the following criteria:

Inclusion criteria:

  • Imaging technique: PET or fMRI,

  • Reported stereotaxic coordinates (in the MNI or Talairach atlases),

  • Whole-brain voxel-based data analyses,

  • More than 5 participants in each study,

  • Sample population of healthy, adult participants,

  • Reported concrete > abstract or abstract > concrete contrast,

  • Word stimuli,

  • Published in English,

Exclusion criteria:

  • Region-of-interest analyses,

  • Multiple single-case analyses,

  • Sample population of minors,

  • Sample population of neurological, brain-damaged, cognitively impaired or psychiatric patients,

  • Only concrete > baseline or abstract > baseline contrasts,

  • Articles from the gray literature (i.e., literature that is not formally published in sources such as books or journal articles, e.g. unpublished Ph.D. thesis),

  • Presentations from international meetings with no specific data provided, perspective and opinion publications, case reports, series of cases, previous reviews or meta-analyses,

  • Studies not published in, or translated into English,

  • Phrases or sentences stimuli,

  • Studies without adequate information (e.g., stereotaxic coordinates) to analyze the concrete vs. abstract contrasts and no reply from the authors after asking for the missing data.

We used a general and broad initial search. We looked for publications that reported word stimuli and concrete > abstract or abstract > concrete MRI and PET contrasts. In a second time, we distinguished the included papers taking into consideration the type of knowledge, type of task, type of stimuli, and type of investigation method. These classifications led to exploratory sub-analyses, with the purpose of controlling as much as possible confounding factors. As previously specified, we looked for publications that reported concrete > abstract words or abstract > concrete words contrast, without analyzing the exact strategies that the authors applied to divide word stimuli into the two categories. Often, the abstractness/concreteness constructs are operationalized in the papers based on two rating methods: (1) asking participants to classify a word as concrete taking into consideration the degree to which it refers to a tangible entity in the world (it has clear references to material objects); (2) or by evaluating its imageability, i.e., the ease with which the word elicits a mental image. Generally speaking, words referring to something that exists in reality, and one can have an immediate experience of it through the senses are considered concrete (e.g., animals, tools); while words whose meaning cannot be experienced directly but can be defined by other words, internal sensory experience, and linguistic information, are classified as abstract (e.g., emotions, morality, social interaction, time).

After removing duplicates, research papers which did not satisfy the above criteria were excluded. For example, several studies focused on sentences or phrases48,49; or reported only words > baseline contrasts50. The more conservative concrete > abstract and abstract > concrete contrast (as opposed to concrete > baseline or abstract > baseline contrasts) was chosen in order to avoid a variety of baselines that could range from resting state, fixation cross51 to pseudowords52 and number or letters53 and could affect the interpretation of the results, since subtractions from different baselines create different activation patterns. We acknowledge that this type of contrast does not reveal which brain regions equally support the processing of both concrete and abstract words. The question that this meta-analysis can answer is which are the most replicated data in the literature (in terms of brain activation and words representation) when contrasting abstract and concrete words. Moreover, since concrete and abstract words can dissociate, we aimed at assessing in which anatomical correlates they differ, and not the common ones.

If the same data were reported in different publications, we chose the most recent one and with the highest number of participants54,55.

Uncertainties regarding some inclusions were solved by the authors through discussion.

The PRISMA flow of information diagram was used to track the search process as presented in Fig. 1 and the main characteristics of the studies included in this meta-analysis are reported in Table 1.

Figure 1
figure 1

PRISMA flowchart of the selection process for included articles.

Table 1 Descriptive information of the 32 experiments included in the meta-analysis.

Classification of the raw data before clustering analyses

From the selected papers, only the stereotactic coordinates representing the concrete > abstract or abstract > concrete contrasts were extracted. Following this procedure, we obtained 295 foci from a total sample of 535 participants. The stereotaxic coordinates reported in terms of the Talairach and Tournoux atlas56 were transformed into the MNI (Montreal Neurological Institute) stereotaxic space57 using the tal2icbm transforms implemented in the GingerALE software41,43,58.

For all the stereotaxic coordinates we extracted the relevant information about the statistical comparisons that generated them. More explicitly, we reported the MNI coordinates (MNI x,y,z), the name of the first author, the journal and the year of publication of the paper, the technique (PET or fMRI) and the stereotactic space used, the age of participants, the type of task, the nature of the contrast from which the peak was extracted, the statistical thresholds, the stimulus type (nouns or verbs) and the presentation modality (auditory or visual).

Clustering procedure

Once obtained the set of MNI coordinates, the meta-analyses were carried out using the revised ALE algorithm41,43 implemented into GingerALE software Version 3.0.258 (http://brainmap.org/ale). The ALE algorithm aims to identify areas with a convergence of reported coordinates across experiments that are higher than expected from a random spatial association. The logic behind this approach implies a spatial probability distribution modeled for each activation peak included in the dataset of interest. Reported foci are treated as centers of 3D Gaussian probability distributions capturing the spatial uncertainty associated with each focus58. The between-subject variance is weighted by the number of participants per study, since larger sample sizes should provide more reliable approximations of the “true” activation effect. The voxel-by-voxel union of these distributions is used as an activation likelihood map, subsequently tested for statistical significance against randomly generated sets of foci. ALE was proven to be a reliable way of blending evidence from multiple studies43 and was used successfully in different fields e.g.,59.

More specifically we used the following procedure:

  • Anatomical filtering—we applied a first filtering of the coordinates using the most conservative (smallest) mask available in the GingerALE software and 17 foci from the total of 295 fell out of the mask.

  • ALE maps (quantify the degree of overlap in peak activation across experiments) were calculated using the modified ALE algorithm and the random-effects model41,43;

  • Thresholding procedure—for each ALE calculation described below significance was tested using 1000 permutations with a cluster forming threshold of p < 0.001 (uncorrected). In order to increase test sensitivity to false positives significance was corrected with a cluster-level family-wise error threshold of p < 0.0546 as used by other meta-analytic studies60.

Unfortunately, ALE cannot deal with multiple independent variables designs, and in this paper we intended to consider the role of different variables like (1) stimulus type (nouns only, verbs only or all word stimuli), (2) modality of presentation (visual only, auditory only or both visual and auditory), and (3) task specificity (e.g., lexical, semantic tasks or all tasks). The ALE strategy we choose in this case was to consider separate sets of foci for each variable and run one meta-analysis for each of these sets when the number of papers was large enough. To this purpose, the overall dataset was divided a-posteriori into several subsets, which automatically implied running meta-analyses on a low number of foci (lowering the power). An important limitation of this approach is that we are not able to statistically assess the interaction between variables like stimuli type and task.

The analyses were based on the following contrasts:

  1. (1)

    An analysis included the activation peaks associated with word processing independently of the stimulus type and task

  2. (2)

    An analysis with peaks associated with noun processing only (because the number of studies including verbs only was too small (4 studies) for a specific analysis on this type of stimuli70,76,82,83)

    • concrete nouns > abstract nouns included 107 stereotactic activation loci from 15 studies (5 foci out of mask), 251 participants;

    • abstract nouns > concrete nouns included 99 stereotactic activation loci from 18 studies (8 foci out of mask), 324 participants;

  3. (3)

    An analysis included the activation peaks associated with word processing independently of the stimulus type (verbs, names or adjectives), but taking into consideration only visually presented stimuli

    • concrete words > abstract words visual stimuli only included 121 stereotactic activation loci from 18 studies, 301 participants

    • abstract words > concrete words visual stimuli only included 135 stereotactic activation loci from 22 studies, 374 participants

Since only 5 studies included auditory stimuli we could not perform a specific analysis for this category51,54,62,79,89.

  1. (4)

    An analysis on peaks associated with lexical (words or non-words classification task), or semantic decision tasks (e.g., pleasantness decision task, answering a question about the stimuli), excluding all the studies based on: memory tasks (2 studies), perceptual decision task (1 study), mental image generation (3 studies), passive reading (2 studies).

    • concrete > abstract word (only lexical and semantic tasks) included 114 stereotactic activation loci from 16 studies, 273 participants

    • abstract > concrete word (only lexical and semantic tasks) included 116 stereotactic activation loci from 17 studies, 289 participants

We explored a-posteriori the role of the (1) stimulus type (nouns), (2) modality of presentation (visual stimuli), and (3) task specificity (lexical and semantic tasks) in order to control as much as possible for each of these variables, i.e., to increase the results accuracy.

It might be argued against the inclusion of PET and fMRI studies in the same meta-analysis due to the substantial methodological differences between the two techniques in terms of experimental design, processing, spatial localization and cluster accuracy. Because the number of studies investigating concrete vs. abstract words is small, we decided to include data from both techniques in order to increase power in the analyses. Nevertheless, in Appendix A (supplementary materials), we present the data analysis after excluding all PET studies (figures and tables are numbered as e.g., Fig. 1A and Table 1A).

For anatomical labeling and figures, we capitalized on the Automatic Anatomical Labeling (AAL) template available in the MRIcron visualization Software (https://www.nitrc.org/projects/mricron).

Results

Once the appropriate studies were collected, we used activation likelihood estimation (ALE) to meta-analytically remodel available neuroimaging data.

CONCRETE > ABSTRACT meta-analysis

The GingerALE procedure run over the concrete words > abstract words set of coordinates identified a total of 5 clusters, with 1–4 individual peaks each, from 4 to 11 different studies (Fig. 2). Regions that were consistently activated across experiments were localized in the bilateral middle temporal gyrus and posterior cingulate, the left parahippocampal gyrus, left fusiform gyrus, bilateral precuneus and angular gyri, left superior occipital gyrus and left cerebellum culmen. The peaks distribution for each significant cluster is reported in Table 2.

Figure 2
figure 2

Clusters activated by the concrete > abstract words contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 2. The images are presented in neurological convention.

Table 2 Concrete > abstract word clusters.

A similar activation pattern, except for the right hemisphere involvement, was observed when only studies reporting exclusively noun stimuli were taken into consideration (concrete nouns > abstract nouns). We observed three left activation clusters (Fig. 3, Table 3) situated in the middle temporal gyrus, parahippocampal gyrus, posterior cingulate, precuneus, superior occipital gyrus, and culmen (left cerebellum anterior lobe).

Figure 3
figure 3

Clusters activated by the concrete > abstract nouns contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 3. The images are presented in neurological convention.

Table 3 Concrete > abstract nouns clusters.

The ALE procedure run over the concrete words > abstract words, visual stimuli only set of coordinates, identified a total of 5 clusters, with 1–6 individual peaks each, from 4 to 8 different studies (Fig. 4). Regions that were consistently activated across experiments were localized in the left middle temporal gyrus, bilateral posterior cingulate, and parahippocampal gyrus, left fusiform gyrus, bilateral precuneus and angular gyri, left superior occipital gyrus and left cerebellum culmen. The peaks distribution for each significant cluster is reported in Table 4.

Figure 4
figure 4

Clusters activated by the concrete > abstract words—visual stimuli—contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 4. The images are presented in neurological convention.

Table 4 Concrete > abstract words—visual stimuli- clusters.

A comparable activation pattern was observed when only studies based on lexical and semantic tasks were taken into consideration. The analysis indicated 4 activation clusters correlated with concrete words > abstract words—lexical and semantic tasks: bilateral middle temporal gyrus, left posterior cingulate and the left parahippocampal gyri, bilateral precuneus, left angular, left superior occipital gyrus and left cerebellum culmen (Fig. 5, Table 5).

Figure 5
figure 5

Clusters activated by the concrete > abstract words -semantic and lexical tasks—contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 5. The images are presented in neurological convention.

Table 5 Concrete > abstract words—semantic and lexical tasks only- clusters.

Abstract > concrete meta-analysis

The revised ALE algorithm discriminated four clusters that correlated with abstract word processing in a healthy population (Fig. 6), from four to 12 different papers (Table 6). Our analyses identified a robust neural pattern of activity in the left frontal and temporal lobes, specifically, the inferior frontal gyrus, the superior and middle temporal gyri and left inferior parietal.

Figure 6
figure 6

Clusters activated by the abstract > concrete words contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 6. The images are presented in neurological convention.

Table 6 Abstract > concrete word clusters.

When only abstract nouns (abstract nouns > concrete nouns) were analyzed, the results indicated a single cluster with two peaks, from 9 studies, in the left inferior frontal gyrus (Fig. 7, Table 7).

Figure 7
figure 7

Clusters activated by the abstract > concrete nouns contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 7. The images are presented in neurological convention.

Table 7 Abstract > concrete nouns clusters.

We identified three clusters associated with abstract words processing in a healthy population when only studies reporting abstract visual stimuli were included (Fig. 8), from 4 to 12 different papers (Table 8). Our analyses revealed a robust neural pattern of activity in the frontal and temporal lobes, specifically, the inferior frontal gyrus and the superior and middle temporal gyri.

Figure 8
figure 8

Clusters activated by the abstract > concrete words—visual stimuli—contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 8. The images are presented in neurological convention.

Table 8 Abstract > concrete words—visual stimuli- clusters.

When only foci from lexical and semantic tasks were analyzed, the results indicated 2 clusters (with 1–4 individual peaks each, from 3 to 9 different studies), in the left inferior frontal gyrus, superior and middle temporal gyrus (Fig. 9, Table 9).

Figure 9
figure 9

Clusters activated by the abstract > concrete words -semantic and lexical task- contrast. The crosses are centered in the areas correspond to stereotactic coordinates reported in Table 9. The images are presented in neurological convention.

Table 9 Abstract > concrete words—semantic and lexical task only- clusters.

As previously specified, due to the very small number of studies we could not conduct sub-analyses based on the (1) verbs only, (2) other types of tasks present in the included publications like mental image generation, memory tasks, or perceptual decision task only; (3) auditory stimuli only.

In Table 10 the descriptive information for each sub-analysis is reported.

Table 10 Descriptive information for each sub-analysis.

In Appendix A (supplementary materials) we present the analyses without PET data. Except for a few clusters that have a smaller number of voxels and one cluster that fragmented into two smaller ones (for the abstract > concrete contrast), all the other clusters are perfectly overlapped (for detail see from Tables 2A, 3, 4, 5, 6, 7, 8, 9A and from Figs. 1A, 2, 3, 4, 5, 6, 7, 8A). No cluster disappeared and no new clusters were observed, indicating a good data consistency. In Appendix A we also added Table 10. A in which we presented the Brodmann area (BA) for the activated clusters with a brief description.

Discussion

As we pointed out in the introduction, neuropsychological studies suggest a role of the lateral prefrontal cortex in processing abstract words and of the left anterior temporal lobe in processing concrete ones. These data are not confirmed by neuroimaging studies. We run a meta-analysis using more stringent criteria to assess whether imaging data can support not only this segregation but also in which components the two networks differ. There are many variables that could influence our findings concerning the neural correlates, like the type of task, type of stimuli, stimuli presentation modality. We tried to control for all these factors in order to obtain accurate results. Since the number of studies was limited, we could not analyze data according to type of task (e.g., lexical decision task vs. semantic task), but at least we excluded those studies without a semantic or lexical decision task. The task performed during fMRI scan is particularly relevant because the activation observed during passively hearing/reading words might be very different from the one observed during semantic judgments for concrete vs. abstract (e.g., the decision for—which is better associated to a table: a chair or a bench?). Regarding the stimulus type, there is an ongoing debate concerning nouns vs. verbs in general38. This question becomes even more difficult when we try to separate abstract and concrete nouns, and abstract and concrete verbs (we could not analyze verbs separately for the lack of studies). Both, Wang et al.37 and Binder et al.40 combined different types of stimuli, e.g., words, sentences, fixed expressions such as idioms, and short stories without further focusing on the stimulus type. Furthermore, since Binder et al.’s40 objective was to investigate the semantic processing in general and not concrete and abstract distinction (although they run a sub-analysis on these two categories), the activation peaks meta-analyzed were obtained from different contrasts: concrete and abstract stimuli > baseline, concrete > abstract and abstract > concrete stimuli. This choice is comprehensible given their objective but the results could be biased by the type of contrast applied; indeed, discrepancies in the patterns of cortical activation across studies may be attributable, at least in part, to differences in baseline tasks, and hence, reflect the limits of the subtractive logic.

Thirty-two imaging studies were included, which evaluated the activation patterns in response to concrete and abstract concepts. All the data included in the ALEanalysis are based on general linear model, GLM. We also looked for studies that used the more modern multivariate pattern analysis, i.e., a set of methods that analyze neural responses as patterns of activity90, in order to have a separate dataset with this type of methods. Unfortunately, we found a very small number of publications preventing a further meta-analytic procedure48,91,92.

The results of this meta-analysis, consistent with those of previous ones37,40, confirmed that concrete and abstract words processing relies, at least in part, on different brain regions. Based on the currently available data we could not investigate the existence of overlapping networks between concrete and abstract words. The ALE procedure was completely data-driven, without a prior theoretical basis, and the results are constrained only by the nature of our data (e.g., the limited temporal resolution of the neuroimaging techniques, the correlational nature of the data), and by our inclusion/exclusion criteria.

As previously mentioned, experiments testing for greater activation for concrete than abstract words (concrete words > abstract words) converge in the temporo-parieto-occipital regions; namely, the left middle temporal gyrus, left fusiform, left parahippocampal and lingual gyri, bilateral angular gyrus and precuneus, bilateral posterior cingulate, left superior occipital gyrus and left culmen in the cerebellum. The neuroimaging evidence indicates that concrete concept processing is at least partly associated to the perceptual system, and also rely on mental imagery (precuneus, superior occipital gyrus). Binder et al.40 found significant overlapping for concrete stimuli in the angular gyrus bilaterally, left mid-fusiform gyrus, left posterior cingulate, and left dorsomedial prefrontal cortex (DMPFC). With the exception of DMPFC that might be related to the stimuli complexity and/or different baselines, all the other regions are confirmed by our data. At variance with Wang et al.’s meta-analysis37 we found a bilateral involvement of the posterior cingulate cortex (PCC), angular and precuneus gyri. Although involved in many semantic-based tasks, the function of the PCC in semantic cognition is still debated. The following hypothesis are proposed: (1) this region could act as a supramodal convergence zone40, (2) PCC activation could reflect the greater engagement of an imagery-based perceptual system for concrete stimuli, or (3) PCC might be an interface between semantic knowledge and episodic memory91. The precuneus also seems associated with visuospatial imagery, a hypothesis supported by experiments conducted on episodic memory retrieval and linguistic tasks which required the processing of high imagery words or mental image generation83. The same regions were found when only nouns were considered (concrete nouns > abstract nouns contrast) with the difference that the right hemisphere activation disappeared. The two right hemisphere clusters might be specifically correlated with action verbs but this result could also be a consequence of the lack of power due to the limited number of studies (15 studies in the nouns dataset vs. 22 in the noun-and-verb database).

The results on abstract words replicated those reported by Wang and colleagues37 and Binder et al.40; higher activation for abstract compared to concrete words conditions (abstract words > concrete words) is more frequently reported in a left lateralized network, encompassing the inferior frontal gyrus (IFG, Brodmann areas 45, 47), a very small portion of the precentral gyrus, the superior and middle temporal gyri, and inferior parietal. They are also in line with the results observed in brain-damaged patients.

It has been suggested that the ventrolateral prefrontal cortex (VLPFC) implements semantic control in two steps93. Step 1 constitutes controlled access to stored representations when bottom-up input is not sufficient. Step 2 operates at post-retrieval and is thought to bias competition among representations that have been activated during Step 1. According to Badre and Wagner94, both steps recruit VLPFC, though different parts of it, with BA 45 involved in Step 2. In other words, IFG activation could reflect a higher level of semantic control processes (additional resources) since abstract stimuli might require semantic selection, irrelevant cues inhibition, effortful integration, top-down control and working-memory related processes95, in agreement with the context availability theory96. In line with this hypothesis, this region showed greater activation for abstract words when a judgment task was performed following irrelevant cues and reduced activation when semantic decisions were made with contextual help, supporting the idea that this area responds more strongly to abstract words because their meanings are inherently more variable and require more control during linguistic processing as compared to the concrete ones53,97. An alternative explanation is offered by Della Rosa98 using a lexical decision task; they found that the left IFG was particularly active during presentation of words characterized by low imageability and low context availability. The authors’ interpretation was that this area could be a functional convergence zone between imageability and context availability, differentiating abstract from concrete concepts.

In neuroimaging studies, besides the IFG, additional clusters were found in the left superior and middle temporal lobe. However, when only nouns were considered (and not verbs), these clusters lost significance, supporting the idea that the cerebral networks deputed to noun and verb processing might be slightly different.

On the other side, results on concrete words do not support neuropsychological data. Indeed, apart from several single case reports, a study comparing the behavioral variant of frontotemporal dementia (FTD), in which there is a predominant prefrontal atrophy, to the semantic variant, with anterior temporal atrophy showed that while the former group had an increase of the concreteness effect, the reversal was found in the semantic variant group. Similarly, patients with left Anterior Temporal Lobe (ATL) resection show the same pattern of reversal concreteness effect33.

One possibility of this inconsistent results is the type of task; the neuroimaging studies used pleasantness judgments, memory tasks, lexical decision, etc. while, in general, patients are examined by means of naming and comprehension tasks and, occasionally, also semantic judgments. Orena et al.36, for example, using direct electrical stimulation (DES) for brain mapping during awake surgery found no behavioral differences between BA 44 and BA 38 stimulation while patients performed a lexical decision task, but they registered a dissociation between abstract and concrete words during a concreteness judgment task; in particular, abstracts words were impaired during stimulation of BA 44 and concrete words during BA 38 stimulation.

Neuroimaging studies are often hard to compare, and many variables could influence the reported results as the duration of the stimuli presentation, stimuli number, stimulus types. For example, in the same type of experiment a large number of stimuli [e.g., 164 nouns74] were presented while in other studies, only four words were repeated for more than 140 trials78. Moreover, selected stimuli greatly varied among studies encompassing emotions, mind states, living and nonliving things, of different frequency of use, age of acquisition and imageability. In addition, many studies used interchangeably “concreteness” and “imageability” , which are in fact two distinct properties that can differently affect naming and recall99,100,101.

We also controlled for presentation modality. When only visually presented words were included in the analysis no relevant differences were observed between auditory and visual stimuli combined, and only visually presented words (see Figs. 5, 9). This can be partially due to the very small number of studies using auditory information (only 5 studies out of 32).

Another relevant element is the participants’ age since aging can modify neural organization due to neuroplasticity102. With two exceptions69,77 in which the participants’ mean age was > 70, all other studies included a young population with a mean age < 30 (see Table 1). Neuropsychological studies (on patients) involve a different population ranging from 55 to 75. Information obtained from healthy young people cannot be optimal to interpret data from elderly, brain-damaged patients.

According to Eickhoff et al.46, the statistical power of the current meta-analysis to detect not only large, but also small- and medium-size effects can be considered acceptable. Nevertheless, meta-analytic power is intrinsically limited by the number of currently available data especially for two sub-analyses: (1) concrete nouns > abstract nouns, only 15 independent experiments, and (2) lexical and semantical task—concrete words > abstract words, 16 studies. This indicates that, in these two cases, we cannot properly control the influence of individual experiments and that we might have failed to detect small effects. Another limitation is related to the sample size of the included experiments that ranged from 6 to 28 participants. We acknowledge the need to consider only well-designed and controlled studies but taking into account the limited number of papers we were forced to include data from studies with uncorrected p values (see Table 1) risking subtle activation differences that may underlie abstract-concrete differences.

This meta-analysis is focused on how representations of abstract and concrete words are processed in the brain. Regarding this last point, future research should better understand the specific role of each region within the semantic network, how they are connected, and specify how task and stimuli characteristics interact and modify activation patterns.

Considering the main question, we can confirm that concrete and abstract words involve at least partially segregated brain areas, the IFG being relevant for abstract nouns and verbs; in contrast, we could not find evidence of the ATL involvement for concrete items. Our data indicate a more posterior activation for concrete words in regions that are often correlated with mental imagery processes. This meta-analysis seems to support the hypothesis that abstract and concrete words have partly separate neural correlates but the specific features that differentiate between these two classes of stimuli are still open to discussion. The cortical regions that are commonly activated in imaging studies investigating concrete and abstract words seem more congruent with the Dual Coding Theory10, i.e., concrete words have richer representations, depending on both hemispheres. Regarding the hub-and-spoke model (hub regions interacting with modality-specific processing areas), we observed activation patterns in areas considered neural crossroads of the semantic network like the posterior cingulate region, the anterior temporal lobe, and the left inferior frontal gyrus91,98,103 , but these data cannot be interpreted in the frame of this theoretical model.

The lack of converging evidence from clinical neuropsychological and neuroimaging data might be explained by several variables like task and stimuli type, differences in terms of age and brain plasticity between the two populations (young vs. elderly people), etc. These discrepancies deserve further investigation, for example by means of balanced groups of healthy and clinical participants, combining different techniques in the same experiment as TMS-EEG, or TMS and fMRI.