Abstract
Sound symbolism, which is the systematic and non-arbitrary link between a word and its meaning, has been suggested to bootstrap language acquisition in infants. However, it is unclear how sound symbolism is processed in the infants’ brain. To address this issue, we investigated the cortical response in 11-month-old infants in relation to sound-symbolic correspondences using near-infrared spectroscopy (NIRS). Two types of stimuli were presented: a novel visual stimulus (e.g., a round shape) followed by a novel auditory stimulus that either sound-symbolically matched (moma) or mismatched (kipi) the shape. We found a significant hemodynamic increase in the right temporal area, when the sound and the referent sound were symbolically matched, but this effect was limited to the moma stimulus. The anatomical locus corresponds to the right posterior superior temporal sulcus (rSTS), which is thought to process sound symbolism in adults. These findings suggest that prelinguistic infants have the biological basis to detect cross-modal correspondences between word sounds and visual referents.
Similar content being viewed by others
Introduction
In traditional linguistics, the arbitrariness of the relationship between sound and meaning is considered a core principle of language1. For the majority of words in a lexicon, mapping between sound and meaning may indeed seem arbitrary. However, recent large-scale computational research in which word lists covering nearly two-thirds of the world’s languages were analyzed2 found strong associations between speech sound and meanings for some property words (e.g., “small” and i, “full” and p or b) and body part terms (“tongue” and l, “nose” and n). Psychologically, people generally have good sensitivity to sound symbolism, i.e., the non-arbitrary relationships between linguistic sound and meaning3,4,5,6. For example, the open vowel [a] tends to be associated with large object size while the closed vowel [i] tends to be associated with small object size (i.e., vowel-size symbolism4). People judge the nonsense word maluma to be better associated with round than angular shapes, while takete sounds better for angular shapes (i.e., the bouba/kiki effect5,6).
The relationship between sound symbolism and children’s language development has been discussed extensively in the recent literature. Sound symbolic words abound in children’s early speech production7,8 and in talk directed to infants by caretakers9. Previous behavioral studies have also demonstrated that sound symbolism plays a facilitative role in word learning. For example, research has shown that 14-month-old infants benefit from bouba/kiki type sound-shape correspondence in an associative word learning task10. This scaffolding effect continues into toddlerhood, particularly in verb learning11,12,13.
The idea that sound symbolism scaffolds lexical development3 presupposes the ability of infants to detect inherent similarities across sound and other perceptual modalities. A handful of recent behavioral and brain research studies have suggested that prelinguistic infants indeed have such abilities. For example, it has been shown that four-month-old infants detect vowel-size sound symbolism14. Some other studies have demonstrated that prelinguistic infants are also sensitive to bouba/kiki type sound-shape correspondence15,16,17. Fort, Lammertink, Peperkamp, Guevara-Rukoz, Fikkert, and Tsuji18 conducted a meta-analysis of the sound symbolism effect in infants by examining both published and unpublished work, most of which employed behavioral measures. These authors concluded that young infants by and large are sensitive to audition-vision correspondence.
However, understanding of the neural mechanism of sound symbolism remains limited, particularly for infants, but also for adults. Ramachandran and Hubbard6 hypothesized that multi-sensory integration at the temporal-parietal-occipital (TPO) junction is critical for sensing sound symbolism. The TPO junction includes the posterior part of the superior temporal sulcus (STS)19, which is known to play a key role in the integration of complex featural information such as facial movements and vocal sounds, particularly between audiovisual linguistic signals (reviewed in20). Although investigations on the neural mechanism underlying sound symbolism processing are not plentiful even in adults, two previous adult functional magnetic resonance imaging (fMRI) studies that used conventional sound symbolic words in Japanese across different semantic domains have produced results broadly consistent with Ramachandran’s hypothesis in that both identified the involvement of the right STS (superior temporal sulcus) area in sound symbolism processing.
In one study, the auditory presentation of Japanese mimetic words for animal sounds (e.g., ka-ka, an onomatopoeia for crow croaks) was found to activate the right STS more strongly than the names of the animals (e.g., karasu, the Japanese word for “crow”)21. Another study also identified the activation of the right STS area22 in two different non-auditory semantic domains, i.e., shape and motion. Regardless of the domain, when the word sound matched the referent, the posterior part of the right STS was activated more strongly than when the word sound and the referent were mismatched. Thus, the previous results suggest that the right STS plays a critical role in processing sound symbolism, serving as a hub to integrate language sound and visual information.
Little is known about how sound symbolism is processed in the infant brain or its ontogenesis. Thus far, only one published study has investigated the brain response to sound symbolism in young infants, which was included in the meta-analysis by Fort et al.15. Asano and colleagues examined the electroencephalography (EEG) responses of 11-month-old infants when presented with sounds that symbolically matched and mismatched couplings of pseudo-words and visual shapes (e.g., moma for round shapes and kipi for spikey shapes)15. In terms of the event-related potential (ERP) pattern, infants responded differently to the sound-symbolically matching word-shape pairs than mismatching word-shape pairs. The timing and topography were similar to the typical N400 response, which is an index of semantic integration difficulty for both adults23 and infants24,25. Furthermore, the phase synchronization of the neural oscillations (phase locking value, PLV) increased (as compared with the baseline period) significantly more in the mismatch condition than in the match condition, suggesting that cross-modal binding was achieved quickly in the match condition, but that sustained effort was required in the mismatch condition. An additional brain oscillation analysis showed an increase in the early (<200 ms latency) gamma-band oscillations in the match condition compared with the mismatch condition, which was thought to be related to multisensory integration (reviewed in26). Taken together, these results provide some evidence for the hypothesis that infants detect the correspondence between word sounds and referents through spontaneous cross-modal mapping. However, more direct evidence for this hypothesis is warranted: if early sensitivity to sound-shape symbolism in young infants15,16,17,18 reflects the spontaneous cross-modal mapping ability available before or at the time at which infants start to make conscious efforts in connecting word sounds to their referents, a significant hemodynamic response would be expected in the area corresponding to the right TPO junction area, especially in the area of the STS, where audio-visual information is integrated (adults27; 3-month-old infants28).
In this study, we investigated this hypothesis using near-infrared spectroscopy (NIRS). We chose to study 11-month-old infants because multiple studies have consistently reported that infants around this age are sensitive to sound symbolism15,17, while the results were unstable for younger infants16,17,29. Another reason for this is that the participants of the previous EEG study15 were 11-month-old infants. While an EEG has high temporal resolution, its spatial resolution is limited. The opposite holds true for NIRS. The previous EEG study15 and the current NIRS study can complementarily address the neural basis of sound symbolism processing in 11-month-old infants. The word-shape pairs used in the present study were identical to those used in our previous infant study15 (refer to Fig. 1A for examples). The stimulus pairs, which were confirmed to elicit sound-symbolic responses in both 11-month-old infants as well as adults with various linguistic backgrounds15, allowed us to examine the loci of sound symbolism processing in the 11-month-old infant brain.
In each block, the infants were presented with a sequence of three spiky or round visual shapes followed by the novel word moma or kipi, which were used for Asano et al.’s EEG study15. The hemodynamic responses to the matching and the mismatching pairs were contrasted against the responses during baseline (i.e., neutral shapes followed by white noise).
Methods
Participants
All infants were full-term at birth and healthy at the time of the experiment. This study was approved by the Ethics Committee of Chuo University, and was conducted in accordance to the Declaration of Helsinki. We received written informed consent from the parents of all infant participants. To obtain a sufficient number of valid trials for the match and mismatch cases for analysis, the participants only heard one of the two word stimuli (i.e. the moma condition or the kipi condition. For details, see the Stimuli section). The participants included in the analyses were 22 healthy 11-month-old infants (11 in the moma condition and 11 infants in the kipi condition; 10 males and 12 females, mean age = 353 days, ranging from 336 days to 386 days). These sample sizes were determined by a power calculation on the basis of a previous fNIRS study30 that revealed the activation of the right STS in 7- and 8-month-old infants in response to a visual-auditory association. An additional 20 infants were excluded because they did not complete a sufficient number of trials that can be included in the analyses (fewer than three trials for either the match or mismatch presentation).
Apparatus
Each infant sat on experimenter’s lap in an experimental booth throughout the experiment. A 21-inch color cathode ray tube (CRT) display (Sony GDM-F520) was used to present the visual stimuli. The resolution of the CRT was set at 1024 × 768 pixels with an 8-bit color mode. The display was placed in front of the infant at a distance of approximately 40 cm. The infant’s viewing behavior was monitored by a hidden video camera set beneath the CRT display. An experimenter controlled the presentation of the stimulus.
Stimuli
Twenty spiky shapes, twenty rounded shapes, and twenty neutral shapes, which were neither spiky nor rounded, were prepared. The shapes were drawn with black lines on a white background. In each trial, infants were presented with a sequence of three different spiky or rounded shapes. Each shape was presented for 2 s; therefore, each trial lasted 6 s (Fig. 1A). Two nonsense words, kipi or moma, recorded by a Japanese female (400 ms in duration), were used as auditory stimuli respectively in two different conditions. In the moma condition, the auditory stimulus “moma” was presented 200 ms after the onset of a visual shapes, either a round shape (matching) or a spiky shape (mismatching) (Fig. 1B). Likewise, in the kipi condition, the auditory stimulus “kipi” was presented after the onset of the same shapes. These stimulus pairs were the same as in our previous infant sound symbolism study15 in which 11-month-old Japanese infants were found to present different EEG responses to sound-symbolically matched and mismatched word-shape pairs (see Discussion for possible issues related to the low-level acoustic features of the auditory stimuli). During the baseline period, a sequence of neutral shapes (Fig. 1A) was presented, followed by 400 ms of white noise. The duration of each visual shape and the onset timing of the auditory stimulus was identical to that in the test trials. The baseline period continued until two criteria were met: (i) the duration was >10 s; and (ii) the infant continued to look at the baseline stimuli during the last 2 s. To confirm that the shapes in the baseline stood neutral between the spiky and the round shapes, 16 adults (mean age = 26, range 20–38 SD = 4.8, 10 females) rated the associations between the nonsense words (i.e. “kipi” and “moma”) and three types of shapes. The results revealed no significant difference between the ratings of association between the “kipi” sound and the “moma” sound against the neutral shapes (t(15) = 1.696, n.s.). Furthermore, the dispersion of the ratings in the three types of shapes were similar (Levene’s test: F(1,30) = 0.209, n.s.). Therefore, any effect found in the results should not stem from the shape variabilities within each shape type.
Procedure
Each infant was tested while sitting on an experimenter’s lap facing a CRT monitor placed 40 cm away from the chair. The infants watched the stimuli passively while brain activity was recorded. The infants were allowed to watch the stimuli as long as they were willing to do so. Their behavior was recorded on videotape during the experiment.
The NIRS instrument
We used a Hitachi ETG-4000 system (Hitachi Medical, Chiba, Japan), which recorded NIRS from 24 channels simultaneously; 12 channels were for recording the right temporal region and 12 were for the left temporal region. The instrument generated two wavelengths of near-infrared light (695 and 830 nm) and measured the time courses of the levels of oxyhemoglobin (oxy-Hb), deoxyhemoglobin (deoxy-Hb), and total-hemoglobin (total-Hb) concentrations in each channel with a 0.1 s time resolution. We used NIRS sensor probes that were developed for infants (Hitachi Medical, infant probe 3 × 3 mode). These probes were lighter in weight and had softer skin contact than other probe types. Most of the infant participants appeared comfortable during the experiments. We used a pair of sensor probes, each of which contained nine optical fibers (3 × 3 arrays). Of the nine fibers, five were used to emit infrared light, and four were used to detect the scatter of the infrared light through the brain tissue. The optical fibers of each probe were mounted on a soft silicone holder. The emitter and detector fibers were displaced by 2 cm. Each pair of adjacent emitting and detecting fibers was assigned to a single measurement channel, which allowed for the measurement of hemodynamic changes at each of the 12 channels in each hemisphere. In each hemisphere, the centers of the probes were placed at the locations of electrodes T3 and T4 as defined by the International 10–20 electrode system. When the probes were positioned, the experimenter confirmed that the fibers were touching the infant’s scalp correctly. The NIRS system automatically evaluated if the contact was adequate to measure the emerging photons in each channel after the scattering and refraction of infrared light under the scalp.
Data analysis of the NIRS measurements
By examining the behavior of the infants that was recorded on the videotape, we excluded from the analysis the trials during which the infant looked away from the visual stimulus or became fussy. In addition, we excluded the trials during which infants looked back at the experimenter during the preceding baseline period, and the trials with movement artifacts, which were done automatically by a computer program to detect sharp changes in the time series of the NIRS raw data. The raw oxy-Hb data from the individual channels were digitally bandpass-filtered at 0.02–1 Hz to remove longitudinal signal drifts and the noise from the NIRS system. Next, the mean concentration value of each channel within each participant was calculated by averaging the data across the trials in a time series from 2 s before trial onset to 6 s after the end of the trial, which was recorded with a time resolution of 0.1 s. Using the mean concentrations in the time series, we normalized the oxy-Hb concentration during the matching and mismatching presentation for each channel within each participant by calculating the Z-scores against the hemodynamic response during the last 2 s in the baseline. The Z-scores (z) were calculated by subtracting the mean concentration of the last 2 s in the baseline (μ2) from the concentration (μ1) at each time point during the stimulus presentation and then dividing this difference by the standard deviation of the concentration during the last 2 s in the baseline (σ), as follows:
The difference in the signal from the last 2 s in the baseline was statistically tested. Our null hypothesis was that the brain activities in infants during the presentation of sound-symbolically matching and mismatching novel word-visual shape pairs are identical.
Results
We measured the hemodynamic responses in the temporal regions (Fig. 2). As the absolute concentration values of oxyhemoglobin (oxy-Hb) differ substantially between participants, we normalized the concentrations of oxy-Hb to the Z normalization for each channel and within each participant on the basis of the mean concentration in the time series. We first compared the responses (Z-scores), averaged across 12 channels in each the left and right temporal regions, against the baseline to assess if the hemodynamic response was modulated by the sound-symbolic correspondence between the word sound and the shape. Figure 2A,B present the time course of responses to the matching and mismatching sound-symbolical pairs. Upon visual inspection, when the sound matched the shape in the moma condition, the increase in the concentration of oxy-Hb in the right temporal region appeared to be much larger than in the left temporal region (Fig. 2A; results of deoxy-Hb and total-Hb changes are shown in Fig. 3). In contrast, such a difference between the hemispheres was not found when the mismatching pairs were presented. However, in the kipi condition, as opposed to the moma condition, no obvious increase in the hemodynamic response was observed either for the matching pairs or the mismatching pairs (Fig. 2B; results of deoxy-Hb and total-Hb changes are shown in Fig. 4).
To select a time window for further statistical analysis, we segmented the NIRS data into 2-second bins from stimulus onset to 10 s following stimulus presentation, and conducted two-tailed one-sample t-tests with a null hypothesis that hemodynamic responses during the experimental trials (word sound-shape match or mismatch) are not different from those during the baseline (i.e., white noise) period. The results revealed a significant increase in the oxy-Hb concentration in the 0–2 s, 2–4 s, and 4–6 s time bins [t(10) = 4.22, p < 0.01, d = 1.27; t(10) = 5.16, p < 0.01, d = 1.56; t(10) = 4.87, p < 0.01, d = 1.46, respectively; Bonferroni-corrected] in the moma condition during the presentation of the matching pairs, but not during the presentation of the mismatching pairs. In the kipi condition, no increase was detected in any of the time windows. Due to the significant increase in the oxy-Hb concentration observed 0 to 6 s from the time of onset, we averaged the Z-scores for this time window (mean and SD of the responses in each time bin are provided in Table 1) to test if the hemodynamic response was modulated by the sound-symbolic correspondence of the word-shape pairs.
A repeated-measure ANOVA with three factors was applied to the oxy-Hb data by considering word type (moma vs. kipi), congruency (match vs. mismatch), and hemisphere (left vs. right) as the factors for comparison. This analysis revealed significant main effects of congruency [F(1,20) = 5.254, p < 0.05, η2 = 0.065] and hemisphere [F(1,20) = 5.671, p < 0.05, η2 = 0.022] and a significant two-way interaction of congruency and hemisphere [F(1,20) = 4.715, p < 0.05, η2 = 0.019]. The main effect of the word type [F(1,20) = 0.114, n.s.] and three-way interaction [F(1,20) = 1.954, n.s.] did not attain statistical significance. In order to explore the effect of word type, which we observed from the time course of responses (Fig. 2A,B), we conducted two separate repeated-measure ANOVA tests with moma condition and kipi condition. The analysis of moma condition yielded the observed congruency × hemisphere interaction [F(1,10) = 5.313, p < 0.05, η2 = 0.069]. This analysis also revealed a significant main effect of congruency [F(1,10) = 8.420, p < 0.05, η2 = 0.248] and a significant main effect of hemisphere [F(1,10) = 5.313, p < 0.05, η2 = 0.069]. The hemodynamic response to the sound-symbolically matched pair was stronger than to the mismatched pair, in the right temporal region [t(1,10) = 4.038, p < 0.01, d = 0.829], but no difference in response was observed in the left temporal region [t(1,10) = 1.059, n.s.]. In contrast, no significant main effect or interaction was found in the kipi condition [main effect of congruency: F(1,10) = 0.243, n.s.; main effect of hemisphere: F(1,10) = 2.314, n.s.; interaction: F(1,10) = 0.373, n.s]. Thus, consistent with our hypothesis, a reliable response to the sound-symbolically matching pair in the right temporal region was found, but the effect was limited to the “moma” sound.
To further pinpoint the cortical regions relevant to the processing of sound symbolism, we examined which NIRS channels exhibited a significant Oxy-Hb signal increase. Multiple t-test comparison revealed that channels 14, 15, 16, 20, and 23 showed a significant hemodynamic response in the moma condition during the presentation of the matching sound-shape pairs [ch14: t(10) = 4.75, d = 1.93; ch15: t(10) = 2.56, d = 1.06; ch16: t(10) = 4.00, d = 1.89; ch20: t(10) = 2.85, d = 1.16; ch23: t(10) = 3.35, d = 1.37; false discovery rate-adjusted p value < 0.05]. Again, in the kipi condition, no channels showed an increase from the baseline. According to the estimation of the correspondence between the channel positions in the International 10–20 EEG system and their anatomical loci31, these activated channels were near the right superior temporal region, which is in close topographic proximity to the key region for sound symbolism processing in adults22.
Discussion
The results of this research revealed that the superior temporal region, which is known to be a cross-modal integration area, plays a key role in sound symbolism processing in 11-month-old infants, as it does in adults. Infants showed increased hemodynamic responses to sound-symbolically matched round/moma-type novel word-visual shape pairs in the right superior temporal area, while no such response was observed for spiky/kiki-type and sound-symbolically mismatched pairs. The hypothesis that sound symbolism scaffolds lexical development3 presupposes the ability of infants to detect the correspondence between a word sound and a referent through spontaneous cross-modal mapping prior to commencing active efforts to associate linguistic sounds to their referents. Our findings provide additional support to this hypothesis, although the results suggest that not all sound symbolism that adults may sense may be detected by prelinguistic infants18. More importantly, this research revealed the brain loci for sound symbolism, which is an important step toward uncovering the neural mechanism underlying sound symbolism sensitivity.
The modulation of hemodynamic responses by sound-symbolic correspondence in infants in the moma-sound trials was lateralized to the right temporal region. This result is consistent with the findings of the previous adult fMRI studies that identified the involvement of the right STS area in sound symbolism processing21,22. As it was reported that the processing of linguistic sounds is lateralized to the left temporal lobe while that of non-linguistic sounds (e.g., animal sounds, environmental sounds) is lateralized to the right temporal lobe32,33, the authors of the adult fMRI studies argued that sound symbolic words have properties of both linguistic symbols and non-linguistic iconic symbols and are processed correspondingly. Combined with the findings that the STS is involved in audio-visual perceptual integration in infants as well as in adults27,28, the right superior temporal area is predicted to be the key structure for the detection of sound-meaning correspondence in prelinguistic infants.
The attempt to draw structural analogies between the brains of infants and adults may cause concern as the former is in a state of continuous growth. In fact, previous research investigating the structural changes that occur in the brain from infancy to adulthood has suggested that maturation of the association cortex, which includes the STS area, occurs much later than the cortical area for the individual sensory modalities34. Despite this concern, several studies have reported that the involvement of the cortical region is analogous to the adult STS area in infant multisensory processing as well as in face processing. For example, a previous fNIRS study28 showed that audio-visual multisensory events triggered significant activation in the global network of the cortical areas, including the temporal areas, in 3-month-old infants. Second, functional connectivity (resting state networks) between the STS area and the visual (MT, V4), auditory (A1), and somatosensory cortices has been found to exist in the neonatal brain35. Thus, the neural processing of multisensory integration may involve the STS from a very early postnatal age. Additionally, previous NIRS studies have shown that activation of the right STS occurs in 7- and 8-month-old infants in response to face stimuli, similar to that typically observed in the adult brain36,37,38,39,40,41.
One unanticipated finding was the lack of hemodynamic change from baseline in the left temporal area in both sound-symbolically matching and mismatching conditions. Previous brain imaging (fMRI and NIRS) studies have reported that infants as young as 3- to 4-months of age recruit the left temporal areas for processing speech sounds, including pseudo-words42,43,44. Thus, it was somewhat unexpected that the presentation of novel speech sounds did not increase activation above the level observed for white noise in the left temporal region. One possible reason is that the duration time of the sound stimulus was substantially shorter in our study (400 ms) in comparison with previous studies that reported activation of the left temporal region (700–12,000 ms; 42–44). It could be that 400 ms was too short for infants of this age to invoke language processing. However, the duration of the stimulus used in our study was sufficient to invoke the response in the right STS, which strengthens the conclusion that the increased activation in the right STS reflected the perceptual cross-modal integration between vision and audition.
Another point of discussion is the finding of sound-symbolic effects for the “moma” stimulus but not for the “kipi” stimulus. Although it is difficult to specify the reason for this asymmetry, this result is strikingly consistent with the findings of a recently conducted meta-analysis on the sensitivity to sound symbolism in infants by Fort and colleagues18. Across the 11 studies examined, the meta-analysis found a greater sensitivity to sound symbolism for bouba-type pseudo-words than for kiki-type pseudo-words in infants and found that the sensitivity for the latter word type emerges later on. Children between 4 and 15 months of age showed a lower sensitivity to kiki-type pseudo-words compared to children between 25 and 28 months of age. The asymmetry between the sensitivity to the bouba-round and the kiki-spiky correspondence may arise because sound symbolic corresponding is subtler for the latter than the former. The results of the meta-analysis and the current study suggest two possibilities. The first possibility is that, infants of about 11 months of age are not sensitive to sound symbolism for kiki-type pseudo-words (i.e., they show only “round-moma” correspondence effect rather than the bouba/kiki effect). Perhaps the “round-moma” combination was easier to map than the “spiky-kipi” correspondence in the infants’ brain, and the latter sound symbolism needs to wait until further maturation in the brain or needs more exposure to linguistic input. Fort et al.18 suggested several possible reasons for the absence of the “spiky-kiki” correspondence effect, including the possibilities that, infants prefer specific acoustic and/or visual features (e.g., low-frequency bouba-type sounds and/or curved objects) over others (e.g., high-frequency kiki-type sounds/angular objects), or that infants have more experience with round objects and/or bouba-type words than spiky objects and /or kiki-type words in their direct perceptual environment18. The absence of the “spiky-kiki” effect in the present study could also have been due to the low-level acoustic features of the auditory stimuli. The sound used in this study, “kipi,” contains a vowel repetition, while “moma” contains a consonant repetition. As it has been revealed previously, infants respond differently to consonants and vowels45, the differences between the acoustic structures of the two auditory stimuli might have led to the asymmetric results.
Alternatively, it may be possible that the 11-month-old infants enrolled in our study were actually sensitive to sound symbolism for kiki-type pseudo-words, but the sensitivity was not reflected in the experimental results due to some confounding methodological factors. Temporal duration of “bouba/moma” is longer than “kiki/kipi”, and thus a stronger and temporally more sustained cross-modal mapping process should be induced by the former rather than the latter. The temporal resolution of testing methods like NIRS and preferential looking, which was used for most of the studies included in the meta-analysis18, might be too low to capture the transient sound-symbolic responses induced by “kiki/kipi” sounds. In fact, we observed sound-symbolic responses both in moma- and kipi-sound trials in our previous EEG study with 11-month-old infants15; the effect size of sound symbolism, calculated based on the mean ERP amplitudes in the sound-symbolically matched and mismatched conditions, was larger in the kipi-sound trials (Hedge’s g = 0.59) than in the moma-sound trials (g = 0.11) [see additional analyses for18 that is available online]. Since an EEG provides high temporal resolution, this observation is consistent with the possibility that the lack of the sound-symbolic effect was due to the particular methodological property of NIRS (as well as the preferential looking paradigm) rather than the particular sound property of the word. Of course, the last possibility is speculative, and we have no intention to argue that it is more tenable than the other two. Further research is required to verify these possibilities. However, the fact that 11 month-old infants showed sound-symbolic responses both in moma- and kipi-sound trials in our previous EEG study15 albeit suggests the possibility that, 11-month-old infants may have sensitivity to spiky-kiki type sound symbolism. It also suggests that the asymmetric results of the current study, in which we located the loci of sound symbolism processing in the 11-month-old infant brain using fNIRS, may not be only attributed to the differences between the acoustic structures of the “kipi” and “moma” sounds. To disambiguate these possibilities, in future studies, it would be beneficial to use a larger pool of novel words in which consonants and vowels are systematically combined in order to exclude the effects specific to acoustic characteristics of the auditory stimuli.
One final remark pertains to the relevance of the present result to the neonatal synesthesia theory previously proposed by some theorists6,46. Our result suggests that some sound-referent correspondence is processed as a spontaneous multimodal mapping, and in this sense, it is biologically based. However, it is premature to interpret the present result as evidence for or counter-evidence against the neonatal synesthesia theory. First of all, our participants were not neonates so we cannot directly speak to this theory. Furthermore, it is not clear to us whether this theory would predict that infants would detect any sound symbolism that adults would sense without learning. As noted earlier, it is possible that infants are more sensitive to particular types of sound-referent correspondences than other types prior to language learning and acquire other types of sound symbolism later through more language learning experiences. We are not certain whether this possibility counters the neonate synesthesia theory, and it is beyond the scope of the present research.
In any case, the fact that we obtained the hemodynamic changes in one type (moma) of sound-referent correspondence that infants have reported to be sensitive to in previous studies18 shows that our experimental method was valid for assessing sensitivity to sound symbolism in young infants. Although prelinguistic infants may not be sensitive to all the sound symbolism that the adults sense, what they do sense is processed in the right posterior temporal area, where adults process sound symbolic words. This finding is an important first step towards our understanding of the neural mechanism of sound symbolism processing, as well as understanding the ontogenesis of sound symbolism, although substantial future work is needed.
References
De Saussure, F., Course in general linguistics, eds Bally, C., Sechehaye, A. & trans Harris, R. (La Salle, IL: Open Court) (Original work published 1916) (1983).
Blasi, D. E., Wichmann, S., Hammarström, H., Stadler, P. F. & Christiansen, M. H. Sound–meaning association biases evidenced across thousands of languages. Proc Natl Acad Sci USA 113(39), 10818–10823 (2016).
Imai, M. & Kita, S. The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philos Trans R Soc Lond B 369(1651), 20130298 (2014).
Sapir, E. A study in phonetic symbolism. J Exp Psychol 12(3), 225–239 (1929).
Köhler, W. Gestalt psychology (New York: Liveright) (1929/1947).
Ramachandran, V. S. & Hubbard, E. M. Synaesthesia – a window into perception, thought and language. J Conscious Stud 8(12), 3–34 (2001).
D’Odorico, L., Carubbi, S., Salerni, N. & Calvo, V. Vocabulary development in Italian children: A longitudinal evaluation of quantitative and qualitative aspects. J Child Lang 28(2), 351–372 (2001).
Toda, S., Fogel, A. & Kawai, M. Maternal speech to three-month-old infants in the United States and Japan. J Child Lang 17(2), 279–294 (1990).
Fernald, A. & Morikawa, H. Common themes and cultural variations in Japanese and American mothers’ speech to infants. Child Dev 64(3), 637–656 (1993).
Imai, M. et al. Sound symbolism facilitates word learning in 14-month-olds. PLoS One 10(2), e0116494 (2015).
Imai, M., Kita, S., Nagumo, M. & Okada, H. Sound symbolism facilitates early verb learning. Cognition 109(1), 54–65 (2008).
Kantartzis, K., Imai, M. & Kita, S. Japanese sound-symbolism facilitates word learning in English-speaking children. Cogn Sci 35(3), 575–586 (2011).
Yoshida, H. A cross-linguistic study of sound symbolism in children’s verb learning. J Cogn Dev 13(2), 232–265 (2012).
Peña, M., Mehler, J. & Nespor, M. The role of audiovisual processing in early conceptual development. Psychol Sci 22(11), 1419–1421 (2011).
Asano, M. et al. Sound symbolism scaffolds language development in preverbal infants. Cortex 63, 196–205 (2015).
Ozturk, O., Krehm, M. & Vouloumanos, A. Sound symbolism in infancy: evidence for sound–shape cross-modal correspondences in 4-month-olds. J Exp Child Psychol 114(2), 173–186 (2013).
Pejovic, J. & Molnar, M. The development of spontaneous sound-shape matching in monolingual and bilingual infants during the first year. Dev Psychol 53(3), 581–586 (2017).
Fort, M. et al. Symbouki: a meta-analysis on the emergence of sound symbolism in early language acquisition. Dev Sci e12659 (2018).
Corbetta, M., Patel, G. & Shulman, G. L. The reorienting system of the human brain: from environment to theory of mind. Neuron 58(3), 306–324 (2008).
Calvert, G. A. Crossmodal processing in the human brain: insights from functional neuroimaging studies. Cereb Cortex 11(12), 1110–1123 (2001).
Hashimoto, T. et al. The neural mechanism associated with the processing of onomatopoeic sounds. Neuroimage 31(4), 1762–1770 (2006).
Kanero, J., Imai, M., Okuda, J., Okada, H. & Matsuda, T. How sound symbolism is processed in the brain: a study on Japanese mimetic words. PLoS One 9(5), e97905 (2014).
Kutas, M. & Federmeier, K. D. Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Annu Rev Psychol 62, 621–647 (2011).
Friedrich, M. & Friederici, A. D. Lexical priming and semantic integration reflected in the event-related potential of 14-month-olds. Neuroreport 16(6), 653–656 (2005).
Parise, E. & Csibra, G. Electrophysiological evidence for the understanding of maternal speech by 9-month-old infants. Psychol Sci 23(7), 728–733 (2012).
Senkowski, D., Schneider, T. R., Foxe, J. J. & Engel, A. K. Crossmodal binding through neural coherence: implications for multisensory processing. Trends Neurosci 31(8), 401–409 (2008).
Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H. & Martin, A. Unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nat Neurosci 7(11), 1190–1192 (2004).
Watanabe, H. et al. Effect of auditory input on activations in infant diverse cortical regions during audiovisual processing. Hum Brain Mapp 34(3), 543–565 (2013).
Fort, M., Weiß, A., Martin, A., & Peperkamp, S. Looking for the bouba-kiki effect in prelexical infants. Presented at the 12th International Conference on Auditory-Visual Speech Processing, Annecy, 71–76 (2013).
Ujiie, Y., Yamashita, W., Fujisaki, W., Kanazawa, S. & Yamaguchi, M. K. Crossmodal association of auditory and visual material properties in infants. Scientific reports 8(1), 9301 (2018).
Okamoto, M. et al. Three-dimensional probabilistic anatomical cranio-cerebral correlation via the international 10–20 system oriented for transcranial functional brain mapping. Neuroimage 21(1), 99–111 (2004).
Specht, K. & Reul, J. Functional segregation of the temporal lobes into highly differentiated subsystems for auditory perception: an auditory rapid event-related fMRI-task. Neuroimage 20(4), 1944–1954 (2003).
Thierry, G., Giraud, A. L. & Price, C. Hemispheric dissociation in access to the human semantic system. Neuron 38(3), 499–506 (2003).
Gogtay, N. et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc Natl Acad Sci USA 101(21), 8174–8179 (2004).
Sours, C. et al. Cortical multisensory connectivity is present near birth in humans. Brain Imaging Behav 11(4), 1207–1213 (2017).
Ichikawa, H., Kanazawa, S., Yamaguchi, M. K. & Kakigi, R. Infant brain activity while viewing facial movement of point-light displays as measured by near-infrared spectroscopy (NIRS). Neurosci Lett 482(2), 90–94 (2010).
Kobayashi, M., Otsuka, Y., Kanazawa, S., Yamaguchi, M. K. & Kakigi, R. Size-invariant representation of face in infant brain: fNIRS-adaptation study. Neuroreport 23(17), 984–988 (2012).
Honda, Y. et al. How do infants perceive scrambled face?: A near-infrared spectroscopic study. Brain Res 1308, 137–146 (2010).
Nakato, E. et al. When do infants differentiate profile face from frontal face? A near-infrared spectroscopic study. Hum Brain Mapp 30(2), 462–472 (2009).
Nakato, E., Otsuka, Y., Kanazawa, S., Yamaguchi, M. K. & Kakigi, R. Distinct differences in the pattern of hemodynamic response to happy and angry facial expression in infants – a near-infrared spectroscopic study. Neuroimage 54(2), 1600–1606 (2011).
Otsuka, Y. et al. Neural activation to upright and inverted faces in infants measured by near infrared spectroscopy. Neuroimage 34(1), 399–406 (2007).
Dehaene-Lambertz, G. et al. Language or music, mother or Mozart? Structural and environmental influences on infants’ language networks. Brain Lang 114(2), 53–65 (2010).
Minagawa-Kawai, Y. et al. Optical brain imaging reveals general auditory and language-specific processing in early infant development. Cereb Cortex 21(2), 254–261 (2011).
Homae, F., Watanabe, H. & Taga, G. The neural substrates of infant speech perception. Lang Learn 64, 6–26 (2014).
Bonatti, L. L., Pena, M., Nespor, M. & Mehler, J. Linguistic constraints on statistical computations: The role of consonants and vowels in continuous speech processing. Psychol Sci 16(6), 451–459 (2015).
Spector, F. & Maurer, D. Synesthesia: A new approach to understanding the development of perception. Dev Psychol 45(1), 175–189 (2009).
Acknowledgements
Special thanks to the infants and their parents for their kindness and cooperation. This research was supported by a Grant-in-Aid for JSPS Research Fellow (16J05067 to J.Y.), Grant-in-Aid for Scientific Research on Innovative Areas “Shitsukan” (16H01677 to M.K.Y. and 18H05014 to S.K.) and Grant-in-Aid for Scientific Research on Innovative Areas “Evolinguistics” (18H05084 to M.I.) both from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and Grant-in-Aid for Scientific Research from the JSPS (19K23388 to J.Y., 26285167 to M.K.Y. and 16H01928 to M.I.).
Author information
Authors and Affiliations
Contributions
M. Imai developed the study concept. All authors contributed to the study design. Testing and data collection were performed by J. Yang. J. Yang and M. Asano performed the data analysis and interpretation under the supervision of S. Kanazawa, M.K. Yamaguchi and M. Imai. J. Yang and M. Asano drafted the manuscript, and S. Kanazawa, M.K. Yamaguchi and M. Imai provided critical revisions. All authors approved the final version of the manuscript for submission.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, J., Asano, M., Kanazawa, S. et al. Sound symbolism processing is lateralized to the right temporal region in the prelinguistic infant brain. Sci Rep 9, 13435 (2019). https://doi.org/10.1038/s41598-019-49917-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-019-49917-0
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.