Colours’ Impact on Morality: Evidence from Event-related Potentials

Black and white have been shown to be representations of moral concepts. The purpose of this study was to investigate whether colours other than black and white have similar effects on words related to morality and to determine the time course of these effects. We presented moral and immoral words in three colours (red, green and blue) in a Moral Stroop task and used the event-related potential (ERP) technique to identify the temporal dynamics of the impact of colours on moral judgement. The behavioural results showed that it took longer for people to judge immoral words than moral words when the words were coloured green than when they were red or blue. The ERP results revealed the time course of these effects. Three stages were identified in the significant effects of P200, N300 and LPC. These findings suggest a metaphorical association between the colour green and moral information.

are often afraid of darkness at night, and the stories they read connect the colour black to evil and monsters. Thus, when they see black, bad and immoral concepts are evoked. The opposite is true for the colour white.
In addition to black and white, other colours convey metaphorical information about morality. For example, when referring to traffic lights, it is universally known that a red light means stop and a green light means go. Red can be an indication of a serious warning, whereas green represents peaceful, healthy and permissible action. In moral judgements, we need to know what is right or wrong. Therefore, we predict that there are metaphorical associations between these colours and moral concepts: green may be associated with morality, and red may be associated with immorality. We expect the effects of these colours on moral judgement to differ. This concept has not previously been explored.
Despite the considerable interest in metaphor and the corresponding abundance of prior studies, the neural correlations of the effect of colours on morality have been poorly understood. Recently, neuroscientists demonstrated that there are many factors involved in moral processes (such as emotion [28][29][30][31] , culture 32 , and individual dispositions 33 ) and identified the complex neural mechanisms underlying these factors. However, the neural mechanism responsible for the metaphorical connection between colours and moral concepts remains unknown. Our study attempts to answer this question. Event-related potentials (ERPs) enable researchers to follow the temporal course of moral processing due to the high temporal resolution of milliseconds. Previous ERP studies identified several components related to moral and colour processing. The first component is P200, a positive wave that peaks approximately 200 ms after stimulus onset. Some studies have revealed that P200 is connected to automatic attention attribution in early time windows 34,35 . Negative information, such as immoral stimuli, consistently elicit larger and shorter P200 compared to neutral and positive stimuli [36][37][38][39] . The second component is N300, a negative component present in the time window approximately 300 ms after target word presentation 34,40 . Bramão et al. suggested that this component is crucial for the integration of shape and colour information, which can be used to access the structural and semantic representations of an object in long-term memory 40 . A large body of ERP studies reported a late positive component (LPC) during a late time window from 300 to 800 ms post-stimulus. This component was related to mental resource distribution 41 and was found to reflect controlled and elaborative processes, such as moral evaluation and reasoning 31,42 .
The previous literature has demonstrated the effect of the colours black and white on morality. However, other colours may also influence moral processes, although their effect remains unknown. The neural correlates of these effects are poorly understood. The aim of this study was to investigate the effects of colour on morality and to explore the way in which these effects occur. We used a variant of the Stroop task (the Moral Stroop task), which has been employed by other researchers. In this type of Stroop task, the experimenter asks the participants to judge the different types or valences of the words instead of the colour 20,26 to evaluate the way colour influences processing in the judgement of moral words. We conducted the Moral Stroop task with ERPs to explore the neural correlates of the effects of colours on moral judgement. We predicted the following: the cognitive speed of moral judgement would be faster when words in green referred to morality rather than immorality; when words in red referred to immorality, the cognitive speed would be faster than when the words referred to morality; and this phenomenon would not exist for the colour blue. ERPs reveal the time course of these effects. According to previous studies of ERPs, P200 can indicate attention attribution at an early stage 34,37 . We expected that P200 could be helpful in distinguishing moral words from immoral words at an early processing stage. As N300 may be related to the semantic integration of visual features 40 , we expected a significant influence of colour on moral judgement during the time window of N300. Given that LPC may reflect the elaborative processing and moral reasoning in late time windows 31,42 , we predicted that the interaction between colours and morality would be significant for the LPC amplitude, which would reveal the specific distinctions of morality in different colours.

Results
Behavioural Results. We exclude the reaction time for incorrect trials as well as implausibly long or short RT (RTs that deviated ± 3 standard deviations from the mean) trials from the analysis. In total, 4.88% of the trials were removed (3.04% incorrect and 1.84% excessively long/short response time). The accuracy and reaction time, not including the incorrect and extreme trials of each condition, are shown in Table 1. The two-way repeated-measure ANOVA of moral types and colours revealed a significant main effect of the moral type (F (1, 24) = 10.99, p < 0.01, η p 2 = 0.31); responses to immoral words were longer than responses to moral words. The method also revealed the main effect of colours, which was also significant (F (2, 48) = 5.25, p < 0.01, η p 2 = 0.18); the responses to green words were longer than responses to red and blue words. Most importantly, the interaction between moral types and colours reached significance (F (2, 48) = 7.33, p < 0.01, η p 2 = 0.23). A post-hoc analysis showed that in conditions in which the words were green, the reaction time to immoral words was significantly longer than it was to moral words. However, this difference was not significant in conditions in which the words were red or blue. The positive and negative affect scale (PANAS) result suggested that the evaluation of emotion before and after the Moral Stroop task did not differ (see Table 2).
Latency of N300. The ANOVA for the latencies of N300 revealed a significant main effect of the areas (F (4, 96) = 11.75, p < 0.001, η p 2 = 0.33), with the shortest N300 latency measured in the parietal area (295 ± 21 ms). The main effect of colours was significant (F (2, 48) = 3.72, p = 0.031, η p 2 = 0.13). The N300 latency of the red words (299 ± 20 ms) was shorter than that of the green words (305 ± 20 ms). Most importantly, the interaction of the areas, colours and moral types was significant (F (8, 192) = 2.88, p = 0.005, η p 2 = 0.11). The post-hoc tests showed that in the centro-parietal area, the N300 latencies of immoral words were longer than those of moral words only when the words were green.
Amplitude of LPC. The ANOVA for amplitudes of LPC revealed significant main effects of areas (F (4, 96) = 20.45, p < 0.001, η p 2 = 0.30), indicating that the largest LPC was measured in the central area (10.93 ± 2.91 μ V). The main effect of electrodes was significant (F (4, 96) = 26.91, p < 0.001, η p 2 = 0.53), indicating that the largest LPC was measured in the middle electrodes (11.03 ± 3.28 μ V). The interaction between areas and electrodes was significant (F (16, 384) = 18.20, p < 0.001, η p 2 = 0.43), with the largest LPC measured at the Cz (12.28 ± 3.97 μ V) site. The main effect of moral type was significant (F (1, 24) = 20.19, p < 0.001, η p 2 = 0.47), with immoral words (11.05 ± 3.05 μ V) eliciting larger LPC than moral words (9.39 ± 2.94 μ V). More importantly, the interaction between moral types and colours reached significance (F (2, 48) = 4.26, p = 0.02, η p 2 = 0.15). The post-hoc tests showed that the differences between immoral and moral words in green were significantly larger than the differences for words in red and blue. The interaction of moral types, colours and electrodes was also significant (F (8, 192) = 2.01, p = 0.047, η p 2 = 0.08). The post-hoc tests showed that immoral words elicited a larger LPC than  moral words did in green at all electrodes. However, at some electrodes (e.g., F1, C1, P1), the differences between immoral and moral words did not reach significance in red and blue.
Latency of LPC. The ANOVA for latencies of LPC revealed a significant main effect of areas (F (4, 96) = 13.74, p < 0.001, η p 2 = 0.36), with the shortest LPC latency measured in the parietal area (494 ± 24 ms). The interaction between areas and electrodes was significant (F (16, 384) = 1.75, p = 0.037, η p 2 = 0.07). The latencies of LPC at the parietal electrodes were the shortest (e.g., P4). The main effect of the moral types was significant (F (1, 24) = 5.83, p = 0.024, η p 2 = 0.20), and the LPC latencies of immoral words (512 ± 29 ms) were longer than those of moral words (499 ± 27 ms). The three-way interaction of moral types, areas and electrodes was significant (F (16, 384) = 2.16, p = 0.006, η p 2 = 0.08). The post-hoc tests showed that the LPC latencies of immoral words were longer than those of moral words at many electrodes in the central, centro-parietal and parietal areas (e.g., Cz, CPz, Pz, et al. ). Figure 1 shows the overview of grand-averaged ERP waveforms at example electrodes (Fz, Cz and Pz). To summarize the electrophysiological results, topographical maps displaying significant ERP differences between immoral and moral words when they were green, blue and red in the specific time windows of their ERP components are shown in Fig. 2.

Discussion
In the present study, we conducted an ERP experiment using the Moral Stroop task to explore the impact of colours on the processing of words related to morality. According to the predictions, the behavioural results revealed a significant effect on the reaction time to moral types; immoral words were recognized at a much slower rate than moral words were. More importantly, the significant interaction between moral types and colours showed that the differences in reaction time between moral and immoral words reached significance only when the words were green instead of other colours. These results demonstrated the different effects of colours on morality, especially the impact of green on moral judgement, despite the fact that the stimulus colour was completely irrelevant to the task. According to previous studies, green has a metaphorical meaning that is often peaceful and positive 43,44 . These meanings were contradicted in terms of the immoral concept, which, as a result, led to a slower reaction among participants in recognizing green immoral words compared with green moral words. However, the differences in the participants' reaction time between moral and immoral words did not reach significance when the words were red or blue. We suggest that this is because red and blue have multiple contradictory metaphorical representations. Red usually represents danger and errors. For example, warning signs are often red. However, in China, red is often associated with enthusiasm, happiness and love. One specific shade of red known as "Chinese red" is traditionally used for celebratory occasions. Similarly, we can link blue to depression and frustration. However, blue can also evoke feelings of peace and relaxation. These metaphorical representations of colours were consistent with previous findings. Zhang et al. conducted a Free Association Test to investigate Chinese people's associations with various colours. Their results showed that blue and red could be associated with both positive and negative conceptions, whereas green could be associated with positive conceptions 43 . Huang et al. also found that Chinese people have both positive and negative associations with red 44 . These interpretations of red and blue may eliminate some subjective judgements between moral and immoral concepts.
The second aim of this study was to investigate the temporal course of the brain activation of moral judgement under the impact of different colours. From 200 ms to 1600 ms post-stimulus, three ERP components were found, and the effects of these components suggested three processing stages during moral judgement.
In the first stage, a positive component peak approximately 200 ms post-stimulus showed the main effects of moral types and colours. Consistent with our expectation, compared to moral words, immoral words elicited a

Figure 2. Topographical scalp distribution of the difference waves computed by subtracting the moral words from the immoral words in the time windows of 160-260 ms after the stimulus onset (top), 250-350 ms after the stimulus onset (middle) and 450-600 ms after the stimulus onset (bottom).
Note that the greater positivity displayed in the time window of 450-600 ms reflecting the difference between immoral words and moral words is much higher in the green tests than in the red and blue tests. larger and faster P200. These findings were consistent with previous studies of ERP on morality. For instance, Leuthold et al. reported a larger P200 for a word that involved a socio-normative violation 36 . Wang et al. found that impersonal dilemmas elicited larger P200 than personal dilemmas did in moral judgement 32 . Previous studies have shown that the P200 component is related to attention 34,35 , which is especially sensitive to negative visual stimulus 37,38 . The larger and faster P200 elicited by immoral words in this study suggested that compared to moral information, immoral information is noticed much more quickly and may attract more attention in this early stage. Another effect of P200 was that the amplitude of green words was larger than that of blue and red words. Previous studies have investigated people's ocular sensitivity to different colours and found that human eyes are most sensitive to green and least sensitive to blue 45,46 . According to these findings, we suggest that the sensitivity of human eyes to the colour green makes people focus more on green words than on blue and red words, as revealed by the larger P200. The main effects of moral types and colours on P200 indicate that during this early stage, the human brain processes colours and morality information in parallel.
In the continuation of the process, in the second stage approximately 300 ms post-stimulus, the amplitudes of N300 demonstrated a significant main effect of colours, with red and blue words eliciting more negative N300 than green words. In a review of previous studies, N300 has been associated with perceptual as well as semantic processing 47,48 . Lu et al. conducted an ERP study to investigate the role of colour knowledge in the perceptual process 34 . They found that incongruently coloured pictures (e.g., a blue apple) were associated with more negative N300 than congruently coloured pictures were (e.g., red apple) and suggested that N300 may be related to the integration of perceptual information (e.g., colour and shape) with semantic processing. Similarly, Bramão et al. found that atypically coloured objects can elicit more negative N300 compared to typically coloured objects 40 . In this study, the colours red and blue both had two metaphorical meanings. The larger N300 under these two conditions showed the complex processing of integrating the colours' metaphorical meanings with the words' moral semantic information. For the latency of N300, the interaction of areas, moral types and colours was significant. Specifically, in the centro-parietal area, the N300 latencies of immoral words were longer than those of moral words only in the green condition. This effect was consistent with our expectation and suggested that it took longer to integrate the metaphorical meaning of the colour green with immoral information compared to moral information.
In the third stage, during the late time window, the LPC showed a significant main effect of moral types on amplitude and latency. Immoral words elicited larger and longer LPCs than moral words did. The differences of the LPC between moral and immoral stimuli have been demonstrated in previous studies. Van et al. found that value-inconsistent words elicited larger LPCs than value-consistent words did and suggested that strongly disagreeable statements automatically require additional processing resources 49 , just as negatively valenced words and pictures do 39,50,51 . Recently, Hundrieser and Stahl found a higher amplitude of LPC for value-incongruent words compared to value-congruent words in moral sentences with extreme topics and suggested that the LPC is rather sensitive to emotionally intense or arousing events 52 . Leuthold et al. also reported a larger LPC that was elicited by morally unacceptable words compared to acceptable words and used this effect to reflect on the implicit evaluation processing in moral judgement 36 . Based on these findings, the larger and longer LPCs of immoral words indicate that people require more time and distribute more mental resources to manage immoral stimuli compared to moral stimuli. Another important finding regarding LPCs was the interaction effect between moral types and colours. The amplitude differences between immoral and moral words were significantly larger in green than in red and blue. This ERP effect was consistent with the behavioural findings that the reaction time differences between moral and immoral words only reached significance when the words were green rather than other colours. Previous studies have suggested that LPC is related to mental resource distribution 41 and have considered it a representation of slow but controlled and elaborative processes, such as moral evaluation and reasoning 31,36,42 . The larger LPC differences in green words in the present study showed that the conflict between immoral words and the metaphorical meanings of the colour green might lead people to devote more cognitive resources to completing moral evaluations and reasoning processes in late time windows.
In summary, this study investigated the impact of colours on words related to morality. Specifically, the behavioural results revealed the special effect of the colour green on moral judgement and demonstrated consistency between the metaphor of green and morality and inconsistency between the metaphor of green and immorality. This finding is a good complement to research on the association between morality and metaphors related to colours, which has expanded our understanding of how perception influences moral processing. Our results were consistent with the study by Lakoff and Johnson, who suggested that this concept was based on physical metaphors 53 . Colours, as one type of physical information, relate to different metaphors. When people process the conception of morality, these various metaphors provide a link between colours and morality. When the association is consistent, the reaction is much easier and faster; when it is not, the reaction is much more difficult and slower. The ERP results further revealed the time course and neural mechanism behind this association. To the best of our knowledge, this is the first study to use ERPs to assess the temporal dynamics of the impact of colour on moral judgement. Three stages may be involved in moral judgement in the Stroop task. The first stage involves early attention attribution to morality and colours. The main effects of colours and moral types on the amplitudes of P200 suggested that during this stage, the processing of basic visual features and the processing of social information may be parallel. The second stage involves the integration of perceptual information with semantic processing, which was indexed by the interaction of colours and moral type on N300. The longer N300 latency of immoral words compared to moral words in the green condition revealed the longer time interval that people require to integrate the conflicting meanings between green's metaphorical and immoral information. This interaction lasted until the third stage, which involves elaborative evaluation and reasoning processing. The larger LPC differences in green words compared to red and blue words showed that a large amount of cognitive resources was devoted to resolving the conflict between green's metaphorical association and immoral words during the moral evaluation and reasoning processing stage. The ERP effects revealed the impact of colours on morality, from attention, visual features and semantic processing to advanced moral evaluation and reasoning.
There are some limitations of this study. First, the present study demonstrates only the impact of colours on the moral judgement of words. Future studies should investigate whether these impacts exist in more complex situations, such as moral dilemmas or jury judgements. Second, previous studies have demonstrated the influence of colour on emotion 5 . Emotion plays an important role in moral judgement 29 . Is the impact of colour on morality related to the effect of emotion? Since we did not find significant differences in the PANAS evaluations before and after moral judgement in the present study, the possible role of emotion in the effect of colours on morality is unknown and should be explored in future studies. Finally, this study investigated only three types of colours. The impact of other colours, such as gold and yellow, remains unexplored. We manipulated only the hue as a variable in the study using brightness and saturation, which may influence moral judgement [54][55][56] . This issue will be explored in future studies.

Materials and Methods
Participants. After excluding 2 participants due to the lack of efficient trials, a sample of 25 participants (13 males, mean age = 20.5 years, SD = 2.5) with normal or corrected-to-normal visual acuity and no history of neurological diseases or colour blindness remained for the data analysis. All participants were right-handed, and informed written consent was obtained from each participant. All experimental methods were conducted in accordance with the approved guidelines regarding all relevant aspects, including recruitment, experimental process information, compensation and debriefing of participants. Ethical approval for the study was obtained from the Research Ethics Committee of the Zhejiang Sci-Tech University, where this experiment was conducted.

Materials.
We selected two categories of Chinese words, immoral words (e.g., evil, violence) and moral words (e.g., kind, honest). Each category consisted of 60 words. All of these words were high frequency words according to the "Frequency dictionary of the Chinese Language" (1986). Twenty participants (9 males, mean age = 22.5 years, SD = 1.3) who were not part of the ERP experiment were asked to judge the morality of these words on a scale from 1 (extremely immoral) to 7 (extremely moral) before the experiment. The independent-sample t-tests showed that the immoral words (M = 2.07, SD = 0.96) were rated at a significantly higher level (immoral) than the moral words were (M = 6.12, SD = 0.94), t = 85.45, p < 0.001. We also conducted independent-sample t-tests to compare the familiarity of immoral words and moral words on a scale from 1 (extremely unfamiliar) to 5 (extremely familiar). The findings showed that there was no significant difference between immoral words (M The word colours in each trial were selected from a colour spectrum based on the HSB colour model 55 . The HSB allows users to choose a colour with a hue from 0° to 360° and with saturation and brightness values from 0% to 100%. All the colours we chose were fully saturated (100% saturation) and extremely bright (100% brightness). The angle for the hue was defined as the angle relative to pure red on the colour circle. The hues of the colours we chose were red (0°), green (135°), and blue (225°), based on previous studies 55,56 . Each word appeared three times in each of the colours (red, green and blue). The word (the horizontal visual angle was 1.56°, and the vertical visual angle was 0.78°) was located at the centre of the computer screen with a white background.
Procedure. The experiments were conducted in a dimly lit, soundproofed room. The participants were seated on a comfortable chair with their eyes approximately 90 cm away from a 17-inch computer screen. In each trial, after a fixed cross was presented on screen randomly from 300 ms to 700 ms, the target word was presented until a response was made or for a maximum of 2000 ms. The participants were instructed to indicate whether the word was related to morality or immorality by pressing the "F" or "J" key as quickly and as accurately as possible. The key assignment was counterbalanced across the participants. The inter-trial interval was 1000 ms. The experiment included 360 trials, and the sequence of the trials was random. The participants were allowed to take a break after 120 trials. To monitor emotion, each participant was asked to complete the positive and negative affect scale (PANAS) 57,58 before and after the Stroop task. EEG Recording and Analysis. An electroencephalogram (EEG) recorded data from 64 scalp sites using tin electrodes mounted in an elastic cap (NeuroScan Inc.) with an online reference to the right mastoid. A horizontal electrooculogram (EOG) was recorded from the electrodes placed at the outer canthi of both eyes. A vertical EOG was recorded from electrodes placed above and below the left eye. All interelectrode impedances were maintained under 5 KΩ. The EEG and EOG signals were amplified with a 0.05-100 Hz bandpass filter and were continuously sampled at 500 Hz/channel.
During the offline analysis, the EEG was re-referenced to the average of the right and left mastoids. The ocular artefacts were removed from the EEG signal using a regression procedure implemented in the Neuroscan software 59 . The EEG was averaged in 1600 ms epochs (200-ms baseline) that were time-locked to the presentation of the target word. These averages were digitally filtered with a 30 Hz low-pass filter and were baseline corrected by subtracting the average activity of that channel during the baseline period from each sample. Any trials in which the EEG voltages exceeded a threshold of ± 80 μ V during the recording epoch were excluded from the analysis.
Scientific RepoRts | 6:38373 | DOI: 10.1038/srep38373 A four-way repeated measure analysis of variance (ANOVA) was conducted using the following variables as within-subject factors: moral types (two levels: moral, immoral), colours (three levels: red, green, blue), areas (five levels: frontal, fronto-central, central, centro-parietal, and parietal) and electrodes (five levels: left, left-middle, middle, right-middle, and right electrodes). The Greenhouse-Geisser correction was adopted when the spherical assumption was violated. Post-hoc testing of significant main effects and interactions was conducted using the least significant difference (LSD) method.