Effects of subconscious and conscious emotions on human cue–reward association learning

Life demands that we adapt our behaviour continuously in situations in which much of our incoming information is emotional and unrelated to our immediate behavioural goals. Such information is often processed without our consciousness. This poses an intriguing question of whether subconscious exposure to irrelevant emotional information (e.g. the surrounding social atmosphere) affects the way we learn. Here, we addressed this issue by examining whether the learning of cue-reward associations changes when an emotional facial expression is shown subconsciously or consciously prior to the presentation of a reward-predicting cue. We found that both subconscious (0.027 s and 0.033 s) and conscious (0.047 s) emotional signals increased the rate of learning, and this increase was smallest at the border of conscious duration (0.040 s). These data suggest not only that the subconscious and conscious processing of emotional signals enhances value-updating in cue–reward association learning, but also that the computational processes underlying the subconscious enhancement is at least partially dissociable from its conscious counterpart.

Life demands that we adapt our behaviour continuously in situations in which much of our incoming information is emotional and unrelated to our immediate behavioural goals. Such information is often processed without our consciousness. This poses an intriguing question of whether subconscious exposure to irrelevant emotional information (e.g. the surrounding social atmosphere) affects the way we learn. Here, we addressed this issue by examining whether the learning of cue-reward associations changes when an emotional facial expression is shown subconsciously or consciously prior to the presentation of a reward-predicting cue. We found that both subconscious (0.027 s and 0.033 s) and conscious (0.047 s) emotional signals increased the rate of learning, and this increase was smallest at the border of conscious duration (0.040 s). These data suggest not only that the subconscious and conscious processing of emotional signals enhances value-updating in cue-reward association learning, but also that the computational processes underlying the subconscious enhancement is at least partially dissociable from its conscious counterpart.
T o achieve our behavioural goals, we must continuously adapt our behaviour and learn from changing circumstances. However, the great majority of incoming signals in real-life social situations is irrelevant to our immediate goals, and may be processed unconsciously in many situations. An intriguing question is whether such irrelevant and subconsciously received information can affect behavioural adaptation.
Many studies report that emotional information not necessary for achieving an immediate task goal can affect aspects of human behaviour including decision making 1 , clarity of memory 2 , and learning rates during cuereward association learning 3 , and that this is true even when the people are aware that the information is irrelevant to achieving the task goal. For instance, in a cue-reward association-learning study, presentation of a taskindependent fearful face just before the reward-predicting cue accelerated the learning rates compared with presentation of a neutral face; an enhancement effect that was not found in a similarly designed short-term memory task 3 . However, all of these experiments employed an emotional signal that subjects could consciously perceive, and did not account for incoming information that is processed subconsciously (e.g. the surrounding social atmosphere such as feelings of tension in a classroom). Although shorter duration of stimulus presentation generally induces smaller behavioural effects and neuronal responses, some studies report that subconscious presentation of information or subconscious thought results in larger effects than does conscious counterpart 4-7 , and can affect human behaviour in daily life 8,9 . Therefore, it is important to clarify whether and how subconscious emotional information influences human learning.
Here, we performed a computational model-based analysis of behaviour to examine how learning of a probabilistic cue-reward association is affected when emotional facial expressions are shown subconsciously or consciously before presentation of the reward-predicting cue. We have previously found that learning was enhanced when the duration of face presentation was long (1.0 s) 3 and thus focus here on how learning is affected by a duration (0.027-0.047 s) that yields less recognisable faces.

Results
Facial Discrimination task. Before the main learning task, we conducted a discrimination task (n 5 91) to estimate duration thresholds for conscious discrimination of facial expressions that were based on objective (correct rate) and subjective (confidence scoring) measures (Figure 1a). We regarded a presentation as 'conscious' if it was delivered above both subjective and objective thresholds, and as 'subliminal' if it was lower than both thresholds. We define 'subconscious' presentation as being at a duration between subliminal and conscious presentations.
We conducted a series of t-tests to determine the threshold duration. Analysis showed that performance accuracy (the correct rate, [CR]) at a duration of 0.040 s was higher than at 0.033 s (paired t-test, t ( Additionally, although the CRs in 0.020 and 0.033 s were not different from chance level (paired t-test, 0.020 s: t (90) 5 0.156, p < 1.000; 0.033 s: t (90) 5 2.250, p 5 0.405 with BC), CR in 0.027 s was slightly and significantly higher than the chance level (paired t-test, t (90) 5 3.551, p 5 0.015 with BC) (Figure 1b red).
Consistent with the CR analysis, the subjective confidence score index (CSI) showed that participants discriminated facial expressions when they were presented for longer than 0.040 s significantly better than at shorter durations (0.033 s vs. 0.040 s: paired t-test, t (90) 5 217.033, p , 0.0001 with BC) (Figure 1b black). While CSI comparisons did not differ significantly between 0.027 s and 0.033 s (t (90) 5 22.347, p 5 0.211 with BC) or between 0.040 s and 0.047 s (t (90) 5 22.373, p 5 0.199 with BC), they did differ significantly between 0.020 s and 0.027 s (t (90) 5 219.632, p , 0.001 with BC).
These results showed that participants correctly discriminated facial expressions with high confidence at presentation durations of 0.040 s and 0.047 s, which thus represents conscious presentations as we defined them. In contrast, it was impossible to discriminate facial expressions either objectively or subjectively when faces were presented for only 0.020 s. The other two durations (0.027 s and 0.033 s) represent subconscious presentation because participants showed similar confidence levels in correct and error trials with better-than-random CR at 0.027-s durations, while at 0.033-s durations they could not discriminate faces objectively even with the high CSI in the correct trials. Based on these observations, we used 0.027 s or 0.033 s for the subconscious condition and 0.040 s or 0.047 s for the conscious condition in the learning task. This definition of the subconscious and conscious conditions is similar to that in other studies using facial expressions 10,11 . In the learning task, each participant was randomly assigned to one of these four durations. To rule out other possible factors affecting learning performance, we assessed several individual differences including age, sex, the time we conducted the experiment, and intelligence level. We did not find any factor that was biased among the groups (Table 1, see statistical analyses for sampling bias section).
Learning task. To examine the computational processes behind the interaction between reward learning and subconscious/conscious emotional processing (Figure 2a and 2b), we analysed behaviour using a reinforcement learning model. More specifically, we estimated the following four parameters. The learning rate (e) controls reward prediction error in each trial. The exploration parameter (b) controls how deterministically a value function leads to advantageous behaviour, and reward sensitivity (d) transforms the actual reward into a subjective reward, as the emotional stimulus can change subjective sensitivity to reward. The last parameter is a ¥100 choice bias (b), which is a value-independent bias for choosing the ¥100 option. This parameter represents the possibility that participants were biased to choose one of the two rewards depending on facial emotional expression, regardless of cue-reward associations. We estimated these parameters separately for fearful or neutral conditions (see Reinforcement learning model-based analysis). Before the detailed analysis, we quantified the appropriateness of our statistical models using Akaike information criteria (AIC) and Bayesian information criteria (BIC). As shown in Figure 2c, the ebb model, which includes learning rate, exploration parameter and ¥100 bias,  was selected by AIC, and the eb and ebb models were comparable using BIC (eb was slightly better). Model comparisons were highly consistent with our previous report 3 and we used the ebb model in subsequent analyses. Learning curves averaged separately for each cue, irrespective of face presentation duration (n 5 91), are shown in Figure 2d. Especially in the early stages, learning was faster for cues associated with fearful faces and the ¥100 reward than other cues (solid red line). To conduct a more quantitative analysis, we examined the effects of emotion (fear vs. neutral) on each parameter of the computational model (learning rates, ¥100 choice bias, and exploration). Consistent with our previous report with 1.0-s face presentations 3 , we found that the learning rate was higher in the fearful condition than in the neutral condition (t (90) 5 3.077, p 5 0.003) (Figure 2e, left). Additionally, ¥100 choice bias was negative in the fearful condition (t (90) 5 24.687, p , 0.001 with BC), and no difference was found in the exploration parameter (t (90) 5 21.552, p 5 0.124) (Figure 2e, middle and right). The only notable difference from our previous study was that ¥100 choice bias was negative in the neutral condition (t (90) 5 23.401, p 5 0.002 with BC) (Figure 2e, middle).
Having seen that emotional face presentation modulates the learning rates and ¥100 choice bias, we then investigated how subconscious presentation of emotional faces affects learning rates and the ¥100 choice bias. To achieve this, we separately computed learning rates for each presentation duration (0.027 s: n 5 20; 0.033 s: n 5 20; 0.040 s: n 5 31; 0.047 s: n 5 20) (Figure 3a). A two-way ANOVA (2 Emotions 3 4 Presentations) showed a significant main effect of emotion (F (1,87) 5 13.306, p , 0.001) and no main effect of presentation duration (F (3,87) 5 2.508, p 5 0.064). Importantly, the interaction between emotion and presentation duration was significant (F (3,87) 5 2.946, p 5 0.037), suggesting that the learning enhancement provided by the fearful faces may disappear at some durations. Therefore, we looked into the effects of emotion on the learning rate of each presentation duration.

Discussion
In this paper, we used a computational model-based behavioural analysis of probabilistic cue-reward association learning to determine whether subconscious and task-independent emotional signals affect learning. We found that the learning rate for cues paired with a fearful face was larger than for cues paired with neutral faces, and that this enhancement effect was significant when the face was presented subconsciously (durations of 0.027 s or 0.033 s) and consciously (0.047 s). However, this effect disappeared at 0.040 s. Furthermore, not only does the effect of emotional signals on learning rates vanish at the presentation duration of 0.040 s, but this duration also corresponds to the 70%-90% CR level, validating the discontinuity of the learning-enhancement effect. Because we did not observe this effect in the discrimination task or in the ¥100 choice bias, it is likely to be specific to the associative learning paradigm.
The discontinuity of the learning-rate enhancement effect might have been caused by some malfunction in our experimental devices for stimulus presentation. However, if this was the case, we would expect the same problem to have occurred for the objective CRs in the discrimination task. As we did not observe any significant performance trough in Figure 1b at 0.040 s, and because almost all  participants reported that they were conscious that two facial expressions were presented in the 0.040 s condition (post-experimental questionnaire), we can rule out the possibility of an experimental device-dependent problem. Another possibility is that the trough resulted from some sampling bias among different groups. However, we examined sex, age, experiment time, and academic scores (as shown in Table 1) and did not find any difference among the groups (see Statistical analyses for sampling bias).
One plausible explanation for the disappearance of the enhancement effect is that there are two pathways for emotional signal processing in the brain 12,13 . One system is the cortical pathway, which is routed through several visual stages such as the retina, lateral geniculate nucleus of the thalamus, primary visual area cortex, higherorder brain areas, and finally extending to the amygdala. This route of information processing results in precise perception in which we are conscious of presented stimuli. The other system is the subcortical pathway, which is routed through the retina, superior colliculus, pulvinar nucleus of the thalamus, and extends to the amygdala. Although information processing via this route is comparatively crude, it is thought to be an implicit system that works faster than the cortical pathway. Several behavioural and brain-imaging studies have shown that the subcortical pathway has sensitivity to the rapid presentation (faster than 0.033 s) of emotional facial expressions 11,14,15 . Therefore, the subconscious presentation (0.027 s and 0.033 s) of stimuli presented here could have driven the subcortical pathway, whereas the 0.047 s presentation drove the cortical pathway. These two systems could have different effects on reward-based learning systems that include the substantia nigra, ventral striatum, and amygdala as implicated in previous studies 3,11,[14][15][16][17] .
Similar discontinuity effects observed in behavioural responses to visual stimuli have also been reported as the 'performance-dip effect 5,6 , which is defined as the lowered accuracy in a main task when it is paired with the presentation of a para-threshold task-irrelevant stimulus. These experiments and our current observations are compatible in the sense that performance of the main task was affected when either a subconscious or clear task-irrelevant visual stimulus was presented. Importantly however, while previous experiments showed that the task-irrelevant stimuli reduced performance, our results showed the opposite effect: subconscious emotional signal enhanced learning.
One might wonder which enhances learning more, conscious or subconscious perception of emotional stimulus. Although the effects of subconscious stimulation tend to be weaker in general than conscious stimulation, some studies have reported that subconscious presentation of stimuli was more effective 4-6 . Here, we showed that enhancement by emotion perception was significant in both subconscious and conscious conditions, except when the stimulus duration was 0.040 s. However, as shown in Figure 3b, participants were most affected by the emotional signal when their accuracy was between 60% and 70%. This result seems to suggest that the learning-enhancement effect is strongest when the emotional signal is presented obscurely. Figure 3b also indicates that overly quick stimulus presentation (,60% CRs: mean CR 5 0.478 6 0.023) does not enhance learning rates. These results may indicate that there is an optimal range of presentation durations for emotional signals that yield subconscious enhancement of learning.
Finally, while the ¥100 choice bias (which was independent of learning) was also affected by presentation duration, no trough in the effect was observed. Although faces were unrelated to our main learning task, the subconsciously presented faces may have induced uncertainty 18 or anxiety concerning subjective perception, and the negative feeling may have led to negative choices (smaller reward). Such a transfer of the task-independent feeling to the main task could well be linked with Pavlovian Instrumental Transfer (PIT) 19,20 . The PIT is a phenomenon in which previously conditioned Pavlovian cues affect the subjective prediction and motivation in subsequent instrumental conditioning from the outset, despite no explicit association between the Pavlovian cue and the new learning 19,20 . In the current learning experiment, the subconscious presentation of facial expressions could have induced negative emotion, and this emotion then transferred the subsequent associative learning from the very first trial. Such a negative bias might have been quantified as the negative ¥100 choice bias.

Methods
Participants. Participants in this study were undergraduate and graduate students who did not declare any history of psychiatric or neurological disorders. All experiments were conducted according to the principles in the Declaration of Helsinki and were approved by the ethics committee of the National Institute of Information and Communications Technology. All 130 participants gave informed consent prior to the experiments. Thirty-nine people (30.0%) were unable to learn all four of the associations. Therefore, we analysed data from the remaining 91 participants (64 male; mean age 21.5 6 1.7 years).
Experimental design. Stimuli were presented via a Dell precision T7500 computer with a graphics accelerator (NVIDIA Quadro 4000) and 19 inch CRT display (SONY CPD-G420) to achieve 150 Hz refresh rates. Stimulus presentation and response acquisition were controlled using Psychtoolbox-3 software (www.psychtoolbox.org) with MATLAB. Stimuli were presented within an area subtending 4.49 3 6.16 degrees of visual angle.
Facial discrimination task. Prior to the learning task, all 91 participants performed the facial expression discrimination task (Figure 1a), which measured the presentation-time threshold for subconscious and conscious facial expression discrimination. We used 8 happy and 8 sad faces of the same 4 actors and 4 actresses including 6 Caucasoid, 1 Negroid, and 1 Mongoloid from the NimStim 21 collection that have high validity and reliability of expressions. Three masks (presented for 0.3 s each) and two emotional faces (displayed for 0.020 s, 0.027 s, 0.033 s, 0.040 s or 0.047 s) were presented alternately on a screen (see Figure 1a). To maximise the effects in the main learning task, we did not use fearful or neutral faces in this task. We reasoned that prior knowledge of the facial expressions might affect participant's behaviour in the main learning task. Additionally, repetitive presentation of the same emotional pictures could lead to reduced stimulus saliency.
Participants were required to discriminate the two expressions of an identical actor or actress by answering whether the first expression was the ''same'' as the second one within 3.0 s. Participants indicated their answers by pressing a button with the right index (same) or ring (different) fingers. Additionally, they were asked to indicate how confident they were in their answers (''low confidence'', ''medium confidence'', or ''high confidence'') with the right index, middle, and ring fingers, respectively. As we used two different pictures of an identical actor or actress with forward and backward masks for each trial, participants could not judge the difference of expressions based on outlines of faces or afterimages. This task included 80 trials (8 same and 8 different trials 35 presentation conditions in a pseudo-random order). As the participants were trained for several practice trials with another stimulus set (happy and sad faces), they executed this task flawlessly.
Learning task. For the main learning task, participants learned probabilistic associations (65% or 35%) between four visual cues and two rewards (¥100 or ¥1) through trial and error (Figure 2a). The design was similar to a previous experimental paradigm 3 except for the brief presentation of facial expressions. Each participant was randomly assigned to one of four face-presentation durations (0.027 s, 0.033 s, 0.040 s, or 0.047 s). We used a between-participants design for the four durations because of task difficulty and to avoid the effects of repetition, such as habituation to the task or meta-learning of task structure 22 .
Face stimuli were 20 fearful and 20 neutral faces of 10 actors and 10 actresses, including 10 Caucasoid, 7 Negroid, and 3 Mongoloid. Just before the visual cue (0.3 s), either a fearful or neutral face interleaved with four masks (0.3 s) was presented three times on a screen for an individually and randomly assigned duration in a pseudo-random order. Only one emotion was used within a given trial. Following the last face, one of the cues was presented, followed by a choice between ¥100 and ¥1. Participants then pressed a button within 1.5 s to indicate which of the two rewards they expected. The order of cue presentation and the assignment of the two buttons (left or right) with rewards were randomised across trials. After making their choice, the actual reward was shown in yellow letters for 1.0 s. Over time, participants could then learn the association between each cue and the corresponding reward. Before the experiments, we confirmed that the participants fully understood this task. They were instructed that the face and noise presentations would signal the appearance of a cue. No participants reported noticing any associations between particular facial expressions and the cues. The combinations of the four visual cues, facial expressions, and rewards were counterbalanced across participants (Figure 2b). The total number of trials was 320 (80 3 4 conditions).
Statistical analyses for perceptive discrimination task. The correct rates (CR) were calculated by dividing the sum of the hit rate and correct rejection rate by the number of trials (16) (Figure 1b red). We calculated the subjective confidence level for each judgment using the confidence score index (CSI). For this index, each raw rating www.nature.com/scientificreports SCIENTIFIC REPORTS | 5 : 8478 | DOI: 10.1038/srep08478 was 1, 2, or 3, representing ''low confidence'', ''medium confidence'', or ''high confidence'', respectively. This rating was independent of the correctness of the judgment, and was averaged for each duration (1 # CSI # 3) (Figure 1b black). Additionally, we sorted the CSI data based on the CR (Supplementary Fig. S1).
Statistical analyses for sampling bias. The learning task was conducted using a between-participants design for the four presentation durations to avoid fatigue, habituation, and meta-learning of task structure 22 . However, this might have induced sampling bias. We therefore examined four possible biases: age, sex, the time of day the experiment started, and the intelligence level based on university-department academic scores. The mean experimental start time was taken into account because experiments conducted in the early morning or late at night may be associated with different arousal levels, even though we reminded participants by email before participation to get enough sleep. Results are summarised in Table 1 and there was no bias in any of the four groups.
Reinforcement learning model-based analysis. To conduct a trial-based analysis of the learning process, we adopted a reinforcement learning model 3,23,24 . This model assumes that each participant assigns the value function Q t (s t , a t ) to action a t for the cue s t at time t. Learning increases the accuracy of value representation by updating the value in proportion to the reward prediction error (RPE) R t 2 Q t (s t , a t ), which is the difference between the expected and actual reward at time t (Equation 1): Our learning model contains four free parameters: a learning rate (e f ), reward sensitivity (d f ), value-independent bias for the choice of ¥100 (a t ), (b f (a t )), and an exploration parameter (b f ). The learning rate controls the effects of the RPE, and reward sensitivity transforms the actual reward (r t ) in yen into a subjective reward (R t ) for each participant (Equation 2): In relation to behavioural choice (Equation 3), the bias term represents a valueindependent bias or inclination towards the choice of ¥100, and the exploration parameter controls how deterministically the value function leads to an advantageous behaviour: We estimated each participant's free parameters (denoted as the vector h) from their trial-by-trial learning using the maximum likelihood-estimation method, which minimises the negative log-likelihood of the participant's behaviour (D), as shown in equations 4 and 5. This non-linear minimisation of equation 4 was conducted using the MATLAB function ''fmincon''.
min{log P Djh ð Þ ð4Þ P Djh ð Þ~P t P a t js t ð Þ ð5Þ The probability of choosing an action, a t (¥100 or ¥1), given a visual cue, s t , was computed based on equation 3. We evaluated the significance of each parameter using Akaike information criteria (AIC) and Bayesian information criteria (BIC) by comparing four models using the learning rate and exploration parameter (eb), eb with reward sensitivity (ebd), eb with ¥100 bias (ebb), and eb with both reward sensitivity and ¥100 bias (ebdb). We calculated these information criteria for each participant and compared the mean scores (n 5 91).