A-Situ: a computational framework for affective labeling from psychological behaviors in real-life situations

This paper presents a computational framework for providing affective labels to real-life situations, called A-Situ. We first define an affective situation, as a specific arrangement of affective entities relevant to emotion elicitation in a situation. Then, the affective situation is represented as a set of labels in the valence-arousal emotion space. Based on psychological behaviors in response to a situation, the proposed framework quantifies the expected emotion evoked by the interaction with a stimulus event. The accumulated result in a spatiotemporal situation is represented as a polynomial curve called the affective curve, which bridges the semantic gap between cognitive and affective perception in real-world situations. We show the efficacy of the curve for reliable emotion labeling in real-world experiments, respectively concerning (1) a comparison between the results from our system and existing explicit assessments for measuring emotion, (2) physiological distinctiveness in emotional states, and (3) physiological characteristics correlated to continuous labels. The efficiency of affective curves to discriminate emotional states is evaluated through subject-dependent classification performance using bicoherence features to represent discrete affective states in the valence-arousal space. Furthermore, electroencephalography-based statistical analysis revealed the physiological correlates of the affective curves.


Affective Situation Dataset Participant Self-Assessment (SAM-rated Situation Dataset)
Participants were asked to rate their feelings spontaneously each day if they had encountered any situation where a certain visual content elicited a specific feeling. In our work, visual content is gathered by the proposed wearable device-book, coffee cup, media device including cell phone, research paper, or monitor. In addition, every five days, we manually retrieved unrated situations containing the visual content which the participant had chosen previously as emotional stimuli and asked the users to rate their feelings in the situation if they could recall them.

Affective Situation Labeling System Parameter Settings and Label Comparision
To model affective situations and represent them as affective curves, the parameters introduced in Section 3 for each participant were set through a five-fold cross-validation scheme, except λ 2 and λ 3 . These parameters were set to 2 and -1, indicating that the contentment component has a minimum value of −1/l max at the first frame in a situation. m max , l max , r max , and o max are determined by the maximum values during every five days. Figure 1 shows the results from the five-fold cross-validation in terms of valence and arousal rating distances between the SAM and A-Situ with respect to different parameters α 1 , λ 1 , and ν 1 . Because an affective curve represents affective dynamics in a spatiotemporal situation and consists of multiple pairs of affective labels, it is difficult to directly compare them with a pair of SAM ratings, in which the valence and the arousal ratings are a pair of discrete values for representing an emotion in the same situation. For comparison, we calculate root-mean-squared-errors (RMSEs) of a pair of affective labels scaled 0 to 6 over all participants, as follows: where N s is the numbers of situations in S ζ ,ŷ i is the mean value of pairs of the predicted affective labels (V , A ), and y i is a pair of ground truth labels in situation i ∈ S ζ . Note that 0 represents negative, 3 neutral, and 6 positive valence ratings; and 0 represents neutral, 3 represents low, and 6 represents high arousal ratings. While α w and λ w (w = 1, 2) determine the arousal value A (S i t ), the valence value V (S i t ) is determined by the parameters λ w and ν w . The parameter λ 1 determines when the contentment component l(S i t ) becomes zero; this directly affects the Figure 2. Mean RMSEs of all SAM-rated situations on dataset S ζ between predicted labels and ground truth labels, scaled 0 to 6 over valence (red) and arousal (blue) dimensions. Note that 0 represents negative, 3 neutral, and 6 positive valence ratings, while 0 represents neutral, 3 low, and 6 high arousal ratings.
sign of the valence value V (S i t ) and increment of the arousal value A (S i t ). With a smaller λ 1 value, the contentment component l(S i t ) becomes zero and the valence sign V (S i t ) becomes positive at an earlier t. As shown in Figure 1a, the highest performance on valence ratings was 0.5, and two other points, 0.25 and 0.75, were the next highest parameters for valence ratings. Conversely, the distances between arousal ratings are minimized after 0.5 for most participants, as shown in Figure 1b. The parameter α 1 and its counterpart α 2 (= 1 − α 1 ) determine the level of arousal in terms of the motion and contentment components. The results reveal that the proposed system has the best performance for arousal ratings at between 0.7 and 0.8 for most participants (see Figure 1c) with respect to α 1 . Distances beyond this range remain almost the same when α 1 is greater than 0.85 and less than 0.4. The parameter ν 1 and its counterpart ν 2 (= 1 − ν 1 ) determine the level of valence. With a large value of ν 1 , the resulting valence may not properly represent the emotion associated with approach-withdrawal behaviors. As shown in Figure 1d, the performance of our proposed system fluctuates significantly when the parameter ν 1 is varied. We set α 1 and ν 1 to be 0.75 and 0.5 for all participants and λ 1 was chosen from 0.25, 0.5, and 0.75 to yield the minimum distance for each individual in each of the following experiments. Figure 2 shows the mean RMSEs of all SAM-rated situations on dataset S ζ between labels predicted by the proposed system and ground truth labels as rated by the SAM. Note that 0 represents negative, 3 neutral, and 6 positive valence ratings, while 0 represents neutral, 3 low, and 6 high arousal ratings. The mean accuracies for valence and arousal ratings between the two labels were respectively 2.42 (±0.59) and 2.27 (±0.7), equivalent to 59.7% and 62.2% in terms of normalized RMSE. In both cases, neutral ratings had smallest errors; while higher ratings on arousal had more similarity, negative ratings on valence had larger errors with larger standard deviations. We should affirm that A-Situ does not aim to evaluate affective labels at the same precision level as the SAM ratings; instead, the primary purpose of the proposed system is to provide reliable emotion labels associated with physiological characteristics derived from psychological behaviors.

Motivation Component of Emotional Approach-Withdrawal Behaviors
To compute the divergence, we first compute the flow using multi-scale block-based matching between adjacent frames. Then, the flow is standardized as six primitive optical flow patterns 1 : 1) rotation around a vertical axis; 2) rotation around a horizontal axis; 3) approach toward an object; 4) rotation around the optical axis of the image plane; and 5) and 6) complex hyperbolic flows. Given the motion vector field, the velocity e p at the pixel p(x, y) can be represented as where e p 0 = (u 1 , u 2 ) T is the velocity at pixel p 0 andχ is the matrix defined as Thenχ can be decomposed tō where the four matrices are defined as follows: where d 1 = χ 1 + χ 4 and d 2 = χ 2 + χ 3 refer to divergent and rotating optical flows, respectively. h 1 = χ 1 − χ 4 and h 2 = χ 2 + χ 3 refer to different types of hyperbolic optical flows. The velocity of the motion vector field can be approximately characterized by six parameters: Given an optical flow field of the attentive object, we estimate the parameter vector using (4) and the least squared error method. The parameter u 1 is associated with right and left rotations, u 2 is associated with heading up and down, d 1 is associated with approaching the object, and the last three parameters indicate combined motion. Using the six parameters, we compute the motivation component o(S i t ) at time t as follows: (d 1(x,y) /(d 2(x,y) + h 1(x,y) + h 2(x,y) ))), where X and Y denote the width and height of the optical flow field of the attentive object. The motivation component o(S i t ) in (6) increases when approaching an object; otherwise, it remains near zero. For instance, as shown in Figure 3a, raising a hand while interacting with a mobile phone has only positive effects on the values for the component, but laying down decreases values in the situation. Figure 3b shows minimal values reflecting non-emotional behaviors.

Wearable Device Configuration
We designed a simple, easily wearable device such that users could act freely in everyday situations while the device simultaneously, correctly collects their emotions. Since human affect is sophisticated and subtle, it is vulnerable to personal, social, and contextual attributes. The noticeability and visibility of wearable devices could elicit unnecessary and irrelevant emotions, while recording of human affect should be unobtrusive when measured in the natural environment. To design an unobtrusive device, we imitated the design of existing easy-to-use wireless headsets. We note that the term "unobtrusive device" means that it is not easily noticed or does not draw attention to itself; it does not imply that our device aims to be small or concealable. This easy-to-use device provides comfort and performance to users during long-term activities.
Our device consists of multimodal sensors to capture various affects surrounding daily life as follows: • Frontal Camera for Collecting Visual Content: Visual information has been widely used to detect situations faced by an experimental participant. Analysis of scenes and activities in camera images has provided understanding of this contextual information. Hence, in our system, a small frontal viewing camera with a 30 fps sampling rate was used to record the images.
• Small Physiological Sensor to Capture Human Affect: Patterns of physiological changes have been increasingly analyzed in the context of affect recognition. To evaluate the reliability of the affective labels predicted by our system, we analyzed 4/6 the distinctiveness of physiological signals as categorized by the predicted labels. We used a two-channel EEG sensor for the left and right hemispheres, with sampling rates of 250 Hz using OpenBCI, a tool that has been applied successfully in several works 2, 3 .

Analysis of Physiological Characteristics
We have shown the efficacy of affective curves as reliable emotion labels. However, this finding is limited to the clarification of affective labels reflecting physiological characteristics; that is, it only shows the distinctiveness of EEG signals associated with the labels, which were produced based on psychological measurements. Therefore, we investigated the statistical relationship between EEG spectral power in the four frequency bands from two electrodes (F3, F4) and the affective labels for bridging the gap between psychological measurements and physiological evidence.  Table 1 indicates that the alpha frequency components have higher correlations to both ratings than the other frequencies. Since our dataset contains participants' hand movements, the motor cortex activation related to the movements can be correlated with either valence or arousal ratings, because Mu rhythms (8-11 Hz) activated by movements in the motor cortex area have strong associations with the alpha frequency components. However, this does not imply psychological measurements in hand movements are uncorrelated with physiological changes. To determine if the alpha frequency band is contaminated by hand movements or correlated with psychological measurements in action, we computed correlation coefficients between the power of the four frequency bands and the three components. Table 2 shows the correlation between physiological brain activity and the affective states, for each component. As we described in Section 3, the motion and motivation components may potentially be influenced by the movements; however, only the motivation component reflects a subject's approach behaviors in psychology and quantifies them in valence ratings. Together with the coefficient values in the alpha frequency band associated with the motion and motivation components, the association between physiological changes in the alpha frequency band and psychological measurements from the movements cannot be a result of motor cortex activation, since Mu rhythms (8-11 Hz) are related to the activation and were only positively associated with the motivation component, and not with the motion component.
The characteristics of the brain signals under different affective labels were investigated and analyzed by the above correlation. EEG-based statistical analysis revealed that physiological responses correlate to continuous affective labels.