K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Park, Cheul Young; Cha, Narae; Kang, Soowon; Kim, Auk; Khandoker, Ahsan Habib; Hadjileontiadis, Leontios; Oh, Alice; Jeong, Yong; Lee, Uichin

doi:10.1038/s41597-020-00630-y

Download PDF

Data Descriptor
Open access
Published: 08 September 2020

K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations

Cheul Young Park ORCID: orcid.org/0000-0003-0414-272X¹,
Narae Cha¹,
Soowon Kang¹,
Auk Kim¹,
Ahsan Habib Khandoker²,
Leontios Hadjileontiadis^2,3,
Alice Oh⁴,
Yong Jeong ORCID: orcid.org/0000-0002-5907-3787⁵ &
…
Uichin Lee¹

Scientific Data volume 7, Article number: 293 (2020) Cite this article

18k Accesses
70 Citations
14 Altmetric
Metrics details

Subjects

Abstract

Recognizing emotions during social interactions has many potential applications with the popularization of low-cost mobile sensors, but a challenge remains with the lack of naturalistic affective interaction data. Most existing emotion datasets do not support studying idiosyncratic emotions arising in the wild as they were collected in constrained environments. Therefore, studying emotions in the context of social interactions requires a novel dataset, and K-EmoCon is such a multimodal dataset with comprehensive annotations of continuous emotions during naturalistic conversations. The dataset contains multimodal measurements, including audiovisual recordings, EEG, and peripheral physiological signals, acquired with off-the-shelf devices from 16 sessions of approximately 10-minute long paired debates on a social issue. Distinct from previous datasets, it includes emotion annotations from all three available perspectives: self, debate partner, and external observers. Raters annotated emotional displays at intervals of every 5 seconds while viewing the debate footage, in terms of arousal-valence and 18 additional categorical emotions. The resulting K-EmoCon is the first publicly available emotion dataset accommodating the multiperspective assessment of emotions during social interactions.

Measurement(s)	acceleration • cardiac output measurement • electrodermal activity measurement • Beats per Minute • RR interval • body temperature trait • electroencephalogram measurement • heart rate • emotion/affect behavior trait
Technology Type(s)	Accelerometer • photoplethysmography • sensor • skin temperature sensor • electroencephalography (EEG) • electrocardiography • Self-Report • visual observation method
Sample Characteristic - Organism	Homo sapiens

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12618797

A real-world dataset of group emotion experiences based on physiological data

Article Open access 23 January 2024

Patrícia Bota, Joana Brito, … Hugo Silva

A dataset of continuous affect annotations and physiological signals for emotion analysis

Article Open access 09 October 2019

Karan Sharma, Claudio Castellini, … Friedhelm Schwenker

Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables

Article Open access 07 April 2022

Stanisław Saganowski, Joanna Komoszyńska, … Przemysław Kazienko

Background & Summary

Emotion recognition research seeks to enable computers to identify emotions. It is a foundation for creating machines capable of understanding emotions, and possibly, even expressing one. Such a set of skills to recognize, understand, and express emotions form emotional intelligence^1,2. It is suggested that emotional intelligence is necessary for the navigation of oneself within a society, as it allows one to reason what is desirable and what is not, and to regulate behaviors of self and others accordingly^3,4.

Then why do machines need emotional skills? With advances in Machine Learning and Artificial Intelligence, the transition from human to machine is noticeable in all areas of the society, including those requiring expertise such as medical prognosis/diagnosis^5,6 or automobile driving⁷. It seems inevitable that these narrow AI systems⁸ supersede human experts in respective domains, as it has already been demonstrated with AlphaGo’s superior performance in the game of Go over human champions^9,10.

Not all AI will compete with humans, albeit their superhuman ability. Instead, many AI systems will work with us or for us. Emotional intelligence is critical for such human-computer interaction systems¹¹. Imagine a smart speaker that delightfully greets users when they come home. How should a speaker greet when a user had a rough day? A speaker neglectful of the user’s emotional states may aggravate the user, but a speaker aware of the user’s temper could remain silent to avoid the trouble. Similarly, emotional intelligence is critical for AI systems designed for complex tasks. For example, on roads where autonomous and human-driven vehicles mix, accurate recognition of human drivers’ emotions by autonomous vehicles would lead to more safety as autonomous vehicles can better judge human drivers’ intentions¹².

Now for machines to become emotionally intelligent, they must first learn to recognize emotions, and the prerequisite to learning is data. However, there lie several challenges in the acquisition of emotion data. While emotions are prevalent, their accurate measurement is difficult. Most commonly, emotions are viewed as psychological states expressed through faces, with distinct categories¹³, but research evidence claims the contrary. Rather than distinct, facial expressions are compound¹⁴, relative¹⁵, and misleading¹⁶. A recent review of scientific evidence also presses against the common view, suggesting that facial expressions lack reliability, specificity, and generalizability¹⁷, together with past studies on contextual dependency^18,19,20 and individual variability of emotions^21,22.

Such inherent elusiveness of emotion renders many existing emotion datasets inapplicable for studying emotions in the wild. The majority of emotion datasets consist of emotions induced with selected stimuli in a static environment, i.e., a laboratory^{23,24,25,26,27,28,29}. This method provides experimenters with full-control over data collection, allowing assessment of specific emotional behaviors^30,31 and acquiring fine-grained data with advanced techniques like neuroimaging. Nevertheless, lab-generated data may generalize poorly to realistic scenarios as they frequently contain intense expressions of prototypical emotions, which are rarely observed in the real world^32,33, acquired from only a subset of the population³⁴.

An alternative approach utilizes media contents^35,36,37,38 and crowdsourcing³⁹, compensating for the shortcomings of the conventional method. The abundance of contents available online, such as TV-shows and movies, allows researchers to glean rich emotion data representative of various contexts efficiently. Crowdsourcing further supports inexpensive data annotation while serving as another data source^40,41. Datasets of this type have advantages in sample size and the diversity of subjects, but generalizability remains an issue. Datasets based on media contents often contain emotional displays produced by trained actors supposing fictitious situations. To what extent such emotional portrayals resemble spontaneous emotional expressions is debatable^42,43,44. They also provide no access to physiological signals, which are known to carry information vital for detecting less visible changes in emotional states^{45,46,47,48,49,50}.

To amend this lack of a dataset for recognition of emotions in their natural forms, we introduce K-EmoCon, a multimodal dataset acquired from 32 subjects participating in 16 paired debates on a social issue. It consists of physiological sensor data collected with three off-the-shelf wearable devices, audiovisual footage of participants during the debate, and continuous emotion annotations. It contributes to the current literature of emotion recognition, as according to our knowledge, it is the first dataset with emotion annotations from all possible perspectives as the following: subject him/herself, debate partner, and external observers.

Methods

Dataset design

Intended usage

Inspired by previous works that set out to investigate emotions during conversations^38,51,52,53, K-EmoCon was designed in consideration of a social interaction scenario involving two people and wearable devices capable of unobtrusive tracking of physiological signals. The dataset aims to allow a multi-perspective analysis of emotions with the following objectives:

1.
Extend the research on how having multiple perspectives on emotional expressions may improve their automatic recognition.
2.
Provide a novel opportunity to investigate how emotions can be perceived differently from multiple perspectives, especially in the context of social interaction.

Previous research has shown that having multiple sources for emotion annotations can increase their recognition accuracy^54,55. However, no research in our awareness employs all three available perspectives in the annotation of emotions (i.e., subject him/herself, interacting partner, and external observers). Having multiple perspectives relates to the issue of establishing ground truth in emotion annotations. Emotions are inherently internal phenomena, and their mechanism is unavailable for external scrutiny, even for oneself who is experiencing emotions. As a result, there may not be a ground truth for emotions. Should we consider what is most agreed upon by external observers of emotions as the ground truth, or what the person who experiences emotions reports to have felt the ground truth⁵⁶? Two views are likely to match if emotions are intense and pure, but as discussed, such emotions are rare. Instead, self-reported and observed emotions are likely to disagree for a variety of reasons. People often conceal their true emotions; sometimes, people are not fully mindful of their internal states; and some have difficulties interpreting or articulating emotions^57,58.

With K-EmoCon, we intend to enable the comprehensive examination of such cases where perceptions of emotions do not match, by bringing all three available perspectives into the annotation of emotions, in the context of a social interaction involving three parties of:

1.
The subject – is the source who experiences emotions firsthand and produces self annotations, particularly the “felt sense”⁵⁵ of the emotions.
2.
The partner – is the person who interacts with the subject, experiencing the subject’s emotions secondhand; thus, he or she has a contextual knowledge of the interaction that induced the subject’s emotions and produces partner annotations based on that.
3.
The external observers – are people who observe the subject’s emotions without the exact contextual knowledge of the interaction that induced the emotions, producing external observer annotations.

Notice, that while our definition of perspectives involved in emotion annotation is similar to definitions previously used by other researchers (self-reported vs. perceived⁵⁵/observed⁵⁹), we further segment observer annotations based on whether the contextual information of the situation in which the emotion was generated is available to an observer, as we wish to consider the role of contextual knowledge in emotion perception and recognition.

Existing datasets of emotions in conversations provide a limited scope on this issue as they at most contain emotion annotations from subjects and external observers⁵¹, leaving out annotations from other people who engaged in the conversation (whom we call partners). Or, they either only consider a particular type of annotations that is sufficient to serve their research goal⁵³ or their designs do not allow acquiring multi-perspective annotations^38,52 (e.g., a dataset is constructed upon conversations from a TV-show, only allowing the collection of external observer annotations). Refer to Table 1 to see how K-EmoCon is distinguished from existing emotion datasets.

Table 1 Comparison of the K-EmoCon dataset with the existing multimodal emotion recognition datasets.

Full size table

Context of data collection

In this regard, we chose a semi-structured, turn-taking debate on a social issue with randomly assigned partners as the setting for data collection. This setting is appropriate for collecting emotions that may naturally arise in a day, as it is similar to a social interaction that one could engage in a workplace.

Also, the setting is particularly suitable for studying the misperception of emotions. It is sufficiently formal and spontaneous as it involves randomly assigned partners. We expect such formality and spontaneity of the setting compelled participants to regulate their emotions in a socially appropriate manner, allowing us to observe less pronounced emotions from participants, which were more likely to be misperceived by their partners⁶⁰.

Data collection apparatus

Our choice of mobile, wearable, and low-cost devices to collect affective physiological signals together with audiovisual recordings, while primarily aims to make findings based on our data more reproducible and expandable, was also in consideration of our goal of investigating mismatches in perceptions of emotions in the wild. Research has shown that fusing implicit and explicit affective information can result in more accurate recognition of subtle emotional expressions from professional actors⁶¹. However, no work we are aware of has shown that a similar result can be achieved for subtle emotions collected from in-the-wild social interactions of individuals without professional training in acting. Therefore, our dataset provides an opportunity to examine if emotions of lower intensity, produced from non-actors during communication, can be recognized accurately.

It is also interesting to examine whether subtle emotions could signal instances where emotions are misperceived during communication if their accurate detection is possible. In the same vein, to what extent the intensity of emotions influences their decoding accuracy during a social interaction, where a broader array of contextual information is present, is also worth exploring. K-EmoCon could enable an in-depth investigation of such issues.

Further, we considered the use case of mobile and wearable technologies for facilitating emotional communication. Researchers are actively exploring the potential for using expressive biosignals collected via wearables to communicate one’s emotional and psychological states with others^{62,63,64,65,66}. Our dataset can contribute to the research of biosignal-based assistive technologies to enable affective communication by providing insights on when are apposite moments for communicating emotions.

Ethics statement

The construction of the K-EmoCon dataset was approved by the Korea Advanced Institute of Science and Technology (KAIST) Institutional Review Board. KAIST IRB also reviewed and approved the consent form, which contained information on the following: the purpose of data collection, data collection procedure, types of data to be collected from participants, compensation to be provided for participation, and the protocol for the protection of privacy-sensitive data.

Participants were given the same consent forms upon arriving at the data collection site and were asked to provide written consent after fully reading the form indicating that they are willing to participate in data collection. Since K-EmoCon is to be open to public access, a separate consent was obtained for the disclosure of the data that contains personally identifiable information (PII), which is the audiovisual footage of participants during debates, including their faces and voices. Participants were also notified that their participation is voluntary, and they can terminate the data collection at any point. The resulting K-EmoCon dataset includes the audiovisual recordings of 21 participants, out of 32, who agreed to disclose their personal information, excluding the 11 who did not agree.

Participant recruitment and preparation

32 participants were recruited between January and March of 2019. An announcement calling for participation in an experiment on “emotion-sensing during a debate” was posted on an online bulletin board of a KAIST student community. The post stated that participants would have a debate on the issue of accepting Yemeni refugees on Jeju Island of South Korea for 10 minutes. It also stated that the debate must be in English, and participants should be capable of speaking competently in English, but not necessarily at the level of a native speaker. Specifically, participants were required to have at least three years of experience living in an English-speaking country, or have achieved a score above criteria in any one of standardized English speaking tests listed here: TOEIC speaking level 7, TOEFL speaking score 27, or IELTS speaking level 7.

Once participants were assigned a date and time to participate in data collection, they were provided four news articles on the topic of the Jeju Yemeni refugee crisis via email. The email included two articles with neutral views on the issue^67,68, one in favor of refugees⁶⁹, and one in opposition to refugees⁷⁰. We instructed the participants to read the articles beforehand to familiarize themselves with the debate topic.

All selected participants were students at KAIST, but their ages varied from 19 to 36 years old (mean = 23.8 years, stdev. = 3.3 years), as well as their gender and nationality. We randomly paired participants into 16 dyads based on their available times. See Table 2 for the breakdown of participants’ gender, nationality, and age.

Table 2 Participant pairs for debates.

Full size table

Data collection setup

All data collection sessions were conducted in two rooms with controlled temperature and illumination. Two participants sat across a table facing each other with a distance in between for a comfortable communication (see Fig. 1). Two Samsung Galaxy S7 smartphones mounted on tripods were placed in the middle of the table facing each participant, capturing facial expressions and movements in the upper body from the 2nd-person point of view (POV) along with the speech audio, via the camera app.

During a debate, participants wore a suite of wearable sensors, as shown in Fig. 2, which includes:

1.
Empatica E4 Wristband – captured photoplethysmography (PPG), 3-axis acceleration, body temperature, and electrodermal activity (EDA). Heart rate and the inter-beat interval (IBI) were derived from Blood Volume Pulse (BVP) measured by a PPG sensor.
2.
Polar H7 Bluetooth Heart Rate Sensor – detected heart rates using an electrocardiogram (ECG) sensor and was used to complement a PPG sensor in E4, which is susceptible to motion.
3.
NeuroSky MindWave Headset – collected electroencephalogram (EEG) signals via two dry sensor electrodes, one on the forehead (fp1 channel-10/20 system at the frontal lobe) and one on the left earlobe (reference).
4.
LookNTell Head-Mounted Camera – with a camera attached at one end of a plastic circlet, was worn on participants’ heads to capture videos from a first-person POV.

All listed devices can operate in a mobile setting. Empatica E4 keeps the data on the device, and the collected data is later uploaded to a computer. Polar H7 sensor and MindWave headset can communicate with a mobile phone via Bluetooth Low Energy (BLE) to store data. Table 3 summarizes sampling rates and signal ranges of data collected from each device.

Table 3 Data collected with each wearable device, with respective sampling rates and signal ranges.

Full size table

Data collection procedure

Administration

All data collection sessions were conducted in four stages of (1) onboarding, (2) baseline measurement, (3) debate, and (4) emotion annotation. Two experimenters administered each session (see Table 4 for the overview of a data collection procedure). One experimenter served as a moderator during debates, notifying participants of the remaining time and intervening under any necessary circumstances, such as when a debate gets too heated, or a participant exceeds an allotted time of 2 minutes in his or her turn.

Table 4 Steps for a data collection session, each session lasted approximately two hours.

Full size table

Onboarding

Upon their arrival, participants were each provided a consent form asking for two written consents, first for the participation in data collection that was mandatory, and second for the disclosure of privacy-sensitive data collected during the session, which participants could opt-out without any disadvantage.

Once they agreed to participate in the research, participants decided whether they would argue for or against admitting the Yemeni refugees in Jeju. Participants could either briefly discuss to settle on their preferred positions or toss a coin to decide at random. The same procedure was followed for deciding who goes first in the debate.

Next, participants were given up to 15 minutes to prepare their arguments. Each participant was given a pen, paper, and prints of the articles that they previously received via email. After they finished preparing, experimenters equipped participants with wearable devices. Participants wore E4 wristbands on their non-dominant hand, as arm movements may impede an accurate measurement of PPG. Experimenters assured that wristbands are tightly fastened, and electrodes are in good contact with participants’ skin. Experimenters also assured the EEG headsets and head-mounted cameras are well fitted on participants’ heads, and manually adjusted head-mounted cameras’ lens to make sure the captured views are similar to participants’ subjective views. Participants wore Polar H7 sensors attached to flexible bands underneath their clothes, so the electrodes are in contact with their skin and placed the sensors above their solar plexus.

Baseline measurement

With all devices equipped, sensor measurements were taken from participants while they watched a short clip. This step was to establish a baseline that constitutes a neutral state for each participant. Establishing a neutral baseline is commonly used in the construction of emotion datasets to account for individual biases and reduce the effect of previous emotional states, especially when repeated measurements are taken.

A procedure for a baseline measurement varies across researchers and is often dependent on the purpose of an experiment⁷¹. In stimuli-based experiments, researchers take measurements as their subjects watch a stimulus intended to induce a neutral emotional state^23,24 or measure resting-state activities between stimuli if they are taking multiple consecutive measurements²⁵. Similarly, for K-EmoCon, participants watched Color Bars clip, which was previously reported in the work of Gross et al. to induce a neutral emotion⁷². Experimenters also ensured that no devices were malfunctioning during the baseline measurement.

Debate

A debate began at the sign of the moderator and lasted approximately 10 minutes. Participants’ facial expressions, movements in their upper body, and speeches were recorded throughout a debate. Participants were allowed to speak consecutively up to two minutes during their turns, with turns alternating between two participants. However, participants were also notified that they could intervene during an opponent’s turn, to allow a more natural communication. The moderator notified participants 30 and 60 seconds before the end of their turns and intervened if they exceeded two minutes. A debate stopped at the ten-minute mark with some flexibility to allow the last speaker to finish his or her argument.

Emotion annotation

Participants took a 15-minute break upon finishing a debate. Participants then were each assigned to a PC and annotated their own emotions and their partner’s emotions during the debate. Specifically, each participant watched one audiovisual recording of him/herself and another recording of his/her partner (both recordings from 2nd-person POV, including facial expressions, upper body movements, and speeches), to annotate emotions at intervals of every 5 seconds from the beginning to the end of a debate. We chose 5 seconds based on the report of Busso et al. that the average duration of the speaker turns in IEMOCAP was about 4.5 s⁵¹, and findings from linguistics research also support this number^73,74,75.

This annotation method we employed, a retrospective affect judgment protocol, is widely used in affective computing to collect self-reports of emotions, especially in studies where an uninterrupted engagement of subjects during an emotion induction process is essential^76,77,78,79. Likewise, we opted for this method as participants’ natural interaction was necessary for acquiring quality emotion data.

Note that we did not provide 1st-person POV recordings captured from head-mounted cameras to participants, and they only had 2nd-person POV recordings to annotate felt emotions. One may have a reasonable concern regarding this choice, that participants watching their faces likely caused them to occupy a perspective similar to an observer. Hence, this might have resulted in an unnatural measurement of felt emotions. Indeed, the headcam footage could have been a more naturalistic instrument, as we intuitively take an embodied perspective to recall how we felt at a specific moment in the past.

However, we found the extent of information captured by the headcam footage insufficient for accurate annotation of felt emotions. Experimenters manually adjusted headcam lenses, so the recordings resembled participants’ subjective views, but the headcam footage was missing fine-grained information such as participants’ gazes. Also, past research on memories for emotions has shown that they are prone to biases and distortion^80,81,82. In that regard, it seemed headcam videos, which contain limited information compared to frontal face recordings, would only result in an incorrect annotation of felt emotions, especially in retrospect. Further, we noted that it is not uncommon for people to infer emotions from their faces, as they frequently do when looking in a mirror or taking a selfie.

As a result, participants were given 2nd-person recordings of themselves for the retrospective annotation of felt emotions. In total, participants annotated emotions with 20 unique categories, as shown in Table 5. Experimenters assisted participants throughout the annotation procedure. Before participants began annotating, experimenters explained individual emotion categories to participants, so they correctly understood a meaning and a specific annotation procedure for each item. Experimenters also explicitly instructed participants to report felt emotions, not perceived emotions on their faces. Lastly, experimenters ensured that the start time and end time for two participants matched to obtain synchronized annotations.

Table 5 Collected emotion annotations.

Full size table

External emotion annotation

Additionally, we recruited five external raters to annotate participants’ emotions during debates (see Table 6). We applied the same criteria we used for recruiting participants in data collection to recruit the raters. The raters were provided the 2nd-person POV recordings of participants during debates and annotated emotions following the same procedure our participants followed. External raters performed their tasks independently, and the experimenters communicated remotely with the raters. Once a rater finished annotating, an experimenter checked completed annotations for incorrect entries and requested a rater to review annotations if there were any missing values or misplaced entries.

Table 6 Gender and age of external raters.

Full size table

Data Records

Dataset summary

The resulting K-EmoCon dataset contains multimodal data from 16 paired-debates on a social issue, which sum to 172.92 minutes of dyadic interaction. It includes physiological signals measured with three wearable devices, audiovisual recordings of debates, and continuous annotations of emotions from three distinct perspectives of the subject, the partner, and the external observers. Table 7 summarizes data collection results and dataset contents.

Table 7 Summary of data collection results and the dataset.

Full size table

Preprocessing

For the time-wise synchronization across data, we converted all timestamps from Korea Standard Time (UTC +9) to UTC +0 and clipped raw data such that only parts of data corresponding to debates and baseline measurements are included. For debate audios and the footage, subclips corresponding to debates were extracted from the raw footage. Audio tracks containing participants’ speeches were copied and saved separately as WAV files. Physiological signals were clipped from the respective beginnings of data collection sessions to the respective ends of debates, as the initial 1.5 to 2 minutes immediately after a session begins corresponds to a baseline measurement for a neutral state. Parts in between baseline measurements and debates correspond to debate preparations, which may be excluded from the analysis. Note that we do not provide unedited audio/video recordings and raw log-level data, nor codes for preprocessing this data, as they contain privacy-sensitive information outside the boundary of information we have been permitted to share. See Code Availability section for further detail.

Dataset contents

The K-EmoCon dataset⁸³ is available upon request on Zenodo (https://doi.org/10.5281/zenodo.3931963). In the following, we describe directories and files in the dataset and their contents.

metadata.tar.gz

includes files with auxiliary information about the dataset. Included files are:

1.
subjects.csv – each row contains a participant ID (pid) and three timestamps in UTC +0. Three timestamps respectively mark the beginning of a data collection (initTime), the start of a debate (startTime), and the end of a debate (endTime).
2.
data_availability.csv – shows files available for each participant. For each participant (row), if a data file (column) is available, the corresponding cell is marked TRUE, otherwise FALSE.

data_quality_tables.tar.gz

includes seven CSV tables with information regarding the quality of physiological signals in the dataset. With participant IDs (pid) in rows and file types (ACC, BVP, EDA, HR, IBI, and TEMP for E4 data, and Attention, BrainWave, Meditation, and Polar_HR for NeuroSky + Polar H7 data) in columns, included files are as follows:

1.
e4_durations.csv – contains the duration of each file in seconds, where duration = (last timestamp - first timestamp) / 1000.
2.
neuro_polar_durations.csv – same as above.
3.
e4_zeros.csv – contains the number of zero values in each file. ACC and BVP were excluded as zero crossings are to be expected during their measurement.
4.
neuro_polar_zeros.csv – same as above. Note that zero values for NeuroSky data (Attention, BrainWave, Mediation) indicate the inability of a device at a given moment to obtain a sufficiently reliable measurement due to various reasons.
5.
e4_outliers.csv – contains the number of outliers in each file. Chauvenet’s criterion was used for outlier detection (refer to Code Availability section for its implementation in Python).
6.
e4_completeness.csv – contains the completeness of each file as a ratio in the range of [0.0, 1.0]. 1.0 indicates a file without any missing value or an outlier. The completeness ratio was calculated as completeness = (total number of values − (number of outliers + number of zeros))/total number of values.
7.
neuro_polar_completeness.csv – same as above, with completeness calculated as completeness = (total number of values − number of zeros)/total number of values.

debate_audios.tar.gz

contains 16 audio recordings of debates in the WAV file format. The name of each file follows the convention of p<X>.p<Y>.wav, where <X> and <Y> stand for IDs of two participants appearing in the audio. The start and the end of each recording correspond to startTime and endTime values in the subjects.csv file, respectively.

debate_recordings.tar.gz

contains 2nd-person POV video recordings of 21 participants during debates in the MP4 file format. The name of a file p<X>_<T>.mp4 indicates that the file is the recording of participant <X> that is <T> seconds long.

neurosky_polar_data.tar.gz

includes subdirectories for each participant, from P1 to P32, which may contain up to four files as the following:

1.
Attention.csv – contains eSense Attention ranging from 1 to 100, representing how attentive a user was at a given moment. Attention values can be interpreted as the following: 1 to 20 – “strongly lowered”, 20 to 40 – “reduced”, 40 to 60 – “neutral”, 60 to 80 – “slightly elevated”, and 80 to 100 – “elevated”. 0 indicates that the device was unable to calculate a sufficiently reliable value, possibly due to a signal contamination with noises.
2.
BrainWave.csv – records the relative power of brainwave in the following 8 bands of EEG: delta (0.5–2.75 Hz), theta (3.5–6.75 Hz), low-alpha (7.5–9.25 Hz), high-alpha (10–11.75 Hz), low-beta (13–16.75 Hz), high-beta (18–29.75 Hz), low-gamma (31–39.75 Hz), and middle-gamma (41–49.75 Hz). The values are without a unit and are only meant for inferring the fluctuation in the power of a certain band or comparing the relative strengths of bands with each other.
3.
Meditation.csv – contains eSense Meditation ranging from 0 to 100, measuring the relaxedness of a user. For their interpretation, use the same ranges and the meanings as those for the attention values.
4.
Polar_HR.csv - contains heart rates measured with ECG sensors during debates.

e4_data.tar.gz

contains subdirectories for each participant (except P2, P3, P6, and P7), which may contain up to six files as the following:

1.
E4_ACC.csv – measurements from a 3-axis accelerometer sampled at 32Hz in the range [−2g, 2g] under columns x, y, and z. Multiply raw numbers by 1/64 to convert them into units of g (i.e., a raw value of 64 is equivalent to 1g).
2.
E4_BVP.csv – PPG measurements sampled at 64Hz.
3.
E4_EDA.csv – EDA sensor readings in units of μS, sampled at 4Hz.
4.
E4_HR.csv – the average heart rates calculated in 10-second windows. The values are derived from the BVP measurements, and the values are entered at the frequency of 1Hz. The first 10 seconds of data after the beginning of a recording is not included as the derivation algorithm requires the initial 10 seconds of data to produce the first value.
5.
E4_IBI.csv – IBI measurements in milliseconds computed from the BVP. From a second row onwards, one row is separated from the previous row with an amount equal to a distance between two peaks (i.e., t_i+1 − t_i = IBI_i). Note that HR in terms of BPM can be derived from IBI by taking 60/IBI * 1000).
6.
E4_TEMP.csv – a body temperature measured in the Celsius scale at the frequency of 4Hz.

Note that E4 data entries for P29, P30, P31, and P32 are entered with each row designated with either one of two unique device_serial values. It is necessary that the dataset users only use rows corresponding to a single device_serial. We further recommend using rows with the following device_serial values:

P29, P31 – A013E1 for all files, except A01525 for IBI.
P30, P32 – A01A3A for all files.

emotion_annotations.tar.gz

includes four subdirectories as listed below, which each contain annotations for participant emotions during debates at intervals of every 5 seconds, acquired from three distinct perspectives:

1.
self_annotations – annotations of participant emotions by participants themselves.
2.
partner_annotations – annotations of participant emotions by respective debate partners.
3.
external_annotations – annotations of participant emotions by five external raters. Files follow the naming convention of P<X>.R<Z>.csv, where <X> is a participant ID, and <Z> is a rater number.
4.
aggregated_external_annotations – contains external rater annotations aggregated across five raters via majority voting. Refer to Code Availability section for the Python code implementing the majority vote aggregation.

The first row in a valid file has annotations for the first five seconds, and rows coming afterward contain annotations for the next consecutive five-second intervals, non-overlapping. Also, each row in a valid file contains 10 non-empty values (eight numeric values, including seconds column, and two x’s). Note that annotation files for a participant may not have an equal number of rows (e.g., there may be more self-annotations than partner/external annotations for some participants). In that case, longer files should be truncated from the start such that they have the same number of rows as shorter files since the extra annotations at the beginning are possibly from participants mistakenly annotating emotions during baseline measurements.

Technical Validation

Emotion annotations

Distribution and frequency of emotions

The distributions and the frequencies of emotion annotations are as shown in Fig. 3. Overall, annotations for emotions measured on Likert scales (arousal, valence, cheerful, happy, angry, nervous, and sad) are biased towards a neutral with only a minuscule fraction of annotations for non-neutral states. Categorical emotion annotations (common and less common BROMP affective categories) are similarly biased, with a predominant portion of annotations falling under only two categories of concentration and none. This imbalance in annotations is as expected as emotion data is commonly imbalanced by its nature in the wild (i.e., people are more often neutral than angry or sad)^84,85,86.

Inter-rater reliability

As individual-level information is missing in aggregated data, we used Krippendorff’s alpha⁸⁷, which is a generalized statistic of agreement applicable to any number of raters, to measure the inter-rater reliability (IRR) of emotion annotations from different perspectives for each participant. Figure 4 shows heatmaps of alpha coefficients computed for seven emotions measured on ordinal scales (arousal, valence, cheerful, happy, angry, nervous, and sad).

All annotation values were interpreted as rank-ordered (ordinal scaled) for the IRR computation. Likert scales we used are not intervals or ratios with meaningful distances in-between. While participants and raters were provided numeric scales labeled with semantic meanings (see Table 5), the individual interpretations of scales were likely disparate.

Given that, before the computation, annotation values were scaled relative to a neutral, by estimating modes of columns as neutrals and deducting them from respective column values (i.e., if the mode of a cheerful column for a particular participant was one, then one was subtracted from all values in that cheerful column). This mode-subtraction step was necessary to prevent the underestimation of IRRs.

Annotations in our dataset for scaled emotions are highly biased, as shown in Fig. 3. However, while arousal and valence are explicitly centered at zero (which corresponds to 3 = neutral), five emotions measured in the scale of 1 = very low to 4 = very high (cheerful, happy, angry, nervous, and sad) are systematically biased without a zero neutral. All of their values indicate that some emotion is present, and this absence of zero results in a widely varying interpretation of scale values by our participants and raters.

Consider the following scenario further elaborating this issue: a subject rates that she was cheerful as much as 1 for the first half of a debate, then 2 for the rest, but her debate partner rates that she was cheerful as much as 3 for the first half then 4 for the rest. In this example, self and partner annotations both imply that the subject was less cheerful for the first half of the debate. However, an IRR of two sets of annotations is close to zero without subtracting modes. Indeed, it is possible that the partner perceived the subject as more cheerful overall, compared to the subject herself. In that case, a low IRR correctly measures the difference between emotion perceptions of the subject and partner. Nevertheless, this assumption cannot be confirmed, as there is no neutral baseline.

Therefore, we applied the proposed mode-subtraction to emotion annotations such that alpha coefficients measure raters’ agreement on relative changes in emotions rather than their absolute agreement with each other. This adjustment mitigates spuriously low alpha coefficient values obtained from raw annotations (refer to Code Availability section for the code implementing the mode-subtraction and plotting of heatmaps).

These fixed alpha coefficients are low in general. In particular, a noticeable pattern emerges when comparing alpha coefficients of self-partner (SP) annotations and self-external (SE) annotations. As shown in the last rows of heatmaps (Diff. [SE - SP]) in Fig. 4, the differences between the IRRs of SE annotations and SP annotations tend to be above zero (for 22 out of 32 participants for arousal: mean = 0.145, stdev. = 0.279). This pattern possibly indicates that there exists a meaningful difference in the perception of emotions from different perspectives, while further study is required to validate its significance.

Physiological signals

Data quality

The quality of physiological signal measurements in the dataset has been thoroughly examined. The examination results are included as a part of the dataset in the data_quality_tables.tar.gz archive file.

Missing data

E4 data of 4 participants (P2, P3, P6, and P7) were excluded due to a device malfunction during data collection. While physiological signals in the dataset are mostly error-free with most of the files complete above 95%, a portion of data is missing due to issues inherent to devices or a human error:

IBI – data from P26 is missing as the internal algorithm of E4 that derives IBI from BVP automatically discards an obtained value if its reliability is below a certain threshold.
EDA – data from P17 and P20 is missing, possibly due to poor contact between the device and a participant’s skin.
NeuroSky (Attention, Meditation) – measurements from P1 and P20 are missing due to a poorly equipped device. A portion of data is missing for P19 (∼32%), P22 (∼59%) and P23 (∼36%) for the same reason. No BrainWave data was lost.
Polar HR – data from seven participants (P3, P12, P18, P20, P21, P29, and P30) are missing due to a device error during data collection. Parts of data are missing from P4 (∼38%) and P22 (∼38%) due to poor contact.

Usage Notes

Potential applications

In addition to the intended usage of the dataset discussed above, there are uncertainties as to how physiological markers of an individual’s capacity for flexible physiological reactivity relate to experiences of positive and negative emotions. Our dataset could potentially be useful to examine the role of physiological signal based markers in assessing an individual’s use of emotion regulation strategies, such as cognitive appraisal.

Additionally, various data mining and machine learning techniques could be applied to set up models for an individual’s emotional profile based on sensor-based physiological and behavioral recordings. This could further be transferred to various positive computing use-cases⁸⁸, such as helping children with autism in their social communication^89,90, helping people who are blind to read facial expressions and get the emotion information of their peers⁹¹, finding opportune moments for conversational user interactions^92,93, assisting social anxiety disorder patients to overcome their conditions⁹⁴, allowing robots to interact more intelligently with people^95,96, and monitoring signs of frustration and emotional saturation that affect attention while driving, to enhance driver safety^97,98.

Limitations

Data collection apparatus

Contact-base EEG sensors are known to be susceptible to noises, for example, frowning or eyes-movement might have caused peaks in the data. Other devices may also have been subject to similar systematic errors.

Data collection context

The context of the turn-taking debate may have caused participants to regulate or even suppress their emotional expressions, as an unrestrained display of emotions is often regarded undesirable during a debate. This may have contributed to a deflated level of agreement between self-reports and partner/external perceptions of emotions, which may not be a case for more natural interactions in the wild.

Retrospective emotion annotation with 2nd-person footage

We used retrospective affect judgment protocol where our participants annotated emotions they felt during debates watching the 2nd-person footage of themselves. This approach may have introduced unintended effects to self-ratings of emotions, which pertain to the interaction between interoception⁹⁹, emotional reasoning, and self-perception. Nonetheless, we clearly illustrate our rationale for choosing this annotation method on page 5, under Emotion annotation. Further, our dataset includes annotations of participant emotions from debate partners and external raters who watched the same footage. Therefore, rather than being flawed, our dataset opens a window for investigating the effects mentioned above while altogether enables a comprehensive study of emotions by comparing their perceptions across multiple perspectives.

Mode-subtraction in IRR computation

With the mode-subtraction, inter-rater reliability values represent the agreement of raters on relative emotion changes rather than perceived emotions in an absolute sense (see page 8, Inter-rater reliability). Therefore, the dataset users should keep this in mind and determine whether mode-subtraction is optimal for their particular use-cases.

Demographics

The participant demographics likely have introduced bias in the data. All of our participants and raters are young (their ages were between 19 to 36) and highly-educated, and the majority of them are individuals of Asian ethnicity. Therefore, our data may not generalize well to individuals of different ethnic groups or of younger or older age groups.

Unaccounted variables

Many variables unaccounted during data collection, such as the level of rapport between debating pairs, a participant’s competence in spoken English, and a participant’s familiarity with the debate topic, may also have contributed to a variance in the level of mismatch between the perceptions of emotions across different perspectives.

Code availability

Python codes implementing outlier detection using Chauvenet’s criterion, majority voting, mode-subtraction, and other utility functions, including the generation of heatmap plots, are available on https://github.com/Kaist-ICLab/K-EmoCon_SupplementaryCodes. The Krippendorff package (https://github.com/pln-fing-udelar/fast-krippendorff) was used for the computation of Krippendorff’s alpha. Python version 3.6.9 was used throughout.

Codes for preprocessing the raw log-level data in SQL databases to CSV files were implemented in Python with the SQLAlchemy package. However, these codes and the raw log-level data are not made available as they include privacy-sensitive information outside the agreed boundary for public sharing of the dataset, which was collected only for logistic reasons. Nevertheless, we welcome the dataset users to contact the corresponding authors if they need any further assistance or information regarding the raw data, and it’s preprocessing.

References

Salovey, P. & Mayer, J. D. Emotional intelligence. Imagination, Cogn. Pers. 9, 185–211 (1990).
Google Scholar
Mayer, J. D., Caruso, D. R. & Salovey, P. Emotional intelligence meets traditional standards for an intelligence. Intell. 27, 267–298 (1999).
Google Scholar
Salovey, P. E. & Sluyter, D. J. Emotional development and emotional intelligence: educational implications. (Basic Books, 1997).
Lopes, P. N. et al. Emotional intelligence and social interaction. Pers. Soc. Psychol. Bull. 30, 1018–1034 (2004).
PubMed Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nat. 542, 115–118 (2017).
ADS CAS Google Scholar
Mastoras, R.-E. et al. Touchscreen typing pattern analysis for remote detection of the depressive tendency. Sci. Reports 9, 1–12 (2019).
CAS Google Scholar
Yurtsever, E., Lambert, J., Carballo, A. & Takeda, K. A survey of autonomous driving: common practices and emerging technologies. IEEE Access 8, 58443–58469 (2020).
Google Scholar
Pennachin, C. & Goertzel, B. Contemporary approaches to artificial general intelligence. In Artificial General Intelligence, 1–30 (Springer, 2007).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nat. 529, 484 (2016).
ADS CAS Google Scholar
Silver, D. et al. Mastering the game of go without human knowledge. Nat. 550, 354–359 (2017).
ADS CAS Google Scholar
Reeves, B. & Nass, C. I. The media equation: how people treat computers, television, and new media like real people and places. (Cambridge University Press, 1996).
Turpen, A. Mit wants self-driving cars to traffic in human emotion. New Atlas, https://newatlas.com/automotive/mit-self-driving-cars-human-emotion/ (2019).
Barrett, L. F. How emotions are made: the secret life of the brain (Houghton Mifflin Harcourt, 2017).
Du, S., Tao, Y. & Martinez, A. M. Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111, E1454–E1462 (2014).
ADS CAS PubMed Google Scholar
Yannakakis, G. N., Cowie, R. & Busso, C. The ordinal nature of emotions. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 248–255 (IEEE, 2017).
Frank, M. G. & Svetieva, E. Microexpressions and deception. In Understanding Facial Expressions in Communication, 227–242 (Springer, 2015).
Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M. & Pollak, S. D. Emotional expressions reconsidered: challenges to inferring emotion from human facial movements. Psychol. Sci. Public Interest 20, 1–68 (2019).
CAS PubMed PubMed Central Google Scholar
Carroll, J. M. & Russell, J. A. Do facial expressions signal specific emotions? judging emotion from the face in context. J. Pers. Soc. Psychol. 70, 205 (1996).
CAS PubMed Google Scholar
Cauldwell, R. T. Where did the anger go? the role of context in interpreting emotion in speech. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000).
Barrett, L. F., Mesquita, B. & Gendron, M. Context in emotion perception. Curr. Dir. Psychol. Sci. 20, 286–290 (2011).
Google Scholar
Larsen, R. J. & Diener, E. Affect intensity as an individual difference characteristic: a review. J. Res. Pers. 21, 1–39 (1987).
Google Scholar
Gross, J. J. & John, O. P. Individual differences in two emotion regulation processes: implications for affect, relationships, and well-being. J. Pers. Soc. Psychol. 85, 348 (2003).
PubMed Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T. & Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affect. Comput. 3, 42–55 (2011).
Google Scholar
Koelstra, S. et al. Deap: a database for emotion analysis; using physiological signals. IEEE Transactions on Affect. Comput. 3, 18–31 (2011).
Google Scholar
Abadi, M. K. et al. Decaf: meg-based multimodal database for decoding affective physiological responses. IEEE Transactions on Affect. Comput. 6, 209–222 (2015).
Google Scholar
Subramanian, R. et al. Ascertain: emotion and personality recognition using commercial sensors. IEEE Transactions on Affect. Comput. 9, 147–160 (2016).
Google Scholar
Katsigiannis, S. & Ramzan, N. Dreamer: a database for emotion recognition through eeg and ecg signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Heal. Informatics 22, 98–107 (2017).
Google Scholar
Correa, J. A. M., Abadi, M. K., Sebe, N. & Patras, I. Amigos: a dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affect. Comput., https://doi.org/10.1109/TAFFC.2018.2884461 (2018).
Sharma, K., Castellini, C., van den Broek, E. L., Albu-Schaeffer, A. & Schwenker, F. A dataset of continuous affect annotations and physiological signals for emotion analysis. Sci. Data 6, 1–13 (2019).
CAS Google Scholar
Yan,W.-J.,Wu, Q., Liu, Y.-J.,Wang, S.-J. & Fu, X. Casme database: a dataset of spontaneous micro-expressions collected from neutralized faces. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 1–7 (IEEE, 2013).
Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, 400–408 (2018).
Watson, D. Mood and temperament (Guilford Press, 2000).
Batliner, A., Fischer, K., Huber, R., Spilker, J. & Nöth, E. How to find trouble in communication. Speech Commun. 40, 117–143 (2003).
MATH Google Scholar
Henrich, J., Heine, S. J. & Norenzayan, A. The weirdest people in the world? Behav. Brain Sci. 33, 61–83 (2010).
PubMed Google Scholar
Dhall, A., Goecke, R., Lucey, S. & Gedeon, T. Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed. 34–41 (2012).
Mollahosseini, A., Hasani, B. & Mahoor, M. H. Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affect. Comput. 10, 18–31 (2017).
Google Scholar
McDuff, D., Amr, M. & El Kaliouby, R. Am-fed+: an extended dataset of naturalistic facial expressions collected in everyday settings. IEEE Transactions on Affect. Comput. 10, 7–17 (2018).
Google Scholar
Poria, S. et al. Meld: a multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 527–536 (2019).
McDuff, D., El Kaliouby, R. & Picard, R. W. Crowdsourcing facial responses to online videos. IEEE Transactions on Affect. Comput. 3, 456–468 (2012).
Google Scholar
Morris, R., McDuff, D. & Calvo, R. Crowdsourcing techniques for affective computing. In The Oxford Handbook of Affective Computing, 384–394 (Oxford Univ. Press, 2014).
Korovina, O., Baez, M. & Casati, F. Reliability of crowdsourcing as a method for collecting emotions labels on pictures. BMC Res. Notes 12, 1–6 (2019).
Google Scholar
Motley, M. T. & Camden, C. T. Facial expression of emotion: a comparison of posed expressions versus spontaneous expressions in an interpersonal communication setting. West. J. Speech Commun. 52, 1–22 (1988).
Google Scholar
Jürgens, R., Grass, A., Drolet, M. & Fischer, J. Effect of acting experience on emotion expression and recognition in voice: non-actors provide better stimuli than expected. J. Nonverbal Behav. 39, 195–214 (2015).
PubMed PubMed Central Google Scholar
Juslin, P. N., Laukka, P. & Bänziger, T. The mirror to our soul? comparisons of spontaneous and posed vocal expression of emotion. J. Nonverbal Behav. 42, 1–40 (2018).
PubMed Google Scholar
Cacioppo, J. T. et al. The psychophysiology of emotion. Handb. Emot. 2, 173–191 (2000).
Google Scholar
Picard, R. W., Vyzas, E. & Healey, J. Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis Mach. Intell. 23, 1175–1191 (2001).
Google Scholar
Lisetti, C. L. & Nasoz, F. Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. on Adv. Signal Process. 2004, 929414 (2004).
Google Scholar
Rainville, P., Bechara, A., Naqvi, N. & Damasio, A. R. Basic emotions are associated with distinct patterns of cardiorespiratory activity. Int. J. Psychophysiol. 61, 5–18 (2006).
PubMed Google Scholar
Nummenmaa, L., Glerean, E., Hari, R. & Hietanen, J. K. Bodily maps of emotions. Proc. Natl. Acad. Sci. 111, 646–651 (2014).
ADS CAS PubMed Google Scholar
Pace-Schott, E. F. et al. Physiological feelings. Neurosci. & Biobehav. Rev. 103, 267–304 (2019).
Google Scholar
Busso, C. et al. Iemocap: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335 (2008).
Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M. & Schroder, M. The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affect. Comput. 3, 5–17 (2011).
Google Scholar
Busso, C. et al. Msp-improv: an acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affect. Comput. 8, 67–80 (2016).
Google Scholar
Healey, J. Recording affect in the field: towards methods and metrics for improving ground truth labels. In Affective Computing and Intelligent Interaction, 107–116 (Springer, 2011).
Zhang, B., Essl, G. & Mower Provost, E. Automatic recognition of self-reported and perceived emotion: does joint modeling help? In Proceedings of the 18th ACM International Conference on Multimodal Interaction, 217–224 (2016).
Truong, K. P., van Leeuwen, D. A. & Neerincx, M. A. Unobtrusive multimodal emotion detection in adaptive interfaces: speech and facial expressions. In International Conference on Foundations of Augmented Cognition, 354–363 (Springer, 2007).
Grossman, J. B., Klin, A., Carter, A. S. & Volkmar, F. R. Verbal bias in recognition of facial emotions in children with asperger syndrome. The J. Child Psychol. Psychiatry Allied Discip. 41, 369–379 (2000).
CAS Google Scholar
Dickson, H., Calkins, M. E., Kohler, C. G., Hodgins, S. & Laurens, K. R. Misperceptions of facial emotions among youth aged 9–14 years who present multiple antecedents of schizophrenia. Schizophr. Bull. 40, 460–468 (2014).
PubMed Google Scholar
Truong, K. P., Van Leeuwen, D. A. & De Jong, F. M. Speech-based recognition of self-reported and observed emotion in a dimensional space. Speech Commun. 54, 1049–1063 (2012).
Google Scholar
Hess, U., Blairy, S. & Kleck, R. E. The intensity of emotional facial expressions and decoding accuracy. J. Nonverbal Behav. 21, 241–257 (1997).
Google Scholar
Ranganathan, H., Chakraborty, S. & Panchanathan, S. Multimodal emotion recognition using deep learning architectures. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 1–9 (IEEE, 2016).
Min, H. C. & Nam, T.-J. Biosignal sharing for affective connectedness. In CHI ’14 Extended Abstracts on Human Factors in Computing Systems, 2191–2196 (2014).
Hassib, M., Buschek, D., Wozniak, P. W. & Alt, F. Heartchat: heart rate augmented mobile chat to support empathy and awareness. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2239–2251 (2017).
Liu, F., Dabbish, L. & Kaufman, G. Supporting social interactions with an expressive heart rate sharing application. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 1–26 (2017).
CAS Google Scholar
Liu, F. et al. Animo: sharing biosignals on a smartwatch for lightweight social connection. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1–19 (2019).
CAS Google Scholar
Liu, F., Kaufman, G. & Dabbish, L. The effect of expressive biosignals on empathy and closeness for a stigmatized group member. Proc. ACM on Human-Computer Interact. 3, 1–17 (2019).
CAS Google Scholar
Kim, S. South korea’s refugee debate eclipses a deeper, more fundamental question. The Hill, https://thehill.com/opinion/international/395977-south-koreas-refugee-debate-eclipses-a-deeper-more-fundamental-question (2018).
Kang, J.-k. Yemeni refugees become a major issue on jeju. Korea JoongAng Daily, http://koreajoongangdaily.joins.com/news/article/article.aspx?aid=3049562 (2018).
Park, N. South korea is going crazy over a handful of refugees. Foreign Policy, https://foreignpolicy.com/2018/08/06/south-korea-is-going-crazy-over-a-handful-of-refugees/ (2018).
Seo, B. In south korea, opposition to yemeni refugees is a cry for help. CNN, https://edition.cnn.com/2018/09/13/opinions/south-korea-jeju-yemenis-intl/index.html (2018).
Diers, K., Weber, F., Brocke, B., Strobel, A. & Schönfeld, S. Instructions matter: a comparison of baseline conditions for cognitive emotion regulation paradigms. Front. Psychol. 5, 347 (2014).
PubMed PubMed Central Google Scholar
Gross, J. J. & Levenson, R. W. Emotion elicitation using films. Cogn. Emot. 9, 87–108 (1995).
Google Scholar
Kemper, S. & Sumner, A. The structure of verbal abilities in young and older adults. Psychol. Aging 16, 312 (2001).
CAS PubMed Google Scholar
Yuan, J., Liberman, M. & Cieri, C. Towards an integrated understanding of speaking rate in conversation. In Ninth International Conference on Spoken Language Processing (2006).
Gabig, C. S. Mean length of utterance (mlu). Encycl. Autism Spectr. Disord. 1813–1814 (2013).
Graesser, A. & Chipman, P. Detection of emotions during learning with autotutor. In Proceedings of the 28th Annual Meetings of the Cognitive Science Society, 285–290 (Erlbaum, 2006).
Afzal, S. & Robinson, P. Natural affect data - collection annotation in a learning context. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 1–7 (IEEE, 2009).
D’Mello, S. K., Lehman, B. & Person, N. Monitoring affect states during effortful problem solving activities. Int. J. Artif. Intell. Educ. 20, 361–389 (2010).
Google Scholar
D’Mello, S. K. On the influence of an iterative affect annotation approach on inter-observer and self-observer reliability. IEEE Transactions on Affect. Comput. 7, 136–149 (2015).
Google Scholar
Levine, L. J. & Safer, M. A. Sources of bias in memory for emotions. Curr. Dir. Psychol. Sci. 11, 169–173 (2002).
Google Scholar
Safer, M. A., Levine, L. J. & Drapalski, A. L. Distortion in memory for emotions: the contributions of personality and post-event knowledge. Pers. Soc. Psychol. Bull. 28, 1495–1507 (2002).
Google Scholar
Lench, H. C. & Levine, L. J. Motivational biases in memory for emotions. Cogn. Emot. 24, 401–418 (2010).
Google Scholar
Park, C. Y. et al. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Zenodo https://doi.org/10.5281/zenodo.3931963 (2020).
Calix, R. A., Mallepudi, S. A., Chen, B. & Knapp, G. M. Emotion recognition in text for 3-d facial expression rendering. IEEE Transactions on Multimed. 12, 544–551 (2010).
Google Scholar
Wang, W., Chen, L., Thirunarayan, K. & Sheth, A. P. Harnessing twitter “big data” for automatic emotion identification. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conferenece on Social Computing, 587–592 (IEEE, 2012).
Xu, R. et al. Word embedding composition for data imbalances in sentiment and emotion classification. Cogn. Comput. 7, 226–240 (2015).
Google Scholar
Krippendorff, K. Computing krippendorff’s alpha-reliability. Retrieved from, https://repository.upenn.edu/asc_papers/43 (2011).
Lee, U. et al. Intelligent positive computing with mobile, wearable, and iot devices: literature review and research directions. Ad Hoc Networks 83, 8–24 (2019).
Google Scholar
Picard, R. W. Future affective technology for autism and emotion communication. Philos. Transactions Royal Soc. B: Biol. Sci. 364, 3575–3584 (2009).
Google Scholar
Washington, P. et al. Superpowerglass: a wearable aid for the at-home therapy of children with autism. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 1–22 (2017).
Google Scholar
Buimer, H. P. et al. Conveying facial expressions to blind and visually impaired persons through a wearable vibrotactile device. Plos One 13 (2018).
Cha, N. et al. “Hello there! is now a good time to talk?”: understanding opportune moments for proactive conversational interaction with smart speakers. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4 (2020).
Kim, A., Park, J.-M. & Lee, U. Interruptibility for in-vehicle multitasking: influence of voice task demands and adaptive behaviors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 1–22 (2020).
Google Scholar
Warnock-Parkes, E. et al. Seeing is believing: using video feedback in cognitive therapy for social anxiety disorder. Cogn. Behav. Pract. 24, 245–255 (2017).
PubMed PubMed Central Google Scholar
Breazeal, C. Emotion and sociable humanoid robots. Int. J. Human-Computer Stud. 59, 119–155 (2003).
Google Scholar
Kwon, D.-S. et al. Emotion interaction system for a service robot. In RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication, 351–356 (IEEE, 2007).
Nass, C. et al. Improving automotive safety by pairing driver emotion and car voice emotion. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems, 1973–1976 (2005).
Eyben, F. et al. Emotion on the road—necessity, acceptance, and feasibility of affective computing in the car. Adv. Human-Computer Interact. 2010 (2010).
Craig, A. D. How do you feel? interoception: the sense of the physiological condition of the body. Nat. Rev. Neurosci. 3, 655–666 (2002).
CAS PubMed Google Scholar
Markova, V., Ganchev, T. & Kalinkov, K. Clas: a database for cognitive load, affect and stress recognition. In 2019 International Conference on Biomedical Innovations and Applications (BIA), 1–4 (IEEE, 2019).
Russell, J. A. A circumplex model of affect. J. Pers. Soc. Psychol. 39, 1161 (1980).
Google Scholar
Plarre, K. et al. Continuous inference of psychological stress from sensory measurements collected in the natural environment. In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks, 97–108 (IEEE, 2011).
Ocumpaugh, J. Baker rodrigo ocumpaugh monitoring protocol (bromp) 2.0 technical and training manual. New York, NY Manila, Philipp. Teach. Coll. Columbia Univ. Ateneo Lab. for Learn. Sci. 60 (2015).

Download references

Acknowledgements

The 2019 KK-JRC Smart Project and the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT supported this research (NRF-2017M3C4A7065960). The authors cordially thank all participants for contributing their data for the development of the dataset.

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology, Graduate School of Knowledge Service Engineering, Daejeon, 34141, South Korea
Cheul Young Park, Narae Cha, Soowon Kang, Auk Kim & Uichin Lee
Khalifa University of Science and Technology, Department of Biomedical Engineering, Abu Dhabi, 127788, United Arab Emirates
Ahsan Habib Khandoker & Leontios Hadjileontiadis
Aristotle University of Thessaloniki, Department of Electrical and Computer Engineering, Thessaloniki, 54124, Greece
Leontios Hadjileontiadis
Korea Advanced Institute of Science and Technology, School of Computing, Daejeon, 34141, South Korea
Alice Oh
Korea Advanced Institute of Science and Technology, Department of Bio and Brain Engineering, Daejeon, 34141, South Korea
Yong Jeong

Authors

Cheul Young Park
View author publications
You can also search for this author in PubMed Google Scholar
Narae Cha
View author publications
You can also search for this author in PubMed Google Scholar
Soowon Kang
View author publications
You can also search for this author in PubMed Google Scholar
Auk Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ahsan Habib Khandoker
View author publications
You can also search for this author in PubMed Google Scholar
Leontios Hadjileontiadis
View author publications
You can also search for this author in PubMed Google Scholar
Alice Oh
View author publications
You can also search for this author in PubMed Google Scholar
Yong Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Uichin Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.Y.P. designed, prepared, and conducted the data collection, constructed and pre-processed the collected dataset, performed the technical validation, and wrote the manuscript. N.C. prepared and conducted the data collection. S.K. designed and assisted in the data collection. A.K. assisted in the data collection, constructed the dataset, and revised the manuscript. A.H.K. and L.H. advised, revised, and verified the manuscript. A.O. and Y.J. revised and verified the manuscript. U.L. supervised the dataset design and the data collection, advised the overall project, and revised the manuscript.

Corresponding authors

Correspondence to Cheul Young Park or Uichin Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Park, C.Y., Cha, N., Kang, S. et al. K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations. Sci Data 7, 293 (2020). https://doi.org/10.1038/s41597-020-00630-y

Download citation

Received: 19 May 2020
Accepted: 30 July 2020
Published: 08 September 2020
DOI: https://doi.org/10.1038/s41597-020-00630-y

This article is cited by

Physiological data for affective computing in HRI with anthropomorphic service robots: the AFFECT-HRI data set
- Judith S. Heinisch
- Jérôme Kirchhoff
- Klaus David
Scientific Data (2024)
Exploring contactless techniques in multimodal emotion recognition: insights into diverse applications, challenges, solutions, and prospects
- Umair Ali Khan
- Qianru Xu
- Janne Kauttonen
Multimedia Systems (2024)
K-EmoPhone: A Mobile and Wearable Dataset with In-Situ Emotion, Stress, and Attention Labels
- Soowon Kang
- Woohyeok Choi
- Uichin Lee
Scientific Data (2023)
EmotiphAI: a biocybernetic engine for real-time biosignals acquisition in a collective setting
- Patrícia Bota
- Emmanuel Flety
- Ana Fred
Neural Computing and Applications (2023)
Emotion recognition by web-shaped model
- Paola Barra
- Luigi De Maio
- Silvio Barra
Multimedia Tools and Applications (2023)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Dataset design

Intended usage

Context of data collection

Data collection apparatus

Ethics statement

Participant recruitment and preparation

Data collection setup

Data collection procedure

Administration

Onboarding

Baseline measurement

Debate

Emotion annotation

External emotion annotation

Data Records

Dataset summary

Preprocessing

Dataset contents

metadata.tar.gz

data_quality_tables.tar.gz

debate_audios.tar.gz

debate_recordings.tar.gz

neurosky_polar_data.tar.gz

e4_data.tar.gz

emotion_annotations.tar.gz

Technical Validation

Emotion annotations

Distribution and frequency of emotions

Inter-rater reliability

Physiological signals

Data quality

Missing data

Usage Notes

Potential applications

Limitations

Data collection apparatus

Data collection context

Retrospective emotion annotation with 2nd-person footage

Mode-subtraction in IRR computation

Demographics

Unaccounted variables

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links