Kinematic dataset of actors expressing emotions

Human body movements can convey a variety of emotions and even create advantages in some special life situations. However, how emotion is encoded in body movements has remained unclear. One reason is that there is a lack of public human body kinematic dataset regarding the expressing of various emotions. Therefore, we aimed to produce a comprehensive dataset to assist in recognizing cues from all parts of the body that indicate six basic emotions (happiness, sadness, anger, fear, disgust, surprise) and neutral expression. The present dataset was created using a portable wireless motion capture system. Twenty-two semi-professional actors (half male) completed performances according to the standardized guidance and preferred daily events. A total of 1402 recordings at 125 Hz were collected, consisting of the position and rotation data of 72 anatomical nodes. To our knowledge, this is now the largest emotional kinematic dataset of the human body. We hope this dataset will contribute to multiple fields of research and practice, including social neuroscience, psychiatry, computer vision, and biometric and information forensics. Measurement(s) body movement coordination trait • emotion/affect behavior trait Technology Type(s) motion capture system Factor Type(s) emotion category • sex Sample Characteristic - Organism Homo sapiens Measurement(s) body movement coordination trait • emotion/affect behavior trait Technology Type(s) motion capture system Factor Type(s) emotion category • sex Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12821150

emotion dimensions (e.g., pleasant-unpleasant, valence; deactivated-activated, arousal) 29 . Therefore, there is a lack of public human body kinematic dataset expressing various emotions. Considering this lack of available information, we aimed to produce a comprehensive dataset to assist in recognizing emotional cues from all parts of the body in richer daily situations with more ecological validity.
Taken together, we report a human body kinematic dataset that consists of 1402 trials while expressing six basic emotions (happiness, sadness, anger, fear, disgust, surprise) and neutral. Twenty-two semi-professional actors (half male) participated in this study. A low cost and validated inertial motion capture system with 17 sensors was used [30][31][32] . The actors performed according to the standardized guide and carefully screened daily events. The resulting kinematic dataset contains the position and rotation data of 72 anatomical nodes, which is stored in the BioVision Hierarchy (BVH) structure. This work expands the scope of emotion recognition and can help us to better understand human's emotion conveyed via body movements. To the best of our knowledge, this is, at present, the largest emotional kinematic dataset based on the whole human body. These data are expected to be used repeatedly and foster progress in several fields, including social neuroscience, psychiatry, human-computer interaction, computer vision, and biometric and information forensics. Methods preparation phase. Equipment and environment. The kinematic data were collected using a wireless motion capture system (Noitom Perception Neuron, Noitom Technology Ltd., Beijing, China) with 17 wearable sensors. This apparatus was connected to the Axis Neuron software (version 3.8.42.8591, Noitom Technology Ltd., Beijing, China) on a laptop computer (Terrans Force T5 SKYLAKE 970 M 67SH1, Windows 10 operating system, Intel Core i7 6700HQ processer). These sensors, with a sampling rate of 125 Hz, were placed on both sides of the actors, including their upper and lower arms, hips, spine, head, feet, hands, shoulders, and both upper and lower legs (see Fig. 1a). The tasks were conducted in a quiet laboratory (see Fig. 1b). The actors needed to execute each performance in a square stage of 1 m × 1 m that was 0.5 m from a wall. Proper and limited performance space would also control the horizontal distance between the actor and the object of emotions so that the horizontal displacement of all recordings was not very different.
Scenarios. In order to guide the actors to perform in a typical and natural manner, we created 70 daily event scenarios (10 for each emotion and neutral; see Supplementary File 1) according to the basic concepts of emotions [33][34][35] and previous research 6,7,[20][21][22]26,27,36 . To test the validity of the scenarios, 70 college students (mean age = 23.10 years, SD = 1.64, 42 females) were required to classify the emotion expressed in the scenarios displayed randomly on a seven-alternative forced-choice questionnaire (see Supplementary File 2). The five most recognizable scenarios for each emotion were retained (see Online-only Table 1). Note that, for emotional but not neutral recordings, free (non-scripted) performances were added, during which the performance was spontaneous, and the actors were free to interpret and express the emotions as they thought fit, not be restricted by the scenarios. Thus, there were 35 scenarios used in the recording phase.
Actors. Another group of 24 college students (mean age = 20.75 years, SD = 1.92; mean height = 1.69 m, SD = 0.07; mean weight = 58.52 kg, SD = 12.42; mean BMI = 20.25, SD = 3.30) was recruited from the drama and dance clubs of the Dalian University of Technology to perform as actors for this study (see Table 1). Actors F04 and F13 were excluded because they dropped out. All actors were physically and mentally healthy and right-handed. Each actor gave their written informed consent before performing and was told that their motion www.nature.com/scientificdata www.nature.com/scientificdata/ data were to be used only for scientific research. The study was approved by the Human Research Institutional Review Board of Liaoning Normal University in accordance with the Declaration of Helsinki (1991). After the recording phase, the actors were paid appropriately. recording phase. The actors wore black tights, and 17 Neuron sensors were attached to the corresponding body location. A four-step calibration procedure using four successive static poses was done for the Axis Neuron software before performances and when necessary (e.g., bad WIFI signal or after a rest) (see Fig. 2; for details, see https://neuronmocap.com/content/axis-neuron). To prevent inconsistencies in interpretation by different actors, standardized instructions were given before each recording (see Supplementary File 3). The actors started in a neutral stance (i.e., facing forward and arms naturally at sides). For each kind of emotion, we successively asked the actors to give a six-second free performance based on their self-understanding and the scenario performance. The order of emotions displayed for the actors was random, and the order of scenarios within each emotion was random as well. When the actors were ready and we said "start", the Axis Neuron software simultaneously recorded the motion data. After each performance, we reviewed it and evaluated the signal quality; hence, some performances needed to be repeated several times. The recording phase took approximately two hours, during which the actors could have a rest when they felt tired.

Data records
A total of 1406 trials were collected. The commercial Axis Neuron software uses a RAW file format to store data. We exported those RAW trial files to BVH files, as it is a standard file format that can be analyzed using various software (e.g., 3ds Max, https://www.autodesk.com/products/3ds-max; MotionBuilder, https://www.autodesk. com/products/motionbuilder). Four raw files were impaired during this process (i.e., F09D0V1, M04F4V2, M06N1V1, and M06SU1V1). Therefore, the human body kinematic dataset-available from https://doi. org/10.13026/kg8b-1t49 37 (mean duration = 7.22 s, SD = 1.57)-that was created consists of 1402 trials expressing six emotions and neutral.
A BVH file contains ASCII text and two sections (i.e., HIERARCHY and MOTION). Beginning with the keyword HIERARCHY, this section defines the joint tree, the name of each node, the number of channels, and the relative position between joints (i.e., the bone length of each part of the human body). There are totally 72 nodes www.nature.com/scientificdata www.nature.com/scientificdata/ data (i.e., 1 Root, 58 Joints, and 13 End Sites) in this section (see Fig. 3), which are calculated by the commercial Axis Neuron software according to the 17 sensors (see Fig. 1a). The MOTION section records the motion data. According to the joint sequence defined, the data of each frame is provided, and the position and rotation information of each joint node is recorded. There are some legends in a BVH file: • HIERARCHY: beginning of the header section • ROOT: location of the Hips (see Fig. 3

)
• JOINT: location of the skeletal joint refers to the parent-joint (see Fig. 3

)
• CHANNELS: number of channels including position and rotation channels • OFFSET: X, Y, and Z offsets of the segment relative to its parent-joint • End Site: end of a JOINT which has no child-joint (see Fig. 3
calibration. As described in the recording phase (see Methods), the motion capture system was calibrated with the four-step calibration procedure before performance and as needed. We also reviewed and visually checked the quality of the motion signal and the naturalness of performances trial by trial. After all sensors have a b c d www.nature.com/scientificdata www.nature.com/scientificdata/ been calibrated, the pose of the model in the recording software will be consistent with the actor's initial stance (i.e., facing forward and arms naturally at sides; see Fig. 4a), and the spatial position of the mass center across all models in the recording software will be relatively stable. Otherwise, the model will be deformed, and the initial spatial position of the mass center will be inconsistent across all performances. Therefore, the spatial positions of the first frame mass center across all recordings can reflect the calibration quality. To evaluate the calibration quality, we used the Axis Neuron software and extracted the X, Y, and Z positions of the mass center (see Fig. 4a) of the first frame from each recording (see https://physionet.org/content/kinematic-actors-emotions/2.1.0/). The data distribution in these three dimensions was relatively centralized, showing that the initial states of the actors would be consistent (see Fig. 4b-e), which suggests a good calibration quality in our study.

Usage Notes
BVH files can be imported directly into 3ds Max (https://www.autodesk.com/products/3ds-max), MotionBuilder (https://www.autodesk.com/products/motionbuilder), and other 3D applications. Therefore, these data can be used to build different avatars in virtual reality and augmented reality products. Previous studies on emotion recognition in the field of computer and information science have mainly focused on human faces and voices; hence, the dataset created in this study can improve current technologies and contribution to scientific advancements. In the fields of psychiatry and psychology, researchers can also create experimental stimuli based on the present study, such as emotional point-light displays that contain biological motion information 38 . Such material has been applied in the field of social cognitive impairment and can contribute to the clinical diagnosis of autistic spectrum disorders, schizophrenia, and other conditions of psychiatric patients 39,40 .
Amongst the trials produced, some special trials should be noted. Although we asked the actors to complete each performance within six seconds, some trials are, in fact, much shorter or longer because of the difference in human time perception or operator error (e.g., 21.84 s for F07SA0V1, 2.688 s for M01D1V2). Future studies aim to be more consistent by avoiding these trial discrepancies when possible.
Although we have got richer data, reflecting the variability of body movements by creating more daily scenarios and recruiting more actors, and the present set also gives researchers more opportunities to select experimental materials, the differences among the scenarios in the same emotion may bring some "noise". For example, if the object of anger is a proximal dog, the expression may be targeted at a lower vertical level than if the object is a car speeding off. We will further examine the specific relationship between scenario-induced movement and subjective emotional experience (e.g., emotional intensity, valence, and arousal).  www.nature.com/scientificdata www.nature.com/scientificdata/ Because all the actors were told that their motion data were to be used only for scientific research before the performance, we followed the recommendation of PhysioNet and chose a suitable license as data use agreement (Restricted Health Data License 1.5.0, https://physionet.org/content/kinematic-actors-emotions/view-license/2.1.0/). Therefore, users need to sign this agreement online before downloading and using the present dataset.  Figure 4 shows that, after the calibration procedure, the initial spatial position of the actors across all performances are relatively centralized and consistent, reflecting a good calibration quality in the present study.