Demographic effects on facial emotion expression: an interdisciplinary investigation of the facial action units of happiness

Understanding demographic difference in facial expression of happiness has crucial implications on social communication. However, prior research on facial emotion expression has mostly focused on the effect of a single demographic factor (typically gender, race, or age), and is limited by the small image dataset collected in laboratory settings. First, we used 30,000 (4800 after pre-processing) real-world facial images from Flickr, to analyze the facial expression of happiness as indicated by the intensity level of two distinctive facial action units, the Cheek Raiser (AU6) and the Lip Corner Puller (AU12), obtained automatically via a deep learning algorithm that we developed, after training on 75,000 images. Second, we conducted a statistical analysis on the intensity level of happiness, with both the main effect and the interaction effect of three core demographic factors on AU12 and AU6. Our results show that females generally display a higher AU12 intensity than males. African Americans tend to exhibit a higher AU6 and AU12 intensity, when compared with Caucasians and Asians. The older age groups, especially the 40–69-year-old, generally display a stronger AU12 intensity than the 0–3-year-old group. Our interdisciplinary study provides a better generalization and a deeper understanding on how different gender, race and age groups express the emotion of happiness differently.


Results
provides an overview of the three demographic variables and their effects on AU6 and AU12 intensity. The independent variables include gender (male and female), race (Caucasian, Asian and African American), and age (0-3, 4-19, 20-39, and 40-69-year-old), whereas the dependent variables are AU6 and AU12 intensity. Our results are based on a gender-, race-and age-balanced sample consisting of 4800 images, with each demographic subgroup (e.g., Caucasian, male, 0-3-year-old) consisting of 200 images. To gain a deeper understanding of the main and the interaction effect, a three-way (race × gender × age) analysis of variance (ANOVA) is conducted on AU6 and AU12 intensity separately (Table 1), followed by the posthoc comparison test that compares the means across all demographic groups. The F-value and the p value generated by the three-way ANOVA are used for evaluating the statistical significance; the mean difference (MD) and the corresponding standard error (SE) generated by the post-hoc comparison are used to determine if there exists any difference across groups.
The main effect. Race main effect. The race effect on AU6 intensity and AU12 intensity is similar, as shown in Fig. 1b,e. Moreover, the race effect on AU12 intensity is stronger than that on AU6 intensity, implying that the change in the activity of muscles in the mouth area (AU12) is more pronounced than that around the cheek area (AU6). The posthoc comparison suggests that the African American group has a significantly higher AU12 intensity when compared with the Asian group (MD = 0.436, SE = 0.034, p < 0.001, Cohen's d = 0.454) and the Caucasian group (MD = 0.182, SE = 0.034, p < 0.001, Cohen's d = 0.183), and the Caucasian group has a significantly higher AU12 intensity when compared with the Asian group (MD = 0.254, SE = 0.034, p < 0.001, Cohen's d = 0.261). A similar trend is also found in AU6 intensity. Our test results imply that the African American group and the Caucasian group will in general display a higher intensity of AU6 and AU12 than the Asian group.  Table 1. Three-way (Race × Gender × Age) ANOVA of the main effect and the interaction effect on (a) AU6 intensity and (b) AU12 intensity. Significant variables and their corresponding p values are highlighted in grey. p value < 0.05 is considered as statistically significant. The effect size can be represented by the partial Etasquared ( η 2 ), a higher value indicates a higher effect size. www.nature.com/scientificreports/ A cross-cultural psychological study 29 revealed that cultural difference in the arousal emotional expression between the East and the West is prominent. People in the West tend to embrace a higher arousal emotional state, whilst people in the East tend to embrace a lower arousal emotional state 29,30 . Besides, the cultural theory 31 posited that the Chinese culture and the American culture conceptualize happiness differently. For Americans, happiness is conceived as being upbeat and unmistakably positive, whilst Chinese tend to conceive happiness as a more solemn and calm entity 31 . Such difference in the conceptualization of happiness between the West and the East might explain why the Caucasian group and the Asian group display highly different AU6 and AU12 intensity.
Gender main effect. Based on the three-way ANOVA, a significant gender effect on AU12 intensity (F = 35.318, p < 0.001, partial η 2 =0.007) has been identified, whilst no significant gender effect on AU6 intensity (F = 1.195, p = 0.274, partial η 2 <0.001) has been found. Accordingly, the posthoc test suggests that the female group shows a significantly higher AU12 intensity as compared to the male group (MD = 0.163, SE = 0.027, p < 0.001, Cohen's d = 0.165). The significant difference in AU12 intensity can also be observed in Fig. 1d, indicating that the female group is generally more expressive in happiness, giving a higher AU12 intensity (or a bigger smile) than the male group.
From a psychological perspective. Dodd et al. 32 concluded that gender difference can be detected in the way people smile and is linked to the cultural expectation that females should behave friendlier and are more emotionally expressive than males. Interestingly, Clancy and Dollinger 33 provided a compelling evidence that such gender difference in smiling can be attributable to females' greater tendency to seek social connectedness than males. Besides, various studies 11,34-36 supported the view that females are more expressive than males. This experimental and statistical study will partially support these psychological views on emotional expressiveness variation by gender.
Age main effect. The three-way ANOVA result on AU6 and AU12 intensity ( Table 1) indicates that age has a small but significant effect on both FAUs (AU6: F = 12.172, p < 0.001, partial η 2 =0.008; AU12: F = 68.766, p < 0.001, partial η 2 =0.041). In addition, the posthoc comparison finds that the mean AU6 intensity of the 40-69-year-old group is significantly higher than that of the 20-39-year-old (MD = 0.197, SE = 0.036, p < 0.001, Cohen's d = 0.191). Previous finding 37 suggested that wrinkles, folds and lower expressivities of older faces may affect how facial expressions are being decoded. Based on this observation, we hypothesize that these facial features due to old age may also affect the facial expression of happiness, resulting in a lower AU6 intensity across the old age group. Concerning the age-related effect on facial expressions, several studies 16,17,38 found no difference in expressivity across the younger and the older age groups. These results contradict our newly generated results supporting that the older age group may express happiness more intensively than the younger age group. It would be worthwhile to conduct more studies to investigate the underlying mechanisms that govern the facial expressions of different age groups.
Our  Figure 1f clearly shows there is an obvious big mean difference between the 0-3-year-old and the other age groups, indicating that the infant group has a lower AU12 intensity than the rest of the age groups. With respect to the race × age interaction effect (Fig. 2d), across all race groups, there is a big mean difference between the 0-3-year-old and the other age groups. Hence, the big mean difference is not caused by a specific race group. Amongst all age groups, the 4-19-year-old smile much more intensively via AU12 when compared to the 0-3-year-old infant group. Table 1a, the race × gender interaction effect on AU6 intensity is small but significant (F = 4.868, p = 0.008, partial η 2 =0.002), indicating that there is a statistically significant difference in the expression of AU6 intensity across the female group and the male group when race is being controlled for (see also Fig. 2a). To reveal whether certain race group shows a higher AU6 intensity on the female group than on the male group, we further analyzed the gender effect on each of the three race groups by the posthoc comparison. Our result shows that the Asian female group exhibits a marginally higher AU6 intensity than the Asian male group (MD = 0.089, SE = 0.044, p = 0.043, Cohen's d = 0.100). On the contrary, the African American group and the Caucasian group do not show any significant difference between females and males.

The interaction effect. Race × Gender interaction effect on AU6 intensity. As displayed in
We then analyzed the gender effect on race. The univariate test shows that different races have displayed a significant statistical difference on AU6 intensity across both the female group (F = 17.816, p < 0.001), and the male group (F = 38.754, p < 0.001). For the female group, the African-American group exhibits a significantly higher AU6 intensity when compared with the Asian group (MD = 0.262, SE = 0.044, p < 0.001, Cohen's d = 0.451) and the Caucasian group (MD = 0.116, SE = 0.044, p = 0.024, Cohen's d = 0.451), while the Caucasian group has displayed a significantly higher AU6 intensity when compared with the Asian group (MD = 0.145, SE = 0.044, p = 0.003, Cohen's d = 0.451). A similar trend is found among the male group, as observed in (Fig. 2a).
Race × Age interaction effect on AU6 intensity. As shown in Fig. 2b, there is a small but statistically significant race × age interaction effect on AU6 intensity (F = 4.071, p < 0.001, partial η 2 =0.005), implying that different age groups have displayed a statistically significant difference in the expression of AU6 intensity, when race is being controlled for. To reveal whether certain race group displays a higher AU6 intensity on one age group than on the others, we further analyze the age effect on each of the three race groups by the posthoc comparison. Race × Gender interaction effect on AU12 intensity. To decompose the race × gender interaction on AU12 intensity, we analyzed the race effect on AU12 intensity after controlling for gender, as well as the gender effect on AU12 intensity after controlling for race. For both the female group and the male group, the African American group displays a much higher AU12 intensity than that of the Caucasian group Race × Age interaction effect on AU12 intensity. To decompose the race × age interaction effect on AU12 intensity, we first analyzed the race effect on AU12 intensity after controlling for age, then the age effect on AU12 intensity after controlling for race. Table 2 summarizes the results. Across all age groups, the African American group's mean AU12 intensity is significantly higher than that of the Asian group (see Fig. 2d and Table 2a). Across all race groups, the 4-19-year-old, the 20-39-year-old and the 40-69-year-old exhibit a significantly higher AU12 intensity than the 0-3-year-old, as shown in Fig. 2d and Table 2b. For the African American cohort, the 40-69-year-old smiles more intensively in AU12 than the rest of the age groups, but the difference is not statistically significant when compared with the 4-19-year-old. For the Asian cohort, the young adult group (20-39-year-old) displays a bigger smile intensity in AU12 than the rest of the age groups, but the difference is not statistically significant, except when compared with the 0-3-year-old. For the Caucasian cohort, the old age group (40-69-year-old) displays a bigger smile intensity in AU12 than the rest of the age groups, but the difference is not statistically significant when compared with the 4-19-year-old. Figure 2. Statistically significant two-way interactions on AU6 and AU12 intensity (p < 0.05): (a) race × gender interaction on AU6 intensity, (b) race × age interaction on AU6 intensity, (c) race × gender interaction on AU12 intensity, (d) race × age interaction on AU12 intensity, and (e) gender × age interaction on AU12 intensity. The error bar represents the standard error. www.nature.com/scientificreports/ Gender × Age interaction effect on AU12 intensity. Lastly, we analyzed the gender × age interaction effect on AU12 intensity. We first analyzed the age effect on AU12 intensity after controlling for gender, then the gender effect on AU12 intensity after controlling for age.  . 2e). Figure 2c-e summarize the Race × Gender interaction (F = 3.646, p = 0.026, partial η 2 =0.002), race × age interaction (F = 2.736, p = 0.012, partial η 2 =0.003), and Gender × Age interaction (F = 8.776, p < 0.001, partial η 2 =0.005) on AU12 intensity, respectively. www.nature.com/scientificreports/

Discussion
The demographic effect on FEE has been studied across different disciplines, with most of the research focusing on a single demographic factor. In this study, we examined the main and the interaction effect of three demographic factors on FEH via studying the FAUs of happiness. Meanwhile, we acknowledge the ongoing debates about the relationship between human expressions and emotions 39 . Instead of basing on self-reporting information, we adopted a more objective and standardized approach for measuring AU intensities, which may suffer less human perception bias. The analysis of the main effect on both AU6 and AU12 intensity suggests that, first, in general, the female group shows a higher AU12 intensity than the male group in FEH. The result agrees with the traditional view 12 that the female group is more emotionally expressive than the male group and is more likely to show their positive emotional state of happiness more expressively. Our result is consistent with the previous findings 11, [34][35][36] . Second, when comparing the three different races, including African American, Caucasian and Asian, the African American group tends to have a higher AU6 and AU12 intensity in FEH than the other two groups, while the Caucasian group has a higher AU6 and AU12 intensity than the Asian group. Although the race effect may vary somehow by gender or age, the overall trend of the average AU6 and AU12 intensity is consistent, as revealed in Fig. 1b,e. Third, our results show that the race effect interacts with the age effect to affect AU6 and AU12 intensity, respectively, as shown in Fig. 2b,d. In terms of AU6 intensity, for the African American cohort, the 40-69-yearold exhibits a significantly higher intensity than the 20-39-year-old. For the Asian cohort, the 4-19-year-old and the 40-69-year-old display a significantly higher AU6 intensity than the 0-3-year-old. For the Caucasian cohort, the 4-19-year-old has the highest AU6 intensity across all age groups, especially showing a significantly higher AU6 intensity than the 20-39-year-old. However, for the Caucasian and the Asian cohort, the 40-69-year-old and the 20-39-year-old do not show any significant difference in AU6 intensity and the difference is very small. In terms of AU12 intensity, for the African American cohort, the 40-69-year-old age smiles more intensively than the rest of the age groups, but the difference is very small and not statistically significant when compared with the 4-19-year-old. For the Asian cohort, the young adult group (20-39-year-old) displays a bigger AU12 intensity than the rest of the age groups, but the difference is very small and not statistically significant, except when compared with the 0-3-year-old. For the Caucasian cohort, the old age group (40-69-year-old) displays a bigger AU12 intensity than the rest of the age groups, but the difference is very small and not statistically significant when compared with the 4-19-year-old.
Fourth, significant age-related difference in AU6 and AU12 intensity can also be identified. To our surprise, the 4-19-year-old, the 20-39-year-old and the 40-69-year-old have an average AU12 intensity significantly higher than that of the 0-3-year-old (see Fig. 1f). Lastly, the gender difference has a more pronounced effect on the 20-39-year-old group's AU12 intensity as compared to the 0-3-year-old group (Fig. 1e). In particular, the female exhibits a significantly higher AU12 intensity as compared to the male across the 20-39-year-old, whereas no significant gender difference in AU12 intensity is observed for the 0-3-year-old. What might be the reason that the gender difference in AU12 intensity is not obvious in the early (0-3) years? We speculate that this may be partially attributable to the lesser muscle activities in the mouth area during infancy 37 .
In the big data era, social network platforms have been extensively utilized for emotion 40 or personality 41 analysis. Our study has combined social "big data" in conjunction with FAU recognition technologies to address our social science/psychological research question. However, social big data have presented some limitations and call for further innovations. Despite Flickr's popularity, our current Flickr sample may not be fully representative of the entire American population. Studies [42][43][44][45] show that data from the social media may not be necessarily representative of the entire population. However, with the introduction of our data balance techniques, we have managed to improve the representativeness of our sample based on the facial images downloadable from Flickr. We acknowledge that human facial expressions are complex and can be influenced by various external and internal factors, types of expressions (posed vs. spontaneous), cultural backgrounds, where the expressions have been made and how the pictures of these expressions have been taken (selfies or pictures taken by others). Hence, demographic factors such as age, gender and race may not fully explain how facial expressions differ across different people. However, even if the three demographic factors may exhibit small differences in terms of the expression of happiness (with small effect sizes), they may still carry a significant statistical effect. Further, small effects can have large aggregated consequences 46 . Currently, our collected Flickr images cover both posed and spontaneous facial expressions. Hence, no distinction regarding the naturalness of our facial expressions (whether they are posed or spontaneous) can be made in our study. Our conclusions on the effects of race, age, and gender on FAUs of happiness are drawn with this limitation in mind. In the future, we will take into account the naturalness of the facial image expressions in our model, in order to obtain a more rigorous understanding of the demographic effects on FAUs of happiness.
As compared to other existing social science or psychologically driven facial-recognition studies, our study is superior in three dimensions. First, we have created a balanced composition of facial image samples based on age, race and gender, extending beyond controlled laboratory settings to more natural settings taking the natural facial images as the inputs. Second, utilizing automated AI-driven techniques for measuring the FAU intensity of happiness, we analyzed the effects of three key demographic factors on FEH simultaneously (studying both their single and interactive effects on the FAU of happiness and FEH). Third, our results have demonstrated that AI-driven social "big data" analysis can be used to measure FEH, via examining the FAU intensity of happiness, to validate theories of human emotions in three closely related fields, including psychology, anthropology and social studies. www.nature.com/scientificreports/

Conclusion
In summary, the present study investigates the statistical effects of three demographic factors, namely, gender, race and age, on the FAUs of happiness using human facial images from Flickr and an automated AI-driven FAU recognition algorithm. Specifically, two important FAUs associated with happiness are used as the dependent variables in our statistical analysis. Our method uses the FAU recognition algorithm developed by us and the natural facial image data downloaded from Flickr to provide a full understanding of the main and the interaction effects of gender, race, and age on the FAUs of happiness. Given the strong link between the FAUs of happiness and FEH, our study provides new insights into how the FAU intensity of happiness and FEH can vary across different demographic groups. Moreover, our FAU recognition and social media-based image collection methodology can provide a completely new avenue for decoding human FEE and facilitating future FEE studies on happiness and well-being. To further improve our research design, more image data can be added to each demographic group, while additional demographic factors that potentially affect FEH or other emotions can be considered in future FAU-based FEE studies. In our statistical analysis, both the main and the interaction effect of three core demographic factors on AU12 and AU6 have been thoroughly examined. Based on Flickr dataset, our results have suggested that females are generally having a higher AU12 intensity than males. African Americans are having a higher AU6 and AU12 intensity, when compared with Caucasians and Asians. In addition, the older age groups, especially the 40-69-year-old, are generally displaying a stronger AU12 intensity than the 0-3-year-old. Hence, our interdisciplinary study provides a more automatic and a better generalization, as well as a deeper understanding on how different gender, race and age groups of the Flickr population express their emotion of happiness differently, partially representing the FEH of the Americans. Figure 3 summarizes our overarching methodology. We started with data collection and pre-processing. Next, we obtained the intensity level of AU6 and AU12 for three demographic groups and their interacting sub-groups using our FAU deep learning algorithm. Finally, we performed statistical analysis to investigate the single and interaction demographic effects on AU6 and AU12 intensity. Data collection. We collected facial images from the existing large-scale Real-world Affective Faces Database 18,19 (RAF-DB), which is publicly available and widely used by the research community. RAF-DB contains about 30,000 real-world facial expression images downloadable from Flickr, partially representative of the facial images of the American population. As stated in the database description 18 , the images with facial expressions of happiness were retrieved from Flickr using keyword search, based on a set of emotion keywords, such as "happy" and "smile", etc. Besides, 315 human annotators were asked to label the images with metadata including gender, race, age-range and emotion categories. Flickr is a well-established database that provides facial expressions in the natural settings. The facial expressions of people across a wide age range create a new avenue for us to examine how race or gender effect on the FAUs of happiness and FEH vary across the age range. Previous research studies usually do not consider fully how the age factor will affect AU6 and AU12 intensity after controlling for race or gender. In addition, our study refines the categorization of age groups, for instance, we have divided the 0-19-year-old age group into the 0-3-year-old and the 4-19-year-old one. This allows us to examine the difference in FAU intensity across different age groups in finer granularity. We selected any images labelled with "happy" and with a full facial profile in the original RAF-DB database to our own database. Any severely occluded faces having poor lighting and low resolution were removed. This resulted in 5585 images. Second, we separated the images into 24 subgroups based on the demographic categories, i.e., race (Caucasian, Asian, and African American), gender (male and female), and age (0-3, 4-19, 20-39, and 40-69-year-old). Third, we additionally augmented the subgroups "African-American, female, 0-3-year-old", "Asian, female, 0-3-year-old", and "Asian, female, 0-3-year-old" with pictures from Flickr, to ensure that the number of pictures is distributed relatively evenly across all subgroups. In addition, random down-sampling and augmentation have been utilized to ensure that our sample consists of evenly distributed demographic sub-groups. Finally, we obtained a database with 4800 images.

Method
Data pre-processing. During the data pre-processing stage, first, we utilized an open-source C++ library Dlib 47 to detect and extract 68 facial landmarks, i.e., the (x, y)-coordinates of 68 facial key points in the image. www.nature.com/scientificreports/ Second, based on the 68 facial landmarks, we applied affine transformations (scaling, rotating, translating, etc.) to project the image into a new output coordinate space, where the positions and sizes of all human faces were approximately uniform. Third, the detected face regions were cropped and resized to a resolution of 256 × 256 pixels, which is the standard input size of our developed AU intensity estimator 28 . Given the unequal distribution of the subgroups, we performed random down-sampling for the subgroups with a large number of images, with data augmentation performed for any subgroups that have very few images. Data augmentation was implemented by randomly rotating images by − 10° to + 10° with Gaussian noises of different variances added (0.001-0.02). This step ensured that an evenly distributed sample size can be achieved across all subgroups. Finally, we obtained 200 facial expression images for each demographic subgroup, leading to a gender-, race-and agebalanced sample consisting of 4800 images for our experimental study and data analysis.

Measurement.
In this study, the dependent variables are AU6 and AU12 intensity, the two core indicators for FEH. To estimate the corresponding AU intensity of a facial image, we have developed a heatmap regression framework 28 based on Convolutional Neural Networks (CNNs). Figure 4 illustrates the structure of our proposed heatmap regression framework. It is an Encoder-Decoder architecture, where the encoder is ResNet-50 48 and the decoder consists of three deconvolutional layers and three semantic correspondence convolutional layers 28 . The deep learning models are developed based on Tensorflow 49 , and the training is conducted on a server configured with eight NVIDIA GeForce GTX 1080Ti 11G GPUs. The trained deep learning model is validated using a sample of approximately 70,000 facial images from a challenging benchmark dataset 50 . Our automated algorithm has achieved a superior performance for the estimation of the intensity of spontaneous FAUs of happiness, as demonstrated in our previous study 28 . During the inference stage, the output provides estimates for the intensity of five fundamental AUs of happiness, including AU6 and AU12 intensity. We applied this model to calculate the intensity level of AU6 and AU12 for each demographic subgroup. Details of the FAU estimation model can be found in the Supplementary Material. Statistical analysis. All statistical analyses were performed using IBM SPSS version 26.0.0. Three-way analyses of variance (ANOVAs) were carried out to examine the effects of three independent demographic variables (gender, race, and age) on the dependent variables (AU6 and AU12 intensity). Follow-up simple effect analyses were conducted when the interaction effects are statistically significant. Besides, to compare the mean AU6 and AU12 intensity of different demographic subgroups, we used the posthoc pairwise multiple comparison test, with Bonferroni correction; the α level for all analyses was set at 0.05. The validity of the result can be tested by the mean difference (MD) and the corresponding standard error (SE). p value < 0.05 is considered as statistically significant.

Data availability
The database that supports the findings of our study is publicly available and can be requested from http://www. whden g.cn/RAF/model 1.html. The previous publications 18, 19 provide more details of the dataset.