Predictors of facial attractiveness and health in humans

Facial attractiveness has been suggested to provide signals of biological quality, particularly health, in humans. The attractive traits that have been implicated as signals of biological quality include sexual dimorphism, symmetry, averageness, adiposity, and carotenoid-based skin colour. In this study, we first provide a comprehensive examination of the traits that predict attractiveness. In men, attractiveness was predicted positively by masculinity, symmetry, averageness, and negatively by adiposity. In women, attractiveness was predicted positively by femininity and negatively by adiposity. Skin colour did not predict attractiveness in either sex, suggesting that, despite recent interest in the literature, colour may play limited role in determining attractiveness. Male perceived health was predicted positively by averageness, symmetry, and skin yellowness, and negatively by adiposity. Female perceived health was predicted by femininity. We then examined whether appearance predicted actual health using measures that have been theoretically linked to sexual selection, including immune function, oxidative stress, and semen quality. In women, there was little evidence that female appearance predicted health. In men, we found support for the phenotype-linked fertility hypothesis that male masculinity signalled semen quality. However, we also found a negative relationship between averageness and semen quality. Overall, these results indicate weak links between attractive facial traits and health.

Immune function. Principal components analyses (PCA) were conducted to summarize the interrelated immune function variables. Both male and female data returned two PCs (see supplementary material for details on data reduction). For men, PC1 was loaded mainly by bacterial killing capacity and overall bacterial immunity and PC2 was loaded mainly by bacterial suppression capacity and lysozyme activity. For women, PC1 was loaded mainly by bacterial killing capacity, overall bacterial immunity and lysozyme activity and PC2 was loaded mainly by bacterial suppression capacity. Residuals were extracted from women's PC1 after controlling for significant lifestyle factors (see supplementary material). No lifestyle factors were significantly related to either PC2 in women or any of the PCs in men. Therefore, we ran these analyses on the raw data.
Multiple regression results indicated that there were no significant appearance predictors of either immune PC in either men or women (Tables 5 and 6).
Oxidative stress. Raw descriptive statistics for the oxidative stress variables are presented in Tables 7 and 8.
The isoprostane data was log-transformed to achieve normal distribution for the data analyses. The 8-OHdG analyses for women were based on residuals that were extracted after controlling for various lifestyle variables (see supplementary material). Lifestyle factors did not significantly affect isoprostane levels in women or either of the oxidative stress measures in men. Therefore, we ran the regression analyses on the raw data of these variables.
In men, multiple regression results indicated no significant appearance predictors of either oxidative stress measure (Tables 7 and 8). In women, there were no significant predictors of either oxidative stress measure (Tables 7 and 8).
Semen quality. PCA was conducted on the interrelated semen quality variables, resulting in three PCs (see supplementary material). PC1 was weighted most strongly by variables related to rapid progressive motility. PC2 was weighted most strongly by variables related to the linearity of the sperm movement. PC3 was weighted most strongly by sperm concentration and percentage motile sperm. PC2 was reverse-scored, square-root-transformed, and then reversed-scored again to achieve normal distribution due to a negative skew and positive kurtosis. Residuals were extracted from PCs 1 and 2 after controlling for variation in lifestyle, collection procedure, and sample abnormalities (see supplementary material).
Multiple regression results indicated that PC1 was negatively predicted by skin yellowness (Table 9). PC 2 was positively predicted by masculinity (Table 10). PC3 was negatively predicted by averageness (Table 11).

Discussion
This study provides a comprehensive assessment of the facial traits that predict attractiveness, perceived health, and various measures of actual health, including oxidative stress, immune function, and semen quality. Our results showed that male attractiveness was predicted positively by masculinity, symmetry and averageness, and negatively by adiposity. Female attractiveness was predicted positively by femininity and negatively by adiposity. Male perceived health was predicted positively by averageness, symmetry, and skin yellowness and negatively by adiposity. Female perceived health was predicted positively by femininity. In terms of actual health, there was little evidence that female facial appearance signalled health. In men, semen quality was positively predicted by masculinity, suggesting that masculinity may be a signal of male fertility.
Consistent with previous studies, our findings demonstrate the importance of sexual dimorphism, symmetry, averageness, and adiposity in determining attractiveness 2,18 . Sexual dimorphism, symmetry, and averageness have often been studied in terms of face shape. However, these traits can be influenced by colour as well 22,40 . Indeed, we found that redness was related to femininity in women. In men, lightness and yellowness were related to symmetry. It is possible, therefore, that the relationship between attractiveness and traits such as sexual dimorphism, symmetry, and averageness were partly accounted for by variations in skin colour. However, we did not find any significant relationships between skin colour and attractiveness in either sex. The null findings suggest that, despite recent interest in skin colour as a potential sexual signal, skin colour might play limited role in facial attractiveness. Consequently, it is unlikely that the observed relationships between attractiveness and sexual dimorphism, symmetry, and averageness were due to variations in skin colour.
Earlier studies examining the relationship between skin colour and attractiveness have used morphed images that were manipulated to vary in skin colour, finding that increasing yellowness increased attractiveness 19,20 . Although such studies have the advantage of manipulating specific facial features while holding all others constant, it is important to know whether naturally occurring variation in skin colour, both within and between individuals, is related to attractiveness. A recent placebo-controlled experimental study conducted by our group using the carotenoid beta-carotene to manipulate skin colour found that beta-carotene supplementation increased skin yellowness and redness to enhance attractiveness within individuals 31 . In terms of variation across individuals, only two studies have examined the relationship using natural, un-manipulated images in men 21,35 and only one study has done so in women 41 . Scott et al. 35 found that skin yellowness positively predicted attractiveness in British Caucasian men (N = 75). Stephen et al. 21 found that yellowness positively predicted attractiveness in both British Caucasian (N = 34) and African (N = 41) men. They also found that lightness positively predicted attractiveness in the British Caucasian men and had a curvilinear relationship with attractiveness in the African men. In African women, Coetzee et al. 41 found that attractiveness was positively related to a skin colour principal component that indicated lighter, yellower and redder skin colour (N = 45). Using a larger sample of Australian Caucasians (101 men and 80 women), we found no evidence to support the findings of these three previous correlational studies. Collectively, the results from the literature suggest that skin colour may well influence attractiveness within individuals. But its effect among individuals is less consistent. Given that humans often interact with others on a recurrent basis, one possibility is that skin colour might function as a short-term signal of within-individual condition that allows us to monitor the well-being of individuals that we associate with. Besides predicting attractiveness, masculinity also positively predicted semen quality. This result is in line with the phenotype-linked fertility hypothesis 13 , which proposes that male secondary sexual traits are attractive because they provide information about male fertility. However, our result is not consistent with those of previous studies. Soler et al. 38 found that the width of the face, a measurement that has been linked to masculinity, was negatively related to semen quality. Similarly, Simmons et al. 42 found that voice masculinity was negatively related to semen quality. In contrast, Peters et al. 39 found no relationship between rated masculinity and semen quality    that was quantified manually. Key methodological differences exist between each of these studies and the present one, including how facial masculinity was quantified (i.e. measured vs rated; as discussed above), the domain from which masculinity was determined (i.e. voice vs face), and how semen quality was measured (i.e. manual vs automated). These methodological differences might contribute to the different findings.
In contrast to the positive relationship between masculinity and semen quality, we found a negative relationship between facial averageness and semen quality in men. In general, theories of sexual signalling predict that attractive appearance would be positively related to actual health 10,11,13 . However, life-history trade-offs can lead to negative relationships between appearance and health, especially in appearance traits that are under good-genes selection [43][44][45] . From a life history perspective, individuals have limited resources, which they have to allocate to different life history traits 43 . If one allocates more resources to one life history trait, less is available for other traits. Depending on the environment and the individual's ability to pay the cost of diverting resources from one life history trait to another, trade-offs can lead to either positive or negative relationships between life history traits 44,45 . Trade-offs can occur between different aspects of health. An example is the negative relationship between immune function and semen quality in men found in the present study, which is consistent with suggestions that male immune function is traded-off against fertility 46,47 . Trade-offs may also occur between health and appearance, which have implications for understanding how attractive traits maintain honesty as signals of health. Animal studies have found negative relationships indicating trade-offs between appearance and health 48,49 . The negative relationship between averageness and semen quality that we found could suggest that trade-offs between appearance and health can occur in humans as well.
We did not find any evidence that female attractiveness signalled health. Neither attractiveness nor its components (i.e. femininity and adiposity) were positively related to any of the health variables. Although our results suggest that female attractiveness does not signal immune function or oxidative stress, it might signal other aspects of health, like fertility. Law Smith et al. 50 showed that female facial femininity is positively related to estrogen levels. Increased estrogen levels have been positively associated with the probability of conception in women 51,52 . It is possible, therefore, that men are picking up estrogen-related fertility cues in women via femininity. In recent years, there has been an increased interest regarding the role that carotenoid-based skin yellowness plays in human sexual selection [19][20][21] . In the present study, we found a negative relationship between skin yellowness and semen quality, which could suggest a trade-off relationship between the two traits. In a previous supplementation study, we found that oral supplementation of the carotenoid beta-carotene did not affect immune function, oxidative stress, or semen quality in humans, suggesting that carotenoids have little impact on human health 31 . Therefore, the negative relationship between skin yellowness and semen quality in the present study might be explained by other unknown factors and not carotenoids. Importantly, we did not find any evidence that skin yellowness predicted attractiveness independent of the other facial traits in either sex. Therefore, it is unlikely that carotenoid-based skin yellowness functions as a sexual signal of health among individuals.
We did not replicate Gangestad et al.'s 37 findings that facial attractiveness and rated masculinity are related to men's oxidative stress levels. We took steps based on their results to maximise our chances of finding significant results. The urine samples that we used to measure oxidative stress were taken during our afternoon lab sessions, which were shown in Gangestad et al. 37 to have a stronger relationship with facial appearance compared to morning-awakening samples. We also used more than one measure of oxidative stress to obtain a more representative measure of systemic oxidative stress 37 . We note that there was a huge difference in the average oxidative stress levels between our participants and those of Gangestad et al. 37 . The mean 8-OHdG levels in our participants were 6.5 ng/mg creatinine for men and 7.7 ng/mg creatinine for women. In contrast, the mean 8-OHdG levels of the male participants in Gangestad et al. 37 were more than 100 times that. We do not know why there was such a huge difference between the two studies, given that the participants from both studies were recruited from similar populations (i.e. relatively young individuals from a university community). The mean 8-OHdG levels in Gangestad et al. 37 were much higher even when compared to published reference values of 8-OHdG levels in healthy individuals, which typically range from 11.9 ng/mg creatinine 53 to 43.9 ng/mg creatinine 54 . In comparison  (71) Table 7. Means and SDs (in ng/mg creatinine) and multiple regression models on the facial appearance predictors of 8-OHdG levels. Separate multiple regression models were conducted for skin colour and other potential components of attractiveness (i.e. sexual dimorphism, averageness, symmetry, and adiposity).  Table 8. Means and SDs (in ng/mg creatinine) and multiple regression models on the facial appearance predictors of 8-isoprostane levels. Separate multiple regression models were conducted for skin colour and other potential components of attractiveness (i.e. sexual dimorphism, averageness, symmetry, and adiposity).

Women
to these values, the mean 8-OHdG levels in our sample are low. Therefore, it is possible we did not find significant results because all our participants had relatively low oxidative stress levels.
Out of the health measures used in the present study, we only found relationships between appearance and male semen quality. Health is complex and multi-faceted. Although we have used multiple measures of health, especially those that have been theoretically linked to human sexual selection, there are many other aspects of health that could be related to human facial appearance. Some examples that have been examined in other studies include the major histocompatibility complex (MHC) genes 55 , health outcomes (e.g. sickness incidences) 56,57 , and longevity 56 . However, mixed results have also been found with such measures [55][56][57] . Apart from the myriad of possible health measures, the measurement of health is further complicated by potential trade-offs between different aspects of health, such as that between immune function and semen quality, observed in the present study. To gain a more complete understanding of the relationship between appearance and health in humans, future studies could examine other aspects of health that have not been studied, preferably with multiple measures of health.
In summary, sexual dimorphism, symmetry, averageness, and adiposity play important roles in attractiveness. Skin colour, on the other hand, did not directly predict attractiveness in either sex. In terms of actual health, there was no evidence that female attractive appearance signalled health. However, we found support for the phenotype-linked fertility hypothesis that male masculinity signalled semen quality in our sample of men. Participants. One hundred and one Caucasian men (mean age ± SD = 20.8 ± 3.6 years, range = 17-35 years) and 80 Caucasian women (mean age ± SD = 21.9 ± 4.6 years, range 17-35 years) were recruited from the University of Western Australia community. They received either course credits or travel remuneration. All participants reported being heterosexual. Forty-three women reported being on various forms of hormonal contraception, 36 reported not using any, and one did not report her hormonal contraceptive usage.  Procedure. Participants first attended a one-hour laboratory session. The sessions were conducted between 12 pm-6 pm to control for potential variations in any of the health measures due to the circadian rhythm. For women who were not using any hormonal contraception, the session was conducted seven to 14 days from the start of their menstruation to control for potential changes in the health variables during to the menstrual cycle.

Methods
Participants were asked to abstain from eating anything or drinking any flavoured drinks for 1 hour before their session. Participants first collected 10 ml urine in a sample vial for oxidative stress assays. After rinsing their mouths and resting for ~15 minutes, they also collected 5 ml of saliva in a sample vial via passive drool for salivary innate immune function assays. Both samples were stored in a 4 °C fridge upon collection and frozen at − 80 °C within 4 hours of collection. During the resting period between rinsing their mouth and collecting the saliva sample, we took photographs of the participants' faces and administered a lifestyle questionnaire. Participants' faces were photographed under standardized lighting. Men were cleanly shaven. Women were not wearing any make-up or artificial tanning. Participants were seated 1.3 m from a Nikon D7000 camera against a grey background with a grey cape draped over them to standardize the colour of their clothing. Spectacles and jewellery were removed. Participants had their hair pulled back with a hairband and were asked to look straight at the camera while adopting a neutral expression with their mouths closed. An X-Rite ColorChecker Classic chart (Grand Rapids, MI, USA) was placed next to the participants' faces for colour calibration purposes. The images were taken in Nikon's NEF raw image format and then converted to lossless PNG files 58 .
The questionnaire included items on their dietary and exercise habits, perceived stress levels, recent or long-term medical conditions, and exposure to various toxins, which we used to identify potential confounding lifestyle factors that might influence our health measures.
After the lab session, male participants also collected a sample of their semen for semen quality measurements. They had to abstain from ejaculating for at least two to no more than six days before doing the collection. They collected the sample at home in a sample vial via masturbation while looking at the front view images of four naked women taken from Thornhill and Grammer 59 . The visual stimulation provided by the images is important for producing a normal ejaculate 60 . Participants were asked to deliver the sample to the laboratory for analysis within one hour of collection. As sperm motility is highly sensitive to fluctuations in ambient temperature, participants were asked to wrap the sample vial using a piece of aluminium foil and maintain its temperature by keeping the vial under their arms or between their legs during delivery. Participants also completed an ejaculate questionnaire, which contained questions about the time of the collection, the percentage of ejaculate collected and the portions lost (initial, middle, end), the number of days since the last ejaculation and the amount of time taken to collect the sample.

Preparation of images.
All images were colour-calibrated using the program Psychomorph 61 to control for subtle random variations in colour due to lighting and photographic conditions. The program adjusts the colour of the images by comparing the CIELab values of the ColorChecker patches in the images to known values. We then rotated and aligned the faces so that the eyes were all sitting at the same height on a horizontal plane. A black oval mask was applied to hide most of the hair, ears, and neck. The masking procedure is widely used in facial perception studies 62-66 . Face ratings. An additional 127 Caucasian men (mean age ± SD = 32.6 ± 8.0 years, range = 19-49 years) and 131 Caucasian women (mean age ± SD = 31.8 ± 7.5 years, range = 18-49 years) were recruited online via Amazon Mechanical Turk to provide opposite-sex ratings for the participants' faces. One potential issue with using online raters in studies involving skin colour is that online raters are using computer screens that have not been colour-calibrated, which might introduce additional noise to the colour of the presented images. However, studies on facial preferences and ratings, including those on skin colour, have been done using online samples [67][68][69] . One study showed that ratings of attractiveness in faces that varied in skin colour did not differ between raters who were tested in the laboratory using a colour-calibrated monitor versus those who were tested online using their own computers 68 . This result suggests that face ratings from online samples are comparable to that from laboratory samples. Each rater was randomly assigned to provide opposite-sex ratings on one of the following: attractiveness, perceived health, sexual dimorphism (masculinity for male faces and femininity for female faces), averageness, symmetry, and adiposity on a 9-point scale (1 = not at all, 9 = extremely). For averageness, we asked raters to rate the distinctiveness of the opposite-sex faces following previous studies and reverse coded the ratings to obtain a measure of averageness 6,9,70 . The face images were cropped and presented at 372 × 491 pixels at a resolution of 72 pixels/in. Each face remained onscreen until the rater provided a response. The number of opposite-sex raters for each trait ranged from 16 to 31. With the exception of male ratings of female averageness, which had a moderate Cronbach's alpha of 0.53, the inter-rater reliability of all ratings was high, with a Cronbach's alpha range of 0.74 to 0.96 (see Table S4 for details). For each trait, we calculated an average rating for each face by averaging across raters.
Face colour measurements. We used the program ImageJ (http://imagej.nih.gov/ij/) to measure facial skin colour from the colour-calibrated images. The colour measurements were based on ten 60 × 60 pixel skin patches. Four patches were taken from the forehead and six were taken from either side of the cheek. Previous studies have measured facial skin colour from similar regions 20 . Patches were taken from regions without blemishes, shadows, or specular highlights. Average RGB values were extracted from each patch and converted to CIELab values using the equations from the website EasyRGB (http://www.easyrgb.com/index.php?X= MATH). The CIELab colour space has been used in previous studies on carotenoid-based skin yellowness [19][20][21] . It contains three colour axes, namely lightness, redness, and yellowness, which approximate the human colour visual system. The CIELab values were averaged across the ten patches to form average lightness, redness, and yellowness values for each face. Immune function assays. We measured salivary innate immune function using two measures. First, we measured salivary antibacterial capacity against Escherichia coli 71 . Supernatant from each sample was diluted 2:1 using CO 2 -independent media containing 4 mM L-glutamine and incubated with E. coli (ATCC no. 8739) for 30 minutes for bacteria killing to occur. The mixtures were then plated overnight on trypticase soy agar in triplicates. Positive control plates were prepared using the same procedure with bacteria that was diluted with media alone. The concentration of the E. coli was adjusted such that there would be ~100-300 colonies on the positive control plates. Images of the plates were taken with a ruler as a size reference. We quantified the salivary bacteria killing capacity based on the percentage of colonies killed relative to the positive controls, the salivary bacteria growth suppression capacity based on the percentage reduction in average colony size relative to the positive controls, and the overall salivary bacterial immunity based on the percentage reduction in total bacteria area on the plate relative to positive controls. The number of colonies, average colony area, and total colony area were highly repeatable (colony number: Second, we measured salivary lysozyme activity against Micrococcus lysodekticus (ATCC no. 4698) in duplicates. Powdered M. lysodekticus was reconstituted using phosphate buffer saline (PBS) to form a cloudy suspension. Ten microliters of M. lysodekticus was added to 80 μ l of whole saliva in a 96-well plate. Positive controls were created using M. lysodekticus and PBS. The plate was incubated for 10 minutes at 33 °C and the resultant absorbance was measured using a M5 SpectraMax microplate reader (Molecular Devices, Sunnyvale, CA). Lysozyme activity was calculated as the difference in absorbance in the sample wells relative to the positive controls. The absorbance was highly repeatable (R = 0.96, 95% CI [0.95, 0.97]).
Oxidative stress. We measured the level of oxidative DNA damage and the level of lipid peroxidation by quantifying the urinary 8-OHdG and isoprostane levels, respectively, using enzyme-linked immunosorbent assay kits (Northwest Life Science Specialties, Vancouver, VA, USA). For the isoprostane assay, the urine samples were pre-treated with beta-glucuronidase before we ran the assay. The pre-treatment frees isoprostanes that are bound to glucuronic acid in the urine, thus giving us a more accurate measure of systemic lipid peroxidation 72 . Both assays were run in duplicates. We also measured urine concentration in duplicates using colorimetric assays (Northwest Life Science Specialties, Vancouver, VA, USA). We standardized both the 8-OHdG and isoprostane results against urine concentration by expressing the results in terms of ng/mg creatinine. The 8-OHdG, isoprostane, and creatinine assays were highly repeatable (8- Semen quality. Semen quality was measured using a Hamilton Thorne computer aided sperm analysis (CASA) system immediately upon delivery of each sample. Each sample was loaded into a Leja four-chamber semen analysis slide (3 μ l per chamber). The slide was left on a 37 °C warming stage for 2 minutes before we ran the analyses. Six replicate measurements were taken for each sample. The CASA system measures sperm concentration, percentage motile sperm, and seven motility-related variables (Table S2). Nine samples had to be diluted because they were too concentrated for the CASA to analyse. Each sample was diluted 3:1 using its own seminal fluid, which was extracted by centrifuging a portion of the sample at 12470 × g for 5 minutes. The diluted samples were gently pipetted several times to ensure proper mixing before we ran the analysis. We also took note of whether each sample was completely liquified and whether there were any observed abnormalities with each sample when viewed under the microscope (e.g. regions of dead sperms).