Consistently biased: documented consistency in self-reported holiday healthfulness behaviors and associated social desirability bias

Holiday healthfulness conversations are dominated by overindulgence of consumption and then, largely in reference to resolutions to do better, physical activity, and exercise aspirations. Consistency was found in self-reported agreement with a series of holiday healthfulness statements, across time, holidays (Thanksgiving versus Christmas), and samples of respondents. The largest proportion of respondents displaying social desirability bias (SDB) were found in response to two statements, namely “I will consume more alcohol during the holiday season than at other times of the year” at (63–66%) and “I make it a New Year’s Resolution to lose weight” (60–63%). Cheap talk was tested as a mechanism to reduce SDB in holiday healthfulness reporting, but showed only limited efficacy compared to the control group surveyed simultaneously. Nonetheless, the consistency across time in reporting and SDB are notable in both self-reporting of health-related data and in studying a unique consumption period around the holidays. Healthcare providers and researchers alike seek to improve the accuracy of self-reported data, making understanding of biases in reporting on sensitive topics, such as weight gain and eating over the holiday season, of particular interest.


Introduction
H oliday eating is frequently associated with excess. An average holiday meal in the United States is between 3000 and 4500 calories (Jampolis, 2018), while Americans, on average, eat 2481 calories per day (Rehkamp, 2016). Ma et al. (2006) found that during the fall season, daily caloric intake was 86 kcal higher than the spring. Holiday season indulgence results in an average annual gain in bodyweight between mid-November to mid-January ranging from 0.4 to 0.7 kg (Schoeller, 2014). Recommendations for limiting or decreasing holiday weight gain include weighing yourself every day (Kaviani et al., 2019), decreasing consumption by reflecting on the exercise required to counteract the calorie count (Mason et al., 2018), and increasing exercise as part of New Year's resolutions (Hawkes, 2016). Conversely, Stevenson et al. (2013) found that exercise did not prevent holiday weight gain, and was not a significant predictor of body weight changes. Researchers are interested in the behaviors of people during the holiday season. However, due to social desirability bias (SDB), it can be difficult to assure that selfreporting on behaviors including eating, exercise, or even holiday spending, are reflective of reality.
SDB has long been recognized in psychology, being defined by Maccoby and Maccoby (1954). SDB occurs when in a subconscious effort to make themselves look better, a respondent answers a question in a way that deviates from their true behavior towards a real or perceived socially "correct" answer (Maccoby and Maccoby, 1954;Fisher, 1993). Many ways to combat this pervasive SDB phenomenon have been established. Post data collection calibration methods for SDB include the use of the Marlowe-Crowne Social Desirability Scale, which is a set of 13 questions chosen to establish how likely a person is to express SDB (Crowne and Marlowe, 1960). Based on the scale, researchers can set a threshold by which they can estimate models to adjust results; however, these methods are complicated (Crino et al., 1983). Furthermore, the additional questions involved in measuring SDB lengthen the survey instrument and may contribute to survey fatigue, which can result in decreased response rate and data quality (Galesic and Bosnjak, 2009). Another method to combat SDB is indirect questioning. By asking respondents what they believe the average person does, the respondent is likely to project their beliefs and evaluations when responding, without the social pressure associated with revealing one's own actions (Fisher, 1993). Fisher (1993) found that indirect questioning mitigates SDB without systematically affecting the means of questions that were not socially sensitive. This method of combating SDB has been used in a wide range of areas beyond psychology, such as the importance of environmental performance of automobiles (Johansson-Stenman et al., 2006), public goods (Lusk and Norwood, 2009), meat products (Olynk et al., 2010), and pet acquisition (Bir et al., 2018).
Various methods of combatting, or seeking to limit biases in data collection, have been developed, such as using a cheap talk statement to attempt to mitigate hypothetical bias. The term cheap talk was originally coined in game theory literature in reference to a costless transition of signals and information that does not affect the payoffs of the game (Farrell and Gibbons, 1989;Matthews et al., 1991;Farrell and Rabin, 1996). Cummings and Taylor (1999) built on that concept to determine a method to prevent hypothetical bias when asking respondents about hypothetical purchasing decisions in contingent valuation, as opposed to correcting for the bias post data collection. Their version of cheap talk is an explicit discussion of hypothetical bias through reference to budget constraints and budgetary substitutes prior to asking a respondent to make hypothetical choices (Cummings and Taylor, 1999). Inclusion of a cheap talk script in their experiments resulted in responses that were indistinguishable between hypothetical valuation and valuation questions involving actual payments (Cummings and Taylor, 1999).
The objectives of this research are (1) to evaluate the prevalence of SDB as related to holiday eating habits, with data collected between the Thanksgiving and Christmas holidays in the U.S. and (2) evaluate the impact of a cheap talk statement designed to increase awareness of SDB, and mitigate its impact, prior to the presentation of SDB prone questions. There are likely differences in the prevalence of SDB and potential impact of a cheap talk statement for different subjects. Holiday eating habits were chosen for this analysis because there were two recent studies conducted using identical statements to measure SDB to which these results could be directly compared too. Additionally, holiday eating and New Years resolution setting in the U.S. remain topics with consistent social interest, providing a good first measure for the cheap talk experiment. The benefits of a cheap talk statement for SDB mitigation would include decreasing the need for post-data collection adjustments for SDB or the inclusion of SDB scale establishment questions which would lengthen survey instruments and contribute to fatigue of respondents.

Results
General demographics and shopping behavior. The full sample, and the two subsamples (cheap talk and no cheap talk, henceforth control) all had higher proportions of respondents with residence in the South, and lower proportions of respondents from the Midwest or West, and did not graduate from high school when compared to the U.S. population via the U.S. Census (U.S. Census Bureau, 2016) (Table 1). The full sample had lower percentages of respondents who were male when compared to the U.S. population targets. The full sample and the cheap talk subsample had a lower proportion of people aged 25-34 when compared to the U. S. population. There was a higher percentage of respondents who attended college, Associates or Bachelor's Degree earned when compared to the U.S. population targets. Between the two subsamples, statistical differences were found between the percentage of respondents with an income of $0-$24,999 and those aged between 55 and 64 years of age.
Holiday shopping times reported by respondents were Cyber Monday (37%), Black Friday day (27%), and Black Friday morning (20%) ( Table 2). On average, respondents estimated spending the most on holiday gifts ($437.60 for the cheap talk sample, $562.25 for the control sample). Similar spending, in terms of description selection was seen between the two categories of holiday meals and holiday travel. Most respondents (57-61%) indicated they anticipated spending the same amount in 2018 (the year the survey was administered) as they did in 2017 for holiday meals, gifts, and travel. A little over half of respondents (52%) indicated they gave to charities during the holiday season, with 74% indicating they were giving the same to charity as past planned holiday giving.
When comparing the mean response between self and the indirect question regarding the average American for both subsamples, differences were found for two statements between the subsamples (Table 3). For the cheap talk subsample, the average level of agreement between self and average American was statistically different with the exception of "I will maintain my workout schedule during the holiday season", "I will be vigilant about my weight during the holiday season", and "I watch what I eat during the holiday season". For the other statements, the average for self was statistically higher than the average for the average American, indicating the presence of SDB. Conversely, only the statement "I will be vigilant about my weight during the holiday season" was not statistically different between self and average American in the control sample. The average for the selfreported scores for the statements "I will maintain my workout schedule during the holiday season", and "I watch what I eat during the holiday season" were both statistically lower than the average of the scores for the average American, indicating the presence of SDB. For the remaining questions, the average for self was statistically higher than the average American, indicating SDB for those questions in the control sample. The distributions of difference between self and the average American were statistically compared using the Kolmogorov-Smirnov test (Kolmogorov, 1933) (Table 4). The distributions of the differences were statically different between the control group and the cheap-talk group, the cheap-talk group vs. Bir et al. (2020) and Widmar et al. (2016) for the statement "I will maintain my workout schedule during the holiday season". Additionally, the distributions of the results for the cheap talk group and Bir et al. (2020) and the cheap talk group and Widmar et al. (2016) were statistically different for the statement "I will be vigilant about my weight during the holiday season". Finally, the distribution of the cheap-talk and Bir et al. (2020) results were statistically different for the statement "I watch what I eat during the holiday season". Although the analysis of the distributions can serve as a robustness check, it does not provide information regarding whether incidences of SDB have increased or decreased, but only that the distribution is different.
Therefore, incidences of SDB, defined as either having a difference of −1 or less or 1 or greater, depending on the statement and associated directionality of SDB, are presented for the cheap talk sample in Fig. 1, and the control sample in Fig. 2, as well as Table 5. When comparing the proportion of respondents who exhibited SDB, only "I will maintain my workout schedule during the holiday season" had a statistically different percentage of SDB occurrences between the cheap talk (33%), and control samples (41%). For this particular statement, the cheap talk statement decreased the percentage of respondents who exhibited SDB in their agreement with the statement, while for all other statements, no statistical difference (either positive or negative) was found.
In addition to reporting the percentage of respondents who exhibited SDB, Widmar et al. (2016) and Bir et al. (2020) reported the percentage of respondents with spreads of −4 to −3, −2 to −1, 0, 1 to 2, and 3 to 4. The percentage of respondents in each sample who had spreads between self versus the average American were statistically compared to the cheap talk sample and the control sample from this study (Table 5). Focusing specifically on the direction of SDB, either negative or positive scores depending on the statement, there were 7 incidences of statistical differences between the proportion of respondent within a particular spread between the control sample and either the Widmar et al. (2016)   respondents that were in that spread. For the statement "I will maintain my workout schedule during the holiday season", lower percentages of respondents scored −4 to −3 when compared to the Widmar et al. (2016) and Bir et al. (2020) samples. Additionally, lower percentages of respondents scored −2 to −1 when compared to the Bir et al. (2020) sample.
Evaluating SDB occurrences more broadly, when considering the percentage of respondents who exhibit SDB, few differences are found across studies. A smaller proportion of respondents exhibited SDB, as defined as having an SDB score of less than −1 for the statement "I will maintain my workout schedule during the holiday season" in the current cheap talk sample when compared to Widmar et al. (2016) and Bir et al. (2020). The proportion of respondents who exhibited SDB was also lower in the current cheap talk sample for the statement "I watch what I eat during the holiday season" when compared to the Bir et al. (2020) sample. For the control sample, proportions of respondents were statistically different than Bir et al. (2020) and Widmar et al. (2016) for the statement "I anticipate gaining weight during the holiday season". A lower percentage of respondents in the control sample exhibited SDB for the statement "I will be vigilant about my weight during the holiday season" when compared to Bir et al. (2020). Overall, given that these studies span 4 years and 4 samples, and two holidays, the level of consistency in SDB occurrences is notable.

Discussion
Due to the holiday-associated nature of this study, the precision timing afforded by online surveys was instrumental. The comparison between demographics, shopping behavior, and holiday spending indicates that it is unlikely that there are systematic differences between the two subsamples. Subsample characteristics were also statistically compared to determine if there were systematic differences by Rotko et al. (2000) in a study of European air pollution. The amount of money spent during the holidays is notoriously difficult to determine. The holiday shopping season, as defined in the U.S. as the time between Thanksgiving and Christmas, can vary from 26 to 32 days with large impacts on holiday spending (Basker, 2005). Each additional day results in~$6.50 in spending, mostly attributed to impulse purchases (Basker, 2005). Byrnes (2019) stated that on average people in the U.S. would spend nearly $1050 on gifts, goodies, and travel. This is much higher than the combined averages found in this study. However, Byrnes (2019) was using data from the National Retail Federation, in self-stated data, such as the results of this study, people may be under-estimating the amount spent, or may have trouble remembering all purchases made. Although there is social pressure to spend during the holiday season spurred by the idea of gifts as expressions of love (Spector, 2018), there is also social pressure to mitigate spending, as exhibited by the many advice articles and books regarding not overspending (Spector, 2018;Epperson and Dickler, 2019;Karp, 2010).
Evidence of SDB in self-reporting holiday healthfulness-related behaviors was found, which is unsurprising in itself, although perhaps notable for the consistency with which it was documented over time in this analysis along with those of Widmar et al. (2016) and Bir et al. (2020). The literature provides ample evidence of SDB in self-reported health behaviors and outcomes, including both underreporting negatives and over-reporting positive behaviors. Hébert et al. (2001) found that women with a One respondent was not included in the analysis of this question. The respondent spent $3000 on holiday meals (3 times the next highest amount), $199,000 on holiday gifts (25 times the next highest amount), and $13,000 on holiday travel (2 times the next highest amount) and thus were considered an outlier. college educations working in the health system tended to underreport caloric intake and Simons et al. (2015) found an underreporting of sedentary gaming hours among non-active videogame playing youths. Klesges et al. (2004) found that overestimates of self-reported activity, underestimates of sweetened beverage preferences, and lower ratings of weight concerns and dieting behaviors were related to SDB in 8-10-year-old girls. They suggested more research into the role of SDB in complicating relationships observed between self-reported diet and/or physical activity and health outcomes was needed. Adams et al. (2005) suggested that SDB led to an over reporting of physical activity among women in self-reported data. The use of cheap talk to mitigate the percentage of respondents exhibiting SDB was only effective for one of the eight holiday statements studied. Additionally, the inclusion of the cheap talk statement resulted in fewer statistical differences between the average self-score of all respondents compared to the score assigned to the average American. This decrease in statistical difference between self and average American indicates that steps towards convergence of the average American and the selfreported score was occurring for more than just one statement. Despite this only mild success rate, the incidences of SDB did not increase due to the inclusion of the cheap talk statement. The use of cheap talk to prevent hypothetical bias as introduced by Cummings and Talyor (1999) experienced mixed results when being used with other products and scenarios. List (2001) found that the cheap talk script for hypothetical bias did not work on experienced bidders. Champ et al. (2009) found that the hypothetical script only worked for some offer amounts, and did not work on experienced bidders. Previous work has evaluated SDB in other health or food-related contexts for other regions of the world. Bergen and Labonte (2020) evaluated SDB in neonatal and child health care use in Ethiopia. Their identification strategy included using common cues, the nature of responses and choice patterns. They warned that SDB is influenced by accepted attitudes and behaviors, social position and affluence. Studying foodpurchasing behaviors in Australia, Wheeler et al. (2019) found that SDB influenced responses regarding the purchase of organic food, increasing self-reported purchasing frequency. They found while accounting for SDB that respondents were motivated to purchase organic for non-selfish reasons including environmental and public good. The effectiveness of such tangentially related health and food-related questions in other countries would be an interesting extension of this work.
To further provide evidence of the prevalence and consistency of incidences of SDB in holiday eating and exercise-related statements, few differences were found between the SDB results of this study, Widmar et al. (2016) and Bir et al. (2020). The consistency of responses across samples and time is noteworthy for those studying holiday health. The few statistical differences found between the cheap talk subsample and the previous samples were mostly towards decreasing the percentage of respondents who exhibited SDB. The decrease in SDB also supports the idea that incorporating a cheap talk statement prior to SDBsensitive questions may result in a mild decrease of incidences of SDB. Limited evidence of cheap talk reducing SDB in sensitive questions related to holiday eating and healthfulness was found. However, notable consistency across time, samples of respondents, and holidays (Christmas versus Thanksgiving) in terms of both responses and SDB exhibited was documented.
Limitations of research. Although the demographics of the full sample and subsamples closely mirrored the U.S. population, there were some statistical discrepancies. The samples of online survey respondents are often overeducated (Szolnoki and Hoffmann, 2013). However, the benefits of online data collection, including short completion time and affordable implementation, are often thought to outweigh this shortcoming (Louviere et al., 2000;Gao et al., 2009). The use of cheap talk to mitigate the percentage of respondents exhibiting SDB was only effective for one of the eight holiday statements studied. The use of cheap talk to mitigate SDB may be more successful for other SDB prone questions, aside from the holiday-focused statements investigated here. It may simply be that the prevalence of SDB in the holiday eating and exercise statements is so engrained that the cheap talk statement had minimal effect. Or, perhaps there is an inherent difference in holiday-related reporting for the various cultural, economic, and social reasons which holiday spending and celebrations are so wrought with debate. Further research implementing cheap talk for other SDB prone questions could shed light on which situations this type of intervention works to mitigate SDB.

Methods
Survey instruments and data collection. The research project #60460205 was approved by the Purdue University institutional review board. Informed consent was obtained by all participants. Data collection took place during the peak of the 2018 holiday season, with data collection occurring December 18-26, 2018 to correspond with the winter holiday season surrounding Christmas Day (December 25) in the U.S. due to the holiday dinning and health-related questions specific to this data collection effort. Kantar, a company which hosts a large opt-in panel database (Kantar, 2020), was used to obtain the survey respondents, who were required to be 18 years of age or older to participate. Quotas set within Qualtrics, an online survey tool (Qualtrics, 2020), were used to target the proportion of respondents to match the U.S. census proportions for gender, age, education, income, and region of residence (U.S. Census Bureau, 2016). The test of proportions was used to evaluate if there were statistical differences between the subsamples employed in this study, as well as between each of the subsamples and the U.S. census. The one and two tailed tests of population proportion, assuming a normal distribution is calculated as where p 0 is the hypothesized proportion (for example the census percentage), b P is the sample proportion, and n is the sample size (Acock, 2018). Equation (1) was used to compare each subsample to the U.S. population. A test of the difference of two proportions b P 1 and b P 2 , for example comparing the demographics of the two subsamples, can be calculated as where x 1 and x 2 are the total number of successes in the two populations (Acock, 2018 Fig. 1 Holiday SDB cheap talk sample n = 368. Exhibits SDB is defined as having a score of less than −1 or >1, depending on the statement. Black boxes indicate the direction of SDB, therefore the percentage of respondents who indicated SDB can be determined by adding the percentages within the box for that question. For example for the statement "I anticipate gaining weight during the holiday season", the percentage who exhibit SDB can be calculated by adding 7 + 12 + 19 + 21 = 59, and is given to the right of the box. HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-020-00665-x ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2020) 7:178 | https://doi.org/10.1057/s41599-020-00665-x Qualtrics. Data were checked for nonsensical answers and clear outliers, with special attention paid to write in answers. One respondent was not included in the analysis of the question regarding holiday spending. For this write in question the respondent spent $3000 on holiday meals (3 times the next highest amount), $199,000 on holiday gifts (25 times the next highest amount), and $13,000 on holiday travel (2 times the next highest amount) and thus was considered an outlier. The statements presented to respondents were previously employed in evaluations of SDB by Widmar et al. (2016), and Bir et al. (2020). Both Widmar et al. (2016) and Bir et al. (2020) conducted their national-scale U.S. data collection using online surveys surrounding the American November holiday Thanksgiving. Specifically the statements investigated were: "I anticipate gaining weight during the holiday season", "I will gain more weight during the holiday season than during other times of the year", "I make it a New Year's resolution to lose weight", "I will maintain my workout schedule during the holiday season", "I will be vigilant about my weight during the holiday season", "I watch what I eat during the holiday season", "I will consume more desserts during the holiday season than at other times of the year", "I will consume more alcohol during the holiday season than at other times of the year". Respondents were asked to indicate on a scale from 1 (it describes you very well) to 5 (this statement does not describe you at all) how well the statements described them. Then, following indirect questioning (Fisher, 1993) respondents were asked to indicate on a scale from 1 (it describes the average American very well) to 5 (this statement does not describe the average American at all) how well the statements describe the average American. The order of the statements as seen by respondents was randomized for both rating self and the average American.
Two additional datasets are used in this analysis to facilitate robustness checks across samples, time, and holidays (Thanksgiving versus Christmas holiday time periods). Widmar et al. (2016) first used the set of holiday health statements employed in this work. Their targeted to be nationally representative sample of 620 U.S. respondents was collected November 17-19, 2014 (Widmar et al., 2016). Bir et al. (2020) used the same set of holiday health statements as part of a larger exploration of holiday eating with a focus on turkey consumption. Bir et al.'s (2020) nationally representative sample of 565 U.S. respondents was collected November 12-19, 2018.
Developing cheap talk for SDB. Given the propensity for respondents to under-report bad behaviors and over-report good behaviors, this study sought to measure and assess the impact of a cheap talk statement on incidences of SDB. Respondents were randomly assigned to two subsamples which each saw and responded with their level of agreement to identical statements about holiday healthfulness. One subsample was shown a cheap talk statement about SDB prior to providing their level of agreement to the statements for themselves and the average American, and one group was not shown any information prior to reporting their level of agreement to the same eight statements for themselves and the average American.
Since its introduction to contingent valuation literature by Cummings and Taylor (1999), the use of cheap talk statements has expanded. Instructions to respondents, and specifically cheap talk statements have been employed by Lusk (2003) in WTP experiments to minimize hypothetical bias and by Blumenschien et al. (2007) Fig. 2 Holiday SDB control sample n = 367. Exhibits SDB is defined as having a score of less than −1 or >1, depending on the statement. Black boxes indicate the direction of SDB, therefore the percentage of respondents who indicated SDB can be determined by adding the percentages within the box for that question. For example for the statement "I anticipate gaining weight during the holiday season", the percentage who exhibit SDB can be calculated by adding 10 + 10 + 15 + 20 = 55, and is given to the right of the box.
"Human inclination may be to answer questions in a way that deviates from your true behavior in an effort to improve the impression you make on others. This desire to give what is believed to be the socially "correct" or acceptable answer is often referred to as social desirability bias. Please keep this inclination in mind, and try to reflect on your true behavior when answering questions". Statistical comparisons between those who saw the cheap talk statement and those who did not were conducted in two ways.
Evaluating incidences of SDB, and comparing the results of indirect questioning across studies. First, the mean results for the responses regarding self and the holiday healthfulness statements as well as average American for both the cheap talk and control subsamples were calculated. The mean scores for self and average American were statistically compared for each of the two subsamples using a paired t-test. The equation for paired observations is given by where x i is observation i from the self score, y i is observation i from the average American score, n is the number of observations, and S d is the standard deviation (Dixon and Massey, 1983). All analysis was conducted in STATA (StataCorp, 2019). It was hypothesized that there would be fewer statistical differences for the subsample who viewed the cheap talk statement, as their selfassessment would be closer to that of their idea of the average American. This theory is supported by the construct and purpose of indirect questioning (Fisher, 1993). Next, an index was created of the difference between the score respondents gave themselves and the score they gave the average American on the scale from 1 (describes well) to 5 (does not describe well) following Bir et al. (2020) and Widmar et al. (2016). For example, if a person indicated a score of 2 when asked about themselves, and a score of 4 when asked about the average American, the difference would be −2. Depending on the specific statement, either a positive or negative difference between the score chosen for self and the average American indicated the presence of SDB. A positive difference would indicate the presence of SDB for the statement "I anticipate gaining weight during the holidays", "I will gain more weight during the holiday season than during other times of the year", "I make it a New Year's resolution to lose weight", "I will consume more desserts during the holiday season than at other times of the year", and "I will consume more alcohol during the holiday season than at other times of the year". A negative score would indicate the presence of SDB for the statement "I will maintain my workout schedule during the holiday season", "I will be vigilant about my weight during the holiday season", and "I watch what I eat during the holiday season". Each respondent was evaluated as exhibiting or not exhibiting SDB for each of the 8 statements studied. Exhibiting SDB was defined as having a difference between the score for self and average American of 1 or greater, or a difference of −1 or less depending on the statement. Using the test of proportions Eqs. (2) and (3), the proportion of respondents who displayed SDB was statistically compared between the cheap talk and control samples. Additionally, using Eq. (1), the incidences of respondents who displayed SDB were statistically compared between the two subsamples from this study, Widmar et al. (2016) and Bir et al. (2020).
Finally, for a broader evaluation of potential incidences of SDB, Widmar et al. (2016) grouped the difference between self and average American scores as −4 to −3, −2 to −1, 0, 1 to 2, and 3 to 4. In order to compare results between Widmar et al. (2016), Bir et al. (2020) and the results of this study, the same categories of differences between self and average American scores were determined for this study and Bir et al. (2020). These categories were then statistically compared between the different studies using Eqs. (1)-(3) to compare the individual differences between the categories. To better understand the distribution of the differences between self and average American scores for the holiday healthfulness questions the Kolmogrorov-Smirnov test for cumulative distributions was used (Kolmogorov, 1933). The cumulative distribution function for the samples being compared were determined, and then the largest difference was identified. The largest difference between the CDFs were then compared to the critical value. The critical value (D) for a 0.05 statistical significance was calculated as Given the sample size of 368 the critical value was determined as 0.070796.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.