Boston College daily sleep and well-being survey data during early phase of the COVID-19 pandemic

While there was a necessary initial focus on physical health consequences of the COVID-19 pandemic, it is becoming increasingly clear that many have experienced significant social and mental health repercussions as well. It is important to understand the effects of the pandemic on well-being, both as the world continues to recover from the lasting impact of COVID-19 and in the eventual case of future pandemics. On March 20, 2020, we launched an online daily survey study tracking participants’ sleep and mental well-being. Repeated reports of sleep and mental health metrics were collected from participants ages 18–90 during the initial wave of the pandemic (March 20 – June 23, 2020). Given both the comprehensive nature and early start of this assessment, open access to this dataset will allow researchers to answer a range of questions regarding the psychiatric impact of the COVID-19 pandemic and the fallout left in its wake.


Background & Summary
The outbreak of COVID-19 and the societal responses taken to combat its spread have had far reaching consequences, providing a unique opportunity to examine how large groups of individuals fare when exposed to chronic stress and uncertainty. The present study was designed to better understand the repercussions of the COVID-19 pandemic and the subsequent response measures associated with it (e.g. social distancing and "lockdowns") on mental health and sleep patterns. In addition to the daily surveys, we collected comprehensive demographics and extensive assessments of factors related to well-being (e.g. prosocial behavior, emotion regulation strategies, tolerance of uncertainty, personality traits). The extent of the information collected on this large sample of global participants allows for investigation of a number of potential avenues of future investigation, such as individual differences in feelings of social isolation and exercise during the pandemic and how demographic variables such as age 1 or minority status 2 affect mental-health outcomes.
It is well documented that mental wellbeing suffers when under chronic stress, and particularly when individuals do not feel able to control the source of the stress 3 . It is also apparent that sleep can be impaired when under stress, and that changes in sleep can have negative consequences on emotions and mental wellbeing 4,5 . However, other evidence makes it clear that there are situational factors that can alter the resilience shown under stress. For instance, an extensive literature has revealed the importance of social connectedness for mental wellbeing (reviewed by [6][7][8] ) and has shown that there are not only physical benefits to staying physically active but also mental health benefits 9,10 . Therefore we included a number of questions about these situational factors that warrant further exploration.
At the time that this study was designed, in March 2020, it was becoming apparent that the spread of COVID-19 was not going to be easily contained. There had been widespread focus on the physical-health repercussions of the pandemic but relatively less discussion of the mental-health repercussions of the chronic stress associated with the pandemic and the societal responses being taken. On March 20, 2020-a day after the first US state-wide "stay-at-home" order was issued in California-we launched an online survey study meant to capture the impact of the first wave of the pandemic on individuals' mental wellbeing and sleep patterns. As made clear in a recent meta-analysis, the COVID-19 pandemic 11 has led to a high prevalence of sleep disruptions, affecting approximately 40% of people from general and healthcare populations, suggesting the importance of the topic examined in our survey study.
The study was designed to be comprehensive and long-term in nature, asking participants to fill out daily sleep reports and assessments of their mental wellbeing. As the weeks passed, the benefits of this longitudinal design came into focus. The early days and weeks provide insight into how individuals fared as they initially adapted to stay-home recommendations, school closures, and changes to job structures. As time went on, the data provide insight into how individuals cope in the midst of a continuing public health crisis that also has had dire economic consequences 12 . Despite the ongoing challenges that individuals needed to confront, completed analyses have revealed an improvement in participants' reported wellbeing over time: Individuals reported more stress, worry, and negative affect in the March and early April than they did in later April and May 1,2 . As made apparent through these analyses, the primary benefit our frequent and long-term assessment is that it permits a much more fine-grained analysis of changes in sleep and mental health as the days passed during the initial spread of the pandemic, as opposed to collecting snapshots of functioning at one or two time points.
At the time of this submission, the dataset includes a sample of 1,518 participants and 37,882 survey responses. Data has been collected from adults ages 18 to 90 and with an extensive demographic section covering location information (country and, in the case of US, state), employment status, COVID-risk level, race, ethnicity, and sexual orientation and gender identity. As such, these data will enable researchers to explore how the current pandemic experience is differently impacting people by age, economic impact, minority status, and risk-status. Additionally, we are planning a series of follow up assessments to add additional longitudinal information to this already rich dataset, which will also be made available on Open Science Framework (OSF).

participants.
Online recruitment for this dataset began on March 20, 2020 and ended August 5, 2020. In response to recruitment during this time frame, N = 1,899 individual participants completed the online informed consent and were enrolled in the study. Of this initial recruitment, N = 1,518 (age range: 18-90 years old; M = 35.2, SD = 15.1) completed the initial demographic survey, which was required before daily data collection began. All English-speaking adults 18+ were eligible for the study, regardless of pre-existing medical or mental health conditions. Demographics of the entire sample can be found in Online-only Table 1. For completion of daily surveys and one-time assessments described below, participants received entries into raffle drawings for gift cards as compensation. The Institutional Review Board at Boston College approved all consent and assessment procedures under IRB Protocol Number 20.212.01.

Recruitment methods.
Participants were recruited primarily via social-media postings (Facebook, Twitter, Reddit), direct emails to individuals who had expressed an interest in being contacted about research studies, and emails to listservs with members interested in relevant topics (e.g., scientific societies). The timing of the surveys was aligned with the first wave of the COVID-19 pandemic in North America, but attempts were made to encourage a more international sample by using social media outlets and listservs with international reach. However, surveys were only available in English and thus only English-speaking individuals could participate, limiting the ability to gain a representative international sample.
Assessment materials and design. After consenting to the study, participants were assigned a unique and de-identified Participant ID. Participants were asked to enter the ID at the beginning of each assessment they completed for the duration of the study, and to protect the confidentiality of the participants these IDs were the only means of linking their data over time and across assessments. Further, to ensure anonymization, the Participant IDs have been replaced with numeric placeholders in the publicly available datasets. Study data were collected and managed using REDCap 13,14 electronic data capture tools hosted at Boston College. REDCap is a secure, web-based software platform designed to support data capture for research studies. Along with their Participant ID, participants received a link to our initial demographic survey. After completion of the demographic survey, participants were placed in the pipeline to receive information on the rest of the assessments as described below for the duration of the study or until they requested to withdraw (<5%). PDF copies of all survey questions are available with the data on the OSF study page: https://doi.org/10.17605/OSF.IO/GPXWA. Demographic survey. Following the consent protocol, participants received an email with their Participant ID and a link to the initial demographic survey. Participants were required to complete this demographic survey prior to receiving information on any further assessments. Participants were asked to report age, race and ethnicity, natal sex, gender identity, sexual orientation, socioeconomic and military status, education, marital status, number of dependents, and previous diagnoses of serious medical and mental health conditions. Further demographic data was collected in the Round 3 one-time assessment (see below).
Daily survey. After completion of the initial Demographic Survey, participants were enrolled to begin receiving our daily survey assessment. To reduce participant burden, two versions of our daily survey were utilized during the assessment period: the Short Version and the Full Version. To establish a baseline of all metrics included in the daily surveys, participants received the Full Survey for at least three days following completion of the demographic survey. The Full Survey was then sent to all participants on randomly selected days 2 days/week, with the Short Survey sent the remaining 5 days/week. To enhance the quality of the data reported, participants were instructed not to try to make up surveys on days that they missed. The Short and Full Version of the daily surveys are described in detail below.
The Short Version of our daily survey included several questions relevant to the duration and quality of sleep, including bedtime, sleep attempt time, sleep latency, time spent awake after sleep onset, morning wake time, www.nature.com/scientificdata www.nature.com/scientificdata/ and the time participants got out of bed. We also collected daily dream reports, descriptions of activity and exercise, time spent virtually socializing, alcohol consumption, quarantine status, COVID-19 symptoms and diagnosis, and their subjective experience of overall stress. All questions within the Short Version of the survey were optional and participants were asked to respond to any that they were able to given their time and energy that day.
The Full Version of the survey included all questions from the Short Version, as well as questions related to their experience of worry on factors related to COVID-19 (i.e., individual health, health of family, friends, and community, public health, and financial impact), perception of social isolation, current mood using the Positive and Negative Affect Schedule (PANAS) 15 , and symptoms of depression using a modified version of the Patient Health Questionnaire-9 (PHQ-9) 16 that omitted the question assessing suicidality. Most questions within the Full Version were required in order to be submitted, but participation was optional each day it was received.
Participants received either the Short or Full Version of the daily survey at 08:00 in their local time zone every day of the assessment period from March 21, 2020 -May 20, 2020. After May 20th, we discontinued the Short Version of the survey, but continued to send the Full Version of the survey 2 days/week from May 21, 2020 -June 23, 2020. Participants that enrolled in the study after June 23, 2020 only received the Full Version of the survey for the initial three days following completion of the demographic survey. Round 2 Assessment. The Round 2 one-time assessment was launched on June 16, 2020. Initial invitations and reminders to complete the survey were sent via REDCap and email. The Round 2 assessment included the following measures: Insomnia Severity Index (ISI) 23 , Reduced Morningness Eveningness Questionnaire (RMEQ) 24 , Perceived Stress Scale 25 , Toronto Empathy Questionnaire 26 , and a series of questions specifically tailored for events surrounding the COVID19 pandemic. As part of these questions, we asked participants to reflect on their experience and memory of the onset of the pandemic from March -June, 2020 and to recount specific memories from this time-period and to report their general experience of how positive or negative they remember this period of time being. We also ask them to discuss how well their experience matched their initial predictions about the spread of the pandemic and assessed their future predictions for the success of an eventual reintegration process.
Round 3 Assessment. The Round 3 one-time assessment was launched on June 29, 2020. Initial invitations and reminders to complete the survey were sent via email only. The Round 3 assessment included the following measures: Short Urgency-Premeditation-Perseverance-Sensation Seeking-Positive Urgency (UPPS-P) Impulsive Behavior Scale 27 , Intolerance of Uncertainty Scale 28 , Emotion Regulation Questionnaire 29 , Brief Self-Control Scale 30 , and an Exit Survey collecting additional important demographic information that became apparent over the course of the assessment period and their experience as participants in the study. This included a more detailed assessment of pre-existing and current medical and mental health information, COVID-19 high risk factors, purchasing and use of personal protection equipment (PPE), information on essential workers and healthcare professionals, additional economic impacts of COVID19, implementation of stay-at-home orders or other government-ordered measures initiated in their area, and additional assessment of experience with COVID-related dreaming.
All participants had the opportunity to complete the one-time assessments from their initial launch date until August 26th, 2020. Approximately 55% of the participants completed the first one-time assessment, and approximately 42% of participants completed the Round 2 and Round 3 assessments. Of the total sample, 37.2% (n = 564) completed at least one daily survey and all three rounds of the one-time assessments. The full timeline of the study can be found in Fig. 1.

Data Records
All five datasets (demographics, daily survey, and Round 1-3 data) have been anonymised and both the raw and cleaned data are available in CSV (non-proprietary) formats on the Open Science Framework (OSF) platform 31 . We have also included PDF files of the questionnaires and README files in DOCX and PDF format that include variable descriptions and explanation of all data processing done in cleaned versions of the data sets. To assist with hypothesis generation that may be relevant to the spread of COVID-19 in different areas, we have also included an XLSX file with all participant locations that includes dates of first infection, peak infection, peak death rate, dates of lockdowns, and descriptions of lockdowns. All of these materials are available at the following link: https://doi.org/10.17605/OSF.IO/GPXWA.

technical Validation
A Python script was written to confirm that all variables were within expected ranges. Most variables were Likert-type scales that could only take on limited values (e.g., the integers 1 through 7). For these, we confirmed that all responses were in this set of possible values. For open-ended numeric responses (e.g., "How many people do you live with?") we confirmed that all responses were numeric. Impossible values were replaced with missing values (e.g., a person who responded that they worked 8 days per week on average or someone who reported www.nature.com/scientificdata www.nature.com/scientificdata/ drinking over 12,000 alcoholic beverages on a given day), but merely implausible values (e.g., reported sleep time as 22.25 hours) were retained. It is therefore left to researchers using the dataset to determine a plan for dealing with outliers in these variables. For other open-ended responses, obvious mistakes were corrected (e.g., misspelling of a country name) but were otherwise left as is. Details of the data cleaning procedure are available in the README files provided with the data. The original, uncleaned data is also provided.
Descriptive statistics of all calculated variables were checked to make sure that results were within the expected range. The quality control script, which checks for proper formatting of all variables, reports percentages of unlikely values, and reports the number of missing values for each variable, is shared with the data. Online-only Table 2 contains output from a sample of variables/scales from each phase of the study design as a demonstration of this quality control check.

Usage Notes
As noted, the data files on the OSF platform are accompanied by README files in DOCX and PDF format that include descriptions of variables within each dataset and explanations of all data processing that has been done in the processed versions of the data sets. The datasets include both responses to every individual question that was de-identifiable and calculated scale scores in the case of validated measures or composite metrics that we believe may be of interest to researchers. Descriptions of calculated scores can also be found in the README files.
Limitations of the data. Users should keep in mind limitations about the sample, e.g., primarily White participants, skewed toward participants who are female, well-educated, and from the United States (with further skew toward Massachusetts residents) and limitations related to potential biases in attrition over time as well as biases in the days on which daily surveys may have been completed (e.g., participants were encouraged to attend first to their health and thus would likely have skipped surveys on days when they were not feeling well). Users of this dataset should also be aware that the measures rely on self-report, and often on subjective assessments, without any attempts to externally validate the accuracy of those reports.
Another important consideration of this dataset is that the completion of each daily survey and each one-time assessment was optional. As such, the number of entries per participant varies, and because some questions were optional, the sample size additionally varies across metrics. The number of people that have completed each survey and each survey element can be easily determined using the participant ID code provided. Prior to use, data users should be sure to set restrictions on the quantity and timing of the assessments as needed (e.g. minimum number of daily surveys completed, order and timing of completion of one-time assessments, etc.) or use statistical analyses that allow for this type of variability.
Finally, as discussed above in Technical Validation, a small proportion of free response questions contain some potentially implausible responses. This was most apparent in questions regarding the timing of sleep and wake as it both requires free response and REDCap can only verify a standard time entry in 24-hour format, which a minority of participants struggled to remember to do. Further, there are a variety of ways the participants could make errors (e.g. incorrect time in bed, incorrect rise time, using 12-hour instead of 24-hour time format, etc.) and we did not determine a way to safely and confidently catch all errors without making potentially incorrect assumptions or potentially losing good data along with bad. Importantly, concerns of implausible responses even within the sleep data affect only a small percentage of responses (e.g. 93% of 'total sleep time' responses between 4-10 hours, 2.6% of total sleep time responses <2 hours or >12 hours). As such, the goal of our initial data cleaning was to take a conservative approach: instead of trying to correct problems by making additional, potentially incorrect assumptions, we took responses at face value as much as possible. When results were impossible or ambiguous (but not simply unlikely), we then replaced the value with a missing value in the clean version of the data set, rather than trying to guess at the correct response. This still leaves a number of known issues in a small