Real-world longitudinal data collected from the SleepHealth mobile app study

Conducting biomedical research using smartphones is a novel approach to studying health and disease that is only beginning to be meaningfully explored. Gathering large-scale, real-world data to track disease manifestation and long-term trajectory in this manner is quite practical and largely untapped. Researchers can assess large study cohorts using surveys and sensor-based activities that can be interspersed with participants’ daily routines. In addition, this approach offers a medium for researchers to collect contextual and environmental data via device-based sensors, data aggregator frameworks, and connected wearable devices. The main aim of the SleepHealth Mobile App Study (SHMAS) was to gain a better understanding of the relationship between sleep habits and daytime functioning utilizing a novel digital health approach. Secondary goals included assessing the feasibility of a fully-remote approach to obtaining clinical characteristics of participants, evaluating data validity, and examining user retention patterns and data-sharing preferences. Here, we provide a description of data collected from 7,250 participants living in the United States who chose to share their data broadly with the study team and qualified researchers worldwide.


Background & Summary
The Center for Disease Control (CDC) has deemed insufficient sleep to be a "public health epidemic" based on evidence that over one-third of American adults are not getting enough regular sleep 1,2 . More than 40 million Americans suffer from over 80 different sleep disorders, and another 20-30 million suffer from intermittent sleep problems 3 . Sleep accounts for one-quarter to one-third of our day and has a significant effect on health, well-being, and daytime performance 4 . Poor sleep comprising both short and long sleep periods is associated with impaired cognitive performance 5 . Short sleep is linked with seven of the top fifteen causes of death in the United States 6 , decreased workplace productivity 7 , and an estimated $680 billion in economic losses across five major countries 8 .
The diagnosis and treatment of sleep disorders has been an integral part of medical practice for the last several decades, but a more holistic approach focused on health and wellness has recently started to emerge. This integrated approach is known as "sleep health", and it focuses on studying sleep in the general population 9 . Sleep health research is beginning to demonstrate that many facets of sleep are related to important health outcomes 10 .
Traditional sleep research methods utilize questionnaires and sleep diaries to collect subjective reports of sleep 11 . In recent years, electronic surveys have been employed with limited scope and reach. Digital health methods can be used to measure variability in sleep and subsequent effects on daytime functioning over prolonged periods of time 12 . They also allow researchers to reach and enroll significantly larger study cohorts at a lower cost than traditional research methods 13,14 . Few studies carried out to date have explored the utility of assessing sleep health in this manner 15,16 .
The SleepHealth Mobile App Study (SHMAS) was an observational study developed using Apple's ResearchKit (RK) (www.researchkit.org) platform, which has also been used by other recent digital health studies [17][18][19][20][21] . The primary goal of the SHMAS was to explore the relationship between sleep habits and daytime functioning using a novel digital health approach. Secondary goals included investigation of participant characteristics, evaluation of data validity and examination of user retention patterns and data-sharing preferences.
The SHMAS was launched on the Apple App Store on March 2, 2016. The app store landing page was viewed 86,507 times and the application was downloaded 27,502 times. 12,356 participants provided consent to participate in the study through June 26, 2019 (Fig. 1). Study participants who agreed to share data broadly were enrolled in the study for an average of 935.4 days (SD = 355.1, range = 1-1212).
Upon enrollment in the study, baseline demographic and survey data were collected. Participants were instructed to actively monitor their sleep and activity patterns for a minimum of five days in a row every quarter. Participants were encouraged to share Apple Health compatible application and wearable data. Participants were also asked to select whether to share their data with the SleepHealth team only (narrowly) or with the team and other qualified researchers (broadly). In order to complete the consent process, participants had to make an active choice selecting the scope of data sharing as no default choice was specified 22 . The data described here and shared with the research community come from participants who chose to share their data broadly (7,250/12,356; 59%). The cohort consisted of primarily white (78%) males (79%) with a mean age of 37 years (SD = 13, range = 18-87). 40% of the sample completed a 4-year college degree or higher. Baseline demographic characteristics did not appear to meaningfully differ for participants when grouped by sharing status (Table 1), although the comparison yielded statistically significant differences. These statistically significant differences are likely attributable to a large sample size. Large-scale research studies like the SHMAS are testing novel ways to gather real-world data that may contribute to our understanding of long-term disease trajectories and disease management. Early studies have demonstrated the utility of digital health methods for collecting real-world data as well as their potential to fundamentally transform biomedical research 23,24 . However, concerns about engaging participants to provide long-term real-world data as well as the validity of inferences drawn from said data remain an issue 25 . To help address some of these challenges, there is a critical need to share the methods employed and lessons learned from studies carried out in this manner with the greater scientific community while following responsible data sharing guidelines.
We hope that by making SHMAS data available in accordance with FAIR principles 26 , we will provide a rich dataset to researchers interested in studying sleep health. These data capture various sleep health constructs that can help to improve our understanding of sleep habits, sleep patterns, and their relationship with daytime alertness and health. We also hope to raise awareness and interest in sleep research in general and pave the way for more refined sleep studies in the future.

Methods
Participant onboarding. Overview. The SHMAS was conducted remotely through an iPhone application.
After installing the application, prospective participants were given a brief overview of the study and information about the study team (Fig. 2). Eligibility. Prospective participants completed an in-application, three-item screener to determine their eligibility. The study was open to anyone with an iPhone who could install the application, was 18 years of age or older, lived in the United States, and could read and write fluently in English. There were no other criteria for inclusion in the study.

Participant expectations.
After successful completion of the eligibility screener, detailed information about the study was provided (Fig. 3). This included an explanation of how contributed data would be protected and used, the specific types of sensor-based and passive data that would be collected with permission, and the types of surveys and daily activities that prospective participants would be asked to complete. www.nature.com/scientificdata www.nature.com/scientificdata/ Data sharing. Prospective participants were asked whether they wanted to share their study data for secondary research narrowly (With the SleepHealth team only) or broadly (with the SleepHealth team and other qualified researchers worldwide). No default sharing choice was selected. The study protocol and consent procedure were approved by the Western Institutional Review Board, Puyallup, WA (WIRB #20151042).
Understanding of study conditions. Potential participants were given contact information for the study team if they had any questions or concerns. They were then asked three questions about the information contained in the electronic informed consent process to ensure that they understood the terms of the study (Fig. 4). The three questions reinforced the following concepts: (1) if sleep quality got worse over the course of the study, they should contact their medical provider and not rely on the application; (2) that they could withdraw from the study at any time but the data that they contributed would not be deleted; and (3) that the application was a research study and not a commercial application. All three questions had to be answered correctly in order to proceed.
Prospective participants were shown an electronic copy of the consent form and asked to carefully review it. They were then asked to either click "Agree" or "Cancel. " If they clicked "Cancel, " they were not allowed to proceed any further and were not included in the study. If they clicked "Agree, " they were asked to confirm their decision by providing their first name, last name, and electronic signature.
Passive data. Participants were asked if they wanted to share passive data aggregated by Apple's HealthKit framework 27 with the study team. The default option was no sharing. Participants had the option to share as much or as little passive data as they wanted by either opting to share all categories of Healthkit data by selecting "Turn All Categories On" or by selecting individual data categories one by one.
Registration. To complete study registration, participants were asked to provide an email address, password, birthdate, and gender. They also selected a 4-digit passcode and chose permissions for receiving notifications. Participants had the ability to modify their notification permissions within the application at any time.
Account verification, consent form, withdrawal. Participants were required to verify their account via email and received a copy of their electronically signed consent form in PDF format. They could withdraw their consent at any time by selecting the "Leave Study" button from the study dashboard. Any data that participants provided up until the point of withdrawal were retained by the study team.
active data collection. The study consisted of two kinds of active data collection: one-time quarterly surveys and daily activities which were intended to be completed for a minimum of 5 out of 7 days each quarter. Participants had the option to complete the daily activities as many times as they wanted.
Surveys. The names of the surveys administered to participants in the study were: About Me (demographics); My Family (family medical history); My Health (general health and medical conditions); Research Interest (prior research experience/willingness to participate in future research studies); Sleep Habits (sleep duration and quality), and Sleep Assessment (sleep disorder symptoms and daytime functioning). The SHMAS utilized a combination of validated and adapted assessments that are commonly used in sleep research. Specific survey and questionnaire items are described in detail on the SleepHealth Public Researcher Portal.
Daily activities. The AM and PM Check-in activities contained items from the Consensus Sleep Diary (CSD) 28 and queried participants about their previous night's sleep and their current day's activities. The Sleepiness Checker activity was a direct adaptation of the Karolinska Sleepiness Scale (KSS) 29 . It utilized a 9-point Likert www.nature.com/scientificdata www.nature.com/scientificdata/ scale to evaluate subjective sleepiness and was anchored by "extremely alert" to "extremely sleepy, fighting sleep. " (Fig. 5). The KSS is a valid measure of daytime alertness/sleepiness and is highly correlated with EEG and other relevant behavioral metrics. The Sleep Quality Checker used a single-item from the CSD which utilized a 5-point likert scale to rate the previous night's level of sleep quality. It was anchored by "very poor" to "very good. " (Fig. 6). The Alertness Checker was an objective, sensor-based task that was included in the SHMAS to assess sustained attention, which has been shown to be impaired with sleep loss 30 . It was a smartphone adaptation of the 3-minute Psychomotor Vigilance Task (PVT-B) 29,31 . The PVT-B has been demonstrated to be sensitive to reduced alertness resulting from sleep loss as well as improvements in alertness following recovery sleep 32,33 . The PVT-B has also been utilized in situations where it is impractical to use the original 10-minute PVT 34 . Participants were instructed to use the thumb or index finger of their dominant hand to tap their screen as quickly and accurately as possible in response to a target stimulus (Fig. 7). The Nap Tracker was a simple tool designed by the SHMAS Fig. 3 Screenshots of Onboarding Process. Prospective participants were given an overview of how their data would be protected and used. They were briefed on the types of sensor-based and passive Apple Health data that would be collected with their permission and also given an explanation of the types of surveys and daily activities that they would be expected to complete during the course of the study.
www.nature.com/scientificdata www.nature.com/scientificdata/ study team that allowed participants to measure the timing and duration of naps. Participants could tap the screen to start the nap tracker at the beginning of their nap and tap again to stop the nap tracker upon awakening. They were also able to rate the quality of their naps on a scale ranging from "poor" to "excellent. " Active task administration frequency. The SHMAS was designed to be a longitudinal observational study. The baseline onboarding surveys were gradually made available over the first three days of the study rather than all at once. This was done with the intention of making the baseline assessment less overwhelming. Table 2 outlines the survey and activity release schedule. Surveys were intended to be completed once during a baseline period and again at subsequent quarterly follow-ups. Daily activities were intended to supplement baseline and quarterly surveys, and users received notifications to complete them for 7-days in a row during the baseline period. After the baseline period ended, notifications automatically turned off by default. Users could re-enable notifications if desired, and were encouraged to continue to regularly use the daily activities to track their daytime sleepiness, sleep quality and objective vigilance.

Fig. 4
Screenshots of Informed Consent Test Questions. Prospective participants were asked three questions to ensure that they understood what their participation in research entailed: 1) If their sleep quality got worse during the study, they should stop relying on the application and see their doctor, 2) Once they started participating in the study, they were free to withdraw at any time but the data they contributed would not be deleted and 3) That the application was a research study and not a commercial application.
www.nature.com/scientificdata www.nature.com/scientificdata/ Fig. 5 The Sleepiness Checker. The Sleepiness Checker could be used by participants to report their levels of sleepiness throughout the study. Participants received push notifications (if enabled) to complete it twice randomly throughout the day, but there was no limit on the number of times it could be completed. Fig. 6 The Sleep Quality Checker. The Sleep Quality Checker could be used by participants to report their sleep quality for the previous night's sleep. Participants received a push notification 1.5 hours after their reported wake-up times (if enabled) to provide this information, but it could be completed at any time on a given study day.
www.nature.com/scientificdata www.nature.com/scientificdata/ Passive healthkit data collection. Participants had the option to share HealthKit data with the study team during the onboarding process. For participants who had granted HealthKit sharing permissions, HealthKit data were passively collected in the background while the study application was open.

Limitations
The SHMAS was only available to iPhone users residing in the United States. Location-based data were not collected for SHMAS participants in an effort to maintain participant privacy. Recruitment of participants for this iOS application based study may have introduced socioeconomic, gender and racial biases, which has been discussed in previous RK-based study publications 35,36 . Future digital health studies will benefit from being made www.nature.com/scientificdata www.nature.com/scientificdata/ available internationally and across iOS and Android platforms which will allow more diverse study cohorts to be recruited.

technical Validation
The data provided on the SleepHealth Public Researcher Portal (www.synapse.org/sleephealth) come from participants who agreed to share broadly with the study team and qualified researchers worldwide. All data are self-reported unless otherwise indicated, and should be treated as such. The study portal also contains passively collected Apple HealthKit data from participants who consented to data sharing.
Potentially sensitive data such as birthdates and free-text responses were excluded in order to safeguard participant privacy. In order to protect participants with potentially identifiable traits, self-reported height and weight data that fell outside thresholds established in a previously published data descriptor manuscript (height <60 or >78 inches, weight <80 or >350lbs) 35 were excluded and set to 'CENSORED' in the corresponding data frames. Despite the fact that all study participants affirmed that they were 18 years of age or older at the time of study enrollment, some participants later provided dates of birth which indicated that they were under 18 years old. Data contributed by such participants were excluded. Additionally, data from test accounts that were created during the initial study period were also excluded in order to maximize data quality.    www.nature.com/scientificdata www.nature.com/scientificdata/ "These data were contributed by the participants of the SleepHealth Mobile App Study (SHMAS), which was sponsored by the American Sleep Apnea Association with scientific oversight by Carl Stepnowsky, PhD, as described in Synapse: https://www.synapse.org/sleephealth. "

Code availability
The SHMAS was built using Apple's open source ResearchKit framework (https://github.com/researchkit/ researchkit) and AppCore (https://github.com/ResearchKit/AppCore). The SHMAS source code is available upon request. Participant data hosting was handled by IBM Watson Health Cloud (https://www.ibm.com/watsonhealth/), a data collection solution where participant data were stored before being imported into Sage Synapse. The code used to process and clean the raw study data can be found on GitHub (https://github.com/apratap/ SleepHealth_Data_Release). Suggested thresholds for excluding outliers are included in the code but commented out.