Physical activity, sleep and cardiovascular health data for 50,000 individuals from the MyHeart Counts Study

Studies have established the importance of physical activity and fitness for long-term cardiovascular health, yet limited data exist on the association between objective, real-world large-scale physical activity patterns, fitness, sleep, and cardiovascular health primarily due to difficulties in collecting such datasets. We present data from the MyHeart Counts Cardiovascular Health Study, wherein participants contributed data via an iPhone application built using Apple’s ResearchKit framework and consented to make this data available freely for further research applications. In this smartphone-based study of cardiovascular health, participants recorded daily physical activity, completed health questionnaires, and performed a 6-minute walk fitness test. Data from English-speaking participants aged 18 years or older with a US-registered iPhone who agreed to share their data broadly and who enrolled between the study’s launch and the time of the data freeze for this data release (March 10 2015–October 28 2015) are now available for further research. It is anticipated that releasing this large-scale collection of real-world physical activity, fitness, sleep, and cardiovascular health data will enable the research community to work collaboratively towards improving our understanding of the relationship between cardiovascular indicators, lifestyle, and overall health, as well as inform mobile health research best practices.

In March 2015, MyHeart Counts (https://github.com/ResearchKit/MyHeartCounts) was launched as an observational smartphone-based study developed using Apple's ResearchKit software development library (http://researchkit.org/). The study's goal is to evaluate the feasibility of frequent, remote data sampling of physiologic parameters as measured by smartphone measures of fitness, activity, and sleep. These data may facilitate a more complete understanding of the association between objective measures of health, self-reported disease, and quality of life. Researchers may use these data to characterize activity profiles as to better understand the impact of the number of activity transitions on health 8 . There are many questions that may stem from these data, each of which will require a community of researchers to explore.
Findings from the MyHeart Counts study revealed a clustering of participants by activity pattern into several distinct groups-sedentary, active, active only on workdays, active only on non-workdays. These cluster assignments were found to correlate with participants' self-reported incidence of cardiovascular disease as well as self-reported mental well-being. Additionally, the MyHeart Counts study results suggest that patterns of activity correlate with different incidence of self-reported cardiovascular disease: individuals with multiple short bursts of physical activity throughout the day reported better health as compared to counterparts who performed the same number of minutes of physical activity, but in one longer session. These findings are described in JAMA Cardiology 8 .
MyHeart Counts utilized remote enrollment and consent in which participants self-guide through a visually engaging eConsent, in addition to a traditional consent form, prior to deciding to join the study 9 (Fig. 1). A critical aspect of this transparent consent process is providing participants with an explicit decision point, allowing them to specify if the de-identified data they donate to the study can also be studied by qualified researchers worldwide 10 .
The MyHeart Counts iOS app was downloaded among 110,056 users from March 9, 2015 to October 28, 2015. The number of users who enrolled in the study, consented and shared data worldwide (broadly) and Stanford only (narrow) is shown in Fig. 1. The study cohort described here is composed of contributions from study participants who designated broad sharing of their data (n = 34,189).
Among consented participants, 4,900 (10.2%) completed a 6-minute walk test at the end of day 7 in the study. The 6-minute walk test activity not only collects distance traveled, but also accelerometry during the walk. This comprises the largest 6-minute walk data cohort to date [11][12][13][14][15] . Additionally, MyHeart Counts users were able to upload data from wearable devices compatible with HealthKit. The most popular wearable devices used are listed in Table 1. These additional data from wearables enabled more continuous monitoring of activity patterns than through the phone alone.
Our aim in sharing data donated by MyHeart Counts participants is to encourage the consolidation of a broad, diverse, and collaborative community of mobile health researchers. We invite diverse solvers from around the world to engage in better understanding how mobile technologies can impact cardiovascular health.

Methods
These methods are expanded versions of descriptions in our related work 8 .
Participant onboarding. The MyHeart Counts app was made available starting in March 2015 through the Apple App Store (https://itunes.apple.com/us/app/myheart-counts/id972189947?mt=8) in the United States for iPhone 4S or newer requiring a minimum of iOS 8. Enrollment was open to individuals 18 years of age or older who were able to read and understand English and had iPhones registered in the United States. Participants then completed an interactive eConsent process that included animated icons, concise text, and links for more information 16 . In completing the eConsent, participants designated a data sharing preference: only with Stanford ("narrow sharing") or more broadly with qualified researchers worldwide (no default choice was presented) (Fig. 1c).
After completing the eConsent, participants were asked to e-sign an electronically rendered traditional consent form. A copy of the signed consent document was sent to participants by email, allowing for verification of their enrollment in the study. Following enrollment, participants could choose their next actions within the study, including setting a 4 digit passcode or registering a fingerprint scan to secure the study app, or completing preliminary study activities. These data were sent to Stormpath, a service used by the bridge server to perform login and store PHI separately from other forms of study data. As part of onboarding, participants were invited to grant the study app access to their iPhone's HealthKit, Motion Activity, Notifications, and Location Service. Ethical oversight of the study was obtained from Stanford University's Research Compliance Office (Protocol #IRB-31409).

Study tasks.
Consented participants contributed a range of data passively, as well as data that were contributed actively through forms and surveys, and via the 6-minute walk test.
Data collected as part of onboarding included participant account information (name, email, password), as well as study data (gender, height, weight, wake & sleep times). These study data were sent to the server the first time the participant opened the study application after verifying their email post-consent.
We enabled passive data collection from HealthKit and Core Motion when the participant opened their study app for the first time after verifying their email. HealthKit is a framework designed to capture, store, and facilitate sharing of health and physical activity data collected from iPhone sensors between apps. Additionally, a variety of apps and devices may write to HealthKit (e.g., Fitbit Sync Helper, Nike+ Run Club, Apple Watch, Beddit). The MyHeart Counts app captures a variety of body measurements (height, weight), physical activity data (active energy expenditure in kcal, cycling distance, flights climbed, sleep analysis, stand hours, steps, walking and running distance, workouts), health results (blood glucose), and vital signs (diastolic/systolic blood pressure, oxygen saturation) if they have been entered in HealthKit.
www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ With users' permission, during the initial 7-day monitoring period, motion was recorded through the Core Motion coprocessor chip of their iPhone (iPhone 5S or newer). The low-power chip integrates a number of sensor signals, including a triaxial accelerometer, gyroscope, compass, and barometer, to estimate the presence of movement, distance traveled, as well as the modality of movement, (i.e., walking, running, cycling, driving). Throughout the study, users were able to visualize these data on a dashboard built into the app. Data were sent to the server whenever 50 Kb was collected or when older than 24 h, using Wifi or cellular.
On the final day of the study (eighth day post-enrollment), participants were presented with a final set of questionnaires. These consisted of the Well-Being and Risk Perception Survey with additional questions used to compute the participant's Atherosclerotic Cardiovascular Disease Risk Score 17,18 from which a Heart Age 19 was calculated. All survey questions as well as app screenshots of the survey presentation are available on the Synapse MyHeart Counts Public Researcher Portal (Wiki, Data Description, Survey Data Gathered in the MyHeart Counts App 20 .
During the same interval, participants were asked to complete a self-administered 6-minute walk test. The 6-minute walk test is a phone-guided task that triggers the collection of global positioning system displacement-based distances, pedometer-based distances, pedometer step counts, and accelerometer and gyroscope measurements in both raw and processed formats.
Correlation analysis was performed to determine whether a participant's duration of app usage during the first 8 days post-enrollment was associated with responses to the above-mentioned surveys (Fig. 2). It was found that participants with self-reported heart disease, vascular disease, and family history of heart disease used the app longer (Fig. 2a). Specifically, family history of heart disease correlated with 0.23+/−0.13 (p = 1.734e-4) more days of app usage, presence of heart disease correlated with 0.56+/−0.18 (p = 8.78e-10) more days of app usage, and presence of vascular disease correlated with 0.47+/−0.25 (p = 1.90e-4) more days of app usage. Similarly, participants' mental well-being correlated with app usage (Fig. 2b). Participants who reported scores of 8-10 on the "feel worthwhile" and "happy" questions used the app longer as compared to those who reported low (1)(2)(3) or medium (4-7) values. Conversely, those who reported high values on the "worried" and "depressed" questions were found to use the app for a shorter period of time. Self-perceived risk was also significantly associated with duration of app usage (Fig. 2c). On the scale for this survey, those with a medium (score = 3) self-perceived www.nature.com/scientificdata www.nature.com/scientificdata/ 10-year risk used the app 0.29+/−0.03 (p < 2.2e-16) days longer than those with low (score = 1 or 2) or high (score = 4 or score = 5) perceived risk.
Data collection and distribution. Data were sent by the app in encrypted form to Bridge Server, a RESTful API and researcher web interface developed and operated by Sage Bionetworks (http://sagebase.org/) and run on Amazon Web Services (AWS). Bridge is designed to allow collection and management of mobile health data from apps by providing apps the ability to securely create accounts for participants. The server then records consent and identifying personal information required for account creation separately from study data. Separation of personal information from study data is accomplished by storing personal information and accounts in a separate accounts database, and storing study data is S3 buckets on AWS. A dictionary stored in the Bridge server can convert an account identifier, used by the app when sending data, into a healthCode, used by the research team to identify an individual in the coded data (https://developer.sagebridge.org/articles/security.html).
Coded study data, consisting of survey responses, mobile sensor measurements and device data was exported to Synapse (https://www.synapse.org/) for distribution to researchers. Synapse 21 is a general-purpose data and analysis sharing service where members can work collaboratively, analyze data, share insights, and track the attribution and provenance of those insights to share with others. Synapse is developed and operated by Sage Bionetworks as a service to the biomedical research community. These Bridge and Synapse services have been used to support numerous health studies, including all five of the initial ResearchKit apps launched in March 2015 8,22,23 as well as subsequent studies 24 .
Multiple updates of the MyHeart Counts app were released during the study period to address software-related concerns and to implement new features. Because of an initial technical issue with the integration of HealthKit and ResearchKit data, demographic information is missing for a number of early participants. Participants were subsequently emailed to request they upgrade so this missing information could be provided.

Data Records
Data was restricted to records shared by versions of the app released before . For tables containing records lacking an AppVersion column, such as HealthKit and 6MWT Displacement data, data sent before October 28, 2015 was included. A total of 48,968 participants consented to the study and agreed to share their data broadly with the research community. 40,017 participants completed at least one survey or task after joining the study, of whom 34,189 agreed to share their data broadly. 6,870 completed all surveys presented in the first 8 days of their participation and were ages 40-70 years, allowing for computation of their 10-year risk score. 4990 completed at least one 6MWT with 6,927 total 6MWT completed. Clinical and demographic characteristics are provided in Table 2.
The number of study participants who provided daily HealthKit step data and activity pattern data derived from the phone's core motion accelerometer are illustrated in Fig. 3. As an example, the distribution of the total number of days of motion and HealthKit step data provided by users during the study period are also illustrated in Fig. 3.
For the 25,774 participants who supplied location data, we illustrate their geographic distribution (https://www.aggdata.com/free/united-states-zip-codes) by state in Fig. 4. The three states with the largest number of participants are California (n = 9,813), New Jersey (n = 3,560), and New York (n = 3,252).
All coded data sets (Table 3) are stored and accessible via the Synapse platform in a public project with associated metadata and documentation 21 .

technical Validation
The survey data provided here are participant reported outcomes responses. For the current data release, participants were allowed to enter survey responses that may not be physiologically possible. This was corrected in version 2 of MyHeart Counts, released Dec 12, 2016 (data not included in this release).
The Core Motion data provided here are derived from Apple iPhone devices with proprietary technical validation. We do not provide test-retest nor other technical validation data sets here, however others have reported technical validation of the Core Motion sensor in a different context 25 .
The 6-Minute Walk Test was validated in an outdoor setting by comparison of step count and distance reported by the MyHeart Counts app with corresponding values obtained in accordance with the ATS Statement: Guidelines for the Six-Minute Walk Test (n = 20) 1 . On a validation set of 26 tests, mean error was −3.39 yards, mean absolute error was 56.65 yards, and standard deviation was 70.28 yards 8 . A negative correlation (pearson = −0.58) was found between distance walked and Six-Minute Walk Test error. Due to limitations in how the ActiveTask was encoded, if the study app leaves the foreground during a test, the data collected may not be complete.

Limitations. The MyHeart Counts study experienced similar limitations as the five other ResearchKit
fully-mobile large-scale flagship studies 26 . Although the study recruited over 50,000 users within an interval of 6 months, users did not sustain follow up and there was a significant drop off rate as the mean time of engagement with the app was 4.1 days, consistent with the Asthma Health and mPower studies 22,23 . The high dropout rate was due to low barriers to exit and entry which was a double edge sword as it resulted in both increased dropout but also facilitated engagement of individuals difficult to reach with more traditional means. No sign of systematic bias was found in the characteristics of users who dropped out early versus those who remained in the study longer. Readers may find a more comprehensive list of limitations related to the MyHeart Counts study in a previous publication 8    www.nature.com/scientificdata www.nature.com/scientificdata/ While certain data types may have additional Conditions for Use (e.g., HealthKit data), the overarching Conditions for Use are as follows: • You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research • You reaffirm your commitment to the Synapse Awareness and Ethics Pledge • You agree to abide by the guiding principles for responsible research use and data handling as described in the Synapse Governance documents • You commit to keeping these data confidential and secure • You agree to use these data exclusively as described in your submitted Intended Data Use statement • You understand that these data may not be used for commercial advertisement or to re-contact research participants • You agree to report any misuse or data release, intentional or inadvertent, to the Access and Compliance Team (ACT) within 5 business days by emailing act@sagebase.org • You agree to publish findings in open access publications • You promise to acknowledge the research participants as data contributors and study investigators on all publication or presentation resulting from using these data as follows: 'These data were contributed by users of