TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers

We present a novel longitudinal multimodal corpus of physiological and behavioral data collected from direct clinical providers in a hospital workplace. We designed the study to investigate the use of off-the-shelf wearable and environmental sensors to understand individual-specific constructs such as job performance, interpersonal interaction, and well-being of hospital workers over time in their natural day-to-day job settings. We collected behavioral and physiological data from n = 212 participants through Internet-of-Things Bluetooth data hubs, wearable sensors (including a wristband, a biometrics-tracking garment, a smartphone, and an audio-feature recorder), together with a battery of surveys to assess personality traits, behavioral states, job performance, and well-being over time. Besides the default use of the data set, we envision several novel research opportunities and potential applications, including multi-modal and multi-task behavioral modeling, authentication through biometrics, and privacy-aware and privacy-preserving machine learning.


Background & Summary
Maintaining a healthy, productive workforce is an increasingly challenging problem in a complex and frenzied world. Optimal job performance relies on worker wellness, and as organizations strive to prepare their workforce for the evolving demands, worker wellness is increasingly important. Current standards are based on cross-sectional assessment of employee characteristics, often in controlled testing conditions that cannot account for the dynamic nature of working environments and employee performance and are therefore poorly suited for this task 1 . Fortunately, today's densely instrumented world offers tremendous opportunities for unobtrusive and persistent acquisition and analysis of diverse, information-rich time-series data that provide a multi-modal, spatio-temporal characterization of individuals' actions in, and of, the environment within which they operate. However, the connection between individual and group performance, well-being, and quantitative measurements from sensor data has not been established for such dynamic environments in the wild.
To connect job performance-related and well-being-related constructs through self-assessments with data from sensors, we frame well-being and performance within the overarching notion of psychological flexibility. Psychological flexibility refers to an individual's capacity to maintain fluid awareness and acceptance of current circumstances and, depending upon available opportunities, take effective action even when experiencing difficult or unwanted thoughts, emotions, and sensations 2 . Psychological flexibility is defined as a primary individual determinant of behavioral effectiveness and well-being 3 . It has been shown that, in the workplace, the degree to which employees are psychologically flexible can have a profound effect on their productivity, well-being, and success 4,5 . Moreover, the connection between sensor measurements and mental states put forth by the Somatic Marker Hypothesis 6 suggests that the physiological status of our body (i.e., the somatic marker) is an indispensable part of our cognition and emotion, which are building blocks of our mental states. The purpose of our research is to connect psychological flexibility, job-performance, and well-being with somatic and bio-behavioral markers using an in situ experimental study in a real world workplace. Fig. 1 Experimental design. Participants received instructions in a 2-hour on-boarding session, where they completed the first part of the baseline survey and were instructed in the use of sensors and smartphone apps. This session was followed by the second part of the baseline survey and then by 10 weeks of data collection, during which participants wore multiple wearable sensors (wristband, garment and an audio badge) and answered two daily EMAs (Ecological Momentary Assessments) through their personal smartphones. During the off-boarding session, participants handed in their sensors and finished uploading data. After the sensor data collection, they completed a post-study survey.
• Task Performance was assessed using two different measures: -In-Role Behavior (IRB) 11 : Consists of 7 items, each being a Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). A scored is obtained by adding all the responses, for a total score between 7 and 49. -Individual Task Proficiency (ITP) 12 : Consists of 3 items, each being a Likert scale ranging from 1 (very little) to 5 (a great deal). A scored is obtained by averaging all the responses, for a total score between 1 and 5.
• Personality (BFI-2): It was measured using the Big Five Inventory-2 13 . It consists of 60 items, each being a Likert scale ranging from 1 (disagree strongly) to 5 (agree strongly). Five different scores are computed, all in a range between 1 and 5: -Negative Emotionality (neuroticism): Scored by averaging all the negative emotionality responses. • Affect (PANAS): It was measured using the Positive and Negative Affect Schedule-Expanded Form 14 . It consists of 60 items, each being a Likert scale ranging from 1 (very slightly or not at all) to 5 (extremely). Two different scores were computed, with scores in the range 10 to 50: -Positive Affect (POSAFFECT): Score is obtained by adding the positive responses.
-Negative Affect (NEGAFFECT): Score is obtained by adding the negative responses.
• Anxiety (STAI): It was measured using the State Trait Anxiety Inventory 15 . It consists of 20 items, each being a frequency scale ranging from 1 (almost never) to 4 (almost always). It is scored by adding sum responses, obtaining a value in the range 20 to 80.
The following scales correspond to the second part of the baseline survey, and were assessed on a take-home questionnaire. We include a description of each measurement and a brief rationale.
• Demographics (DEMO): Additional demographics assessed several basic characteristics of participants. Specifically, they were asked about race, marital status, pregnancy, number of children living with participants, and housing situation (e.g., rent or own). It also assessed things that were more germane to the particular sample at hand. This included what position the participant currently held at the hospital from which they were recruited, what specific certifications they have (e.g., nurse practitioner), years in professions, what shift they worked (e.g., day or night), how many hours worked at the organization from which participants were recruited, and amount of over time worked. In addition to this, participants were asked about the length of their commute, mode of transportation used in their commute, do they have another job outside of the one from which they were recruited and if so, how many hours do they work there. Lastly, they were asked if they were currently a student, their gender, age, place of birth, English as the native language education level, and job-related demographics (e.g., full-time or part-time, industry, tenure in the organization, and income). • Health (RAND): Health was measured using the Rand Health Survey-Short form 16 . This assesses eight health domains through 36 self-report items. These domains included physical function, role limitations due to physical health, role limitations due to personal or emotional problems, general mental health, social functioning, bodily pain, general health perceptions, and energy/fatigue. This measure also includes one scale  Table 1. Selected sensors with a summary of measurements (output) and instructed use or sensing times. The first three sensing streams were obtained directly from participants through wearable sensors and apps installed in their personal smartphones. All surveys were obtained by direct input of participants on their personal smartphones or a web browser. The last four sensing streams were obtained by placing sensors in the hospital. PPG: photoplethysmography, ECG: electrocardiography.
that assesses perceived change in health. Scores are obtained by computing the mean of the items that are associated with each of the domains listed above. • Life Satisfaction (SWLS): The Satisfaction with Life Scale 17 is a 5-item measure that aims to assess participants' general satisfaction with life. Participants are to rate the degree to which they agree with each statement on a scale of 1 (strongly disagree) to 7 (strongly agree). A total score is obtained by taking the average of the 5 items. • Perceived Stress (PSS): The Perceived Stress Scale 18 is a 10-item scale that aims to assess how often one has experienced stress in the last month. Participants are asked to rate the frequency in which they experience perceive stress on a scale of 0 (never) to 4 (very often). After reverse coding the necessary items, a total score is obtained by taking the average of the 10 items. • Psychological Flexibility (MPFI): The Multidimensional Psychological Flexibility Inventory 19 is a 24-item questionnaire that measures both psychological flexibility as well as inflexibility. The 24-item measure is the short form version. 12 items measure flexibility and 12 items measure inflexibility each being assessed on a scale from 1 (never true) to 6 (always true). The MPFI also measures a number of sub-dimensions: -Psychological Flexibility (PF): Under flexibility there are sub-dimensions for acceptance, present moment awareness, self as context, defusion, values and committed action. -Psychological Inflexibility (PI): The inflexibility sub-scales include experiential avoidance, lack of contact with the present moment, self as content, fusion, lack of contact with values, and inaction.
• Items on this measure ask participants to think about the last two week and to rate the frequency in which they experience the feelings described in each item. PF, PI, and their sub-dimensions are scored by taking the mean of the items that comprised each scale or sub-dimension. • Work Related Acceptance (WAAQ): Additionally, psychological flexibility as related to work was measured by the 7-item Work-related Acceptance and Action Questionnaire 20 . The WAAQ presents a statement and participants rate the degree to which each statement is true on a scale from 1 (never true) to 7 (always true). The WAAQ is scored by taking the mean of the items. • Work Engagement (UWES): Work engagement is measured using the Utrecht Work Engagement Scale 21 .
Work engagement measure presents 9-items and participants rate the frequency in which they have experienced the feeling described on a scale from 0 (never) to 6 (always). Then scores are averaged to obtain a total score. There are three sub-scales: vigor, dedication, and absorption. • Psychological Capital (PCQ): It can be thought of as a higher-order construct that is comprised of hope, self-efficacy, resilience, and optimism 22 . It is assessed through the Psychological Capital Questionnaire through a 12-item measure 23 . The PCQ asks participants the degree to which they agree on a 6-point scale from 1 (strongly disagree) to 6 (strongly agree). • Challenge and Hindrance Stressors (CHSS): Challenge and Hindrance stressors is measured using a 16-items measure where participants were presented with a statement and asked to rate the degree of agreement or disagreement with the statement 24 . 8 items were used to measure challenge stressors and 8 items were used to measure hindrance stressors. Total scores are calculated by computing the mean over all hindrance stressors items and computing separately the mean over all challenge stressor items.
Ecological momentary assessments. The Ecological Momentary Assessments (EMAs) were received twice a day by participants and were divided into two groups. Note that some scales have a "D" appended to their name compared to the baseline survey to denote its daily version. A first group of EMAs assessed job-related variables, health-related variables, and personality. The job-related questions were asked a total of 31 times during the study (every two days), the health-related questions were asked 35 times during the study (every two days), and the personality-related questions were asked 5 times during the length of the study (every two weeks), with a total of 71 surveys administered over the 10 weeks of the study. Participants received one of these surveys daily. The job, health, and personality surveys were sent either at 6am, noon, or 6 pm, and expired 4 hours after they were sent.
Another group of EMAs assessed psychological flexibility and psychological capital. The psychological flexibility form was sent to participants a total of 50 times over the ten weeks (5 times per week), whereas the psychological capital form was received a total of 20 times throughout the same period (2 times per week). Participants received one of these surveys daily. The psychological flexibility and psychological capital EMAs were sent uniformly at random to day shift participants between 11am and 6 pm, and between 11 pm and 6am for night shift participants. They expired 6 hours after their delivery.
Note that some scales have a "D" appended to their name compared to the baseline survey to denote its daily version.
The surveys were implemented using ResearchKit for iOS and ResearchStack for Android (through the TILES app described in Section Phone-apps).
The following items were asked daily to participants during ten weeks and were present each at the beginning of each job, health, and personality EMA (base daily survey).
• Context measures (CONTEXT): These were 4 context questions. The first question asked participants about interactions with other people and the communications channel. The second question asked about the activity in which they were engaged in when they received the survey. The third question asked for current location, and the fourth question asked whether any atypical events had occurred. • Stress (STRESSD): Stress was measured daily using a single that read, "Overall, how would you rate your current level of stress?".
www.nature.com/scientificdata www.nature.com/scientificdata/ • Anxiety (ANXIETY): Anxiety was assessed daily using a single which asked, "Please select the response that shows how anxious you feel at the moment". • Affect (PAND): Participants' positive and negative affect were assessed daily using the 10 items from PANAS-Short 25 . 5 items were used to assess negative affect and 5 items were used to assess positive affect.
The purpose of the Job Performance Survey was to assess participants' perceived job performance, and included the following measurements: • Work today (WORK): Prior to completing the job performance survey, participants were asked if they had worked 1 or more hours on that day. If participants answered no, they were not shown the job performance survey. • Task performance (ITPD, IRBD): Was measured using the same items that were used in the baseline survey described previously. • Organizational citizenship behavior (OCBD)/Counterproductive work behavior (CWBD): These were measured using a total of 16 items (DALAL) 26 , with 8 items per scale.
The purpose of the Health Survey was to assess a number of health-related variables: • Sleep (SLEEPD): Sleep was assessed with a single item that asked participants to specify the number of hours they slept the previous night. Participants were instructed not to confuse this with the number of hours spent in bed. • Physical Activity (EX): Physical activity was measured using two questions. Participants were asked to specify the number of minutes of vigorous activity they engaged in yesterday (e.g., sprinting, power lifting). The second, asked participants how many minutes they spent the previous engaging in moderate physical activity (e.g., jogging, biking). • Tobacco Use (TOB): Tobacco use was measured using two items. The first asked whether the participant used a tobacco product yesterday and if so, a follow-up question was presented which probed how many times tobacco products were used and what type of product was used. • Alcohol Use (ALC): Alcohol use was assessed using 2 items. The first asked whether participants consumed any alcohol yesterday and if they responded yes, they received a question that asked to specify how many beers, wines and spirits they consumed.
The purpose of the Personality Survey was to assess the personality: • Personality (BFID): The personality survey uses BFI-10 (shortened version of the BFI-2 used in the baseline survey previously described).
The Psychological Flexibility Survey included context questions and measures of psychological flexibility: • Context Question (Activity): The first question asked participants to select from a list the type of activity in which they were engaged in immediately before beginning the survey. Example options included travel or commuting, eating and/or drinking, work, and work-related activities. Participants could also respond "other" and specify in text what they were doing. • Context Question (Experience): These items assessed experiences (both pleasant and unpleasant). The question was provided as a checklist (for positive and negative experiences), such as "Difficult thoughts of memories", "Pleasant physical sensations", "Difficult urges or cravings". • Psychological Flexibility (PF): 13 items were included to assess psychological flexibility 2 . Items of the PF survey are divided into 3 sub-scales. Participants were asked to report how true each statement was about themselves during the last 4 hours. They rated each statement on a scale of 1 (Never) to 5 (Always). The mean was calculated for all items in each sub-scale for a total score. This scale was created for this study.
The Engagement/Psychological Capital Survey assessed context (base daily survey), engagement, psychological capital, and challenge and hindrance stressors. It is comprised of items that are non-stigmatizing and/or pathologizing, and that have demonstrated large effect sizes on significant outcomes (e.g., employee health and well-being, job performance, job retention and turn-over) 27 .
• Context questions (Activity): The first question asked participants where they were, and participants selected from a list (e.g., work, home, outdoors, etc.). The second question was the same as the first question participant answered in the context questions for the psychological flexibility questionnaire. • Engagement (Engage): Participants completed a 3-item measure of work engagement 28 . Participants were asked to think about the activity they had just reported doing and how they felt while engaging in that activity. Statements were rated on a scale of 1 (not at all) to 7 (very much). A mean of the 3 items was computed to create a total score. • Psychological Capital (Psycap): It was measured using 12 items from CPC-12 29 . Participants were instructed to rate each statement based on how much they agreed with it. Items were rated on a scale of 1 (not at all) to 7 (very much). The mean for all 12 items was used to compute the total score. • Interpersonal Support (IS): A subset of 3 items from 30 are used to assess daily job resources.
• Challenge/Hindrance Stressors (CS, HS): A subset of 8 items from the baseline survey measure of Challenge/ Hindrance Stressors was used, 4 items to measure each type of stressor 24 . Participants were instructed to consider www.nature.com/scientificdata www.nature.com/scientificdata/ the degree to which they agreed with each statement based on the last day that they had worked, including the day on which they completed the survey. Items were rated on a scale of 1 (not at all) to 7 (very much).
Post-study survey. The Post-study survey is equivalent to the take-home part of the baseline survey, except for not including demographics.
Sensing devices. The initial goal of the study was to predict self-assessed psychological constructs (obtained through surveys) from sensor data. To this end, we selected a set of wearable and environment-sensing devices to obtain physiological and behavioral information from participants. Table 1 summarizes the sensors worn by participants and their intended use throughout the study. Details on the sensor selection can be found in 31 .
Wearable sensors. Participants were instructed to wear a Fitbit Charge 2 wristband (https://help.fitbit. com/?p = charge_2) at all times throughout the duration of the study. Furthermore, at work, they were asked to wear an OMsignal smart garment (https://web.archive.org/web/20181221115159/https://www.omsignal.com/, a T-shirt for men and a sports bra for women, both discontinued) and a Unihertz Jelly Pro smartphone (https:// www.unihertz.com/shop/product/jelly-pro-black-21, Jelly phone for short) as a lapel microphone (or "audio badge"). The Jelly phone was programmed to obtain audio features from the raw audio (which was discarded) 32 . In parallel, these Jelly phones also sent Bluetooth packets at 1Hz over 15s windows every minute, to estimate their locations within the building/work place. These packets had a unique 4 bytes identifier for every participant.
Environmental sensors. There were two kinds of environmental sensors: Owl-in-One (https://shop.reelyactive. com/products/owl-in-one-ble) Bluetooth data hubs and Minew sensors (https://en.minewtech.com/sensor. html). The Owl-in-Ones were used to estimate participant proximity to these by capturing the signal strength of Bluetooth packets from the Jelly phones that participants wore in the hospital and to collect environmental data sent over Bluetooth by Minew sensors.
The Owl-in-Ones were installed in fourteen nursing units (spread over seven of the building floors) and two hospital labs. A total of 244 Owl-in-Ones were installed, about 1.5m above the floor depending on space availability on wall areas near power outlets. Each nursing unit was equipped with an Owl-in-One sensor in these four room types: patient room, nursing station, lounge, and medication room. These different rooms were selected after observing the behavioral patterns of nurses during their shifts (by talking to nursing directors of Keck Hospital and shadowing nurses throughout a workday). Each Owl-in-One was labelled with the study logo, and the phrase "This is a data hub for the TILES study. For more information, please visit https://sail.usc.edu/tiles".
One Owl-in-One was installed in every other patient room, one in every medication room, one in every lounge, and between one and four in nursing stations, depending on the size, layout, and availability of power outlets. In the hospital labs, one Owl-in-One was installed in every lounge, and at least one in each major room (e.g., blood lab, micro-bio lab, shipping/receiving, patient lobby, etc.) depending on the room size and power outlet availability. Figure 2 shows an example of Owl-in-One placements in a nursing unit.
Through information collected from Minew sensors, the Owl-in-Ones also captured (door) motion information, humidity, temperature, and light information across the hospital. Two light (E6) and temperature/humidity (S1) Minew sensors were installed in each nursing unit and each laboratory. These sensors were placed in open areas near the main hallways and within one foot of an Owl-in-One sensor. In the nursing units, one pair of E6/S1 sensors was installed in the nursing stations nearest and farthest from the unit entrance. In the labs, one pair was located near the lab entrance and the other in a frequently occupied open room away from the entrance. Minew motion sensors (E8) were placed on the top outer corner of doors and captured information pertaining to foot traffic through the doorway. One motion sensor was placed on each medicine room door in the nursing units. No sensor was placed on the lounge room doors because they remained open at all times, and none were placed on the unit entrance/exit doors due to fire safety restrictions. In the labs, one motion sensor was placed on the main entrance door and one on the lounge door. A total of 52 motion sensors, 63 light sensors, and 63 temperature/ humidity sensors were installed throughout the hospital.
Phone apps. Several phone apps were installed, with informed consent, on the participants' personal smartphones, for interaction with sensors, data uploading, to receive surveys, and to communicate with the research team.
TILES app. This app was custom-developed for the TILES study and was used both for data collection and for communication with participants throughout the enrollment and data collection periods. It is available for both Android and iOS (see Section subsec:code-availability for details). The EMAs were administered via the TILES app. Participants received a push notification when the EMAs were delivered and again thirty minutes before it expired if it had not yet been completed. Bi-directional communication was enabled via the TILES app as well. Participants could contact the research team at any time through the Contact Info tab. The app also contained a Frequently Asked Questions (FAQs) page which was updated in real time during the study as common questions were identified. In return, participants were notified via push notifications, and the via activity feed within the app of any non-compliance and were reminded to sync each device with its companion app.
Fitbit app. The Fitbit app is a third party app that was used to pair the Fitbit wristband with each participant's personal smartphone using Bluetooth. Participants could visualize the data collected through their Fitbit wristband in this app, and could sync their data with Fitbit's servers.
OMsignal app. The OMsignal app is a third party app that was used to start and stop the recording of the OMsignal garments, update the firmware of OMsignal garments if necessary, and sync the data to OMsignal's servers.
www.nature.com/scientificdata www.nature.com/scientificdata/ RealizD app. RealizD is a third party smartphone application (no longer developed) for iOS and Android that records screen-on time and phone pickups. Data reported by RealizD takes the form of a timestamp marking the start screen-on session and the duration of that session in seconds.

Study procedures.
In this section we describe the mechanisms through which participants were deemed eligible and later recruited and enrolled in the study. We also describe the data collection process. All these steps were conducted in accordance with USC's Health Sciences Campus Institutional Review Board (IRB) approval (study ID HS-17-00876). We present an overview of the study in Fig. 1.
Requirements for eligibility. All volunteer participants were recruited from the University of Southern California's (USC) Keck Hospital. To participate, subjects were required to (a) be employed by the hospital and work, on average, at least 20 hours a week, (b) have exclusive access to an internet and Bluetooth-enabled mobile phone running Android 4.3 or higher or iOS 8 or higher for the 10 weeks of participation, (c) have exclusive access to a personal e-mail for the 10 weeks of participation, (d) have access to WiFi at home for the duration of the 10 week study, (e) be proficient in both speaking and reading English, and (f) be capable of wearing wearable sensors in a way that allows data to be collected and transmitted to the research team.
Recruitment. Participants were recruited using multiple methods, including (a) e-mails to employees from leaders within Keck Hospital informing them about the study and how to sign up, (b) attending employee meetings to inform employees about the study, (c) posting flyers in different parts of the hospital where employees would be likely to see them, (d) information tables set up in the cafeteria, where potential participants could learn more about the study and sign up. Participants who had indicated interest but had not completed the sign-up process were texted by one of the principal investigators to support completion of the sign-up process.
After completing a screening questionnaire to check eligibility, potential participants were sent a text message with a link to download the TILES app. The TILES app then walked them through identity verification, informed consent, downloading and syncing the necessary additional apps, and finally signing up for an in-person enrollment session.
Through the above methods, 365 individuals indicated interest in participating by completing a brief screening questionnaire and were found to be eligible. Of these 365 individuals, 212 participants provided their consent to participate in the study, while 153 did not complete the on-boarding procedures. Participants were recruited in three waves, each with different start and end dates. Table 2 summarizes the dates and number of participants per wave. Over the course of the study, eight participants chose to drop out, due to various reasons, such as a sensor www.nature.com/scientificdata www.nature.com/scientificdata/ becoming uncomfortable or no longer wanting to receive daily surveys. The data of these participants has been kept in the dataset.
Participant enrollment session. After providing their consent to participate, interested individuals signed up for a two-hour in-person enrollment session at the hospital through the TILES app. Upon arrival at the enrollment session, each participant was assigned a unique participant ID. During the first hour, participants completed part I of the baseline survey, under the supervision of a trained research team member. During the second hour, participants received their package of wearable sensors and instructions for use. Each participant received three wearable sensors along with a USB charging hub and two micro USB cables for charging, to help participants streamline the process of charging the sensors. The TILES app sent participants links to download all the necessary apps: Fitbit, OMsignal, and RealizD.
Participants were instructed to wear three sensors (a Fitbit Charge 2, an OMsignal garment, and a Unihertz Jelly Pro smartphone) that collected physiological and behavioral data over a 10-week period. We describe the instructions given to participants in the following paragraphs. Table 1 shows a list of the sensing streams and their instructed use. In addition, participants were instructed to fill part II of the baseline survey at home.
Daily surveys. Participants were informed from the first day of data collection they would start receiving one text message each day they were enrolled in the study. The text message contained a link to the job, health, or personality EMAs that they were expected to complete that day. Participants were instructed to complete the survey as soon as safely possible once they received the text message. A second daily EMA with psychological flexibility or capital surveys was received via a push notification on the participant's phone and contained similar instructions.
The EMAs took no more than 15 minutes to complete, and on most days the survey could be completed in around 5 minutes. Participants who worked on the night shift received the first EMA (job, health, or personality) at either 6 pm, 12am, or 6am and participants who worked the day shift received the job, health, or personality EMAs at either 6am, 12 pm, or 6 pm. Participants were informed that they had 6 hours to complete each survey and they would receive a reminder notification from the TILES app 30 minutes before the link expires if the survey was not complete. The research team then distributed a calendar of the 10-week data collection period with a schedule of when to expect the daily survey each day. For the second (psychological flexibility or capital) EMAs, night shift participants received the surveys at a random time between 11PM and 6AM and were given 6 hours to complete the survey once it had been sent. Day shift participants received these surveys at a random time between 11AM and 6PM and were given 6 hours to complete the survey once it had been sent. All participants would receive a reminder via a push notification 30 minutes before the survey closed to remind them to complete the survey.
Fitbit charge 2. The first wearable sensor distributed to participants was the Fitbit Charge 2. Participants were asked to wear this sensor on their non-dominant hand day and night throughout participation in the study. To properly set-up this sensor, each participant created a Fitbit account and registered the Fitbit Charge 2 as a new device as well as synced the Fitbit app to the TILES app. When prompted by the Fitbit app, participants were asked to give Bluetooth permissions and deny location permissions.
OMsignal garments. Next, participants were given an OMbox and OMsignal garments; men were given five shirts and women were given three bras. The OMbox contains the hardware and software to process, collect, and transmit the information. Participants were asked to charge the sensor prior to each work shift, then connect it to the OMsignal garment, wear the OMsignal garment with the OMbox attached during their work shifts at the hospital, and start OMsignal recordings in the OMsignal app installed in their phones at the beginning of each work shift and stop, save and upload the recording at the end of each work shift. During the enrollment session, each participant paired his/her OMsignal box to his/her account (created through the TILES app) on the OMsignal app on their mobile phone, practiced connecting the OMsignal box to the garment, and saving an OMsignal recording. At the beginning of the data collection, there was no version of the OMsignal app for Android. As a solution, we provided an iPod Touch to each participant with an Android personal smartphone with the OMsignal app installed. This way, they could start and stop recordings and upload the data using WiFi. The research team also helped set up location-based reminders on the iOS devices to help participants remember to start and stop OMsignal recordings when arriving at the hospital as well as leaving.
Unihertz Jelly pro. Participants were given an Unihertz Jelly Pro phone (running Android 7.0). These were either clipped to participants' clothing near the neckline or placed in a shirt pocket. The cases of the Jelly phones  www.nature.com/scientificdata www.nature.com/scientificdata/ were modified, such that the microphone pointed upwards, as described in 32 , to better capture the speech data from the wearer. Participants were asked to charge the Jelly phone prior to each work shift, unlock the Jelly phone, check that the TILES Audio app 32 was running, and upload the audio data at the end of each work shift by pressing the UPLOAD DATA button in the TILES Audio app. Each Jelly phone was linked to the TILES app on each participant's mobile device by scanning a QR code in the TILES app. When prompted by the Jelly phone TILES Audio app, participants were asked to enable permissions (e.g., allowing TILES Audio app to run in the background, access to photos even though the camera was not used, but access was needed for the proper functioning of the app, etc.) and disable location-related services. Additionally, participants were informed of the Jelly Phone TILES Audio app's disable feature (to stop recording audio features) and instructed on how to use this function.
RealizD app. Lastly, participants downloaded the RealizD app on their smartphone and were informed that this app would track how often the phone was picked up and for how long. Participants did not need to interact with the RealizD app during their participation, since it ran in the background.
Phone permissions. For the RealizD app to work, participants were asked to allow location permissions. Participants were also asked to keep WiFi and Bluetooth turned on on their personal mobile phones throughout their participation in the 10-week data collection period.
Environmental sensors. Finally, participants learned about the environmental sensors that were placed around the hospital and informed that no participant interaction with these sensors was required.
Completing the pre-study survey. Following completion of their enrollment session, participants were emailed a link to complete this survey, administered on the online survey platform REDCap.
Data collection. The 10-week data collection took place in three different participant waves. The data collection periods and number of participants per wave are shown in Table 2.
Off-boarding session. After the 10-week data collection from sensors and daily surveys ended, participants attended an in-person off-boarding session, which typically lasted between 15 to 20 minutes. During this session, participants exported mobile application data to members of the research team and returned their wearable sensors (except for Fitbit, see Section Incentives).
Completing the post-study survey. Following completion of their off-boarding session, participants were emailed a link to complete a survey administered on the online survey platform REDCap. This survey was identical to part II of the baseline survey; the only difference is that the demographics survey was removed and a study feedback survey was administered. This survey took approximately 30 minutes to complete. This concluded participant study procedures.
Incentives structure. A novel incentive scheme was developed for the TILES study to encourage compliance. Study participants were awarded with monetary incentives (Table 3) and points for study-related activities, proportionate to the time required to complete each activity. These points later translated to monetary awards. The number of points awarded for each activity is summarized in Table 4. A survey was considered completed if the participant went through all the survey (but they could skip questions). Note that for at least three consecutive days of Fitbit data, the participant received a 2× boost on points received for wearing the Fitbit. Points were converted to monetary compensation on a weekly cadence, according to a set of thresholds noted in Table 5. The expected use of OMsignal garments and Jelly phones was 3 days a week for most of the participant population, so points for wearing and syncing these devices were added to the incentives schemes as bonuses.
In addition to weekly gift cards as incentive, points were accumulated throughout the duration of the study and grand prizes were awarded to the top three point earners per wave. Each participant's current point total and ranking were displayed in the TILES app activity feed. Bonus points were awarded for various activities, as summarized in Table 6. The first, second, and third place point earners across each wave were awarded $250, $200, and $100, respectively.
Participants that finished the 10-week data collection period also kept the Fitbit Charge 2 that they wore during the study. Figure 3 depicts the architecture for the data collection from sensors. On the left column, we have all possible wearable and environmental sensors. Wearable sensors such as Fitbit and OMsignal garments connect to the participants' personal smartphones using Bluetooth. The data are uploaded to a third-party server using an available wireless internet connection (WiFi or LTE).

Data acquisition and flow.
The Jelly phones (used here as audio-features recorders) uploaded data to the research server directly using WiFi only. The Jelly Pro smartphones given to participants also sent Bluetooth packets programmed with a unique identifier that were captured by the Owl-in-One hubs installed throughout the hospital. These packets were combined with their received signal strength indicator (RSSI) computed by the Owl-in-Ones.
The Owl-in-Ones also received data from environmental sensors. Data from both Minew sensors and Jelly phones were sent through Keck Hospital's public WiFi network to reelyActive's servers over UDP, from which the data were collected using the Pareto API 33 over HTTPS. These data were stored securely in the research server after filtering the data to contain only Bluetooth packets generated by our sensors.
Data were also collected directly through the participants' personal smartphones through the TILES app and the RealizD app. The TILES app uploaded data directly to the research server while the RealizD app uploaded (2020) 7:354 | https://doi.org/10.1038/s41597-020-00655-3 www.nature.com/scientificdata www.nature.com/scientificdata/ data to the RealizD server and that data was later pulled to the research server. The research server (code available at https://github.com/usc-sail/tiles-data-collection-pipeline/) consists of a RESTful API hosing a series of endpoints to collect push-type data streams (e.g. Owl-in-One, TILES app) in addition to a suite of tasks to fetch pull-type data streams (e.g. Fitbit, OMsignal).

Data preprocessing. Survey data.
Once the data collection period ended, the baseline survey and EMAs were scored using R scripts (available at https://git.io/JePgE).
Data for the baseline survey were stored in a table where each column represents a single survey question or metadata variable and each row represents a single participant. In contrast, data for the various EMAs were measured daily and stored in multiple files, where each row contains the answers of a single participant to a survey. The files are split by shift (day/night), date, survey kind, and time it was administered. Surveys left unanswered by participants were added as empty surveys later on. All these files were aggregated and curated to obtain three files, one for the first group of EMAs (job, health, and personality in addition to base), one for the psychological flexibility group and one for the psychological capital group (curation scripts are available in the folder src/curation/ of the companion code). We have removed most of the raw questions from the EMAs to preserve participants' privacy (except for those included in Table 7), but have kept the aggregated scores. We also list in Table 8 the demographic variables that underwent additional curation to prevent de-identification.
Free text responses in EMAs have been manually annotated into categories. Three questions are concerned: location when answering, activity engaged in right before answering, and atypical events that happened or are expected to happen. Since some of these categories are subjective, we have between 2 and 5 annotations (one per annotator) for each text response. Each text response can have between 1 and 3 categories associated with by annotators. Fusion of annotation is then performed and the top 3 categories appearing at least twice are reported alongside the frequency of the category in the annotations (e.g. if 2 out of 5 annotators use a category, that category has frequency 2/5 = 0.4). We refer the reader to the README file in the dataset for further details on those categories and how they are reported in the data.
Data for the enrollment session baseline survey (part I), take-home baseline survey (part II) and study-completion survey (part II) were stored in single files each. Variables were renamed to correspond to what each question measured. After the above steps were taken, total scores for each psychological measurement were calculated (scored folder in Table 9).    www.nature.com/scientificdata www.nature.com/scientificdata/ Fitbit data. Fitbit data retrieved using the Fitbit API contained separate time series for measured heart rate and step count, in addition to a daily summary of physical activity and sleep. The heart rate data is reported on non-uniform intervals anywhere between approximately 5s and 15min depending on the participants' physical activity. Occasionally, long strings of repeated identical heart rate values (usually 70bpm) were reported in the raw data, spanning durations typically less than 15 minutes but sometimes up to 20 hours. Because of consumer observations that Fitbit technology sometimes incorrectly reports exactly 70bpm (see https://community.fitbit. com/t5/Blaze/Blaze-s-Heart-Rate-Stuck-on-70-bpm/td-p/2727738) and also because repeated measures of the same heart rate over several minutes is highly unlikely, these long strings were interpreted as artifacts. Thus, sequences of at least 50 repeated identical heart rate values were replaced with NaN (Not a Number, equivalent to missing values). As a result, an average of 0.8% ± 1.7% of each participant's total number of heart rate samples collected were removed. The step count, daily summary, and sleep data did not contain these long string artifacts and, therefore, were not pre-processed.
OMsignal data. The data obtained from the OMsignal's API contained no obvious visible artifacts, and so they were not modified during the pre-processing stage.
Owl-in-One data. The Owl-in-One devices captured packets from all Bluetooth devices broadcasting Bluetooth advertisements at Keck Hospital. We filtered all of these packets and stored only the packets coming from Minew sensors, Jelly phones, and Owl-in-Ones by filtering keywords expected to be found in the packets ("minew", "reelyActive_RA-R436", "jelly"). These were originally stored in JSONL format, and later translated to CSV files containing only the relevant information for easier processing (details below).
RSSI. The RSSI information was pre-processed separately for Minew sensors, Jelly phones, and Owl-in-Ones themselves, and stored in CSV files. All MAC addresses were translated into hospital rooms or locations and formatted into a directory name as follows: [building name]:[floor#]:[wing/area]:[room type] [room #]. These files also include relevant IDs (such as the participant ID associated to a Jelly phone), when appropriate. We have hashed the actual directory names to prevent making the hospital's floor plans publicly available, such that the floor number, unit, and room numbers are kept private. An example is c25c:lounge:2fec.

Instance Action Points
On-boarding Download and install the TILES app 50 Authorize Fitbit access 50 Weekly

Reach at least 275 weekly points 20
Earn more points then the previous week 20 Wear and sync OMsignal device at least two days 20 Wear and sync Jelly Pro device at least two days 20 Off-boarding Export RealizD data 50 Table 6. Bonus points scheme for study participation stages and milestones. Participants received weekly points by wearing the sensors and answering the surveys. These were converted to weekly monetary rewards (Table 5) and added to a global ranking that awarded prices by the end of the study. www.nature.com/scientificdata www.nature.com/scientificdata/ Environmental data. Bluetooth packets sent by Minew sensors contained the measured temperature and humidity, light level, or motion information in their payload. Each packet was received by Owl-in-One devices, time stamped, and sent to the reelyActive's cloud servers where they were processed and sent to the research server. In the research server, the packets were filtered so that only packets containing Minew data were kept as environmental data. All environmental data was further filtered so the only packets recorded contained identifier values that also appeared on the research team's list of identifiers for all installed sensors. Less than 0.1% of the received packets contained corrupted data in the form of invalid source sensor identifiers, which is consistent with the low-energy Bluetooth (BLE) bit error rate. None of the other packet values were observed to be corrupted, including the measured environmental data, so no additional preprocessing was performed.
Audio. Each file contains raw audio features extracted as a combination of the Interspeech 2013 ComParE Vocalization Challenge feature set 34 and openSMILE's emobase feature set 35 . The OpenSMILE toolkit was applied in this configuration to extract acoustic low-level descriptors (LLDs) of 127 dimensions every 10ms using either 25ms or 60ms frame sizes. The configuration file used to extract features is provided with the app itself (the OpenSMILE configuration file is also available at https://git.io/JeiC7). The feature set contains prosodic measures (pitch, intensity), cepstral information (MFCCs 1-12), RASTA PLP features, spectral features (band energy between 250-650Hz, centroid of frequency distribution, spectral rolloffs), and other acoustic characteristics (e.g. LPC 0-7, zero crossing rate).
We did not perform any preprocessing on the raw audio before feature extraction. To extract foreground speech information, we trained a machine learning model to learn to differentiate foreground versus background on a separate corpus collected in-house, with the same audio feature extraction hardware and software, but also with the ground truth audio, and applied it to processing the TILES-2018 Audio Data Record's raw features 36 . The output of these models is temporal foreground predictions in the interval [0, 1], where values close to 1 predict foreground. These temporal foreground predictions are also included in the TILES-2018 Audio Data Record, and described in the Data Records section. To extract data with foreground speech, we recommend thresholding first at 0.5 a median-filtered version of the foreground speech predictions with a window length of 101 samples (corresponding to a 1s window). A non-zero value corresponds then to a row with detected foreground speech.
For the current data release, we further curated the data by only including a subset of the features collected, and omitting filterbank features such as MFCCs and PLPs, as well as LPC features. We believe filterbanks should be released with some form of information obfuscation or encryption, as it contains potentially recoverable language information and poses privacy concerns. We intend to release privacy preserving embeddings on the filterbanks at a later stage. For information on features included in the release, refer to Section Audio.
Inference of days at work. For convenience, we also provide an estimate of working days for all participants. This was obtained using the EMAs, as well as the data collected from the OMsignal garments and a combination of the Jelly phones and Owl-in-One data.
One of the base EMA questions was where the participant currently was (a value equal to 2 indicated currently at work). All of the participants' responses were saved into a table (each row represented a participant, each column a date). Equivalent tables were saved for days in which participants had recorded data through their OMsignal garments and through the Owl-in-Ones receiving pings from the Jelly phones.
All of this information was combined by performing a logical or operation between the tables. This means that if any of the sources of information regarded a given day as a day spent at work, that day was inferred as a day at work.

Data records
The TILES-2018 data 37 is split into two data records: the main data record, and the audio data record. Each data set is described in detail in this section.

TILES-2018 main data record.
The main data record is comprised of several different data streams: fitbit, realizd, omsignal, owlinone, and surveys (following the names of the folders in the record), and a metadata folder. Depending on the kind of data collected, each stream may have subfolders. These are described in the following subsections. A summary of the main data record is presented in Table 9. The total size of the record is about 100 Gb (compressed), presented in csv.gz files. The files per participant are named using  Detailed descriptions for all the data sources are included in each folder under a README file.
Participant summary. The participants were 212 hospital employees who volunteered to participate in the study. They enrolled in 1 of 3 waves of participation, each with different start and end dates ( Heart-rate folder. Each file has rows with a timestamp and PPG heart rate values (beats per minute). The PPG heart rate samples are made available by the Fitbit Charge 2 sensors aggregated over intervals of less than 1min, but the time differences between two consecutive samples are non-uniform.
Sleep-data folder. Each file has rows with the sleepId it corresponds to in sleep-metadata, a timestamp and the sleep phase with its total duration in seconds. Phase is either in classic (one of asleep,  www.nature.com/scientificdata www.nature.com/scientificdata/ restless, or awake) or stages (one of deep, light, rem, or wake). The timestamp determines the beginning of the sleeping phase.
Sleep-metadata folder. Each file has rows for each period of sleep, and metadata for that sleep, including beginning and end, nap versus main sleep, type of inferred sleep phases (classic or stages), duration, and various metrics.
Step-count folder. Each file has rows with a timestamp and step count value. In contrast to heart rate values, step count data is sampled with an interval of 1min, and reports the number of steps taken within that minute.
Metadata. Days-at-work folder. Contains a file for all participants. The information is presented in four tables (one per stream, plus aggregated) where each participant corresponds to a column and each row is a date in the format yyyy-mm-dd.
Participant-info folder. Contains a single file with hash-based participant IDs, nursing unit(s) (if available, using the same hashing as for the Owl-in-One directories), and kind of shift (day or night). We have also included the dropout date if it exists.
OMsignal (omsignal folder. ecg folderEach file has raw, 15s-long electrocardiogram (ECG) snippets sampled at 250Hz and recorded every 5min. Each file corresponds to a single participant. Each row belongs to a single recording identified by record_id, and mapping to the corresponding row in the metadata subfolder.
Features folder. Each file contains rows with a timestamp and a set of physiological and physical activity measurements in real-time (aggregated and saved every second), as well as high-level descriptive features (every 5min). The real-time measurements include breathing rate, breathing depth, intensity, cadence, heart rate, RR intervals (defined as the time elapsed between two successive R waves of the QRS signal on the ECG 38 ), and step count. The high-level descriptive features include statistical aggregations and derived features of real-time measurements over the 5min intervals. Examples include the average and standard deviation of the breathing rates as well as posture.
Metadata folder. Contains one file per participant with metadata information such as dates of usage, usage time in hours, and RR coverage (ratio of successive R waves detections in time over a given time interval) for a given recording.  Table 9. TILES-2018 Main Data Record. There are five main folders containing information for each stream of data, plus a sixth folder containing participant metadata (all presented in alphabetical order). The details of each data stream (including measurements and features) are included in each of the subfolders of the data record as README files.
Jelly folder. The Jelly subfolder is organized with files per participant. Each file contains rows with a timestamp, a participant ID, and the directories of the receiving Owl-in-Ones with corresponding RSSI values.
Minew folder. This folder contains three subfolders: • Data folder. Contains one file per device whose timestamped content depends on the type of sensor: • light sensor: yes/no light detection • motion sensor: acceleration in X, Y, and Z coordinates in m/s², • temperature and humidity sensor: temperature in °C and relative humidity in %. • Locations folder. Contains a file with X and Y coordinates in m. The origin of the system of coordinate (i.e., the point (x, y) = (0, 0)) is arbitrary so that the floor maps of the hospital are not revealed, but the pairwise distances between sensors within a same unit have been kept the same. • Rssi folder. This folder has one file per Minew sensor. Each file contains rows with a timestamp (sorted), the hashed directory of the receiving Owl-in-One, and the corresponding RSSI value.
Owls folder. This folder contains two subfolders: • Locations folder. Contains a file with X and Y coordinates in m from the same arbitrary origin than the minew locations. • Rssi folder. The Owl-in-One files are organized by Unix time days (meaning that the cutoff is that midnight UTC). These files each contain all of the signals sent by Owl-in-Ones and received by them. Sending and receiving MAC addresses have been included, together with the sender and receivers' associated directories.

RealizD (realizd folder). Each
RealizD file describes the interaction that participants have with their smartphones. These files include a column with timestamps for initial interaction times and a column with times in seconds corresponding to the duration of the interaction.
Surveys. The surveys folder contains two subfolders including raw and scored surveys.
Raw folder. This folder contains a README file with all the questions, and two subfolders: • EMAs. Contains three files. One file has the information for health, personality, and job surveys plus the base daily survey in each of these. Each file contains the information on when participants were sent, started, and completed the survey. Except for context questions, we have removed the answers to specific questions to help maintain participants' privacy, and have kept the information on the time until first click in each page, last click, and total time spent in each survey page. A second and third files have the responses for psychological capital and psychological flexibility respectively. These includes times at which the surveys were completed, and the total survey times. Some anonymization details may be found in Table 7, for the full details please refer to the README file. • Post-study. Contains a single file, named with all the assessed scales. Each row corresponds to a participant's answers to each question.
Scored folder. Baseline. Contains two files named with the assessed scales in each part of the survey. In each file, rows correspond to participants, and columns contain the values of the scored scales. Many of these values have been binned to help protect participants' privacy. For details, we refer the reader to Table 8.
• EMAs. Each file corresponds to a scored item/scale assessed throughout the study. Each row in each file corresponds to a participant's scored answers.
• Post-study. Contains a single file, named with all the assessed scales. Each row corresponds to a participant's answers to each question.

TILES-2018 audio data record. The TILES-2018 Audio Data Record contains two different kinds of files
(see Table 10) related to the audio features obtained as per Section Audio. Consent to publish the audio data was given by 186 out of 212 participants, as detailed in Table 11.
Folder structure. Raw-features folderThis folder is organized with subfolders per participant. The name of each data snippet in the participant subfolder is the unix time (corresponding to the time at which the recording started).
Fg-predictions folder. This folder is arranged with subfolders per participant, like the raw-features folder. Each file in a participants' subfolder is a NumPy (.npy) file and corresponds to a file in the raw-features folder. The foreground prediction file stores an array to differentiate foreground (FG) (where FG refers to audio features generated by the participant wearing the audio badge, as opposed to background noise generated by third-party. More details can be found in 36 ) and background (BG) speech activity with values indicating the likelihood of foreground speech information of each row in the corresponding file in the raw-features folder.
www.nature.com/scientificdata www.nature.com/scientificdata/ Features. The features are computed over overlapping frames of raw audio. Frames lengths are typically 25ms (and 60ms for some features) and features are updated and recorded every 10ms (which means roughly an overlap of 60% for 25ms frames and an overlap of 80% for 60ms frames). Finally, some features are computed over windows of several frames.

Technical Validation
Sensor validation. In this section, we give an overview of works in the literature as well as work that we have conducted with the sensor data to validate the results.
OMsignal. We have run two studies to validate the data obtained from the OMsignal garments. In 39 , we studied the differences between the heart rate data between the Fitbit and OM garments in the TILES-2018 dataset, where we observed higher correlation between Fitbit and OM garments than previous studies. In 40 , we also compared the accuracy of the Fitbit's PPG-based heart rate measurements against the OMsignal garments' ECG-based heart rate measurements. We extracted several heart rate variability (HRV) features and studied correlations with stress and anxiety.
Since the discontinuation of OMsignal garments, several white papers comparing their performance with medical-grade devices were removed from the public domain. These papers showed however similar ECG quality between properly-fitted OMsignal garments and medical-grade devices.
Fitbit. There are many studies validating the data from Fitbit devices. For a full list of publications, please refer to https://healthsolutions.fitbit.com/research-library/.
ReelyActive Owl-in-One. The datasheet of the reelyActive Owl-in-One version RA-H443 can be found in 41 . In particular, each Owl-in-One uses a Texas Instruments CC2541 Bluetooth chipset, which runs reelyActive firmware without the Bluetooth Stack to capture Bluetooth packets effectively (this information was obtained through private communication with Jeffrey Dungen, Co-Founder and CEO of reelyActive). We did not run further validation studies. However, the validation of this sensor needs to be done in two stages: At a sensor level, and at a network level. Professor Kevin Berisso from The University of Memphis performed a saturation study for the Owl-in-Ones working as receivers (work currently unpublished, kindly shared with us). In this experiment, 491 Minew beacons transmitted packets at 1 Hz for approximately 6.5 hours, with only one Owl-in-One as a receiver within 2m from all the Minew beacons. There were no lost packets reported for 303/491 Minew beacons, and 188/491 Minew beacons had reported lost packets, with a median of 71.5 lost packets for the 188 beacons with lost packets. The analysis at the network level is in the following section (Section Data Integrity).

TILES Audio Recorder.
We presented an analysis of the audio recorder in 32 . TAR primarily extracts the audio features using openSMILE 42 . OpenSMILE is a widely used tool for extracting a wide range of features from audio signals. To test the feature distortion from the recording device, a recording setup was proposed in 32 to allow TAR to record speech amplified through a speaker. In this feature degradation experiment, 1000 gender-balanced utterances from the TIMIT 43 database were randomly sampled and concatenated into one file. The audio files were then played through a loud-speaker. Multiple TARs set at distances 15cm, 20cm, 25cm, and 30cm from the speaker extracted the audio features simultaneously. This experiment quantified the feature distortion by measuring the root-mean-squared error (RMSE) and cosine distance between the features extracted from the source file and recorded features. The results showed that energy-related features were sensitive to recording distance, but pitch and spectral features yield consistent patterns with different recording distances. The results also showed that errors of pitch, MFCC, and LSP were reasonably low (e.g. pitch under 10Hz), which confirmed the robustness of the feature recorded by TAR.   Table 11. Consent given by participants to share the audio data. The consent was given at the beginning of the study through the TILES app. We have only included the data from participants who allowed their data to be shared.
www.nature.com/scientificdata www.nature.com/scientificdata/ Data integrity. The TILES-2018 data set was collected in a demanding, real-world workplace setting, where participants were asked to use wearable sensors, even though their workload and responsibilities did not change. In this scenario, the compliance rates obtained were in-line with other reported compliance rates for smaller studies, as discussed in 31 . Table 12 shows an overview of the compliance rates for each data stream, across all participants. Opt-out reasons included privacy concerns (for the Jelly audio-feature recordings), as well as discomfort or negative skin reactions to the sensors' materials. Figure 4 shows histograms of the average usage hours for all the wearable sensors, across all participants. For the Jelly histogram, we have used start and stop times, which could lead to noisy estimates. Table 13 shows a measure of sensor compliance in two-week intervals, where we see a slight decrease in compliance as study weeks passed.
Sensor data. OMsignal. Table 14 shows information regarding the length of the recording sessions across waves, where each recording session corresponds to the data available for a day wearing the garment. We observe that in all waves, above 88% of the recordings are longer than 4 hours. Table 15 shows the integrity of the collected data, as measured by the RR coverage of the ECG signal. In this table, we observe that about 60% of recordings have an RR coverage of at 85%. In both tables we observe that the usage and quality are constant across waves for the defined usability and quality. Table 16 also shows the total number of hours recorded through the OMsignal devices, and the number of participants from which these recordings were obtained.
Fitbit. Table 14 shows information regarding the daily number of hours the Fitbit device was used, for all participants across waves. We can see a slightly downward trend through waves, with over 85% of the recordings in Wave 1 being over 8 hours, and close to 79% of the recordings being over 8 hours for Wave 3. The median amount of data discarded by the Fitbit processing steps mentioned in the Data Preprocessing section was 0.3% (approximately 1760/586681 total samples), with all but one participant having less than 7% of their data excluded. The last participant had approximately 20% of their data removed during preprocessing.
Owl-in-one. Figure 5 shows data integrity plots for the Owl-in-One data. Figure 5(a) shows a typical Owl-in-One layout in a nursing unit, where an Owl-in-One sends a packet that is decoded by several Owl-in-Ones in its vicinity. Here, we observe that not all owls in the vicinity are able to decode the packet, even within reach. Figure 5(b) (top) shows the total number of packets we stored in the server, as well as the total number of decodings (from the Figure, 7 decodings on average per packet sent). This figure also shows some network failures during the study (dips in the discontinuous blue line). Figure 5(b) (bottom) shows the proportion of daily corrupted packets over the length of the study and the proportion of daily corrupted decodings over the length of the study. The daily average of corruptions in the sender info is 5.99%, while the average corruptions in the receiver information is 8.49%.
Audio. Table 16 shows the total number of hours recorded through the Jelly Pro phones, and the number of participants from which these recordings were obtained. Computing the data integrity for the audio recordings is a ver challenging problem, since the length of the recordings is variable and depends on a 2-tier sampling procedure: Uniform sampling in time over windows, and voice activity detection (VAD) over these sampling windows. If we assume that the VAD gets triggered throughout all windows for a given person in a quiet room, where only that person is speaking, the expected number of hours recorded is 120 hours for the length of the study (assuming three 12hour-long shifts during 10 weeks).
Survey data. Baseline and post-study. Table 17 shows Cronbach's α for the baseline and post-study surveys, as well as a validation α value as found in the literature. This table shows that most of the assessments had an average α over 0.75, except for the agreeableness and alcohol use scales. In terms of reliability as those compared to those reported in validation studies, we have mixed results: in some cases our computed reliabilities are higher, while sometimes they are lower. However, we have increased reliabilities for all assessed scales in the post-study survey compared to the baseline survey.
EMAs. Figure 6 reports Cronbach's α for each construct of each EMA administered. Some of the assessed constructs show an α > 0.7 for most of the time they were administered (challenge stressors, hindrance stressors, support, psychological capital, engagement, individual task proficiency, psychological flexibility, negative affect, and positive affect). Table 18 shows the percentage of participants that opted to participate in each survey type, the average percentage of surveys per type started by participants, and the percentage of questions answered for started surveys. The table underscores that once a participant elected to start a survey, nearly every question was answered. Figure 7 depicts the cumulative number of participants that at least started answering at most some percentage of all surveys of each type administered throughout the study. The histogram for each survey type illustrates that the majority of participants started responding to at least 75% of each type of survey. Figure 8 shows the median response times for the portion of the EMAs that was asked on a daily basis. These median times correspond to the average times taken by each participant in a given scale. We observe downward trend on median time spent in questions, which we hypothesize could be associated to the participants learning the questions over time, as well as carelessly responding as questions are repeatedly asked over time. We have run unpublished analyses to detect careless responding using response times and other response-level features, however, they are not conclusive, since we observe a continuous scale of careful/careless responding among participants.
Works using the data set. We have published several papers using this data, where we discuss various data processing challenges and opportunities.
www.nature.com/scientificdata www.nature.com/scientificdata/ In 44 , we proposed a technique for clustering and discovering patterns in proximity-based location data of hospital workers, by extracting motifs (repeating patterns) from the length of stay in each location from the proximity-based time series of locations. We used the data in this data set including locations of over 200 participants and over 240 proximity sensors during the ten weeks of the study, and discovered that rooms of similar types (e.g., patient rooms) in the hospital exhibited a unique motif signature. The results suggest that similar motif features could be used in place of knowing the room types in advance and thus simplify very large-scale data collection.
A different approach involves using these proximity-based measurements to localize hospital workers. In our Main Data Record, we provide proximity-based information for 16 different high-traffic indoor settings. We use this information in 45 to propose a novel indoor localization algorithm based on tools from the metric learning community.
In 46 , we explore the usage of physiological time series collected from the Fitbit Charge 2 wristband. We particularly study how to obtain optimal-length motifs from heart rate time series to capture intuitive physiological   31 . † Realizd data only shows the times when phone interaction occurs, thus it is not possible to differentiate between periods with no interaction and the application not working.      Here, a single packet is transmitted (in this case, by purple square) and received and decoded by several other owls (as blue circles, the numbers correspond to RSSI values of decodings at the receivers). These packets contain the sender and receivers' directories. Some of these packets were processed and stored by our pipeline with corrupted information due to transmission errors. (b) (top) total number of distinct packets sent daily in the full Owlin-One network (purple) and total number of decodings by all receiving owls (blue) and (bottom) proportion of packets whose sender directory was corrupt and therefore lost among all packets sent (purple) and receiver directory was corrupt and lost (blue) among all receiver decodings.
www.nature.com/scientificdata www.nature.com/scientificdata/ patterns of workers in their daily lives. The results revealed that regular routine patterns, such as sleep, can be reflected through heart rate time series data.
As emphasized in 31 , one major challenge in conducting studies in naturalistic settings relates to the quality of the data being collected with wearables. This requires the development of sensor quality metrics, missing data imputation methods, as well as quality-aware and artifact-robust parameters. To this end, we developed several such measures for breathing and heart rate time series [47][48][49] .
The data we are publishing with the Audio Data Record proposes a new set of challenges not described in the literature before, which are related to privacy-aware audio processing in a real-world setting with sensitive information. As the Jelly phones ("audio badges") we used recorded all foreground (egocentric audio information) and background audio of each participant's environment, in 36 we trained a machine learning model to detect foreground vs. background audio content in a different, in-house corpus which included raw audio time series alongside the same set of extracted features. We applied it to the Audio Data Record to generate foreground vs. background predictions that allow us to retrieve the egocentric information of a participant.
Finally, in 50 we took a more global approach and used several of the data streams collected through sensors to infer self-assessments of participants, i.e., scored surveys.

Usage Notes
Data access. Due to privacy concerns, we request a signed Data Usage Agreement (DUA) to grant access to all data records. A user signing this DUA agrees to the following: (1) not de-anonymizing the data, (2) not trying to identify language content, and (3) not sharing the data record with anyone not having signed a DUA. The document and the form to submit the signed document are available online here: http://tiles-data.isi.edu. Once validated, the user will receive an email with the information on how to download each data record. www.nature.com/scientificdata www.nature.com/scientificdata/ Main record. The main data record has a total size of about 100 Gb. To be mindful of the use of resources, we ask users to download this record only once.
Audio record. Due to the size of this data record (about 305 GB), we provide 2 subsets of it for convenience. A first data record is about 100 GB, and contains only data when foreground speech has been detected. We believe most users will be interested in this record. The second data record of about 10 GB is from a single user and contains all features extracted from the raw audio, unfiltered, i.e., including segment when no foreground speech has been detected. The complete data record will only be accessible upon request, after testing has been performed on the second subset described above, to be mindful of bandwidth usage. Same as the main records, we ask users to download each data record only once. reading the files. We are sharing all the files as compressed comma separated values (CSV) files (.csv.gz), except for the foreground predictions which are stored as NumPy (.npy) files. We recommend directly reading the compressed files. This can be easily done in Python and R (examples follow in Box 1). Note that. csv.gz files can also be opened directly in LibreOffice Calc (free software alternative to Microsoft Excel: https:// libreoffice.org) without decompression. If you need to decompress the files, we recommend using the command line utility gzip.
Data records: Use cases. This data set was initially developed to model and predict self-report mental states from wearable sensors. However, we are devising more uses for it, and hope that researchers will find other uses for various aspects of the data.

Box 1 Read data files in Python and R.
# In Python, using Pandas import pandas as pd df = pd.read_csv("file.csv.gz") # In R, using data.table library(data.table) df = fread("file.csv.gz") # or using tidyverse library(tidyverse) read_csv("file.csv.gz") -> df www.nature.com/scientificdata www.nature.com/scientificdata/ Multimodal signal processing. This data set proposes several problems in core (multimodal) signal processing. There are several opportunities for data quality enrichment, including the denoising of the ECG snippets and Fitbit heart data, denoising of proximity information for localization, time alignment and synchronization of events from multimodal streams, and voice activity detection from breathing information. There are also new opportunities from a signal processing standpoint in the processing of longitudinal survey information.
Statistical modeling and machine intelligence. This data set presents many opportunities for machine learning. The data set was initially designed to predict self-assessments of participants from sensor data. However, there are opportunities to explore the behavioral dynamics of participants throughout time, including through unsupervised learning, behavioral time series forecasting, and causal inference. Other opportunities involve spatio-temporal modeling of behavior, individualized and group-level behavioral modeling, and multitask learning of behavior patterns.
Privacy. We devise several uses of this data set for privacy researchers. Given the total number of hours of physiologic and behavioral data, an obvious use case is exploring the fingerprinting of individuals from physiologic data and behavioral patterns. We are, however, strongly against using the data set for re-identification of  Table 17. Cronbach's α for scales assessed in the baseline and post-study surveys. α was not calculated for GATS or the PSQI as internal consistency is not necessary for reliability on these measures. N.A.: Not assessed.

Fig. 8
Median EMA response times across participants. Each line shows the median of the average response times per question in a given scale for all participants. We include the baseline EMA questions present in the job, health, and personality surveys, which were asked on a daily basis.  Table 18. Survey participation and compliance rates over started surveys. Compliance is measured as the percentage of answered questions in all surveys that were started. We also include the number of started surveys.