Longitudinal dataset of human-building interactions in U.S. offices

Adaptive interactions between building occupants and their surrounding environments affect both energy use and environmental quality, as demonstrated by a large body of modeling research that quantifies the impacts of occupant behavior on building operations. Yet, available occupant field data are insufficient to explore the mechanisms that drive this interaction. This paper introduces data from a one year study of 24 U.S. office occupants that recorded a comprehensive set of possible exogenous and endogenous drivers of personal comfort and behavior over time. The longitudinal data collection protocol merges individual thermal comfort, preference, and behavior information from online daily surveys with datalogger readings of occupants’ local thermal environments and control states, yielding 2503 survey responses alongside tens of thousands of concurrent behavior and environment measurements. These data have been used to uncover links between the built environment, personal variables, and adaptive actions, and the data contribute to international research collaborations focused on understanding the human-building interaction.

population and individual-level thermal preferences. New longitudinal datasets that address such shortcomings are needed to improve the understanding of interactions between occupants and the built environment.
This paper introduces longitudinal data from a one-year study of occupant thermal comfort and several related behavioral adaptations in an air-conditioned office setting in the U.S. Offices were chosen as the context for the research because of their prevalence in the existing literature on occupant behavior data collection and because they are responsible for the most energy use of any single commercial building type in the U.S. 18 . The study develops a longitudinal protocol based in part on the data collection and analysis approaches of previous long-term occupant studies, adding new items that reflect the theoretical underpinnings of Humphrey's adaptive principle 19 and related theories of personal control from the psychology literature 20 . The primary objective of the data collection approach was to record a comprehensive range of exogenous and endogenous factors that may drive personal comfort and behavior outcomes over time, where the latter have not been explored in previous longitudinal studies of occupant behavior despite their suggested importance in previous behavior research [21][22][23][24] .
The longitudinal dataset includes a wide range of variables covering four of the six primary building performance measurement categories identified in the ontology developed by Mahdavi and Taheri 25 , as shown in Fig. 1. For the category of inhabitants, the data include information on occupant position in the building, available control actions, personal attributes and attitudes collected from online surveys. Datalogger measurements of indoor conditions span the hygro-thermal, visual and indoor air quality sub-categories, while concurrent measurements of outdoor conditions include the hygro-thermal sub-category only. Several occupant control actions were also measured on a longitudinal basis, using dataloggers where possible and online surveys otherwise; all data streams were merged based on time stamp as described further in the Methods and Data Records sections.

Methods
Longitudinal data on building occupant behavior, comfort, and environmental conditions were collected between July 2012 and August 2013 at the Friends Center office building in Center City Philadelphia, Pennsylvania, United States. Data collection proceeded in three stages, described further below: (1) semi-structured interviews, (2) site selection and subject recruitment for the longitudinal study, and (3) longitudinal survey and datalogger measurements. Drexel University Institutional Review Board (IRB) approval was received before collecting any of the data reported in this study (protocol #1204001220), and informed consent was obtained from all subjects.
Semi-structured interviews. Semi-structured interviews identify aspects of behavior that are not yet well known or understood and provide a rich qualitative context for developing and interpreting responses from structured survey instruments 26 . To inform the longitudinal portion of the current study, 32 interviews about thermal comfort and related behaviors were first conducted with office occupants from 7 air-conditioned buildings around the Philadelphia region. The building sample ranged from aging to recently renovated; small to large size; and suburban to urban. Interview responses were recorded and scored using the approach reported in Langevin et al. 26 .
Site selection and subject recruitment for the longitudinal study. The Friends Center office building in downtown Philadelphia was chosen for the full longitudinal study following its inclusion in the semi-structured interviews. At 5200 m 2 and four floors (one sub-grade), the Friends Center represents a medium-sized office for the region. The brick-faced building, which was constructed in 1972, has a Building Automation System (BAS) following a renovation to LEED Platinum in 2009, and offers many adaptive opportunities to its occupants. Moreover, the building's interior environment was reported to vary noticeably across the seasons and occasionally within a day, according to the occupant interviews.
Work spaces in the Friends Center include single and shared private offices (enclosed by full height, opaque walls with interior windows), partially open offices (cubicles defined by partial height, opaque partitions), and fully open offices; offices are located at either the core or perimeter of the building floor plan. Perimeter offices are each in close proximity to an exterior window, and the majority of perimeter offices face east, with smaller numbers facing north and west. Spectrally selective, high performance exterior glazing surfaces are offset by a few feet from the brick surface of the building facade, but are otherwise unshaded.
Subject recruitment was initiated through an e-mail message sent to all employees in the Friends Center by its Executive Director. The message outlined the general research approach, the project time period, and remuneration for participants. Included in the e-mail was a link to an online background survey that occupants could fill out to indicate their interest in participating in the full study. The survey was administered via SurveyGizmo and participation was compensated with a $5 gift card. The following question areas were included: (a) demographic information, (b) office characteristics, (c) thermal comfort and preferences, (d) control options, (e) personal values, and f) typical work schedule (arrival, lunch, departure times). A full background survey instrument is available for testing 27 .
The background survey generated a total of 45 occupant responses; from this initial sample, a final sample of 24 occupants was selected for participation in the full longitudinal study using a non-proportionate quota sampling strategy 28 . The strategy looked to achieve at least 1/3 of the final sample (or 8 occupants) in each of a series of key groupings, defined in order of importance by: gender; office type and location; behavioral control options; and comfort/perceived control satisfaction level. Table 1 presents the final occupant sample in terms of these categories.

Longitudinal survey measurements.
Over the course of the following year, the final occupant sample participated in a series of subjective and objective measurements of thermal comfort, adaptive behavior, and related items. These measurements were carried via longitudinal online surveys, as well as through parallel datalogger and BAS measurements of the local environment and behavioral actions; the surveying approach is first outlined here.
Longitudinal surveys were distributed online via SurveyGizmo three times daily (morning, mid-day, afternoon) for a two week period in each of the four seasons, with the time of each survey distribution tailored to work schedules that occupants reported on the background survey. At the time of data collection, SurveyGizmo did not have a native function for repeated surveying of the same subjects; accordingly, the Gmail plugin Boomerang was used to send participants survey links at pre-specified daily times. To incentivize participation, occupants were offered a $5 gift card for every full day of surveying completed, in addition to the $5 previously received for participation in the background survey.
The daily survey instruments included questions about recent work flow, current thermal comfort and sensation, current and recent clothing levels, recent activity, and recent control use. Figure 2 diagrams the full progression of daily survey questions, indicating both the general question areas and associated areas of sub-questioning. A daily survey instrument is available for testing 29 .
Many of the daily survey questions shown in Fig. 2 follow a "show when" logic, where their appearance is conditional on a response to a previous question. The use of "show when" logic minimizes unnecessary questioning, keeping survey completion times down and combating possible survey fatigue. Fatigue was also countered through the use of several image response options, including a series of clothing checklist icons, for example.
Once the daily surveys had been filled out for the full two week period, each occupant completed a final survey that asked about the whole of their two week experience. A two week retrospective survey instrument is available for testing 30 . The final survey asked occupants to log any times when they were out of the office for a significant www.nature.com/scientificdata www.nature.com/scientificdata/ time (e.g., a whole or half day), also querying general comfort and the frequency of behaviors during the previous two week period. The survey also asked occupants to assess the accessibility, time expense, and general clarity of the daily survey instruments. Pairing this feedback with a close examination of the daily survey responses during the given two week period helped inform small revisions to the daily surveys between each seasonal run. Revisions included streamlining the general flow of questions and the addition of short follow-up prompts (for example, on the bodily location of reported thermal discomfort). Revised daily survey instruments were tested through two-day distributions in the time between the full two week seasonal runs to ensure the readiness of these revised instruments for the next two-week implementation.
Overall, the daily surveys generated consistently high response rates (>=85% in each season), yielding a total of 2503 responses from the 24 subjects. The surveys took a median time of just 2.3 minutes to complete and were positively evaluated by occupants. Table 2 summarizes key statistics regarding the performance of the daily survey instruments during each two week seasonal run as well as during the two day test runs in between. environment and behavior datalogger and building automation system (BaS) measurements. Alongside the longitudinal survey data collection, continuous measurements were made of the weather, the local indoor environment of each occupant's office, and a subset of the occupants' behavioral actions. Table 3 summarizes each of these measurements and their observed ranges during the two week surveying periods and across the entire year of the project.  www.nature.com/scientificdata www.nature.com/scientificdata/ Historical hourly weather data (temperature, humidity, and air velocity) were downloaded on a monthly basis from the Weather Analytics service (now Athenium Analytics) for the "Center_City_Philadelphia_580144" grid point.
Regarding interior conditions, ambient temperature was measured for every occupant using the nearest thermostat reading logged by the BAS system at 15 minute intervals. Ambient temperature and relative humidity were also logged at 5 minute intervals for 11 occupants using HOBO U12-013 base sensors (accuracy: +/−0.35 °C temperature, +/−2.5% humidity), as was globe temperature in eight cases (TMC6-HE attachment covered with 40 mm black matte ping pong ball; +/−0.25 °C accuracy), and air velocity in four cases (T-DCI-F900-S-O attachment; +/−0.05 m/s accuracy). Globe temperature sensors were placed on the perimeter and core areas of each floor space, and one air velocity sensor was placed on each floor. Additionally, illuminance and CO 2 concentration were logged for certain occupants across floors using the HOBO U12-012 sensor (accuracy dependent on wavelength 31 ) and the TelAir 7001 CO 2 sensor (accuracy: greater of +/−50 ppm or 5% of reading), respectively. Sensor calibration was not performed after initial deployment. Further information on the location of each sensor including floor number, perimeter vs. core position within each floor, and office type are provided in the data record.
Datalogger measurements of occupant behavior recorded window adjustments and the use of personal fans and heaters. For windows, HOBO UX-90 state loggers were attached to 10 different windows that occupants in the sample reported using in the background survey. The UX-90 loggers record the time stamp of any state change during the measurement period. For fans and heaters, WattsUp? Pro power meters logged wattage at occupant-designated fan/heater outlets at 15 minute intervals. The WattsUp? meters were installed at the desks of all 9 occupants who reported using a heater or fan at some point throughout the year on the background survey.
Together with the daily survey responses, the above environment and behavior information comprises six longitudinal data streams (daily surveys, outdoor weather, BAS thermostats, local environment, personal fans and heaters, and windows). Figure 3 provides an overview of the collection and integration of these various data streams. The Data Records section provides additional detail on how each data stream was post-processed and matched with the other data streams.

Data records
A series of MATLAB 32 scripts were used to process and integrate the environmental and behavior data streams listed in Table 3 as well as all survey data. The general post-processing and integration procedure used by these scripts is as follows: 1. Download the raw data at intervals specified in Table 3 (CSV format); survey data were downloaded at the end of the initial background survey period, and at the end of each two day test period and two week surveying period. 2. Remove any missing readings in the data or data points that are determined to represent faulty measurements based on observation of the building at the time of data download (e.g., measurement device was disturbed during measurement period or atypical operational patterns were observed during the measurement period). 3. Organize raw CSV data readings into larger CSV files along the following dimensions: • Measurement type (BAS, HOBO, weather, heater/fan, window, daily survey).
• Occupant number (1 to 24).  Table 3. Summary of environment and behavior measurements across the year. a Restricted to the occupied hours of 7 AM-6 PM. b Read every 1 week during survey times, and every 2 weeks otherwise. c Based on observation of the WattsUp meter readings during the one month test period before the formal start of the study, fan readings between 3 and 125 Watts were marked as "On, " heater readings between 50 and 1500 Watts were marked as "On, " readings of 0 Watts were marked as "Off " for both devices, and any other reading was marked "Faulty". The dataset that was produced by these steps is made available under an Creative Commons Attribution 4.0 International license through Zenodo 34 . All data are stored in a compressed .txt file ("LANGEVIN_DATA.txt"), which is accompanied by a data legend file in XLSX format ("langevincodebook.xlsx"). The data legend file characterizes each variable column found in the .txt file by type (environment, behavior, etc.), name, data collection source (surveys, dataloggers, etc.), value type (discrete, continuous), description, units, and response levels (for discrete variables).

technical Validation
Several measures were taken to ensure the validity of the collected data, following data collection guidance included in the final report for International Energy Agency Annex 66: Definition and Simulation of Occupant Behavior in Buildings 35 . These measures are outlined below. Survey preparation phase. Longitudinal survey design was informed by preceding semi-structured interviews with office occupants along with background surveys of each occupant's demographic characteristics, thermal comfort, and adaptive behavioral tendencies. Information collected during this preparation phase informed the design of longitudinal survey instruments, ensuring that the questions included on these surveys would be correctly interpreted by occupants and relevant to their daily experience of the office environment. encourage high response rates. The inclusion of financial incentives for participants and use of "show-when" logic to minimize survey fatigue yielded a high overall response rate of 86% across the longitudinal surveys (Table 2), ensuring the survey data are not biased towards respondents with higher participation rates. pilot studies. Three small pilot studies were conducted during the course of the full longitudinal study: one in the month preceding the start of the study, during which time all required dataloggers were installed and collecting test data, and two in the periods between the summer and fall and fall and winter daily surveys, during which times improved survey deployments were tested alongside logger data collection. The pilot studies verified that all data collection devices and the online survey platform were functioning properly before each of the two week daily surveying periods was conducted.
Quality control. Datalogger readings were downloaded regularly (at maximum every 2 weeks) to ensure their proper functionality and, in the case of environmental sensors, to verify that readings had not drifted significantly. Between these download times, sensor operation was visually inspected every few weekdays to ensure the devices were actively recording data and had not been moved or dislodged by occupants.
redundancy. In the case of outdoor and indoor temperature, parallel data streams were compared to address internal validity. For outdoor temperature, hourly data points provided by Weather Analytics were spot-checked against concurrent measurements from the nearest weather station (KPHL) on Weather Underground. For www.nature.com/scientificdata www.nature.com/scientificdata/ indoor temperature, 5 minute interval HOBO temperature measurements were compared against concurrent BAS temperature measurements from the nearest thermostat. Here, summary statistics for whole year BAS temperature readings are within the margin of error of the HOBO temperature readings, indicating a close correspondence between the two measurements. Additionally, state logger measurements of behavior (heaters, fans, windows) were verified against corresponding daily survey responses that reported recent behavioral actions.

Fig. 4
Using daily survey data collected across two week periods in each season of the study and concurrent datalogger measurements of the thermal environment and behavior, relationships between indoor operative temperature and the probability of behavior states are plotted. Logistic regression curves are shown for the total sample of occupants with each behavior available and for each sample grouped by cooler and warmer acceptability range. Gray dots are observed proportions of behavior use in the study with 95% confidence intervals. Increasing probabilities of high clothing insulation, fans turned on, and open windows are observed with increasing operative temperature, while increasing probabilities of heavier clothing and heaters turned on are observed with decreasing operative temperature. Occupants with warmer acceptability ranges are significantly less likely to engage in behaviors that address warm discomfort, while occupants with cooler acceptability ranges are significantly less likely to engage in behaviors that address cool discomfort. Adapted with permission from 38 ; see 38 for details on clothing and acceptability range definitions.

Fig. 5
Using daily survey data collected across two week periods in each season of the study, the evolution of mean thermal sensation, acceptability, and comfort responses is plotted. Thermal sensation generally remains within the acceptable ranges, with associated high ratings of general thermal comfort observed across each two week period. Comparing acceptability ranges between the summer/spring and fall/winter periods, a substantial shift in this range is observed in the direction of warmer thermal sensations. Adapted with permission from 38 . www.nature.com/scientificdata www.nature.com/scientificdata/ Comparison against expected conditions. One year inter-quartile ranges for each of the environmental variables in Table 3 are consistent with those expected of an air-conditioned U.S. office building with occupants that report high levels of general environmental satisfaction, specifically:

Usage Notes
The longitudinal dataset described herein was first posted on OpenEI in July of 2015; in the years since, several researchers have contacted the author for clarifications on certain aspects of the data. Based on these inquiries, notes to guide further use and interpretation of the data are summarized below.
• When reading in the main .txt file to a program such as R or Python for analysis, variables of interest should be indexed by the associated column number listed in the variable codebook file ("langevincodebook.xlsx"). • NaNs for certain variables do not generally imply missing data; rather, they indicate that the variable was not being measured for a given occupant at a given time stamp. For example, per Table 2 only four occupants had a direct illuminance measurement at their office location; for the occupants without this direct measurement, the illuminance variable will show NaNs across all time steps. Similarly, variables such as current and acceptable thermal sensation will show NaNs for all time steps that fall outside of the two week daily surveying windows. • The variable "Occupant Number" refers to an occupant ID (e.g., a value between 1 and the 24 occupants who participated in the study); it does not refer to an occupancy (presence/absence) measurement. www.nature.com/scientificdata www.nature.com/scientificdata/ daily mid-day and afternoon surveys, as well as periods of prolonged absence reported on the retrospective surveys conducted after each two week period. Occupancy state was not continuously measured with dataloggers in this study.
Example applications of these longitudinal data in analysis and modeling efforts are found in both the author's own work and those of other researchers pursuing research on occupant comfort and behavior. Regarding the author's work, these data were used in 38 to develop insights on the drivers of behavior (Fig. 4), long-term trends in thermal comfort and acceptability (Fig. 5), and the sequencing of adaptive actions (Fig. 6). The author also used the data to develop and validate an agent-based model of occupant behaviors in 39 , demonstrating the richness of the dataset for studies of either individual-or group-level behaviors in office settings.
Outside of the authors' own work, several other researchers have used the longitudinal data to explore relationships between the built environment and occupant comfort and behavior, most notably: (1) the dataset was included in the recently developed ASHRAE Database II project 40 , a collection of 81,846 complete sets of objective indoor climatic observations with accompanying "right-here-right-now" subjective evaluations by the building occupants who were exposed to them, which is intended to support diverse inquiries about thermal comfort in field settings, and (2) the data were used in 41 to develop synthetic profiles of building occupant personal characteristics and comfort preferences that can be used to intialize models of individual-level behaviors (for example, in an agent-based framework). The latter effort continues to be pursued and expanded through the recently approved International Energy Agency Annex 79 on Occupant-Centric Building Design and Operation 42 .

code availability
All MATLAB scripts used to implement the post-processing steps described in the Data Records section are available upon request, as are the raw data files described in steps 1 and 2 (e.g., pre-and post-cleaning). Interested readers should contact the corresponding author for access to the scripts so that further instructions on their use may be provided.
Additionally, links to the background survey instrument 27 , daily survey instrument 29 , and retrospective survey instrument conducted after each two week daily surveying period 30 are available for testing.