Background & Summary

In recent decades, global consensus has aligned with the imperative of fostering low-carbon development, and researchers have made significant efforts and innovations toward achieving this objective. Among them, electricity flexibility and demand response technology can guide users toward maximizing the utilization of clean electricity produced by solar or wind energy, improving power grid stability, and reducing carbon emissions1. Buildings are major energy consumers, accounting for more than 34% of the total energy demand and approximately 37% of CO2 emissions globally, while residential buildings comprise over 50% of this portion2,3. Therefore, it is essential to incorporate residential building loads into demand response schemes.

Smart home technologies allow more flexible electricity usage for residential buildings, making them more amenable to participating in demand response than ever before4. From the perspective of electricity flexibility, domestic appliances in residential buildings can be divided into schedulable and non-schedulable devices. Typically, non-schedulable loads, such as lights, TVs, and refrigerators, are not easy to shift without affecting the normal use of occupants. For example, it is unreasonable for occupants to turn off the lights at night. In contrast, schedulable loads, including buffered appliances such as air conditioners (ACs) and water heaters, and postponable appliances such as rice cookers and washing machines, can be shifted easily1. Typically, schedulable appliances are high-power devices and considered as flexible resources, whereas non-schedulable appliances, such as lights and all types of low-power electrical equipment, are not well-suited for participation in demand response5.

The household is the basic unit of a residential community or building. To determine the quantity of electricity flexibility and find a way to shift loads at household level, it is necessary to know the existing load characteristics and understand the actual needs of households before they participate in power grid interaction. This knowledge and understanding rely on detailed data measured per household to the extent possible. Unlike commercial and industrial buildings, residential buildings have more occupant behavior-related appliances. Owing to private ownership, appliance use is freely controlled by the occupants, and the diversity of occupant behavior is an important reason for the difference in building energy use6,7. For example, the electricity use of appliances depends on how long they are switched on, and their on/off state is typically related to the occupancy state (presence or absence). Split ACs are commonly used in Chinese residential buildings, contributing to the electric power peak in summer, and are important flexibility resource. Moreover, their usage is closely related to the indoor/outdoor thermal environment and window-opening behavior8,9. A typical behavior pattern is that the AC in a room is turned on only when the room is occupied and the room temperature reaches a certain level8,10. In China, window-opening is the most effective and convenient natural ventilation method for improving the indoor environment quality11, and is typically related to the use of split ACs, that is, occupants may close the windows or leave them open when turning on ACs, which affects the power consumption of ACs. In other words, residential energy use is affected by many factors and has strong intrinsic relationships. A comprehensive dataset combining such data as occupant behavior, the indoor and outdoor environment, and appliance electricity use will help in better understanding intrinsic relationships in residential buildings.

High-resolution, long-term in-situ monitoring datasets are strongly needed to study residential building energy. For example, Ren et al.12 established several typical usage patterns for household appliances by measuring the power of appliances in a single-family residence. Jin et al.13 verified a TV usage behavior model using a single-family residence’s long-term per-minute occupancy and television power data. Lu et al.14 proposed an AC turning-on behavior model based on room-level occupancy and temperature data. Based on the monitoring of water temperature, indoor temperature, and electric power, Tejero-Gomez et al.15 designed an energy management system for electric heaters, and tested it in a single-family residence. However, the collection of such data is often time-consuming and challenging, owing to cost and privacy concerns. Traditional building energy monitoring relies on many wired sensors and a central monitoring system, which are typically available only for new buildings. Therefore, independently packaged meters are generally used for existing residential buildings. However, collecting occupant behavior and appliance electricity use data is not convenient owing to the size, cost, and required amount of meters8,12,16.

To date, few public residential datasets have included detailed energy use and occupant behavior data. The details of existing publicly available residential datasets are shown in Supplementary Table 117,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33. These datasets have their own application purposes and focus on different data elements. The REDD17, BLUED18, FIRED19, DEDDIAG20, ENERTALK21, UK-DALE26, SustDataED28, REFIF29, EMBED30, and fIEECe31 datasets focus on the high-frequency electricity consumption data of the entire house and individual appliances, aiming to investigate building electricity disaggregation and energy use models. Typically, these datasets have a sampling rate in seconds or milliseconds and a short collection period ranging from a few days to dozens of days. The BLUED18 dataset only labels the switching state of individual appliances and does not include energy consumption data. The DEDDIAG20 and ENERTALK21 datasets include data on many electrical appliances; however, the appliance types are relatively few and include only non-schedulable appliances such as kitchen, cleaning, and entertainment appliances, while schedulable appliances such as ACs are not included. Several datasets23,24,25 focus on occupancy data and are intended for use in building occupancy detection analysis. The Global Building Occupant Behavior Database25 focuses on occupant behavior, and includes four in-situ residential datasets: Dataset-834, which is a dataset of the indoor and outdoor environment and window status over three months from one Canadian residential building; Dataset-1135, which is a dataset of occupant presence over one year from three American houses; Dataset-1336, which is a dataset of the outdoor environment and daily AC energy consumption over one year from one Polish residential building; Dataset-15, which is a dataset of the outdoor environment and window status over six months from four Chinese apartments. A few datasets27,32,33 focus on building energy use, but lack occupant behavior and indoor environment data.

Public residential datasets generally include inadequate data on the occupant behavior, thermal environment, and electricity use, and have inadequate balance between the measurement period and sampling frequency, weakening their application potential in residential energy flexibility analysis. Particularly, there is a lack of data considering the occupancy, window-opening behavior, and electricity of schedulable appliances (split ACs) simultaneously. One important reason for the lack of such data is the lack of small, portable, low-cost meters for various measurements in residential buildings16. With the popularity of Internet-of-Things (IoT) devices in recent years, it has become possible to use smart IoT devices to carry out convenient, low-cost, and long-term comprehensive monitoring of residential occupant behavior and energy use.

This paper introduces an IoT-based Data Collection Platform (IDCP) consisting of gateways, sensors, and cloud servers. This platform can simultaneously collect and upload data on occupant presence, the indoor thermal environment, window-opening behavior, and appliance electricity consumption. The IDCP was built based on smart IoT devices and cloud services, and has the following advantages: 1) tiny wireless sensors consume little space and have little impact on home appearance; 2) low-power sensors support long-term measurement and do not require frequent battery replacement; 3) automatic, continuous, and remote real-time data collection; 4) the devices have low cost. This study deployed the IDCP in a single-family apartment in Beijing, China, and compiled a dataset, namely, CN-OBEE37, which includes per-minute data on the occupant behavior, thermal environment, and appliance electricity use of an apartment over an entire year (from May 31, 2021, to May 31, 2022), and hourly weather data for the same period, which were collected from the nearest national weather station. The selected family home is representative of Beijing’s urban family residences. As one of the regional features in the dataset, the dataset compiled by this study covers window-opening behavior and the use of split ACs, which is currently the mainstream cooling method in Chinese residential buildings. To the authors’ knowledge, the CN-OBEE dataset37 is the first publicly available and detailed occupant behavior and electricity consumption dataset of Chinese homes, contributing to the regional diversification of globally available datasets.

The features of the CN-OBEE dataset37 are as follows:

  • This study focused on most Chinese homes’ appliances, including schedulable appliances, such as split ACs, water heaters, washing machines, and rice cookers, and non-schedulable appliances, such as fridges and TVs.

  • The data were collected over one year, and the sampling frequency is one minute after processing.

  • The dataset was automatically, continuously, and remotely collected using the IDCP.

  • The dataset consists of data on room-level indoor environmental parameters, window-opening states, occupancy presence, appliance electricity use, and outdoor meteorological conditions from the nearest national weather station.

This dataset can be directly used to analyze the quantitative and temporal characteristics of the electricity loads of various appliances (e.g., power peaks, seasonal changes, differences between weekdays and weekends, dependence on occupancy and indoor/outdoor environment, correlation between the use of various appliance), and is the basis for investigating electricity flexibility at household level. In addition, the dataset can be used for (1) electricity load shape analysis to reveal the whole-family and appliance-level demand profiles12,13; (2) electric power forecasting using statistical or machine-learning algorithms38; (3) physical-based or data-driven modeling and validation of occupancy, window-opening behavior, and energy-use behavior for appliances6,39; (4) analytics on the indoor thermal environment and indoor thermal comfort40. Although the dataset represents a single household, it is still an important initial contribution to this research field owing to its comprehensiveness and uniqueness.

Methods

Description of apartment and family

In this study, data were collected from a single-family apartment that was recruited and screened by the research team. The recruitment was aimed at college students, and information such as family structure, income, housing type, and appliance type was collected through an online questionnaire. The family considered in this study was recruited as a representative family, because it is a middle-class nuclear family residing in a privately-owned, well-maintained apartment with a wide variety of modern appliances, which is representative of Beijing’s urban family residences. The family’s participation in this study was completely voluntary and monetary compensation was not provided.

The apartment building has the most common housing form in urban China, and is located in Miyun District, Beijing. The building (Fig. 1) was constructed in 2005 and consists of 13 floors and 48 households.

Fig. 1
figure 1

Building appearance (south elevation).

The apartment is a two-story duplex located at the middle floor of the building, and has an area of 172.73 m2 and floor height of 3 m. The apartment has eight rooms, including a master bedroom, secondary bedroom, cloakroom, home office, living room, kitchen, and two bathrooms. The apartment’s floor plans are shown in Fig. 2.

Fig. 2
figure 2

Floor plans of apartment and sensor locations.

The apartment is used by a family of three: a middle-aged couple and their adult offspring. The couple occupies the master bedroom and works only on weekdays. Their adult offspring is a full-time student at a local university and occupies the secondary bedroom, but often resides at school instead of living at home. To assist in collecting and sharing the CN-OBEE dataset37, the family kindly agreed to install metering devices in their apartment and allow public access to the measured data. They acknowledged that their typical living habits were not affected by the study. Unnecessary or private information about the family was removed from the final dataset by pre-processing.

Data collection

The IDCP is currently deployed for monitoring, and various sensors are installed throughout the apartment to collect data on the occupancy presence, indoor thermal environment, window-opening state, and appliance electricity usage. The measurand and platform details are described in the following section.

In addition, hourly outdoor weather data, including the dry-bulb temperature, relative humidity, atmospheric pressure, wind speed, wind direction, ground temperature, horizontal total solar radiation intensity, and horizontal diffuse solar radiation intensity, were obtained from the national weather station in Miyun District.

Measurands and sensor deployment

The measurands in the apartment were obtained using smart IoT sensors from a brand manufacturer with industrial-grade reliability and accuracy. The deployment layout of all sensors is shown in Fig. 2. Table 1 lists the sensors installed in each room. The measurands were divided into four categories as described below.

Table 1 Number of sensors in each room.

Indoor thermal environment

The dry-bulb temperature and relative humidity of the six main rooms (master bedroom, secondary bedroom, living room, home office, cloakroom, and kitchen) were monitored. To measure the mean indoor thermal environment more effectively, temperature and humidity sensors were placed in the middle or corner of a wall at the height of 1.5 m, and away from the heat source. In addition to temperature and humidity, the air pressure of each room was also monitored by these sensors to assist in calculating the humid air’s thermophysical properties and analysing the natural ventilation between the rooms.

Occupant presence

The occupancy state (presence or absence) in the six main rooms was monitored by infrared motion sensors mounted near the door of each room and facing the occupant activity area. Because the rooms are not very large, one motion sensor can cover almost the entire room without dead corners. Motion is detected when someone enters, leaves, or moves in a room. Because the family has no pets, the infrared motion sensors only detected human movement.

Window-opening state

The open or closed state of all operable windows in the apartment was monitored using magnetic switch sensors. The window sensor consists of two parts: a primary unit and a magnet. These sensors were mounted along the edges of the fixed border and movable part of each window, respectively, and aligned with each other. The window-open or window-closed state was detected by considering the distance between the primary unit and the magnet. When a window was opened or closed, the action was detected and recorded. For windows with two movable parts, a window sensor was installed on each part for independent monitoring. Because the cloakroom windows are always closed, a window sensor was not installed there.

Appliance electric power

The use of major electrical appliances (six split-type ACs, one computer, one TV, one kettle, one rice cooker, one fridge, one washing machine, and two water heaters, amounting to a total of 14 appliances) were monitored. The six ACs were located in the six main rooms. The computer was in the home office on the first floor, which is typically used by the family’s offspring. The TV and electric kettle were located in the living room. The rice cooker and fridge were located in the kitchen. The washing machine was located in the bathroom on the first floor, and the two water heaters were located in the upstairs and downstairs bathroom, respectively. Each appliance was equipped with a plug power sensor to monitor the operating power and cumulative electricity consumption. The plug sensor had ultra-low standby power consumption (approximately 70 mW), which has minimal impact on power measurement.

Sensors

All smart sensors were lightweight and compact, and occupied minimal space when installed. The temperature and humidity sensor, infrared motion sensor, and window sensor were powered by built-in batteries, while the plug sensor drew power from a standard socket. Owing to their excellent low-power design, sensors using new button batteries can operate continuously for over a year. Smart sensors only support wireless communication and have no data storage function. Unlike traditional sensors, which sample data at regular intervals, smart sensors are designed to generate data when the cumulative change of the sensing status exceeds a pre-set threshold or when there is no change beyond the threshold during a long pre-set period. The threshold and period are pre-set in the firmware by the manufacturer and cannot be configured by users. Once sampled data are generated, the sensors immediately report the data to the cloud through the data collection platform developed by this study.

The manufacturer’s technical specifications (https://developer.aqara.com/console/equipment-resources) for each sensor type (exposed view, measuring variables, data unit, sensor range and accuracy, and reporting mechanism) are summarized in Table 2.

Table 2 Technical specifications of IoT Sensors.

IoT-based data collection platform

A schematic of the IDCP and data flow process is shown in Fig. 3. Smart sensors, gateway devices, and cloud services are standard configurations of mainstream IoT smart home systems in the current market. The components used in this study belong to the same IoT manufacturer brand. Each smart sensor was connected to a gateway through the ZigBee protocol, while the gateway was connected to a wireless router through the Wi-Fi protocol. One gateway device and two Wi-Fi routers were used to ensure complete signal coverage. Notably, the plug power sensor (16 A), which is referred to as the AC companion (https://www.aqara.com/prodCenter), is an integrated device containing a power sensor and a gateway directly connected to a Wi-Fi router. Through the gateways and wireless routers, the successive sensing data of the sensors are uploaded and stored on the manufacturer’s cloud server. Typically, the data flow in a closed platform built by the manufacturer and can only be viewed through specific mobile apps. This study selected a specific IoT manufacturer because they provide an open cloud service and have released RESTful HTTP APIs for remote calls by authorized third-party applications, thus enabling device status queries, message pushes, and other functions. The message push service allows the reporting of real-time data collected by the sensors to a third-party server. Hence, this study built a virtual private server and an authorized app to accept the pushed messages, sort out the target data, and store the data in a SQLite database. Then, the single-file database can be downloaded for processing and analysis. Moreover, issues during the entire data collection process can be identified through real-time data analysis, and solved in time.

Fig. 3
figure 3

Schematic of IDCP.

The advantages of the IDCP are as follows: 1) tiny wireless sensors occupy minimal space and have little impact on home appearance; 2) low-power sensors support long-term measurement and do not require frequent battery replacement; 3) automatic, continuous, and remote real-time data collection is achieved; 4) the devices have low cost. Therefore, this platform is convenient for large-scale deployment and the long-term monitoring of buildings. During deployment in a single-family apartment, the platform has been running stably for approximately two years.

Data pre-processing

This section describes the data pre-processing steps required to obtain the dataset from the SQLite database. The database stores the raw timestamp records of the indoor thermal environment, occupant presence, window-opening state, and appliance electric power, which comes in at varying intervals, as mentioned above. The pre-processing tasks include the regularization of irregular time series data and identification and handling of missing data.

Regularization

The raw time series data reported by the smart sensors have irregular time intervals, as shown in Fig. 4(a), which is inconvenient for display and analysis. Therefore, the irregular time series data must be converted to regular time series data using a unified starting time, ending time, and time interval. This study used a one-minute interval to maintain high temporal resolution. The regularized timestamps follow the date-time format (Year-Month-Day Hour: Minute: 00), in the UTC + 08:00 time zone (Beijing Time). Figure 4 shows an example of temperature time-series regularization.

Fig. 4
figure 4

Regularization of temperature time series.

The following strategies are used to calculate temporal values at one-minute intervals, fill the “normal” gaps caused by the sensor-reporting mechanism, and merge multiple data records corresponding to the one-minute interval.

  • For temperature and humidity sensors, window sensors, and power sensors, the resampled value at one minute is the average of the data records occurring in that minute (from the 0th second to the 59th second); the gap between every two records is filled with the value recorded at the last valid timestamp (forward propagation).

  • For occupancy motion sensors, the resampled value for a given minute is 1 if any motion record occurs; gaps are filled with zeros, because an algorithm that can effectively interpolate presence data does not exist. This study retains the status quo for further research in the future.

Handling of missing data

In addition to the normal gaps caused by the sensor-reporting mechanism, there are two additional gap types in the raw dataset: 1) gaps caused by network interruptions, including lost connection between the sensors, gateways, and Wi-Fi router, and the router being powered off; 2) gaps caused by occupant habits, such as unplugging the sensors of appliances that are unused (AC, rice cooker).

The following gaps are caused by the occupants’ unplugging habits: 1) the AC power sensor is unplugged when the AC is not in use, resulting in data gaps but not missing data. These gaps were filled with zeros to ensure that the AC is not in use; 2) the plug of the rice cooker is often inserted when the appliance is in use and pulled out when the appliance is not in use. The plug often needs to be removed from the power sensor, and the sensor is occasionally removed with it, which causes inconvenience to the occupants. The occupants unplug the sensor to avoid trouble, resulting in missing data. These gaps are deleted in the regularized data.

Gaps caused by network interruptions are deleted in the regularized data. The rules for determining missing data are as follows:

  • Data are considered to be missing if the temperature and humidity sensors have not reported data for 12 hours.

  • Window sensors and infrared motion sensors report data when the window state changes and occupant motion is detected, respectively; therefore, the online status of the sensors is considered to determine whether data are missing. Data are considered to be missing when the sensors are offline.

  • Data are considered to be missing if plug power sensors have not reported data for 36 hours.

Data Records

The CN-OBEE dataset is publicly available for download from figshare37, and consists of eight comma-separated value (CSV) files. Electric power data were placed in a separate file to facilitate the analysis of electricity use. These files can be classified into the following three types.

Room data file

There are six composite data files for the six main rooms, including five categories of measurement variables: dry bulb temperature, relative humidity, air pressure, window state, and occupant presence. Notably, the number of data columns for each variable is different owing to the different number of window sensors in each room.

Electric power file

A file with instantaneous electric power data is available for all appliances. This file can be used for power analysis of each appliance, but can also be used to calculate the cumulative power consumption of all appliances in the family home.

Outdoor weather file

There is an outdoor weather file, which includes data on the dry bulb temperature, relative humidity, atmospheric pressure, wind speed, wind direction, ground temperature, horizontal total solar radiation intensity, and horizontal diffuse solar radiation intensity.

The first column of each file is a regularized timestamp. Because data regularization follows 1-minute intervals, this amounts to 1440 data points per day (there are 24 data points per day in the outdoor weather data). The data collection period was 365 days, from May 31, 2021 to May 31, 2022. The details of the eight files are listed in Table 3.

Table 3 Files in dataset.

Technical Validation

This section discusses the technical effectiveness of the proposed dataset. First, a preliminary analysis of the missing data was conducted. Then, the accuracy of the data was validated.

Missing data

A detailed breakdown of the missing data in the relevant columns for each room is presented in Table 4. A time chart of the missing data in each column is shown in Fig. 5.

Table 4 Detailed breakdown of amount of missing data in relevant columns in each room.
Fig. 5
figure 5

Time chart of missing data for each column in each room.

For the rice cooker in the kitchen, there are no data in the later period because the plug power sensor was unplugged. The remaining data gaps were caused by network interruptions, which led to data not being uploaded in time. For the temperature and humidity sensor in the master bedroom, the sensor was not found to be offline in time, owing to the authors’ negligence, resulting in a large amount of missing data.

Indoor and outdoor thermal environment

The dry-bulb temperature, relative humidity, and air pressure box diagram for each room and the outdoor environment over one year are shown in Fig. 6. As can be seen, the indoor dry-bulb temperature is between 15 °C and 30 °C, and the outdoor temperature is between -15 °C and 35 °C throughout the year. The indoor relative humidity is lower than the outdoor relative humidity, and mostly lies between 20% and 60%. The indoor thermal environment changes normally under the outdoor conditions, and extreme values do not exist. The outdoor weather data conform to the typical climate of Beijing.

Fig. 6
figure 6

Box plot of dry-bulb temperature, relative humidity, and air pressure for each room and outdoor environment.

Occupant presence

The provided source data represent occupant motion and are used to determine the presence of occupants. Because this study considered a three-person family, most of the time, no more than three occupant simultaneous motions should be detected by all motion sensors. Owing to the occupants’ work routine, the occupant motion patterns on weekdays are different to those on weekends.

Figure 7 shows the number of occupant motions detected every minute in the apartment. On some occasions, the number of occupant motions exceeds 3, with a proportion of 0.024%, which may reflect occupants moving into multiple rooms in one minute or guests visiting the apartment.

Fig. 7
figure 7

Number of occupant motions detected every minute in apartment.

Figure 8 shows the hourly occupant presence probability for each room on weekdays and weekends over one year. For the master bedroom, cloakroom, living room, and kitchen, there is a significant delay in the peak of occupancy probability between weekdays and weekends because the couple wakes up, eats breakfast, and has dinner later on weekends compared with weekdays. For the secondary bedroom and home office, the occupancy probability on weekends is greater than that on weekdays because the family’s adult offspring typically stays at school on weekdays and returns home on weekends.

Fig. 8
figure 8

Hourly occupant presence probability for each room on weekdays and weekends.

Relationship between window-opening and occupancy

Occupant presence is a prerequisite for the occurrence of window-opening behavior. The opening or closing of a window must be accompanied by occupant motion. Figure 9 shows the relationship between the number of window opening and closing actions and the occupancy probability in the living room. To make the graph smooth and clear, the number of window opening and closing actions was averaged over 15 minutes. The occupancy probability per minute was obtained by averaging the original motion data. As expected, window opening peaks after occupants move to the living room in the morning, and the two curves are in good agreement.

Fig. 9
figure 9

Number of window opening and closing actions and occupancy probability in living room.

Relationship between appliance power and occupancy

Occupant presence is also a prerequisite for the use of most electrical appliances. Figure 10 shows the relationship between the per-minute average power of the upstairs water heater and occupancy probability in the master bedroom. Figure 11 shows the relationship between the per-minute average power of the TV and occupancy probability in the living room. The couple mainly uses the upstairs water heater, therefore, its use peaks after the couple wakes up or before they go to sleep, which is consistent with their bathing habits. The TV usage peaks mainly during dinner or before sleep and matches well with the peak of occupancy probability in the living room.

Fig. 10
figure 10

Per-minute average power of upstairs water heater and occupancy probability in master bedroom.

Fig. 11
figure 11

Per-minute average power of TV and occupancy probability in living room.

Peak power of appliances

Air conditioners and water heaters, which have high instantaneous power and energy consumption, are typical peak loads in residential buildings. Moreover, they have significant thermal storage capacity and can be used as free resources to participate in demand response.

The peak power of appliances by category is shown in Fig. 12a. Figure 13 shows the instantaneous power of all appliance types in two typical summer days, including all above-mentioned electrical appliances, but excluding lighting and other small electrical appliances (mobile electronic equipment), which were not measured. As expected, the electric power peak of the family mainly results from the use of water heaters and ACs.

Fig. 12
figure 12

Peak power and total electricity consumption of appliances by category.

Fig. 13
figure 13

Instantaneous power of all appliance types in two typical summer days.

Whole-year electricity consumption of appliances

The whole-year electricity consumption of appliances by category is shown in Fig. 12b. As can be seen, ACs do not have large total consumption on the annual time scale because they are only used in summer. The family’s total and AC electricity consumption were compared with those of other households in Beijing.

According to the China Building Energy Model41,42, the average annual electricity consumption of an urban household with three members and high-quality living standards in Beijing, is approximately 3300 kWh/a, excluding lighting. The reference value measured for the 120-m2 apartment of a single-family middle-class with five members in Beijing is approximately 2250 kWh/a42. In this dataset, the family’s electricity consumption, excluding lighting and other small electrical appliances, is approximately 2833 kWh/a, which is close to the reference value.

Figure 14 shows the family’s whole-year AC electricity consumption in this dataset, and that of 23 Beijing households41, from lowest to highest. The data for 23 households were obtained in an apartment building with split ACs during 2006 and 2007. The results reveal that the difference in the AC electricity consumption can be more than 10-fold, owing to the different lifestyles and different AC usage patterns of different families. The total electricity consumption of ACs in this dataset is 1.12kWh/ (m2·a), which is close to the median of the 23 households (1.33kWh/ (m2·a)) and indicates that the family’s AC usage is at a normal level.

Fig. 14
figure 14

Electricity consumption of ACs in Beijing households.

Usage Notes

The presented dataset includes the electricity consumption of each appliance, indoor temperature and humidity, open or closed state of windows, and the state of occupancy for a typical urban family in Beijing, China, at room and appliance level and one-minute time intervals over an entire year. This dataset can support the following: electricity load shape analysis; electric power prediction; occupant behavior modeling and validation; building energy modeling and validation; thermal comfort analysis; suitability of energy-related measures; other types of research for residential buildings at household level.

In addition, this paper proposes an IoT-based data collection method, which is highly available to facilitate data collection. The proposed method has the following advantages: 1) miniscule wireless sensors consume little space and have little impact on home appearance; 2) low-power sensors support long-term measurement and do not require frequent battery replacement; 3) automatic, continuous, and remote real-time data collection; 4) the devices have low cost. The proposed method has good applicability to the measurement of room- and appliance-level information related to energy, the environment, and occupant behavior in residential buildings, and can be used to collect more data from households in the future, which will in turn help in responding to concerns on wide variations from household to household.

All data files in this study are available in CSV format with a total file size of 180 MB. The CSV data format is easy to use, and the file can be processed and analyzed by many popular programming languages, such as Python, Java, and C++. The Python pandas library is recommended for data processing and visualization.