With the cost of consuming resources increasing (both economically and ecologically), homeowners need to find ways to curb consumption. The Almanac of Minutely Power dataset Version 2 (AMPds2) has been released to help computational sustainability researchers, power and energy engineers, building scientists and technologists, utility companies, and eco-feedback researchers test their models, systems, algorithms, or prototypes on real house data. In the vast majority of cases, real-world datasets lead to more accurate models and algorithms. AMPds2 is the first dataset to capture all three main types of consumption (electricity, water, and natural gas) over a long period of time (2 years) and provide 11 measurement characteristics for electricity. No other such datasets from Canada exist. Each meter has 730 days of captured data. We also include environmental and utility billing data for cost analysis. AMPds2 data has been pre-cleaned to provide for consistent and comparable accuracy results amongst different researchers and machine learning algorithms.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Currently, much of the world is focused on reducing electricity consumption; our increase in consumption is neither economically nor environmentally sustainable. Additionally, there is a growing consensus that environmental and economical sustainability are inextricably linked1. As the cost of power rises, we must find technological solutions that help reduce and optimize energy use2,3. Residential homes contribute about 34% to the total power consumption in the USA and their consumption is projected to increase to 39% by 2030 (ref. 4). One way to help homeowners and occupants reduce their consumption is to monitor and present how much power their appliances are using through an effective eco-feedback device or display mechanism5,6.
The Almanac of Minutely Power dataset (AMPds) was initially released in 2013 with one year of meter data without environmental and utility billing data7 (Data Citation 1: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/MXB7VO). The first release also contained some data integrity issues: missing readings and a counter reset that happened with the water meters. With this second release (AMPds2) we increased the monitoring length to two years (730 days of captured data per meter). The integrity problems that existed in the first release have been corrected. We have added historical climate data, two years of hourly weather data, and two years of utility billing data.
AMPds has been used (See Google Scholar: https://scholar.google.ca/scholar?cites=9977726888743581483) and can be used in research that looks at: non-intrusive load monitoring (NILM, a.k.a. load disaggregation)8, energy use behaviour, eco-feedback and eco-visualizations6,9, application and verification of theoretical algorithms/models, appliance studies, demand forecasting, smart home frameworks, grid distribution analysis, time-series data analysis, energy efficiency studies, occupancy detection, energy polity and socio-economic frameworks10, and advanced metering infrastructure (AMI) analytics. Testing the accuracy performance with real-world datasets is crucial in these fields of research. Synthesized data does not realistically represent an actual dataset as ‘a real-world dataset would normally have certain complexity that is harder predict and in many cases can be very difficult to deal with’ [ref. 11, p.114].
There are indeed other datasets that exist from the USA12,
AMPds2 data has been cleaned to provide for consistent and comparable accuracy results amongst different researchers and machine learning algorithms. Other datasets (e.g., REDD14) leave the onus of data cleaning on each researcher. This means that the same dataset can be cleaned very differently. This results in an inability to reproduce and compare algorithms published.
Residential house characteristics
Our data was collected from a house built in 1955 in the Greater Vancouver metropolitan area in British Columbia (Canada), which underwent major renovations in 2005 and 2006, receiving a Canadian Government EnerGuide23 rating of 82% (from 61%). The house is located in Burnaby, the municipality east of Vancouver. Elevation-wise, the house is 80 m above sea level and the front of the house faces south. The house has one level above grade and a basement making up a total of 2,140 ft2 (199 m2) of living space (1,070 ft2 or 99.5 m2 per floor). The main floor ceiling height is 8 ft (2.44 m) and the basement ceiling height is 7 ft (2.13 m). Within the house is a rental unit that takes up approximately half the basement (603 ft2 or 56 m2 of living space). The detached garage is approximately 161 ft2 (15 m2) and the overhead door faces the back alleyway (see Fig. 1).
The house has the original wood-frame construction. In 2006, all existing exterior wall stucco was removed. Proper vent covering was installed under the eaves and the exterior walls were re-stuccoed with a light green ‘California’ finish. The previous stucco finished was removed. The house has an black asphalt shingle roof that was replaced in 2007. The new asphalt shingles are light brown. When the stucco and roof were replaced, -inch plywood was nailed to the existing shiplap boarding.
Originally, the above grade walls were insulated with batt insulation evaluated at R7 and the roof was insulated with blown-in insulation evaluated at R19. After renovations, R24 batt insulation was added on top of the existing ceiling insulation. The main floor wall insulation was not improved. For the basement, R24 was added to the ceiling and above grade walls. Below grade walls had R9 extruded polystyrene rigid insulation affixed to the concrete walls. The basement floor was upgraded to have DRIcore sub-flooring (see Manual_dricore.pdf, Data Citation 2: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/FIE0S4) which is rated at R1.7.
Windows are double-pane low-e glass and were replaced in 2005 (see Table 1). All doors are insulated core metal and were replaced in 2005. The basement walls are approximately 25.4 cm thick with the South basement wall (front of house) almost completely below grade, while the North wall (back of house) is about 1 m below grade. The house has three full bathrooms (tub with shower, toilet, and sink) and a master bedroom ensuite (toilet and sink). Two of the bathrooms are in the basement; one is in the rec room, and the other is in the rental suite. Faucets and showerheads are restricted to a maximum flow of 9.5 l/min (2.6 GPM). All toilets have 6 l tanks and are dual-flush.
The main house has a family of three persons: a male and a female adult in their late 30 s and a daughter between the age of 5 and 6. The male adult is a full-time student at a local university, the female adult is self-employed, and the child attends full-time elementary school. A rental suite houses one male occupant in his early 20 s with full-time employment.
HVAC system elaboration
Our test house has a dual-fuel HVAC system where a heat pump is used alongside a forced air gas furnace. The heat pump cools the house in summer and heats the house in winter. The gas furnace is used, but only when it is too cold outside for the heat pump to operate effectively. When the outside ambient temperature is 2 °C or lower, the HVAC system changes over from electric heating (heat pump) to natural gas heating using the furnace. At low temperatures, heat pumps are not efficient for heating and can strain the compressor.
During data collection, the HVAC thermostat was set to a constant heating set-point of 21 °C and the cooling set-point ranged within 24–26 °C. The HVAC furnace fan was set to constantly run 24-hours to circulate the air. The furnace is 2-stage with a variable-speed fan and is rated as 93% efficient. The heat pump has a 2-stage compressor and is rated at 17 SEER. It is the central unit for air conditioning; there are no other air conditioning units in the house (besides the windows).
Our main concern when designing the data collection system for AMPds2 was integrity and accuracy. For these reasons we chose to use industry-standard equipment for monitoring and acquisition. Data was stored off-site on a database server that was hosted at a co-location facility with proper power backup and network connection redundancy. Figure 2 depicts the setup of our data collection system. Table 2 summarizes the specifications of the metering equipment used, including the accuracy standards each meter adheres to.
After two years of collection, only 2,029 electricity readings and 437 water and natural gas readings were missing from a total of 1,051,200 readings for each resource (discussed in more detail below for each resource). The missing readings were algorithmically created during the data cleaning process which is discussed in detail in the Dataset File Preparation Subsection24.
Electricity supply & metering
BC Hydro is the provincial utility that provides electricity to the house via a 240 V, 200A service. As with all Canadian homes, two 120 V lines enter the house—leg 1 (L1) and leg 2 (L2) of the same phase. There are pole transformers that convert the single phase into two legs. Each transformer services about five homes.
Electricity measurements were taken by two DENT PowerScout 18 units metering 24 loads at the electrical circuit breaker panel. Only 21 loads were kept. The three loads that were removed were: the gas stovetop plug breaker, the microwave plug breaker, and a randomly chosen lighting breaker because no activity was recorded. All current and all current-based measurements were recorded as zero. The gas stovetop only used electricity to ignite the gas burners. The microwave had never been used and was removed at one point. The lighting breaker that was chosen was for a backyard outside light that was never used—the bulb was burned out and not replaced.
Measurements were read over a RS-485/Modbus communication link by a Obvius AcquiSuite EMB A8810 data acquisition unit. During the data cleaning process for electricity, we found and corrected 55 readings where 1 of 21 meters had missing measurements and 2,029 readings where more than one of 21 meters had missing measurements (see Dataset File Preparation Subsection for more details).
Water supply & metering
Burnabys water distribution system is fed by four water pump stations, four water reservoirs, and twenty-one pressure reducing stations to control and regulate water pressure. Water pressure is produced by gravity from the higher elevation water reservoirs that Metro Vancouver manages.
Water service is via a -inch pipe at a pressure between 108–118 psi (744.6–813.6 kPa) [reported by Engineering Department]. A pressure regulator is used (see specifications in Manual_WilkinsModel70.pdf, Data Citation 2: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/FIE0S4) to maintain water pressure in the house at 60 psi (413.7 kPa).
Water measurements were taken by 2 Elster/Kent V100 water meters, which also send pulses to a data acquisition unit. These water meters are volumetric cold water meters that measure water with a rotary piston. Before July 14, 2012 (timestamp 1342287780) the water main was metered by a DLJ 75C meter and hot water was metered by an Elster S130 meter. These meters pulse once per gallon which was too coarse of a measurement for the amount of water being consumed by the houses occupants. This was the reason for replacing these meters with ones that pulse more frequently. See Table 2 for details on these water meters (e.g., standards compliance and accuracy data).
Pulse data was collected using an Obvius AcquiSuite EMB A7810. To note, the Obvius AcquiSuite units have a per-minute sampling limitation. It is not possible to capture data at a faster rate, which is an acceptable cost for reliability. During the data cleaning process for water, we found and corrected 437 readings that were missing from both water meters.
Dishwasher water (DWW) consumption data was annotated by hand25,26. Having the electricity consumption data and details in the appliance manual about how the dishwater used water made this task relatively easy. This is further discussed in the Technical Validation section.
Natural gas supply & metering
Natural gas is supplied to the house by FortisBC at a pressure of 1.75 kPa and is composed of methane, ethane, propane, and butane. FortisBC uses the Higher Heating Value (HHV) as the conversion factor when converting from gas volume to energy used in gigajoules (GJ). HHV is the total heat obtained from combustion. The heating value of the gas is measured daily by FortisBC (see file NaturalGas_HeatValues, Data Citation 2: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/FIE0S4). For the Lower Mainland (Zone 24) the measurement energy desity values are in GJ/103m3. FortisBC assumes a temperature of 15 °C and a pressure of 101.325 kPa for conversion of gas values into energy values.
Natural gas measurements were taken by an Elster AC250 gas meter and a Elster BK-G4 gas meter; both send pulses to a data acquisition unit. These natural gas meters are diaphragm meters. See Table 2 for details on meter standards compliance and accuracy. Pulse data was collected using an Obvius AcquiSuite EMB A7810. During the data cleaning process for gas, we found and corrected 437 readings that were missing from both gas meters.
Environmental & weather records
Hourly weather data was downloaded from the Environment Canadas Weather Office which has a weather station at YVR (Vancouver International Airport) located at latitude of 49.20, longitude of −123.18, and elevation of 4.30 m. Our test house is approximately 18 km from YVR with an elevation difference of approximately 75 m. YVR is located next to the water which might account for slight differences in outdoor temperature between the two locations. There is no precise method to determine this difference. Anecdotally we have seen up to ±2 °C. Date and times listed within this file are in Local Standard Time (LST). Add 1 h to adjust for Daylight Saving Time when it is observed. The Data Quality column (and other columns) may contain M (missing), E (estimated), NA (not available), or ** (Partner data that is not subject to review by the National Climate Archives).
Historical climate normals data (from 1981 to 2010) was downloaded from the Environment Canadas Weather Office which had a weather station at Burnaby Capitol Hill (latitude of 49.17, longitude of −122.59, and elevation of 182.9 m). This weather station was closer to our test house but closed down in 2010. Precipitation data about rainfall and snowfall is included.
Utility bills & invoice records
The billing data for all three forms of consumption was created from the values that exist on the included redacted utility billing statements. We were able to download 50% of the billing data from our account on the utility's website. The remaining data was manually entered in. All billing data was human verified for accuracy from each billing statement. Data entered by hand was rechecked for accuracy after the values of each bill were recorded.
Code used to store data collected via the data acquisition units to the database server can be download from the online code repository GitHub24. The scripts used to convert the database tables to the final dataset files can be downloaded from the same online code repository (see the Technical Validation section).
AMPds2 is publicly available for download from Harvard Dataverse (Data Citation 2: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/FIE0S4) in many different formats including: the original CSV, tab-delimited, and RData format. Table 3 lists a description of each file that is part of AMPds2. File names describe the contents by listing the type of data and the meter ID separated by an underscore. There are four types of data: Electricity, Water, NaturalGas, and Climate. For example, Electricity_CDE.csv would be electricity data from the clothes dryer (CDE) meter, NaturalGas_Billing.csv would be natural gas billing data. Refer to Table 3 for a description of all files included in the AMPds2 dataset. Refer to Table 4 (available online only) for a description of meter IDs and datafile column names.
Each row in each of the dataset files represents a single meter reading once every minute with an associated unix timestamp. Each reading contains all the measurements and calculations provided by the meter. Refer to Table 4 (available online only) for specific information on each measurement provided. In the case of pulse metering, the data acquisition unit calculated the three measurements (counter, avg_rate, inst_rate) as pulses were received from each meter.
This integer timestamp is the amount of seconds since 1970-01-01 12:00:00am (UTC). Because each reading is one minute apart the timestamp number increases by 60 every reading. The two data acquisition units use the Network Time Protocol (NTP) for clock synchronization. There were records where the timestamp was off by ±10 s. In these cases our data cleaning script24 corrected the timestamp to have zero-seconds. This slight variation in time was caused by having to download the readings of 24 loads over a limiting fixed baud rate (of 9600 bps) used by the DENT meters.
Table 4 (available online only) describes the column names found within each file. No one file will contain all the column names listed. Figure 3, Fig. 4, and Fig. 5 give some insight as to how the house consumed resources over the two years. Additionally, Table 5 (available online only) gives detailed information about each of the major appliances that consumed resources in our test house.
Climate data files are kept in the original format provided by Environment Canada. Each row in Electricity_Billing.csv and NaturalGas_Billing.csv will match a utility billing statement found in Electricity_Statements.csv and NaturalGas_Statements.csv, respectively. Statements are not available for Water_Billing.csv data.
One-time events & oddities
On May 4, 2012 at 10:34am local time (timestamp 1336152840) the houses existing electro-mechanical meter was replaced with a digital ‘smart’ meter. This explains why all electricity reading were recorded as zero.
On July 14, 2012 between 10:43am and 5:03pm local time the houses water supply was disconnected to perform a repair. The instantaneous hot water unit has internal leaking due to micro-imperfections in the copper pipe. During this time the existing water meters (which pulsed at a per gallon rate) were replaced with water meters that pulse per 0.5 l.
During the period of data collection, the main house family when on holidays/travel during the following periods: May 1–7, 2012; June 9–18, 2012; and, July 31—August 8, 2013. Consumption during these times should be near zero. The rental occupant was not tracked, in terms of taking holidays/travel.
Our test house was used in previous home occupancy research27,28. There is an additional dataset available for download from Harvard Dataverse named ODD: Occupancy Detection Dataset (Data Citation 3: Harvard Dataverse. http://dx.doi.org/10.7910/DVN/2K9FFE) which contains power meter (mains and heat pump), ambient light and ambient temperature sensor readings (from 10 locations within the house), and outside weather and daylight data. Sensors communicated via a ZigBee mesh network and readings were captured in 15-minute intervals from January 22 to August 29, 2010.
The meters and data acquisition equipment were manufactured by well known companies that produce meters for industrial and residential installations around the world. Meter calibration was done by the meter manufacture before shipping at the factory. The calibration process is proprietary and we were not privy to the process.
Dataset file preparation
Scripts were created to export the data from the database to final comma separated values (CSV) files. During this process we checked the integrity of the data. If readings were missing they were algorithmically added24. To note these additions, a plus sign was added to the beginning of each timestamp, which does not affect the programatic conversion from a string to an integer. Our data cleaning scripts (make_AMPds2_power.py and make_AMPds2_pulse.py) work as follows:
1. From MySQL export data into CSV files, 1 raw file/meter
2. Execute./make_AMPds2_power.py or make_AMPds2_pulse.py.
3. Load all raw data CSV files into memory.
4. Create empty records that will store clean data.
5. For each meter and each CSV row.
6. Zero out the seconds in the timestamp.
7. Convert the timestamp to a record index i.
8. If this record at index i is empty then.
9. Convert each measurement to the proper data type.
10. Add the measurements to this record.
11. For each record in the CDE meter.
12. Fix record by removing phantom 0.4A and 27–30VA.
13. For timestamp and each meter.
14. Use equations (1) and (2) so WHE >=MHE+RSE+GRE.
15. If the previous records was missing data then.
16. Fill in the missing measurement data.
17. Event distribute the accumulation for Pt, Qt, St.
18. Save clean records for each meter
Soft-meter data was calculated during this process. Figure 6 shows how each meter is related to each other and which meters are soft-meters. The main house electricity soft-meter (MHE) is calculated by the formula
The unmetered electricity soft-meter (UNE) is calculated by the formula
To calculate cold water consumption use the formula
The calculation of cold water will work over longer periods of time (say one day). Equation (3) will not work over shorter periods of time, because the water meters are pulsing at coarse values of 0.5 l where the time between pulses may cross over multiple minutes where small amounts of water are used.
The dishwasher water soft-meter (DWW) was manually annotated as discussed previously25,26. DWW consumption followed a very specific pattern of 3 l spurts of water correlating to patterns in the electrical data. In most cases, this was the only water being consumed in the house, making the annotation as simple as copying these readings. When there was simultaneous water use, usually the signal could easily be visually decomposed and/or a nearby reading could be used to infer the proper labelling. There were very few cases where an arbitrary choice between two equally likely labellings had to be made.
Measurement uncertainty between main & sub-meters
The DENT power meter used is considered revenue class (Class 0.5) which has a very high accuracy, typically better than 1% (<0.5% typical). This meter accuracy classes are governed by two standards organizations: ANSI C12.20 for North America and IEC 62053 elsewhere (see Table 2).
For this class of meter the absolute error is limited to the 0.5% of the full scale reading. Usually, however, the error is somewhat proportional to the reading, with higher readings subject to larger absolute error than lower-valued readings. To make a simple model, we could consider each meter to add a Gaussian error to the true value, with variance proportional to the true value. Each individual meter adds such Gaussian noise. So the variance of the sum of such readings is the sum of individual variances (i.e., proportional to the sum of true values). According to the same model, the main meter makes a Gaussian error with variance proportional to the whole-house power usage, which is the sum of true power values in each individual meter. Hence, the error in the main meter has the same variance as the sum of the readings of individual meters. This is due to the fact that all meters are Class 0.5, so we expect they would have the same constant of proportionality for the variance. If the main meter had a higher class (better rating than 0.5%), then it would produce less uncertainty than the sum of individual meters.
For the electricity data, an additional step was performed. We checked that the whole-house reading was never less than the summation of all sub-meters. If it was then the whole-house reading was changed to be equal to the summation. This can happen because not all meters can be read simultaneously. Each DENT PowerScout 18 meter has 6 three-phase sub-meters (labelled A through F) which can be configured to be 18 single-phase sub-meters. The storage registers within each of the 6 three-phase sub-meters is updated once per second with new measurements. Previously, we discussed the issue of timestamp synchronization and that timestamps between sub-meters could be off by ±10 s due to the fact that the meters have a limiting fixed baud rate of 9600 bsp. This slight variation in reading time is the cause of having whole-house readings less than the summation of all sub-meters. Suppose, for example, the electricity mains are metered by sub-meter A and the heat pump is metered by sub-meter F. The data acquisition unit would download the measurement data from sub-meter A, then B, and so on, finally to F—taking a total of 10 s to do. If the heat pump was to turn ON within that 10 s window, then the readings from sub-meter A would not reflect the more recent event that would be reflected in sub-meter F—the heat pump turning ON.
The second factor that can contribute to this summation has to do with rounding. Although the meter is quite precise, the measurement values stored in the memory registers are rounded to the nearest whole number for some measurements and tenths of a whole number for other measurements. When we sum up these rounded numbers, they can exceed the whole-house reading. No changes to the whole-house reading were performed if the opposite was true. This is because there were many unmetered loads in the house that could be running at any given time.
We found an additional problem that affected the metering of the clothes dryer (CDE) PowerScout 18 Unit 1 Meter E. L3 (line 3, for 3-phase loads) was not used but the meter was recording a phantom load for 04.A and between 27–30VA. We verified with a multimeter that this should not be the case. There is an additional step to remove the phantom load measurements from the CDE datafile. For details, refer to the make_AMPds2_power.py script24.
For the water data, 14 discrepancies were found between the counter and avg_rate. In all cases, the counter should be a cumulative sum of the avg_rate. Of the few times when this was not the case, usually (9 out of 14 times) it was because a pulse failed to be recorded in the avg_rate column. In a few cases (4 out of 14 times), the avg_rate was not a multiple of the pulse size. For both of these types of error, the avg_rate was simply overwritten with the true value derived from the change in the counter. The remaining occurrence (1 out of 14 times) was an accuracy error of 0.001 in the counter. This and all following counter values were adjusted to fix this. For details, refer to the make_AMPds2_pulse.py script24.
How to cite this article: Makonin, S. et al. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Sci. Data 3:160037 doi: 10.1038/sdata.2016.37 (2016).
Makonin, S. Harvard Dataverse. http://dx.doi.org/10.7910/DVN/2K9FFE (2010)
We would like to thank British Columbia Institute of Technology (BCIT), Electrical and Computer Engineering Technology students and faculty Bob Gill for their collaborations in the past.