Skip to content

energietransitie/twomes-dataset-windesheim-assendorp2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

Twomes Dataset Assendorp 2021 (metadata only; real data follows)

Dataset containing anonymized energy and temperature data from homes in the Assendorp neighbourhood in Zwolle, the Netherlands

The dataset is currently in review to ensure it meet our standards for anonymization. Here, you can already view the metadata, so you know what you may expect.

Table of contents

General info

This repository will contain an anonymized dataset comprising time series measurement data about energy and temperatures in residential homes (mostly) in the Assendorp neighbourhood in Zwolle, the Netherlands, obtained during the heating season of 2021-2022 in the Twomes research project.

Note: Git LFS is required to clone big CSV files

Recruitment

Subjects were recruited from the municipality of Zwolle during autumn 2021. Recruitment was primarily targeted at the Assendorp neighbourhood, via 50 Tinten Groen Assendorp, e.g. via a news message. In November 2021, additional subjects were recruited amongst colleagues and students of Windesheim University of Applied Sciences, e.g. via an internal message.

Subjects could volunteer to participate and give informed consent by filling out this generic online recruitment survey (this survey is also available in XML-format). Subjects that satisfied all inclusion criteria were also asked to fill out this online survey to provide their static Bluetooth MAC addressto us (this survey is also available in XML-format) and were asked to forward this latter survey to the other members of the household with a smartphone.

Inclusion criteria

Inclusion criteria were:

  • the home address lies in the municipality of Zwolle, the Netherlands;
  • the home is equipped with an internet connection and wireless internet (Wi-Fi);
  • the home is heated by a gas-fired heating boiler that is controlled by a thermostat and is not predominantly heated via other means;
  • a smart energy meter is installed in the home;
  • at least one of the occupants has an Android/iOS smartphone.

Data management

We documented our Data Management Plan online. The privacy policy (in Dutch) is available online as well in a layered structure: short summary, summary and full version.

Data

In the sections below, the data pre-processing and data formats used in the data files will be described.

Homes

TODO: describe

Measurement Devices

We used the following measurement device types to collect data. Some devices consisted of a main device and one or two satellite devices.

Device type name Category Main device repo Satellite device 2 repo Satellite device 2 repo
OpenTherm-Monitor comfort + installation + occupancy twomes-opentherm-monitor-firmware
DSMR-P1-gateway energy twomes-p1-gateway-firmware
DSMR-P1-gateway-Tin energy + comfort twomes-p1-gateway-firmware twomes-room-monitor-firmware
DSMR-P1-gateway-TinTsTr energy + comfort + installation twomes-p1-gateway-firmware twomes-room-monitor-firmware twomes-boiler-monitor-firmware
DSMR-P1-gateway-TinTsTrCO2 energy + comfort + installation + occupancy/ventilation twomes-p1-gateway-firmware twomes-room-monitor-firmware twomes-boiler-monitor-firmware

Date and time information

All timestamps were measured in Unix time format, using device clocks regularly synchronized via NTP with the correct UTC time. Setting the local device clock to the proper UTC time via NTP was one of the first steps performed by the measurement devices after they were connected to the internet via the home Wi-Fi network of a subject. Each measurement device synchronized its device clock via NTP every 6 hours. Uploads of measurement data (which could contain more than one measurement) were timestamped both by the measurement device according to the local device clock and by the server. We did not yet check for deviations between the last device timestamp of a measurement upload and the upload timestamp at the server.

Timestamps were converted to a timezone-aware pandas.Timestamp value, in the Europe/Amsterdam timezone. In the csv files we use ISO 8601 format with time offset: YYYY-MM-DDThh:mm:ss±hhmm.

Raw measurements

Raw measurements will be available in the folder /raw-measurements/ in three formats:

  • twomes_raw_measurements.parquet: a single parquet file with all 23 homes for which we have more than 3 weeks data;
  • 8nnnnn_raw_measurements.parquet: 23 parquet files, one for each home;
  • 8nnnnn_raw_measurements.zip: 23 zipped csv files, one for each home;

All measurement data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame wih the recommended indices and data types.

Alternatively, you can also read the zipped csv files, but this typically takes much longer. You can use the code below to end up with a DataFrame with the recommended indices and data types:

TODO: insert pandas.read_csv() code here
Index/Column Name Type Description
index id category unique code of the home
index device_name category unique name of the measurement device
index source category device type name of the measurement device
index timestamp Timestamp start of the interval (timezone aware)
index property category property name of the measurement
column value object value of the measurement
column unit category unit of the measurement value

Raw properties

In the folder /raw-properties/ we will make various measured properties available in an 'unstacked' format with each property in its own column and an appropriate datatype. Similar to measurements, we will make data available in three formats:

  • twomes_raw_properties.parquet: a single parquet file with all 23 homes for which we have more than 3 weeks data;
  • 8nnnnn_raw_properties.parquet: 23 parquet files, one for each home;
  • 8nnnnn_raw_properties.zip: 23 zipped csv files, one for each home;

All property data is structured according to the table below. By importing the parquet variant using pandas.read_parquet(), you automatically get a DataFrame wih the recommended indices and data types.

Alternatively, you can also read the zipped csv files, but this typically takes much longer. You can use the code below to endup with a DataFrame with the recommended indices and data types:

TODO: insert pandas.read_csv() code here
Index/Column Name Type Description
index id category unique code of the home
index source category device type name of the measurement device
index timestamp Timestamp start of the interval (timezone aware)
column property_1; see property table below data_type_1 measured value of this property
column property2 data_type_2 measured value of this property
... ... ... ...
column property_n data_type_n measured value of this property

Measured Properties

Below is a table that lists all properties that were measured, the data type in the raw-properties DataFrame, the measurement unit, the measurement interval, the source device and sensor that measured it, as well as the property name and value format as retrieved from the Twomes database.

Property Type Unit Measurement interval [h:mm:ss] Description Source Sensor /
OpenTherm ID/byte/bit) /
OBIS reference
Database property Database format
co2__ppm float32 ppm 0:05:00 CO₂ concentration DSMR-P1-gateway-TinTsTrCO2 SCD41 CO2concentration %d
rel_humidity__0 float32 - 0:05:00 relative humidity DSMR-P1-gateway-TinTsTrCO2 SCD41 humidity %d
temp_in__degC float32 °C 0:05:00 air temperature DSMR-P1-gateway-TinTsTrCO2 SCD41 roomTempCO2 %.1f
temp_in__degC float32 °C 0:05:00 air temperature DSMR-P1-gateway-TinTsTr Si7051 roomTemp %.1f
temp1__degC float32 °C 0:00:10 temperature of hydronic supply/return pipe DSMR-P1-gateway-TinTsTr DS18B20 boilerTemp1 %.1f
temp2__degC float32 °C 0:00:10 temperature of hydronic supply/return pipe DSMR-P1-gateway-TinTsTr DS18B20 boilerTemp2 %.1f
heartbeat Int16 - 0:10:00 measurement system heartbeat OpenTherm-Monitor ESP32 heartbeat %d
ch__bool Int8 bool 0:00:30 STATUS/CH mode OpenTherm-Monitor 0/LB/1 isCentralHeatingModeOn 0/1
dhw__bool Int8 bool 0:00:30 STATUS/DHW mode OpenTherm-Monitor 0/LB/2 isDomesticHotWaterModeOn 0/1
flame__bool Int8 bool 0:00:30 STATUS /Flame status OpenTherm-Monitor 0/LB/3 isBoilerFlameOn 0/1
mod_max__0 Int8 - 0:00:30 CAPACITY SETTING OpenTherm-Monitor 14 maxModulationLevel %d
cap_max__kW Int8 kW 0:00:30 MAX CAPACITY OpenTherm-Monitor 15/LB maxBoilerCap %d
mod_min__0 Int8 - 0:00:30 MIN-MOD-LEVEL OpenTherm-Monitor 15/HB minModulationLevel %d
temp_set__degC float32 °C 0:05:00 ROOM SETPOINT OpenTherm-Monitor 16 roomSetpointTemp %.2f
mod__0 Int8 - 0:00:30 RELATIVE MODULATION LEVEL OpenTherm-Monitor 17 relativeModulationLevel %d
temp_in__degC float32 °C 0:05:00 ROOM TEMPERATURE OpenTherm-Monitor 24 roomTemp %.2f
temp_ch_max__degC float32 °C 0:05:00 MAX CH WATER SETPOINT OpenTherm-Monitor 57 boilerMaxSupplyTemp %.2f
temp_sup__degC float32 °C 0:00:10 BOILER WATER TEMP. OpenTherm-Monitor 25 boilerSupplyTemp %.2f
temp_ret__degC float32 °C 0:00:10 RETURN WATER TEMPERATURE OpenTherm-Monitor 28 boilerReturnTemp %.2f
presence__dBm_csv str [dBm] 1:00:00 Bluetooth presence OpenTherm-Monitor ESP32 listRSSI %d
heartbeat Int16 - 0:10:00 measurement system heartbeat DSMR-P1-gateway ESP32 heartbeat %d
e_use_lo_cum__kWh float64 kWh 0:05:00 electricity meter reading DSMR-P1-gateway 1-0:1.8.1 eMeterReadingSupplyLow %.3f
e_use_hi_cum__kWh float64 kWh 0:05:00 electricity meter reading DSMR-P1-gateway 1-0:1.8.2 eMeterReadingSupplyHigh %.3f
e_ret_lo_cum__kWh float64 kWh 0:05:00 electricity meter reading DSMR-P1-gateway 1-0:2.8.1 eMeterReadingReturnLow %.3f
e_ret_hi_cum__kWh float64 kWh 0:05:00 electricity meter reading DSMR-P1-gateway 1-0:2.8.2 eMeterReadingReturnHigh %.3f
e_timestamp__YYMMDDhhmX str local time; tz=Europe/Amsterdam 0:05:00 electricity meter reading DSMR-P1-gateway 0-0:1.0.0.255 eMeterReadingTimestamp YYMMDDhhmX
g_timestamp__YYMMDDhhmX str local time; tz=Europe/Amsterdam 0:05:00 / 1:00:00 1 gas meter reading DSMR-P1-gateway 0-0:1.0.0.255 gMeterReadingTimestamp YYMMDDhhmX
g_use_cum__m3 float64 m3 0:05:00 gas meter reading DSMR-P1-gateway 0-n:24.2.1.255 gMeterReadingSupply %.3f

Weather measurements

Weather data was collected and geospatially interpolated using HourlyHistoricWeather from the Royal Netherlands Meteorological Institute (KNMI), based on average hourly values.

For all homes, we used the same location for geospatial interpolation of weather data: lat, lon = 52.50655, 6.09961, the center of the Assendorp neighbourhood in Zwolle, the Netherlands. Average values were converted from the source units to the units as indicated in the table below.

Index/Column Property Type Unit Measurement interval [h:mm:ss] Description Source Source property Source value format Source unit
index timestamp Timestamp start of the measurement interval KNMI YYYMMDD, H H=1: 0:00:00 - 0:59:59; H=24: 23:00:00 - 23:59:59;
column temp_out__degC float32 °C 1:00:00 outdoor temperature KNMI T %d 0.1 °C
column wind__m_s_1 float32 m/s 1:00:00 wind speed KNMI FH %d 0.1 m/s
column ghi__W_m_2 float32 W/m2 1:00:00 global horizontal irradiance KNMI Q %d J/(h·cm2)

Preprocessed data

Preprocessing steps include:

  • if available, use timestamps based on e_timestamp__YYMMDDhhmX or g_timestamp__YYMMDDhhmX for smart meter reading measurements, instead of the timestamp obtained from the twomes-p1-gateway-firmware;
  • remove duplicate measurements (including duplicates that arise from the previous step);
  • filter out smart meter resets: for each property with _cum in the name, for each time series of a particular home for that property, typically by taking a diff() of series, followed by setting any negative values to zero, then taking the cumsum() of the series;
  • calculate energy flow rates for meter readings: for each property with _cum in the name, for each time series of a particular home for that property, take a diff() of the series and assign it to a series with the same name, but without _cum in the name;
  • remove absolute outliers, i.e. measurement values smaller than the value in the column Min or larger than the value in the column Max in the table below;
  • remove statistic outliers, i.e. measurement values with an absolute z-score higher than the value indicated in the Sigma column in the table below;
  • interpolate measurements to intervals of 15 minutes, but to not interpolate between measurements that are 60 minutes apart or more;
  • calculate derived properties as a combination of other properties, as indicated in the column Calculation in the table below.

All column values in a preprocessed data frame represent the average during the interval that starts at the timestamp indicated.

Index/ Column Name Type Unit Description Calculation Min Max Sigma
index id Int16 unique code of the home 800000 899999
index timestamp Timestamp start of the interpolated interval (timezone aware)
column temp_out__degC float32 °C outdoor temperature -28 40
column wind__m_s_1 float32 m/s wind speed 0 35
column ghi__W_m_2 Int16 W/m2 global horizontal irradiance 0 1000
column temp_in__degC float32 °C indoor temperature 0 40 3
column temp_set__degC float32 °C thermostat setpoint temperature 0 40
column g_use__W Int16 W natural gas power used (superior calorific value) Δg_use_cum__m3 · h_sup__J_m_3 / Δtimestamp 2 0 1e5
column e_use__W Int16 W electrical power obtained from the grid e_use_hi_cum__m3+ Δe_use_lo_cum__m3) · J_kWh_1 / Δtimestamp 3 0 2e4
column e_ret__W Int16 W electrical power returned to the grid e_ret_hi_cum__m3+ Δe_ret_lo_cum__m3) · J_kWh_1 / Δtimestamp 3 0 2e4

Status

Dataset is: collected, anonymization-in-progress

License

This data is made available under the CC BY 4.0 by the Research group Energy Transition, Windesheim University of Applied Sciences

Credits

Thanks go to those who are the ultimate source of this dataset:

  • all anonymous subjects who volunteered to make their measurement data available

We use and gratefully acknowledge the efforts of the makers of the following source code and libraries:

Footnotes

  1. Smart meters with DSMR ≥ 5.0 report meter readings every 5 minutes, smart meters with DSMR < 5.0 report gas meter readings only every hour.

  2. h_sup__J_m_3 = 35.17e6 J/m3
    Conversion factor for the superior calorific value of natural gas from the Groningen field.

  3. J_kWh_1 = 3.6e6 J/kWh
    Conversion factor from kWh to J. 2

About

Dataset containing anonimized energy and temperature data from homes in the Assendorp neighbourhood in Zwolle, the Netherlands

Resources

License

Stars

Watchers

Forks

Languages