OPERAnet: A Multimodal Activity Recognition Dataset Acquired from Radio Frequency and Vision-based Sensors

This paper presents a comprehensive dataset intended to evaluate passive Human Activity Recognition (HAR) and localization techniques with measurements obtained from synchronized Radio-Frequency (RF) devices and vision-based sensors. The dataset consists of RF data including Channel State Information (CSI) extracted from a WiFi Network Interface Card (NIC), Passive WiFi Radar (PWR) built upon a Software Defined Radio (SDR) platform, and Ultra-Wideband (UWB) signals acquired via commercial off-the-shelf hardware. It also consists of vision/Infra-red based data acquired from Kinect sensors. Approximately 8 hours of annotated measurements are provided, which are collected across two rooms from 6 participants performing 6 daily activities. This dataset can be exploited to advance WiFi and vision-based HAR, for example, using pattern recognition, skeletal representation, deep learning algorithms or other novel approaches to accurately recognize human activities. Furthermore, it can potentially be used to passively track a human in an indoor environment. Such datasets are key tools required for the development of new algorithms and methods in the context of smart homes, elderly care, and surveillance applications.

environment. The dataset is comprehensive in so far it contains over 1 Million annotated data points.
• The presented data can be exploited to advance human activity recognition technology in different ways, for example, using various pattern recognition and deep learning algorithms to accurately recognize human activities. For this purpose, the users can apply different signal processing pipelines to analyze the recorded WiFi CSI, PWR, UWB and Kinect data and extract salient features that can be used to recognize the human activities and/or concurrently track the target's position within an indoor environment.
• This is the first dataset that is collected with an explicit aim to accelerate the development of self-supervised learning techniques. Such techniques are extremely data hungry, requiring orders of magnitude larger datasets compared to more traditional supervised learning.
This open-source dataset is intended for both HAR and non-cooperative localization, which are areas of growing interest to research communities including but not limited to radar, wireless sensing, IoT and computer vision. To ensure that the dataset aligns to the FAIR (Findable, Accessible, Interoperable, Reusable) Data principles of Open Science, we have (i) made it publicly available for download via the figshare portal, (ii) provided an in-depth and clear description of the dataset for each modality, (iii) formatted our dataset using standard filetypes and encoding, and (iv) provided example scripts/codes that will allow the user to load and analyze the data from each modality.

Methods
Experiments were performed in a university environment in two furnished rooms, with desks, chairs, screens, and other office objects lying in the surroundings. The room layouts are depicted in Fig. 1 along with their physical dimensions. A maximum of six subjects of different age groups participated in the experiments which were intended for the sensing of day-to-day activities as well as non-collaborative localization. The description of the various experiments performed is provided in Table 1. Approximately 8 hours of data were collected across multiple modalities including WiFi Channel State Information (CSI), Ultra-wideband (UWB), Passive WiFi Radar (PWR) and kinect sensor systems. The breakdown of the activities' durations is given in Table 2. The monitoring devices were installed on the extremity (boundary) of the rooms such that enclosed spaces of dimensions 2.45m ×4.40m and 4.06m ×4.53m were used as monitoring areas for Room 1 and 2, respectively.

exp056-exp061
Person performing the six activities (2-7) in a predefined order, starting with activity "walking" and ending with activity "body rotating". exp028 Crowd counting. A maximum of six people walking continuously and randomly. Experiment starts with six people and then after every 5 min, one person steps out of the monitoring area.

exp035-exp043
Device-free static localization. CSI transmitter (NUC3) and CSI receiver (NUC2) are placed side by side and the target stand still at a given position for each experiment number.

exp044-exp048
Device-free dynamic localization. CSI transmitter (NUC3) and CSI receiver (NUC2) are placed side by side and the target moves along a short straight path for each experiment number.

exp049-exp054
Device-to-device localization. No human target present. CSI transmitter (NUC3) and CSI receiver (NUC2) are placed at different angles with respect to each other (-30 0 , 0 0 , +30 0 , -60 0 , 0 0 , +60 0 ) for each experiment number. Referring to the experiment numbers in Table 1, exp001-exp054 were performed in Room 1 while exp055-exp061 were carried out in Room 2. exp028 is the crowd counting experiment whereby six people walked randomly and continuously within the monitoring area of Room 1. Then, after approximately every 5 min, one person moved out of the room. Fig. 4 shows the particular instant of exp028 where 5 out of the 6 people already left the monitoring area and only the last person's ground truth walking trajectory is shown. For illustration purposes only, a moving average filter is applied to the raw ground truth positions to smooth the target's trajectory path. The experiments exp034-exp048 were device-free localization experiments involving a human target who was standing still at several positions or walking along a straight short path in a number of directions as shown in Fig. 2. The target wore a tag to get his/her ground truth position. Note that only the WiFi CSI transmitter (NUC3) and receiver (NUC2) were used during these experiments for recording data and they were placed side by side. As for the device-to-device localization experiments (exp049-exp054), the CSI transmitter (NUC3) and receiver (NUC2) were placed at different angles with respect to each other (0 0 , 30 0 , -30 0 , 60 0 , -60 0 ), as shown in Fig. 3 (no human target). Note that one tag was placed on the CSI transmitter and another on the receiver to get their fixed ground locations within the environment.

WiFi CSI
The WiFi CSI system consistsed of three Intel Core i5 vPro Next Unit of Computing (NUC) devices. Each device is fitted with an Intel5300 Network Interface Card (NIC). In order to extract the CSI from the NICs, the Linux 802.11n CSI tool 19 need to be installed on the devices running an appropriate kernel version of the Linux operating system. Appropriate firmware and drivers need to be installed on the devices in order to expose the CSI. More information regarding the installation steps can be found in 20 . The CSI provides information about the wireless channel characteristics such as multipath propagation, attenuation, phase shift, etc. It is regarded as a fine-grained information since it describes the amplitude and phase information of the signal across multiple overlapping but orthogonal subcarriers in the Orthogonal Frequency Division Multiplexing (OFDM) physical layer waveform. In a WiFi system based on OFDM, the CSI is used by the equalizer in the receiver to reverse the detrimental effects of the channel and recover the transmitted signal. The channel estimate (i.e., CSI) is obtained by transmitting a training sequence (pilot symbols) which is known by both the WiFi transmitter and receiver. This process is often referred to as channel sounding. CSI or Channel Frequency Response (CFR) are often used interchangeably and they represent the wireless channel in the frequency domain. Applying the Inverse Fast Fourier Transform (IFFT) to CFR gives rise to the Channel Impulse Response (CIR) in the time domain and this characterizes the amplitude and phase information over multiple propagation paths, as shown in Fig. 6. The Intel 5300 NIC extracts CSI over 30 subcarriers, spread evenly among the 56 subcarriers of a 20 MHz WiFi channel or the 114 subcarriers in a 40 MHz channel 19 . To record the CSI, the injector mode was used whereby one NUC was configured as the transmitter (injector) while the receivers were monitoring the channel into which packets were injected. This method requires that both the transmitter and receiver be equipped with the Intel5300 NIC. In the access point (AP) method, only the receiver needs to be equipped with the Intel5300 NIC and CSI data is logged at the receiver by pinging an access point. However, this method is not very stable since there might be a lot of dropped packets (e.g., due to interference) and also the packet rate is limited. The CSI data was stored on the receiver NUCs for offline processing. The raw data is in .dat format and they need to be parsed by appropriate Matlab utilities 20 to obtain the data in a format that can be easily interpreted. Since the Intel5300 supports Multiple Input Multiple Output (MIMO) capability, the CSI data was logged as a 3D tensor of complex numbers for each received packet, with n t × n r × N complex values, where n t is the number of transmit antennas, n r is the number of receive antennas and N is the number of subcarriers. The parameters of the CSI system are summarized in Table  3. Referring to the green boxes in Fig. 1, the WiFi CSI system comprised of a single transmitter (NUC3) and two receivers (NUC1 and NUC2). NUC1 was installed facing the transmitter in a Line-of-Sight (LoS) geometry, while receiver NUC2 was placed in a bi-static configuration (90 0 ) with respect to the transmitter.

System
WiFi

UWB
Three UWB systems were used during the experiments. The first system (see red nodes in Fig. 1) was used to obtain the ground truth position of the target while he/she wore one or more tags and moved within the monitoring area. The other two systems (yellow and blue nodes) were fixed nodes installed a multi-static configuration and which were exchanging Channel Impulse Response (CIR) data among themselves. The 4 passive nodes of UWB system 1 (yellow nodes in Fig. 1) were implemented  using the Decawave's EVK1000 21 modules. The modules were programmed with a custom firmware so as to record CIR data on all modules. Node '0' was acting as an initiator whereby it exchanged Single-Sided Two-Way Ranging (SS-TWR) messages (poll, response and final) with each of the other 3 nodes. When a given node replies back, the frame is broadcast and heard by all other nodes operating on the same channel. In this way, each node can read the received frames in their accumulator and extract the CIR data. Therefore, CIR data is available in a bidirectional mode between all pairs of nodes. This means that all nodes act as transmitters and receivers, giving rise to a maximum of 12 communication channels. The 4 nodes were connected to laptops in order to record the CIR data via a serial terminal. The 5 passive nodes of UWB system 2 (blue nodes in Fig. 1) were implemented using the Decawave's MDEK1001 22 modules. The units were flashed with custom firmware so as to record CIR data on all nodes. Node '0' was acting as an initiator and transmitted a packet at a set period. The packet essentially includes a time schedule for transmission for the other 4 nodes. In this way, each node knows who needs to transmit next and when with minimal delay. Thus the transmission was performed in a round-robin fashion to avoid collision. Nodes with IDs 1-4 were connected to laptops to record the CIR data via a serial terminal. The average packet rate for UWB system 1 (yellow nodes) was around 400Hz while for UWB system 2 (blue nodes), the average packet rate was around 195Hz, considering combined communication links. The other parameters for UWB system 1 and 2 are summarized in Table 4.  Table 5. PWR system parameters.

PWR
For the Passive WiFi radar (PWR) system, a USRP-2945 25 was used as the receiver which is equipped with four synchronized channels. The USRP-2945 features a two-stage superheterodyne architecture with four independent receiving channels and shares local oscillators for phase-coherent operation. Each receiving channel was equipped with a 6-dB directional antenna. The collected raw data are then routed to a computing unit through a PCIe port, which is a desktop computer in this work. A PWR system consists of a minimum of two synchronized channels; a surveillance channel which records reflected WiFi signals from the monitoring area and a reference channel which records the direct signal emitted from the transmitter. As mentioned previously, four channels are used in the USRP-2945, where one channel was used as the reference channel (denoted as "rx1" in Fig. 1) while the other three channels were used as surveillance channels (denoted as "rx2", "rx3" and "rx4" in Fig. 1). Since PWR does not transmit a signal (it only monitors received signals), it can use any third-party signal source as the illuminator but however, a reference signal is needed. In this work, we used the CSI transmitter (NUC3) as the PWR source for convenience, allowing a direct comparison between the two systems' performance. PWR correlates the signal from the surveillance and reference channels to estimate two parameters: relative range and 6/17 Doppler frequency shift. Additionally, a CLEAN 26 algorithm has been used to remove the direct signal interference. More details on this signal processing can be found at 26 . However, due to the limitation of the WiFi signal bandwidth (40MHz in this work), the range resolution is limited to 3.75 meters which is too coarse for indoor applications. Therefore, only the Doppler frequency shifts are recorded in the form of Doppler spectrograms. The output from the PWR system is specified as n s × n b × N t real values, where n s is the number of surveillance channels, n b is the number of Doppler bins and N t is the time.
Other details about the PWR's parameters are given in Table 5.

Kinect
We used two of Microsoft's Kinect v2 sensors to gather motion capture data from different human activities 27 . Kinect v2 incorporates an Infrared depth sensor, a RGB camera, and a four-element microphone array that provides functionalities such as three-dimensional skeletal tracking, facial recognition, and voice recognition. Although the device was originally developed to play games, numerous researchers have used it for applications beyond its initial intended purpose. Due to the low cost and wide availability, it has now been used extensively in research areas, such as video surveillance systems where multiple Kinect devices are synchronized to track groups of people even in complete darkness 28 , improve live three-dimensional videoconferencing 29 and in medicines to measure a range of conditions such as autism, attention-deficit disorder and obsessive-compulsive disorder in children 30 . Note that in skeleton tracking, Kinect might suffer from occlusion when some parts of the human body are occluded with others and therefore cannot be tracked accurately. Therefore, in this work, we used two Kinects to track three-dimensional time-varying skeletal information of the human body, including 25 joints such as head center location, knee joints, elbow joints, and shoulder joints from two different directions. The real advantage of using motion capture technology is capturing more accurate, more realistic, and complex human motions. This three-dimensional joint information can further be used for simulating the corresponding radar scatterings mimicking a typical PWR sensing system. In one of our previous works, we presented an open-source motion capture data-driven simulation tool, SimHumalator, that can generate large volumes of human micro-Doppler radar data at multiple IEEE WiFi standards (IEEE 802.11g, ax, and ad) 18 . Radar scatterings were simulated by integrating the animation data of humans with IEEE 802.11 complaint WiFi transmissions to capture features that incorporate the diversity of human motion characteristics and the sensor parameters. More importantly, we have demonstrated that the human micro-Doppler data generated using the simulator can be used to augment limited experimental data 31,32 . Interested researchers can download the simulator from https://uwsl.co.uk/.

7/17
The output from the Kinect system is specified as N t × N b × N d real values, where N t is the number of time frames, N b is the number of tracked joints on the human body, and N d is the three-dimensional position (x, y, z) information.

Ground Truthing
• The Decawave's (now Qorvo) MDEK1001 development kit 22 was used for obtaining the ground truth position of the targets. 11 units were configured as anchors and mounted on walls in the experiment rooms (see red nodes in Fig. 1). Their xy coordinates were manually measured using a laser measuring device, which were then entered in the DRTLS Android app. A maximum of 6 tags were configured for exp028, while for the activity recognition experiments, the person wore two tags, one on each arm. Two UWB units were also configured as listeners so as to record the xy coordinates of the tags on a serial terminal on two laptops (for redundancy).
• As for the labelling of activities, a program was developed in Matlab with automated voice output to instruct the person when to perform the various activities such as sitting, standing, etc. At the same time, the programmable script recorded the timestamps at which the activity was instructed to be performed. The person just had to listen to the voice command and perform the activity accordingly. As a backup solution, another activity labelling application was developed in Matlab where one can insert the labels for the required activities. Then, an observer constantly looked at the person doing the activities and clicked on the appropriate button in the app to record the start and stop times of the activity. All labels were stored in text files along with their timestamps. Note that all modalities were synchronized to the same local Network Time Protocol (NTP) server, resulting in synchronisation accuracy across all modalities of < 20ms.

Data Records
Four modalities have been used during the experiments, namely, WiFi CSI, UWB, PWR and Kinect sensor. With respect to WiFi CSI and UWB, there were two systems in each case. The experimental datasets can be accessed and downloaded from the figshare repository at https://figshare.com/s/c774748e127dcdecc667. The datasets have been compressed (zipped) into separate folders for each modality, allowing the user to only download the data of interest. The zipped folders' names and the number of files in each folder, along with their file formats, are specified in Table 6. The directories wificsi1 and wificsi2 refer to the data collected by the WiFi CSI receivers, denoted by "NUC1" and "NUC2" in Fig. 1, respectively. uwb1 and uwb2 refer to the data collected by the two passive UWB systems, represented by the yellow and blue nodes in Fig. 1, respectively. The directory pwr contains the PWR spectrogram data recorded from the three surveillance channels ("rx2", rx3" and "rx4" represented as black triangles in Fig. 1) for each experiment (excluding exp001,exp019, exp034-exp054).

Experiment Directory
Each file in the directories in Table 6 corresponds to a given experiment number (filename contains strings exp001,exp002, etc.,), the details of which are provided in Table 1.  Table 6. Dataset directory details.

WiFi CSI dataset description
This section describes the structure of the data files residing in the wificsi1 (NUC1) and wificsi2 (NUC2) directories. The files are in .mat format and each row in the file corresponds to a received CSI packet. The columns in the dataset have the following headers: • timestamp: UTC+01 00 timestamp in milliseconds when the CSI packet was received by the NUC devices.
• tag4422_x, tag4422_y, tag89b3_x, tag89b3_y: refer to the ground truth position of the person in the monitoring area in terms of 2D x-and y-coordinates. Note that for all experiments, except exp001,exp019,exp034, exp055,exp028 and exp035-exp054 (NUC2 only), the person was wearing two UWB tags on either arms, bearing IDs 4422 and 89b3. The information regarding which tag is worn on which arm is given in the columns with headers left_arm_tag_id and right_arm_tag_id. For the crowd counting experiment (exp028), there were a maximum of 6 people and hence 6 UWB tags were used to obtain the ground truth position of each person. Each person wore the tag on his/her left arm. In the WiFi CSI files for exp028, the x-and y-coordinates of the person is given in the columns tag4422_x, tag4422_y, tag89b3_x, tag89b3_y, tag122c_x, tag122c_y, tag4956_x, tag4956_y, tag1e85_x, tag1e85_y, and tag9118_x, 9118_y. The person bearing UWB tag id 4956 was the first to step out of the monitoring area, followed by 9118, 1E85, 4422, 89B3, and finally 122C.
• anchor_node_xy_positions: x-and y-coordinates of the eleven UWB anchor nodes distributed across the room (see red nodes in Fig. 1) for obtaining the ground truth position of the tag/s.
• tx_x_coord, tx_y_coord, target_x_coord, target_y_coord (for exp035-exp054 only): For exp035 -exp048, tx_x_coord and tx_y_coord respectively correspond to the x-and y-coordinates of both the CSI transmitter (NUC3) and CSI receiver (NUC2) since they were placed side by side while the target was standing still at several positions or walking along a short path. The human target was holding a tag and its ground truth x-and y-coordinates are given by target_x_coord and target_y_coord, respectively. As for exp049 -exp054, tx_x_coord and tx_y_coord refer to the x-and y-coordinates of the CSI transmitter (NUC3), respectively, while target_x_coord and target_y_coord refer to the x-and y-coordinates of the CSI receiver (NUC2), respectively. No human target was present in this case.

UWB dataset description
This section describes the structure of the data files residing in the uwb1 (UWB system 1-yellow nodes) and uwb2 (UWB system 2-blue nodes) directories. The files are in .csv format and each row in the file corresponds to a received UWB packet. The UWB dataset files have the following fields similar to the WiFi CSI datasets: timestamp, activity, exp_no, person_id, room_no, tag4422_x, tag4422_y, tag89b3_x, tag89b3_y, tag122c_x, tag122c_y, tag4956_x, tag4956_y, tag1e85_x, tag1e85_y, tag9118_x, 9118_y, left_arm_tag_id, right_arm_ tag_id, and anchor_node_xy_positions. The additional column headers or those that are different from the WiFi CSI dataset headers are described below: • fp_pow_dbm: estimate of the first path power level (in dBm) of the UWB signal between a pair of nodes. The formula for computing this value is given in 23 .
• rx_pow_dbm: estimate of the receive power level (in dBm) of the UWB signal between a pair of nodes. The formula for computing this value is given in 23 .
According to the manufacturer, the above two estimated parameters can be used to infer whether the received signal is Line-of-Sight (LoS) or Non-Line-of-Sight (NLOS). It is stated that, as a rule of thumb, if the difference of the two parameters, i.e., rx_pow_dbm -fp_pow_dbm is less than 6dB, the signal is most likely to be LoS, whilst if the difference is greater than 10dB, the signal is likely to be NLoS 23 .
• tx_rx_dist_meters: separation distance between the pair of transmitting and receiving nodes in meters.
• fp_index: accumulator first path index as reported by the Leading Edge Detection (LDE) algorithm of the DW1000 UWB chipset in register 0x15 (in FP_INDEX field). It is a sub-nanosecond quantity, consisting of an integer part and a fractional part.
• fp_amp1: first path amplitude (point 3) value reported in the FP_AMPL1 field of register 0x15 of the DW1000 UWB chipset.
• fp_amp2: first path Amplitude (point 2) value reported in the FP_AMPL2 field of register 0x12 of the DW1000 UWB chipset.
• fp_amp3: first path amplitude (point 1) value reported in the FP_AMPL3 field of register 0x12 of the DW1000 UWB chipset.
• max_growth_cir: Channel Impulse Response (CIR) power value reported in the CIR_PWR field of register 0x12 of the DW1000 UWB chipset. This value is the sum of the squares of the magnitudes of the accumulator from the estimated highest power portion of the channel, which is related to the receive signal power 23 .
• rx_pream_count: Preamble Accumulation Count (PAC) value reported in the RXPACC field of register 0x10 of the DW1000 UWB chipset. RXPACC reports the number of accumulated preamble symbols. The DW1000 chip estimates the CIR by correlating a known preamble sequence with the received signal and accumulating the result over a time period. The number of preambles used for the CIR estimation is dependent on the quality of the received signal.

PWR dataset description
This section describes the structure of the data files residing in the pwr directory. The files are in .mat format and each row in the files corresponds to a PWR measurement from each of the three receivers (surveillance channels) at a given point in time.
• exp_no: experiment number which is specified as "exp_002", "exp_003", etc. See Table 1 for more details. Note that the PWR system does not need background scan. Hence, background data for "exp_001" and "exp_019" are omitted for the PWR system.

Kinect dataset description
This section describes the structure of the data files residing in the kinect directory. The files are in .mat format and each row in the files corresponds to three-dimensional skeleton information captured from each of the two Kinects at a given point in time.
• exp_no: experiment number which is specified as "exp_002", "exp_003", etc. See Table 1 for more details. Note that the Kinect system does not need background scan. Hence, background data for "exp_001" and "exp_019" are omitted for the Kinect data.
• timestamp: UTC+01 00 timestamp in milliseconds when the Kinect skeleton data were recorded.
• Kinect1: Velocity-time profile measured from Kinect skeleton data over a period of time is demonstrated in Fig. 5(c). It perfectly captures human motion characteristics and is qualitatively similar to the envelope of the human-micro-Doppler signatures presented in 5(d).

Technical Validation
WiFi CSI Fig. 5(a) shows a 196s portion of the received WiFi CSI signals on NUC1 and NUC2 for exp018. CSI values for transmit antenna 1, receive antenna 1 and subcarrier 10 has been considered here. The injection packet rate of WiFi CSI was set a 1600 Hz. For illustration purposes only, the CSI data has been filtered using a 1D wavelet denoising technique and the corresponding results are shown in Fig. 5(a). It can be observed that the CSI measurements in the time domain capture variations in the wireless signal due to the latter's interaction with surrounding objects and human bodies. Therefore, machine or deep learning algorithms can be used to train the observed patterns and automatically extract features from raw signals to predict human activities. The CSI data can be processed and interpreted in different ways (feature extraction), for example spectrograms which are generated by applying Short Time Fourier Transform (STFT) to the CSI amplitude data [3][4][5] . Note that the two receiving NUCs (NUC1 and NUC2) were arranged differently with respect to the transmitter (NUC3). As shown in Fig. 1, in both rooms NUC1 was facing the transmitter in a 180 0 configuration while NUC2 was in a bistatic geometry (90 0 ) with respect to the transmitter. By having multiple views of the same activity being performed, it is envisaged that the prediction accuracy can be improved using algorithms such as contrastive learning 33 . Fig. 5(b) shows the UWB signals between nodes '0' and node '3' for UWB system 1 and nodes '1' and '2' for UWB system 2, considering the same experiment number and time window. The raw CIR data has been converted to Channel Frequency Response (CFR) using the Fast Fourier Transform (FFT) and the signals are plotted for the 10 th CFR sample for each system. Note that the terms CFR and CSI can be used interchangeably. Although the sampling rate of the UWB systems is much lower (typically <100 Hz considering bidirectional data which are reciprocal) than the WiFi CSI system, the activities cause variations in the UWB signals and these variations can be fed to machine or deep learning algorithms for activity prediction. The raw CIR can also be used for human activity recognition (HAR) and yield high prediction accuracy, as demonstrated in 5 .

UWB
Considering the crowd counting experiment, Fig. 7(a) and 7(b) show the first path power level (in dBm) for the two UWB systems between a given pair of nodes in each case. The first path power level has been computed using the formula given in the DW1000 manual 23 . As can be observed, the first path power level increases gradually as each person was moving out of the monitoring area. This is an expected behaviour since the LoS signal becomes less and less obstructed. By using the fp_pow_dbm parameter together with other parameters such as overall received UWB power level (rx_pow_dbm), UWB CIR data and WiFi CSI data, the number of people in a given environment can be inferred through the use of artificial intelligence algorithms.
While UWB modules such as Decawave's EVK1000 and DWM1000 are used for active localization providing the 2D or 3D coordinates of a target carrying a tag, in this experiment, we deployed fixed UWB nodes programmed with custom software to record CIR data in a multi-static configuration. The idea is to extend the functionality of pulse-based UWB systems from active localization to the ability to sense their environment using the CIR data 13,24 . Fig. 6(a) and 6(b) show 1000 aligned and accumulated CIR measurements recorded between a given pair of UWB modules for each system in a static environment (exp001). As can be observed from Fig. 6(a) and 6(b), when the room is empty, the accumulated CIR measurements are stable. However, when a person is performing activities as in exp003, some variations occur in the accumulated CIR, as can be observed in the region starting around τ − τ FP ≈ 8ns and 10ns in Fig. 6(c) and 6(d), respectively. The earliest time at which changes are observed in the CIR is the bi-static delay. Since the transceivers are fixed in the multi-static network and their positions are known, the distance travelled by the direct (first path) signal between pairs of devices can be computed along with its delay τ FP . The black vertical lines in Fig. 6(c) and 6(d) represent the ground truth bi-static delay which is computed from the tags' coordinates. Note that the target was wearing two tags, one on each arm. Also, exp003 refers to the sitting on chair and standing up from chair activities which were performed at a given location within the monitoring area. Therefore, the reported locations for each tag were averaged over the 1000 accumulated CIR measurements to obtain a single 2D position per tag. Now, the separation between the two tag' coordinates are of a certain distance apart, which is around 60cm, corresponding to the approximate diameter of the human (arm to arm distance). Therefore, the midpoint xy coordinates are taken as the ground truth position of the target. The ground truth bi-static delay can then be computed by finding the transmitter-target-receiver path length and subtracting the direct signal path length from it, given that the fixed UWB transmitter and receiver positions are known. Assuming that the signal emitted from the transmitter reflects off the target and reaches the receiver without additional scattering, then the bi-static range defines an ellipse on which the target is located 13 . This ellipse has the position of the transmitter and receiver as foci points and the major axis length is equal to the bi-static range. In an ideal scenario, the common intersection point of ellipses from multiple transmitter and receiver pairs indicates the location of the target.
It should be noted that in the multi-static UWB network, each transceiver device runs its own independent RF clock and therefore CIR measurements between pairs of devices may be sampled at different times 13 . The DW1000 chip organizes the CIR buffer in such a way that the reported first path index (FP_INDEX) in each CIR measurement is usually around 750 34 . The chipset also estimates FP_INDEX in each CIR measurement with a resolution of 1.0016ns 64 , such that it is represented as a real number, having integer and fractional parts (see column fp_index in UWB datasets). Now, since each CIR measurement basically has a different FP_INDEX value but the same sampling resolution of 1.0016 ns, the accumulated CIR measurements need to be aligned with their respective estimated FP_INDEX, and the latter can be shifted to be at the beginning of the CIR buffer, as shown in Fig. 6. Furthermore, in order to remove outliers in the accumulated CIR, those CIR measurements where the number of accumulated preamble symbols (see column rx_pream_count in UWB datasets) is less than half of the number of transmitted preamble symbols (preamble length of 128 considered in the experiments) can be discarded 13 . Fig. 5(d) illustrates the Doppler spectrogram collected by the PWR system from a given angle (see placement of "rx2", "rx3" and "rx4" in Fig. 1). As mentioned previously, the PWR system uses the CSI transmitter (NUC3) as the signal source with an injection packet rate of 1600Hz. The coherent processing interval was set at 1 second, and sampling rate was set at 20MHz for each channel. The signatures in the Doppler spectrogram demonstrate the relative velocity between human and receiver antenna. The bi-static velocity is maximum in a monostatic layout (0 degree), and minimum in a forward scatter layout (180 degrees). We can see large differences between the signatures from the "sit/stand" and "walk" activities in terms of Doppler frequency shifts. This is because the velocity of the walking activity is much higher than other activities. The activities "lie down" and "stand from floor" have opposite Doppler signatures due to the opposing directions the human body undertakes when performing these activities. Such signatures can potentially be used for various machine learning applications such as activity recognition, people counting, localization, etc. Fig. 5(c) illustrates the velocity-time profile of a human undergoing a set of activities. The motion profile is generated using the time-varying three-dimensional position information of different joints on the human body, such as the torso, arms, legs, and the head, at a frame rate of 10 Hz. To mimic accurate radar reflections from the target, we assume the radar scattering centers lying approximately at the center of the bones joining these joints. Thus, the signatures in the velocity-time plot demonstrate the relative velocity between human scattering centers and the Kinect position. Fig.5 (c)-(d), compare the velocity-time profile (generated using motion capture data) and the measured spectrograms, respectively, for a human undergoing a series of motions. The envelope of the velocity-time profile is visually very similar to the measured spectrogram indicating how well both systems capture the motion characteristics. For example, as the human sits down, we observe negative Doppler due to the bulk body motion. The positive micro-Doppler arises due to arm motion and legs moving slightly in the sensor direction while sitting down. After a 5sec delay, the human subject stands up from the chair, resulting in primarily positive Dopplers. Similarly, the latter part of the spectrogram presents signatures corresponding to a human transitioning from first walking to falling and then standing up from the ground to rotate his body while standing at a fixed position.

Kinect
In most realistic scenarios, the human motions might not be restricted to a single aspect angle with respect to the radar. In such scenarios, the spectrograms might differ significantly. It could be due to the shadowing of some part of the human body if captured at different angles. Therefore, we can leverage the animation data captured by the Kinect to feed as input to our human radar simulator, SimHumalator and synthesize radar returns as a function of different target parameters (varying aspect angles) and different sensor parameters (varying bi-static angles and operational parameters such as waveform design and antenna characteristics). Such signatures can potentially augment otherwise limited real radar databases for various machine learning applications such as activity recognition, people counting, and identification. CFR data (considering the 10th CFR sample between nodes '0' and node '3' for UWB system 1 and nodes '1' and '2' for UWB system 2), (c) Velocity information extracted from Kinect sensor data and (d) PWR Doppler spectrogram extracted from surveillance channel 'rx2'. Only a 196s portion of exp018 is considered for the four synchronized modalities in this illustration.

Usage Notes
The dataset repository is available at https://figshare.com/s/c774748e127dcdecc667. The different directories are specified in Table 6. Furthermore, the interested reader is encouraged to navigate to the codes directory where example scripts on how to load and analyze a specific modality data are included. These are described in the following section.

Code availability
Some Matlab and Python scripts have been made available in the codes directory for the users to replicate some of the figures in this Data Descriptor: Figure 6. 1000 accumulated and aligned CIR measurements in a (a) static environment (exp001) recorded between nodes '1' and '3' of UWB system 1 (yellow nodes) (b) static environment (exp001) recorded between nodes '2' and '3' of UWB system 2 (blue nodes) (c) dynamic environment (exp003) recorded between nodes '1' and '3' of UWB system 1 (yellow nodes) (d) dynamic environment (exp003) recorded between nodes '2' and '3' of UWB system 2 (blue nodes). Note: Bidirectional CIR data are reciprocal. τ FP represents the first path (direct) signal time-of-flight between the pair of nodes.
• plot_wificsi.m: This script is used to load the complex WiFi CSI data recorded by each NUC device and visualize the amplitude variations over time, as illustrated in Fig. 5(a). The user can specify the start and stop timestamps and visualize the CSI stream in that time segment (for a given transmit antenna, receive antenna, and subcarrier index). Furthermore, for comparison purposes, the generated plots consist of the raw (unfiltered) CSI data and those which have been denoised using DWT.
• plot_uwb.m: This script is used to load the complex CIR data recorded by each passive UWB system, convert it into CFR using FFT and visualize the amplitude variations over a given time segment (between a given pair of UWB nodes), as illustrated in Fig. 5(b). Furthermore, this script allows the users to plot the aligned CIR measurements as shown in Fig.  6.
• plot_uwb_fppow_crowdcount.m: This script is used to load the UWB data for the crowd counting experiment (exp028) and plot the first path power level (in dBm) over time for each UWB system (between a given pair of UWB nodes), as illustrated in Fig. 7.
• plot_PWR_demonstration.m and plot_pwr_spectrogram.py : These scripts allow the users to visualize the PWR spectrograms from the three surveillance channels: "rx2" (as illustrated in Fig. 5(d).), "rx3" and "rx4", as a function of time and Doppler.
14/17 • plot_kinect_data.m: This script allows the user to plot the motion capture data (as a function of velocity versus time) from one of the two Kinect systems, as illustrated in Fig. 5(c). Furthermore, the users can visualize the stick (skeletal) representation of the kinect motion capture data as an animation over the specified time segment.
• oddet.py: This python script allows the user to extract only the modalities and features needed rather than needing to take the entire files and then stripping out unused features. With this python script, one can select the modality, experiment number and features needed through the command line interface. Additionally if a specific set of features are required, one can also specify all the columns needed through YAML configurations which will allow the user to curate the dataset to the format that more closely suits the usage. This python script is available at the following GitHub repository: https://github.com/RogetK/ODDET.