5G-enabled contactless multi-user presence and activity detection for independent assisted living

Wireless sensing is the state-of-the-art technique for next generation health activity monitoring. Smart homes and healthcare centres have a demand for multi-subject health activity monitoring to cater for future requirements. 5G-sensing coupled with deep learning models has enabled smart health monitoring systems, which have the potential to classify multiple activities based on variations in channel state information (CSI) of wireless signals. Proposed is the first 5G-enabled system operating at 3.75 GHz for multi-subject, in-home health activity monitoring, to the best of the authors’ knowledge. Classified are activities of daily life performed by up to 4 subjects, in 16 categories. The proposed system combines subject count and activities performed in different classes together, resulting in simultaneous identification of occupancy count and activities performed. The CSI amplitudes obtained from 51 subcarriers of the wireless signal are processed and combined to capture variations due to simultaneous multi-subject movements. A deep learning convolutional neural network is engineered and trained on the CSI data to differentiate multi-subject activities. The proposed system provides a high average accuracy of 91.25% for single subject movements and an overall high multi-class accuracy of 83% for 4 subjects and 16 classification categories. The proposed system can potentially fulfill the needs of future in-home health activity monitoring and is a viable alternative for monitoring public health and well being.

www.nature.com/scientificreports/ 1. Presence and activity detection of multiple subjects performing different activities in parallel. Activity recognition for single subjects has been greatly explored by several studies in the literature [38][39][40][41][42] . The research team presenting this paper, have led several studies on activity recognition using software defined radios [43][44][45] , which motivated them to take it to the next level and conduct one to explore the capability of the technology to do so for multiple subjects. By utilising RF-sensing technology and Artificial Intelligence (AI), a single system is presented that can simultaneously monitor occupancy, that is count the number of people in a room, and detect the parallel activities of all subjects. The contribution here has two folds, the first is accounting for a combination of three different activities occurring in parallel, amongst four different subjects. Secondly, the variation introduced in the training data by introducing intra-class variation, that is, training the system to classify the same activity or combination of activities performed in different positions within the room and by different test subjects, as the same class. This was performed to strengthen the machine learning model and improve its detection accuracy. 2. 5G-enabled sensing, that is, the proposed RF-sensing system was designed to operate in the 5G-band, particularly at 3.75 GHz. To the best of the authors' knowledge, the use of 5G for multi-user sensing applications has not been considered elsewhere in the literature. The motive and ultimate goal for the use of this frequency is to utilise 5G technology with its high data rates and ultra-low latency capabilities in developing real-time non-invasive activity and presence detection systems for assisted living. Moreover, primary findings from the experiments conducted in this paper have shown that the CSI, reflecting activities performed by a test subject, captured using the 5G frequency 3.75 GHz are much more evident in pattern compared to those captured at the Wi-Fi frequency. To confirm this, the experimental setup, presented later on in "Methodology and framework", was used to collected CSI samples at both frequencies for the "Standing Activity" and for "Empty Room" at the Wi-Fi frequency 5.00GHz. All three captures can be found in Fig. 1, where Fig. 1a and 1b, are of a "Standing Activity" captured at 5 GHz and 3.75 GHz, respectively. Whilst Fig. 1c, represents a capture of an "Empty Room", at 5.00GHz. The evident pattern in the captured CSI, such as that in Fig. 1b is more likely to increase the accuracy of classification, compared to that in Fig. 1a, especially in a real-time system when massive amounts of data are captured and processed. 3. The data sets collected during the course of the experiments, presented in this paper, are made publicly available through this link 46 . The lack of a comprehensive data set for this type of activities has motivated the research team to make the data available and benefit the wider community with this rich data set that covers a wide range of activities. Researchers can use the data set to apply different processing techniques,  www.nature.com/scientificreports/ replicate the experiment and collect more data for bench marking. The online data set contains a total of 1777 samples divided amongst 16 classes, as detailed later on in "Experimental design".

Methodology and framework
This section details the methodology and framework adopted to conduct the experiments and achieve the reported results. It starts by presenting the specifications of the hardware design stage, followed by a detailed outline of the experimental design stage including experimental variables, data collection, data processing, system training, and testing. The system conceptualisation and main building blocks are presented in Fig. 2.
Hardware design specifications. The experiments conducted and reported in this paper utilised two USRPs devices each equipped with the VERT2450 omnidirectional antenna. One USRP is used as the transmitter and the second USRP is used for the receiver. Each USRP is connected to a separate PC that uses the Intel(R) Core (TM) i7-7700 3.60 GHz processors and has a 16 GB RAM. The system makes use of virtual machines to provide the Ubuntu 16.04 operating system. On the Ubuntu virtual machines, Gnu Radio is used to communicate with the USRPs. Gnu Radio allows for the creation of flow diagrams for the USRP function to be carried out. These flow diagrams can then be converted to python scripts. One python script is created to transmit data and another script is configured to receive the data from the transmitter. The transmitter transmits random numbers between 0 and 255 using OFDM. The receiver side is configured to receive the transmitting signal from the transmitter USRP. The script ran on the receiver side then outputs the CSI complex number to the terminal. This output is then processed to extract the amplitude values from the CSI complex numbers. The system's main configuration parameters are shown in Table 1.

Experimental design.
To validate the effectiveness of the proposed framework, various experiments have been performed in a rectangular 2.8 × 3m 2 activity area as shown in Fig. 3. The two X Series USRP devices for the transmission and the reception of CSI signals have been installed in the two corners facing each other. To capture maximum intra-class variation for all the activities that include sitting, standing and walking, the subjects have kept changing their positions randomly in the prescribed activity area while keeping a 1m distance among themselves during the course of multiple experiments. This is to simulate a small setting such as a care  www.nature.com/scientificreports/ home with a limited number of people. Currently the focus of the experiment is to accommodate 4 people. Part of the future work will seek to increase this number. Furthermore, the proposed deep learning-based classification methodology to recognise multi-user activities is comprised of two major modules: (i) System Training and (ii) System Testing, as depicted in a high-level signal flow diagram shown in Fig. 4. The system training module is based on an offline process that involves already acquired and preprocessed CSI samples data set to train the 1-D Convolutional Neural Network (CNN). The system testing is performed in an online setting in which an input CSI data sample, after all necessary preprocessing is performed, is classified as one of the human activities. A detailed description of the two modules is provided in subsequent sub-sections.
System training. The system training module of the proposed classification architecture is mainly based on the three major components (see Fig. 4): (i) Data collection, (ii) Data processing and (iii) 1-D CNN. The CSI data acquisition was performed using the setup shown in Fig. 3. To prepare the captured CSI data for subsequent classification, tasks are done in the data collection and data processing modules, respectively. Whereas the third component deals with the training and learning of the 1-D CNN model. A detailed discussion on each of these components is presented in the following sections.  www.nature.com/scientificreports/ Data collection The data collection for the proposed system training involved five steps with four volunteers each performing three different activities, "Sitting", "Standing", and "Walking" in a lab environment, with the setup shown in Fig. 3b. The setup was replicated in two different lab environments as means of introducing different clutter levels which increases the variability of the data and further strengthens the model. Nevertheless, the data for a particular class collected in both environments were treated as the same data set, that is, clutter level due to environment was not a measured variable in the conducted experiments.
As with any experiment, there were some fixed attributes as well as variable ones. For the experiments presented in this paper, the fixed attributes included (1) The hardware and its configuration (2) The data processing and deep learning techniques and (3) The experimental setup, shown in Fig. 3. The experimental variables included (1) the number of subjects (2) The subject identity, in the data collection for one and two subjects and (3) The location of the performed activity, that is, for example, for one subject performing "Sitting Activity" it was performed in different chair locations, as per Fig. 3. The first variable was measured as will be highlighted in the remainder of this section and in the results, whilst the second and third variables were only utilised to introduce maximum intra-class variation in the collected data. All the data was collected over a calendar week, with a random number of samples collected for all 16 classes in every day of the week to ensure the repeatability of the data over different days.
The "Sitting" and the "Standing" activities are representations of the action of performing these activities and not the posture or the position of the person in the sitting/standing state. Moreover, while capturing the activity data the volunteers were not stressed or forced to keep the upper body still and static so both "Sitting" and "Standing" activity data included the small variations of upper body.
In the first data collection step, the CSI data for a single subject were collected separately for all three activities, that is "Sitting", "Standing", and "Walking", where a total of 420 samples were collected. To introduce maximum variation into the data set, three different subjects participated in this data collection phase. Each subject contributed equally to the data collection, that is, each participant was involved in the collection of 140 samples, divided among the three activity classes. For each CSI data sample, 1200 packets in three seconds are transmitted.
The second step involved data collection for two subjects performing the above mentioned three activities, where a total of 400 samples were collected, as highlighted in Table 2. The same three participants were involved in this data collection stage, with equal contributions from each, that is, each participant was involved in the collection of at least 33 samples for each of the four classes identified for this stage of data collection.
In the third and fourth data collection steps, three and four participants were recruited to participate in collecting the data for three and four subjects performing activities, as outlined in Table 2. The participants recruited for these data collection stages were fixed throughout. In these two steps, 540 and 300 data samples were collected, respectively. In addition to the activity data, 117 data samples were collected for the class "Empty" which represents the status of the room when the subjects are absent from it. All the 16 classes are shown in Table 2. Some of the data samples representing different activities are shown in Fig. 5. The inter-class variation in the data samples of different activity classes is obvious and can be exploited in subsequent classification process to get better results.
Data processing CSI for one transmitter and one receiver antenna forms a matrix that contains frequency responses of all N = 51 subcarriers as shown in Eq. (1), here frequency of each subcarrier H i can be represented as www.nature.com/scientificreports/ www.nature.com/scientificreports/ where |H i (f )| and ∠H i (f ) are the amplitude and phase responses of the ith subcarrier. Each of these subcarrier response is related to system input and output as given in Eq. (3), where X i (f ) and Y i (f ) are the Fourier transforms of input and the output of the system. In general, the acquired CSI data is masked due to the high-frequency environmental noise and multipath propagation of CSI signal. Therefore, to denoise the data and to prepare it for the subsequent training of the 1-D CNN the data is passed through the following data processing steps: • In the first step, each CSI data sample is averaged across all 51 subcarriers, see Eq. (4), to get one averaged data sample to be used in subsequent processing, where x i is the ith data sample that represents the average across corresponding subcarriers y ij for (j = 1, 2, . . . 51).
• Afterwards, a Butterworth lowpass filter of order n = 4 is used to smooth out and to remove small variations from each averaged data sample x i . • Lastly, a discrete wavelet transform (dwt) at level 3 with a Haar basis function is applied to get the approximation coefficients A i for each of the smooth data samples s i . The approximation coefficients represent the output of the lowpass filter in dwt, therefore it further helps in reducing the noise. Mathematically, the convolution and downsampling process involved in the wavelet decomposition for all three levels is represented as follow: where g[k] for k = 1, 2, . . . , K is the lowpass filter of length K for each decomposition level, s i [m] for m = 1, 2, . . . , M is the smooth signal of length M after applying Butterworth lowpass filter, and A l i for levels l = 0, 1, 2 . is representing the approximation coefficient of three levels of dwt. Figure 6 shows a raw data sample and the results obtained after each of the data processing steps. Once all the samples are processed, data set is ready to train the 1-D CNN.

1-D Convolutional Neural Network
The CNN 47 is one of the most widely used Deep Neural Network (DNN) for the purpose of pattern classification from both 2-D images and 1-D data signals. 1-D CNNs that have been recently introduced and got a lot of popularity in dealing with the classification problems related to 1-D signals 48 . Motivated by its high accuracy rates in the classification applications 49 , in this paper, we have also adopted a 19 layers 1-D CNN to recognise multiple human activities performed by multiples subjects in parallel. The purposed 1-D CNN structure is comprised of 6 blocks of convolutional layers and one block of fully connected layers. Out of 6 blocks of convolutional layers, the first block contains one convolutional layer and one pooling layer whereas each of the remaining 5 blocks contains 3 convolutional layers and one pooling layer. Finally, the block of fully connected layers contains 3 layers. The complete architecture of the network is shown in Fig. 7 whereas the detail of various parameters used is given in Table 3. Once all the data is preprocessed and the 1-D CNN is trained the trained model is stored for the subsequent testing phase to classify incoming test CSI signal in one of the activity classes. System testing. The second phase of the proposed methodology is the system testing that involves the following steps: • In the first step, the CSI signal obtained from the USRPs is first processed by a Butterworth filter to make it smooth by removing small variations. Then the smooth signal is passed through Wavelet Transform, as described in the previous section, to get approximation coefficients at level three. • In the second phase, the trained 1-D CNN model is used to classify the processed signal into one of the activity classes.
Ethics. The current study was approved by the University of Glasgow's ethics board (application number: (300190109). All experiments were performed in accordance with relevant guidelines and regulations and informed consent was sought from all participants prior to conducting the experiments.

Results and discussion
The proposed human activity monitoring system was evaluated using two different types of experiments. The first set of experiments focused on determining the accuracy of the system to count the number of people performing particular activities in a room (see Phases 1 and 2 in Table 4). Whilst the second set was conducted to measure the system's accuracy in identifying different postures/activities of multiple people in the same room. Both types of experiments were performed under a train-and-test split strategy with 80% of the random data was considered as training data while the remaining 20% was taken as the testing data. Furthermore, each experiment was repeated 10 times to get the average accuracy rates for both sets of experiments. The proposed 1-D CNN architecture consists of 100 epochs and Adamax as the optimiser with 0.001 learning rate. A detailed discussion on the experimental results related to both experiments is given in the following subsections.
Multi-user presence. The purpose of this experiment is to determine the number of people in an indoor setting. The experiment is done in multiple phases to gradually incorporate different activities as shown in Table 4.
• In the first phase, the experiment involves only standing activity performed by a different number of subjects in parallel. The data is divided into 5 classes including the "Empty" class that represents "no-subject in the room" as shown in Table 4. For this phase of the experiment, the total number of data samples used is 557, out of which 80% (i.e. 445 samples) are randomly selected to train the model and the remaining 20% (i.e. 112 samples) are used to test the system. • The second phase involves only the sitting activity data of all the subjects. Again, the total number of samples are 557 and the data is divided into 5 classes, 4 sitting activity classes and 1 "Empty" class. • In the third phase, sitting and standing activity data for each number of subjects are merged to form the 4 activity classes and 1 "Empty" class. Therefore, the total number of data samples becomes 1417 out of which www.nature.com/scientificreports/ 80% (i.e. 1133 samples) are used to train the system while the remaining 20% (i.e. 284) samples are used for the test purpose. This data also includes the data for mixed activities like one subject sitting and one standing. • Similarly, in the fourth phase sitting, standing, and walking data are used together to make 4 activity classes plus 1 "Empty" class. In this phase, the total number of data samples used are 1777 (1421 training samples and 356 testing samples). This data also includes mixed activity data as described previously.
The average classification percentage accuracies for all the phases across all the 10 repetitions of each experiment are shown in the bar graph given in Fig. 8. It is observed that, in general, the proposed system works well for all the cases. However, it works better when the sitting and standing activities (Phase-1 and Phase-2) are used separately to form the activity classes. Whereas due to an increase in intra-class variation and a decrease in inter-class variation, when the sitting and standing activities are merged in Phase-3 and a new activity "walking" is introduced along with the other mixed activities in Phase-4, the overall accuracy shows a decrease. A further analysis of the confusion matrices, shown in Fig. 9, reveals that maximum misclassifications are due to 3-subject class or 4-subject class. That can be due to more subjects in a relatively smaller space that causes more noise in the USRP data. The performance can be further improved by changing the experimental environment. In this experiment, the aim is to classify human activity in one of the 16 classes given in Table 2. For this experiment, 80-20% hold-out validation is utilised and all 1777 data samples are split into training (80%) and test (20%) data and the experiment is repeated 10 times to get the average performance results. The average percentage accuracy across all 10 repetitions comes as 79.5% (± 2.6) which is very promising considering a large number of classes and lots of variation within the data. Figure 10 shows the normalised confusion matrix with the highest accuracy that is 83% for the activity recognition experiment. Here, the numbers 1, 2, . . . , 16 are representing the 16 classes given in Table 2, respectively. www.nature.com/scientificreports/    www.nature.com/scientificreports/ Activity recognition. It can be seen in the given normalised confusion matrix that in general classification performance is good for most of the activities except a few classes. The normalised confusion matrix provides the highest accuracy for empty (class 1), 1 subject sitting (class 2) and 1 subject walking (class 4) of 100%, 90% and 93%, respectively. However, there is a resemblance in CSI variations in sitting and standing activities as they are similar movements. The 1 subject standing, however, has been misclassified as standing in 14% of the samples resulting in lower accuracy of 82%. Overall, as expected the 1 subject activities provide higher accuracies due to lesser variations in CSI data as compared to multi-subject activities.
In the multi-subject case, some activities have similar patterns and are difficult to differentiate such as "1 sit + 1 stand" (class 5), "2 sitting" (class 7) and "2 standing" (class 8). Since sitting and standing movements result in quite similar motions, whether moving up or down. Also "1 sit + 1 stand" has at least 1 subject performing similar motion with "2 sitting" and "2 standing", resulting in 15% and 10% misclassification rate of "1 sit + 1 stand" activity as "2 sitting" and "2 standing", respectively.
Moreover, "1 walking + 1 sitting" (class 6) and "1 walking + 2 sitting" (class 10) result in a higher misclassification rate between the two activities with accuracies of 80% and 75%, respectively. As illustrated in the confusion matrix 21% of "1 walking + 2 sitting" is misclassified as "1 walking + 1 sitting" and 15% of "1 walking + 1 sitting" activity is misclassified as "1 walking + 2 sitting" activity resulting in the decrease of respective class accuracies. Similarly, "2 sit + 1 stand" (class 11) and "3 sitting" (class 12) resemble each other due to the same activities performed by at least two subjects. Furthermore, standing as mentioned before is quite similar to sitting in terms of similar motion resulting in 20% misclassification of class 11 samples as class 12. Also, similar CSI patterns exist between "4 sitting" and "2sit + 2 stand".
Due to the above-mentioned reasons and several classes performing sitting and standing activities with a different number of subjects, class 5 which represents two subjects (1 sitting and 1 standing) and class 11 that represents three subjects (1 sitting and 2 standing) provide the least accuracy rates of 60% and 50%, respectively in our deep learning classification model.

Conclusions and future work
In this paper, a novel 5G-enabled contactless RF-sensing system has been presented to monitor the human presence and to detect multi-user parallel activity using CSI signals. The system's frequency of operation was 3.75 GHz, which falls within the 3.4 to 3.8 GHz band of 5G, and to the best of the author's knowledge, no other study has implemented 5G-sensing before. The main idea of the paper was to present a 5G-sensing based non-invasive system that is capable of detecting the presence and activities of multiple users in the same room. Furthermore, The results presented earlier in the paper have shown that by combining RF-sensing technology with standard machine learning algorithms such as CNN, it is possible to detect different human activities including counting the number of people in the room with high accuracy. The system was tested to evaluate its capability of counting as well as detect parallel activities amongst a variable number of subject, that is, between 0 and 4, as highlighted in "Multi-user presence", respectively. The subject counting experiment reported high accuracy results between approximately 86% and 95%, with the accuracy decreasing with the increase in inter-class variations. On the other hand, the activity recognition experiment has reported approximately 80% accuracy in recognising multiple activities performed amongst all test subjects. The reported accuracy in the activity recognition experiment was greatly impacted by the intra-class variation introduced in the data. however, the variation was introduced to train the system on the maximum possible combination of input activities, to mimic a real-life scenario. The results obtained in this paper are promising and have a high potential to be improved through more data collection and the implementation of different learning algorithms. Furthermore, as the major focus of this paper is to show the significance and effectiveness of 5G-sensing in capturing variation in CSI data caused due to human activities, therefore the paper currently focuses on a small setting, such as rooms in a care home, with four persons performing three major and common activities i.e "Sitting", "Standing" and "Walking". In future, the aim is to scale it up to cover most of the human activity spectrum performed by a larger group in multiple rooms. Moreover, current implementation of the proposed system is based on one transmitter/receiver antenna pair and is giving better performance in comparison with most of the existing work where more than one transmitter and receiver antenna have been used for the same purpose. Moreover, and given the current system is a proof of concept with focus on showing the significance and effectiveness of 5G frequency band in capturing variation in CSI data, in future the experiments will be performed to assess the impact of number of rooms vs the number of transmitter/receiver antennas on the performance of proposed system, as well different heights and positions for the transmitter and receiver devices. Furthermore, and as mentioned earlier, the data set used to achieve the previously reported results is made publicly available, at 46 , to encourage other researchers and the wider communities to take this system a step further.