Wearable magnetic induction-based approach toward 3D motion tracking

Activity recognition using wearable sensors has gained popularity due to its wide range of applications, including healthcare, rehabilitation, sports, and senior monitoring. Tracking the body movement in 3D space facilitates behavior recognition in different scenarios. Wearable systems have limited battery capacity, and many critical challenges have to be addressed to gain a trade-off among power consumption, computational complexity, minimizing the effects of environmental interference, and achieving higher tracking accuracy. This work presents a motion tracking system based on magnetic induction (MI) to tackle the challenges and limitations inherent in designing a wireless monitoring system. We integrated a realistic prototype of an MI sensor with machine learning techniques and investigated one-sensor and two-sensor configuration setups for motion reconstruction. This approach is successfully evaluated using measured and synthesized datasets generated by the analytical model of the MI system. The system has an average distance root-mean-squared error (RMSE) error of 3 cm compared to the ground-truth real-world measured data with Kinect.


Results
Operating principle. The MI-based communication system is a short-range wireless physical layer that transmits signals by inductive coupling between the wire coils rather than radiating as is done in conventional methods 27,28 . The transmitter node uses a coil to produce an oscillating magnetic field at a specific frequency. Due to the small radiation resistance of the coil, a negligible amount of energy propagates to the far-field. It removes the multipath fading effect resulting in a better quality of service (QoS) compared to conventional propagating wave systems 29 . Each sensor node's (receiver) main component is a coil, which is lightweight, portable, inexpensive, simple, and wearable to capture the transmitter's generated magnetic field. An MI sensor module can be manufactured for less than $20, compared to expensive sensors such as Bluetooth IMU with an average cost of $100 20 . The MI system experiences much less energy absorption in lossy dielectric media (e.g., human body) compared to conventional radio-wave propagation technologies, and therefore can transmit a signal with much less power for the same range. The signal also remains in a 'bubble' around the coil, which minimizes the leakage outside the targeted coverage range, reduces interference, increases security, and provides a personalized space for the user. These characteristics make the MI system power-efficient compared to other short-range communication systems such as Bluetooth 20,30-32 . According to Faraday's law, the time-varying magnetic field induces a voltage in sensor nodes proportional to the rate of magnetic flux change through their coils. For a predefined coil geometry and operating frequency below 30 MHz, where the environmental effects are negligible, the flux change rate is a function of the sensor coils' position, and orientation relative to the transmitter 20,27 . The relationship function from spatial data into induced voltage is non-linear and surjective, and the tracking problem objective is to estimate the sensors' positions given the induced voltage measurements. System architecture. We used an analytical model of the MI system presented in 20,27,33 to calculate the induced voltage at each sensor coil given its position and orientation. This forms the basis of the data-driven backward estimation algorithm that retrieves a node's position using its observed data. It helps assess the system performance under different configurations, such as changing the number or arrangement of sensor coils to find the near-optimal setup with acceptable tracking accuracy. Since the model is a function of relative distance and alignment of coils to the transmitter, we transform the coordinate system to locate the new coordinate system's origin at the center of the transmitter coil, with the coil's surface normal oriented in the Z direction. Given the sensors' spatial data, we compute the coordinate transformation matrix and calculate each coil's position and orientation in the new coordinate frame. We explored the node's position p = (x, y, z) with the resolution of 1 www.nature.com/scientificreports/ cm, and alignment n = (sin θ cos φ, sin θ sin φ, cos θ) with the resolution of 5° as these resolutions are expected to satisfy the accuracy requirements for motion tracking applications 34,35 . It also provides enough data points within the search domain for comprehensive performance analysis of the system with different configuration settings. The possible solutions, which are a unique single-point in an optimal configuration, are retrieved for a given set of observed data. The domain of search space for both the center of a coil and its surface normal alignment is defined as follows. The search domains for the xyz parameters are set such as to represent the average ranges of distances where sensors can be placed for both male and female subjects relative to an on-body central node on their torso. The θ and φ parameters generate the coil's alignment, and therefore their search domains are defined such that it is possible to describe rotations for the coil that do not result in values close to zero.
We studied the performance of an MI sensor (single sensor setting), where the coil can be aligned in any direction. We also adopted two-sensor configurations and investigated different alignment setups. Among these setups, we present the performance analysis of setups where coils' surface normal are aligned in the same direction (parallel setting) or perpendicular to each other (orthogonal setting). Figure 1 depicts the configuration of sensors in each described setting. In these experiments, the induced voltage measured at the coils is used as input for location estimation. Figure 2 shows an example result of the data-driven backward estimation algorithm. As the results display, there are many possible solutions for a single sensor setup, and this number reduces by adding another sensor. The sensor voltage data are assumed to be measured with 1 mv accuracy and given as inputs to the algorithm. A comparison between the two-sensor configurations shows that the parallel setting outperforms the orthogonal setting. Although a unique solution cannot be returned as an output, results suggest that the regression methods with proper constraints can meet the minimum required accuracy for position tracking.
Data collection. We designed and built an MI sensor for 3D motion tracking (see "Methods" section), representing the movements by variation in the MI signals received from the transmitter instead of measuring spatial data via conventional sensors such as IMUs. To evaluate the capability of the proposed MI sensor, we employed regression algorithms and investigated their performance on the MI sensor's data. Validating and testing machine learning methods is critical and challenging due to the difficulty of collecting realistic valid data and the lack of labeled data. One solution is to create synthetic data for training the model, and here, we used a VAE model to produce time-series motion data. The MI data corresponding to the synthetic movements are then generated using an analytical MI system model 20,27 . The regressors are then trained on these synthesized data, which removes the need for supervised training measured data. A point to consider is that the MI system model must be calibrated only once to scale the synthetic training data to sensor measurements and tune the regression  Evaluation. We deployed machine learning regression algorithms to solve the inverse problem of estimating a node's 3D position (x,y,z) from its sensors' measurements in meters. The performance of several regression models, including extra trees (ET), random forest (RF), K-nearest neighbors (KNN), , light gradient boosting machine (LightGBM), multi-layer perceptrons (MLP), decision trees (DT), and linear regression (LR) is compared using PyCaret 36 , an open-source machine learning library in Python. The models are trained on 70% of synthetic data and then scored on the remaining data using the 10-fold cross-validation method. The metrics used for comparison are RMSE, mean absolute percentage error (MAPE), and R-squared (R 2 ). Before fetching data into the regressors, each feature is standardized individually, and the missing values are substituted with previous non-missing values. The processed data are then divided into fixed-length segments of 2 s using the sliding window technique with a 0.1 s step size. Table 1 summarizes the performance results of all models on the synthetic data for different settings. As the results show, the moving node's distance and position in the Z-direction with respect to the transmitter coordinate frame can be tracked with competing accuracy compared to other methods using wearable sensors (e.g., accelerometer) for motion tracking 19,35 . All of the results and metrics on motion tracking are reported in units of meters, and the best scores are denoted in bold. It is worth mentioning that the mutual inductance between two coils varies as their distance, lateral alignment, or angular alignment changes. As we assume that the transmitter coil is centered at the origin and aligned in the Z direction, any movement in the X or Y direction results in a similar lateral misalignment and consequently the same path loss. This characteristic makes it challenging for the regression model to estimate MI sensors' location accurately and differentiate between motion in the X and Y directions. Because the method is able to estimate the distance and location in the Z direction with good accuracy, adding another transmitter with an antenna surface orthogonal to the primary antenna enables the node's motion tracking in the new direction (e.g., X), resulting in 3D positional tracking. However, the dualtransmitter setup can drain power at twice the rate of a single-transmitter system, which can be addressed with proper design modifications. For example, time-division multiplexing (TDM) or frequency-division multiplexing (FDM) approaches can be adopted as low complexity hardware techniques using a single transmitter to reduce the power, area, and cost of a dual-antenna system instead of using two separate transmitters. Then the receiver sensor can record transmitted signals from two perpendicularly aligned antennas, which provides adequate data for tracking its location in 3D. This system keeps power consumption the same as a single-antenna system, while increasing tracking accuracy in all three dimensions.
Among the research studies on 3D motion tracking, the work in 37 , for example, has reported results on tracking subjects' arm motion using smartwatch IMU data. The results show that the system can achieve the highest accuracy when the torso is static, with a median error of 8.8 cm. Moreover, 38 presents a framework for www.nature.com/scientificreports/ reconstructing human motion with the highest accuracy of 6 cm using four 3D accelerometers attached to the user. The work in 39 has proposed the utilization of spinning linearly polarized antennas to track translation of an object attached to a passive radio frequency identification (RFID) tag array in 3D and has reported an average error of 13.6 cm. To provide a realistic assessment of real-world performance, we evaluated each of the optimal models' tracking accuracies on measured data as well. According to the score measures reported on synthetic data, the LightGBM regressor in the single-sensor setting and the ET regressor in the two-sensor (orthogonal and parallel) settings outperform other models. Figure 3 presents the evaluation measures of optimal models using the measured data for each setting. Representative samples of motion tracking in all settings are also displayed in Fig. 4. Our results indicate that the parallel setting with the optimal regression model outperforms other settings on both measured and synthetic MI data.

Discussion
We proposed a 3D motion tracking system based on magnetic induction and provided a proof of concept by experimental measurements conducted using off-the-shelf devices and prototypes. We employed an HF RFID transmitter module equipped with a loop antenna and an MI sensor as the central node and receiver, respectively. The designed sensor is a simple integrated circuit equipped with an Arduino to record the samples of received signals from the transmitter. To implement the proposed system for real-world applications, proper modifications should be taken into account. For example, the MI coils should be designed to be suitable for wearing on the human wrist, arm, and ankle. Furthermore, a wearable custom-designed central node capable of driving a controlled amount of current at the operating frequency through its coil is required. The receivers should cover the range of about 0.5 m to 1 m with minimum power consumption. The RF output power of the reader used in this work is 1 Watt, which can be reduced by designing a customized MI system capable of communication and data transmission with high accuracy. The reader sends continuous sine waves while the sensor records samples of received power. However, a customized MI transmitter (central node) operating with pulsed shape sine wave signals can achieve similar accuracy within the targeted coverage range at significantly lower power. Determining optimal pulse rate and sampling rate plays a critical role in designing a power-efficient high-accuracy MI-based motion tracking system. Hardware development at the sensor side is another factor that affects system performance. For example, impedance matching reduces power losses and consequently enhances the system transfer efficiency and gain. There are research studies focused on details of designing low-power MI-based communication systems. The research work presented in 40 proposes a transceiver design exploiting the low path loss of Magnetic Human Body Communication (mHBC) communication channels toward ultra-efficient body area networking. The transmitter and receiver, respectively,   Another approach for realizing the system with lower power requirements is reducing the number of nodes with batteries. One implementation strategy is to make the central node serve as both transmitter and receiver. It means that the central unit can broadcast the signal and listen back to the responses reflected from the sensors, similar to an RFID system based on passive (battery-less) tags. In an RFID system, the reader sends an interrogation signal to the transponders, which is also used to energize the tag. The tag activates and sends back its unique identifier (UID) if the received power is higher than its sensitivity 41 . A modulation resistance connected in parallel with the tag antenna switches between two different (usually conjugate matching and a short circuit) load impedances at the clock rate of the signal transmitted from the reader to modulate the backscattered signal 42 . Therefore, the central node can communicate with the tags via a secure near-field link backscattering from them. The amplitude of the demodulated signal is calculated and reported at the reader side by a value proportional to the received signal's power level, known as the received signal strength indicator (RSSI). A point to consider is that load modulation is not a practical solution for data transmission in an MI-based motion tracking system. The reason is that the backscattered field, and consequently, the voltage signal received by the reader, switches over two values 12 . The average power returned to the reader is no longer a direct function of distance and misalignment between coils since it varies by the number of zeros and ones in the data stream. Therefore, proper modulation and modifications are required to be able to employ existing RFID protocols.
Here we have compared the relationship between RSSI and MI signals with motion data by recording RSSI data of RFID tags in addition to the MI-sensor data. The experiments are performed using a framework similar to the setup explained for MI measurements (see "Methods" section) using HF RFID tags instead of MI sensors. We employed custom air-cored, three-layer copper coils with a 5 cm radius and 34 American wire gauge (AWG) wire diameter as the tag antenna attached to STMicroelectronics ST25DV04K RFID tag. We measured motion and RSSI data of RFID tags reported from the reader for 112 experiments. The best calculated average R 2 and the correlation between RSSI and the distance of the tag from the reader are respectively 0.11 and 0.33. For an MI sensor, the calculated R 2 and correlation over 220 samples are 0.61 and 0.78, respectively. These results indicate that the MI signal has a stronger relationship with its motion compared to a passive tag (see Supplementary  Information).

Methods
Hardware design. The system consists of a transmitter (central) node generating an oscillating signal at 13.56 MHz. We used ISC.LRM1002 long-range RFID reader module 43 attached to ISC.ANT310/310 long-range HF antenna 43 to generate the RF signal. Since we used this setup for RFID measurements presented in the discussion, we used the same transmitter for a better comparison. The receiver node consists of MI sensors. Each sensor includes an air-cored, single-layer copper coil with a 5 cm radius and 10 AWG wire diameter to capture the transmitter's signal and measure the induced voltage. Resistance and self-inductance of the coil measured by vector network analyzer (VNA) at the resonance frequency are 101 m , and 241 nH, respectively. To improve the system efficiency, we have employed resonant inductive coupling attached to the coil. The tuning circuit can be as simple as a capacitor to tune the frequency or be a or T matching circuit to tune the frequency, control Q-factor, and match input and output impedances for higher power transfer 44 . Here, we used a 560 pF capacitor parallel to a trimmable capacitor with an adjustable range of 3-10 pF to accurately tune the circuit to resonance.
The transmitted AC signal attenuates as a function of distance and alignment of the node with respect to the transmitter antenna. To track the signal's amplitude changes, we used an envelope detector consisting of an IN5817 Schottky diode, a resistor of 1 K , and a capacitor of 1 nF. The envelope detector's output, which is the resistor's voltage, is measured by an Arduino Nano (ATmega168) microcontroller. The resolution of ADC (analog pin A1) is 10 bits for a defined measurement range. Figure 5 depicts MI sensor components.

Measurements.
We employed a Microsoft Kinect v2 to capture the 3D position and alignment of the transmitter and the MI sensor node. The Kinect sensor consists of a depth camera, an RGB camera, and a microphone , and camera space point (x w , y w , z w ) , representing a point in the color images, depth images, and real-world, respectively. The software development kit (SDK)'s mapping function can be used to map a point from one coordinate space to another. We used colored markers to facilitate motion tracking of the devices and developed a video processing algorithm analyzing the color frames to locate pixels corresponding to the target color. The transmitter antenna and the MI node are labeled with distinct colored markers and placed in front of a white background. A threshold range is set for each color to extract pixels with the color value within the defined range. The detected pixels are classified to N m clusters, where N m is the number of markers, using K-means clustering methods. Then, the connected neighboring pixels of each cluster are grouped. Since the markers are colored foam balls, the circle with the minimum area enclosing each set is calculated, and the largest region is given as the target circle. The next step is mapping color to camera space to find the corresponding spatial location of each extracted color pixel. The result is a list of 3D real-world points mapped from the target circle's pixels, and each marker's location is computed by taking the median over all the calculated values. This process repeats for each new color frame that Kinect captures.
The analytical model requires the center and alignment of the transmitter and receiver coils/antennas as inputs to estimate the induced voltage. To determine a coil's surface normal, at least three markers ( M i : i ∈ {1, . . . , N m } with N m >= 2 ) are required. Hence, we used four red and three blue markers to track the transmitter antenna and the MI sensor node. The center of each device is calculated by averaging over its markers' location c = N m i=1 M i , and its surface normal is also calculated by the cross product of vectors passing through the markers: We applied the median filter, a non-linear digital filtering technique, to remove noise and spikes in the extracted location and alignment data.
The induced voltage, V ind , at the MI sensors is measured for 30 s via Arduino by using a Python script that controls the recording in order to synchronize Kinect's motion data and Arduino's measurements. The sampling frequency is 100 Hz, and the reference voltage range is 0 V to 5 V, which results in the quantization interval of 5/1024 V. The data streams of the node's MI sensors are recorded and used as inputs for the regression model to estimate the device's location. The sampling rate of motion data recorded by Kinect and the sensors' data are different. Therefore, all recordings are resampled with a sampling interval of 100 ms, which also handles the missing sample values. The measurement setup of experimental measurements is presented in Fig. 6.

Synthetic data.
A VAE is based on the auto-encoder architecture and is composed of encoder and decoder networks. The encoder compresses the data into a lower-dimensional space called the latent space representation. The decoder decompresses the reduced representation code to reconstruct the original data. The VAE learns the probabilistic interpretation of these networks and generates new samples using different latent variables as input. Consider dataset { x (i) } N i=1 that consists of N i.i.d. samples of some variable x . VAEs assume that the data are generated by a random process with continuous latent variable, and each latent variable z is related to its corresponding observation x through likelihood p θ (x|z) , where p θ is a probability distribution with parameters θ . This probabilistic interpretation of the decoder can decode a latent (hidden) representation code into a distribution over the observation. Similarly, the encoder network returns a latent code sampled from the posterior density distribution p θ (z|x) given a sample from the data space 46 . While both prior p(z) and likelihood p(x|z) can be formulated exactly, the posterior p(z|x) requires an intractable integral over the latent space. Hence, an approximate posterior q φ (z|x) closest in Kullback-Leibler (KL) divergence to the actual, intractable  and can be equivalently written as: On the right-hand side of equation (3), the first term, reconstruction error, represents the likelihood of the model reconstructing the input data. The second term, variational regularization term, is the KL divergence and makes the approximate posterior q φ (z|x) to be close to p θ (z) . The L (x; θ , φ) is a lower bound on the log probability of data p θ (x) , evidence lower bound (ELBO). Maximizing ELBO with respect to the model parameters θ and variational parameters φ respectively maximizes the marginal probability p θ (x) and minimizes the KL divergence 46 .
We trained the VAE model using the sensors' motion data tracked by the Kinect to produce synthetic timeseries samples. After training the model, new time-series data can be generated by sampling from latent space z with normal distribution parametrized by the mean and the variance 47 . The generated data include the motion of the coils' center and alignment in 3D space for a predefined sensor setting. We synthesized angular variables θ and φ to calculate the corresponding coil's surface normal n that can be defined as n = (sin θ cos φ, sin θ sin φ, cos θ) , where the variables θ and φ can take values in the range of 0-90 and 0-360 degrees, respectively.
We have performed the experiment for 220 motions, including spatial translation and rotation ( N s =220). The measured motion data samples of these experiments are used for training VAE to generate synthetic motion data. Then their corresponding MI signal is estimated using the two-port network model of the MI system 20,27 given node motion data. To evaluate the performance of the analytical model, we fetched the captured motion data by the Kinect system as input and estimated the corresponding induced voltage at the MI sensors for each measurement experiment. The circuit model is calibrated by finding the scale and bias of the synthesized data with respect to the measurements. Considering s i and m i as the generated synthetic data and measurements corresponding to a motion sample, the scale a = 1 N s N s i=1 σ mi σ si and bias b = 1 N s N s i=1 µ mi − σ mi σ si µ si can be calculated, where µ si , σ si , µ mi , σ mi represent the mean and standard deviation of synthetic data and measurements corresponding to the ith motion sample from N s samples. Figure 7 shows the measured and simulated sensors' data during their movement, taken from the evaluation dataset after calibrating the model.
The average normalized root-mean-squared error (NRMSE) and cross-correlation of the synthesized and measured data for all experiments are 12% and 0.91, respectively. It should be noted that the reported metrics consider not only the MI system model inaccuracy but also the error associated with the Kinect-based marker tracking algorithm and Arduino measurements. The variation between the real-world and synthetic samples affects the performance of the motion tracking algorithm that trains on the synthetic MI data. We re-assessed the performance of the regression model trained on noisy synthetic datasets to provide an evaluation of errors caused by the analytical MI system model in motion tracking. We considered the single sensor setting and its corresponding optimal regression model LightGBM for the analysis. Gaussian noise with zero mean and standard deviation of σ varying between 0 to 1 is added to the data generated by the MI model. The resulting datasets are separately given to a pre-trained regression model for training and then evaluated on the measured samples. The NRMSE value for each noisy dataset is calculated by comparing measured data and their corresponding noisy synthetic data. Figure 8 displays the performance of a machine learning regressor in motion tracking trained on these datasets with different NRMSE values (noise levels). The results show the effect of the MI model in generating realistic samples on the performance of the motion tracking algorithm.