Introduction

Activity classification using wearable activity monitors among the ambulatory population has been well documented.1, 2, 3, 4, 5 The benefits of detecting physical activities (PAs) using wearable devices include the ability to track regular PA, provide accurate energy expenditure (EE) estimation and assist in behavioral modifications that may lead to a healthier active lifestyle in community settings.6, 7, 8, 9, 10 However, there are only a limited number of studies that have detected and classified PAs performed by individuals who rely on wheelchairs for mobility using wearable devices.11, 12, 13 Identification of wheelchair-related PAs using wearable devices provide not only all the benefits mentioned above but also pertinent information on the functional use of upper limbs, an important factor of upper limb pain and injury prevalent in wheelchair users.14 The clinical practice guideline ‘Preservation of Upper Limb Function Following Spinal Cord Injury,’ published by the Paralyzed Veterans of America, has indicated that minimizing the frequency of upper extremity use in wheelchair users during repetitive tasks such as wheelchair propulsion can decrease the risk factor for repetitive strain injury and/or wrist pain.14

Previous research by Postma et al.11 showed that a wearable activity monitor consisting of six accelerometers and two electrocardiogram electrodes connected to a portable data recorder (0.7 kg) was able to detect wheelchair propulsion in ten manual wheelchair users (MWUs) with spinal cord injury (SCI). The results showed that wheelchair propulsion episodes were detected with an overall agreement, sensitivity and specificity of 92%, 87% and 92%, respectively. In another study, French et al.13 showed that wheelchair propulsion patterns, surface types and self-propulsion versus external pushing of a wheelchair could be detected using two dual-axis accelerometer-based eWatches secured to the wrist and the wheelchair’s frame. The results in three persons without disabilities showed that the classification accuracy rates varied from 80 to 90% for arcing versus non-arcing propulsion patterns, carpet versus tile surfaces and self-propulsion versus external pushing using classification algorithms such as k-nearest neighbor and support vector machines. Along similar lines, Ding et al.12 studied activity classification in 27 MWUs performing a series of representative activities of daily living in a semi-structured setting with an eWatch and a wheel rotation datalogger placed on the wrist and the wheelchair’s wheel, respectively. The results indicated that k-nearest neighbor, support vector machine, Naïve Bayes (NB) and decision tree (C4.5) classification algorithms could classify the activities into self-propulsion, external pushing and sedentary activity with an accuracy of 89.4–91.9%. The studies discussed here focused specifically on detecting propulsion activity versus other activities with activity monitoring systems composed of multiple components.

The primary objective of this study was to develop and evaluate machine learning-based classification algorithms to detect PAs including resting, wheelchair propulsion, arm ergometer exercises and deskwork performed by MWUs with SCI based on data collected from an off-the-shelf multisensor-based sensewear (SW) activity monitor. Our previous research has shown that an activity-specific EE prediction model consisting of four EE estimation equations for the four types of PAs mentioned above had smaller EE estimation errors than a general model consisting of only one EE estimation equation applied for all the activities.15 Therefore, in order to use the activity-specific EE prediction model in the field, we first need to detect the four types of PAs. Our secondary aim was to evaluate how the activity classification accuracy affects the performance of the activity-specific EE prediction model for MWUs with SCI described in our previous work.15

Materials and methods

Experimental protocol

The study was approved by the institutional review board at the University of Pittsburgh and the VA Pittsburgh Healthcare System. The target population of this study was MWUs with SCI. Participants were recruited through the institutional review board approved registries, flyers and advertisements in print media. Convenience sampling was used to recruit participants who expressed an interest in the study. Little or no research has been published on validating activity monitors for EE estimation among MWUs with SCI. Power analysis using a correlational design with α=0.05 (two-tail) and medium effect size (r=0.4) indicated that a total of 40 participants will provide a statistical power of 74%.16 On the basis of this estimation, in this study we recruited 45 MWUs with SCI to take part in the study and provide a written informed consent before their participation in the study. The data collection for the study took place between February 2009 and May 2011. Participants were included if they were between 18 and 60 years of age, used a manual wheelchair as a primary means of mobility, had an SCI, were at least 6 months post-injury and were able to use an arm-ergometer for exercise. Participants were excluded if they were unable to tolerate sitting for 4 h, had active pelvic or thigh wounds or failed to obtain their primary care physician’s consent to participate in the study. The study required the participants to pay one visit to the Human Engineering Research Laboratories, University of Pittsburgh to complete the data collection. All 45 participants who provided written informed consent participated in the study.

The research study protocol has been described in detail elsewhere.15, 17 As part of the pre-activity session, the participants answered a demographics questionnaire and had their heights and weights measured. During the activity session, the participants took part in resting and three other activities including wheelchair propulsion, arm-ergometer exercises and deskwork. The three activities were counterbalanced and the trials within each activity were randomized to counter order effects. During the activity session, all participants wore a SW activity monitor on their right upper arm over the triceps and a Cosmed K4b2 portable metabolic cart (COSMED srl, Rome, Italy). The participants performed each activity trial for a maximum period of 8 min, with a resting period of 5–10 min between activity trials and a period of 30–40 min between activities. During the wheelchair propulsion activity, the participants propelled their wheelchairs for two trials of 2 and 3 mph on a stationary dynamometer, and a trial of 3 mph on a flat-tiled surface. The arm-ergometer exercises included two trials at 60 r.p.m. with 20 and 40 W of resistance and a trial at 90 r.p.m. with 40 W of resistance. During the deskwork session, the subjects typed on a computer for 4 min and read a book for another 4 min.

Instrumentation and data collection

The SW activity monitor was used to collect the average, the mean absolute difference (variability of upper limb motion) and the number of peaks (turning points of upper limb) in transverse and longitudinal accelerations sampled at 32 Hz and recorded at 16 Hz; and the average galvanic skin response (skin conductance due to moisture or sweat), skin temperature and near body temperatures sampled at 32 Hz and recorded at 1 min. The multisensor data from the SW was retrieved using the InnerView Research software 7.0 (Bodymedia Inc., Pittsburgh, PA, USA). In addition, a portable K4b2 metabolic cart was synchronized with the SW and used to collect the criterion EE. The EE in terms of kcal min−1 was retrieved using the Cosmed K4b2 software (version 9.0). The investigators annotated the start and end of each activity trial during data collection, which was further used as the reference for developing and testing of the classification algorithms.

Data analysis

The first step of developing an activity classification algorithm was to separate the data into a training data set and a validation data set. A stratified approach with the injury level (paraplegia versus tetraplegia) as the stratified variable was used to select 80% of the participants into the training data set and 20% into the validation data set. The total amount of activity time was 1645 min (about 27.4 h) including 1319 min (about 22.0 h) in the training data set (n=36) and 326 min (about 5.4 h) in the validation data set (n=9).

The next step was to extract a set of features, which are statistical measures, used to distinguish between the four types of activities. The feature data included characteristic information such as the mean, the mean absolute difference and the number of peaks per minute that were directly obtained from various sensors in the SW activity monitor. In addition, linear and nonlinear features using the multisensor data from SW were calculated on the basis of statistical characteristics, such as time domain features, biomechanical and physiological features specific to PAs.15 We chose a 1-min window size (duration or period) for feature estimation to be consistent with the EE estimation. The features obtained from the SW and the estimated features resulted in a feature space of thousands of variables for the PA classification. We also manually labeled each 1-min activity segment as belonging to one of the four categories, that is, wheelchair propulsion, arm ergometry, resting and deskwork based on the annotations, which served as a reference for training and testing the activity classification algorithms. The data collected from the SW was processed through data analysis programs written in MATLAB (The Mathworks, Inc., Natick, MA, USA).

We then developed three activity classification algorithms based on the training data set using machine learning algorithms including linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and NB. For each classification algorithm, we performed the leave-one-subject-out (LOSO) and sixfold by-subject cross-validation to select the most appropriate features and evaluate the classification algorithm’s performance. The LOSO cross-validation method leaves one subject out and then develops the model on the remaining subjects. The model developed on these remaining subjects is evaluated by the left-out subject. This procedure was repeated 36 times, as there were 36 subjects in training group. The sixfold by-subject cross-validation method is similar to LOSO, except that the subjects are split into six random groups (or folds), and each time a group is left out and the models are developed on the remaining five groups. The sixfold cross-validation was repeated six times as the total participants in the training data set were 36. In addition to cross-validation, the performance of the three activity classification algorithms was also evaluated using the validation data set. Several performance measures were calculated including per-minute precision (true positive/(true positive+false positive)), recall (true positive/(true positive+false negative)), specificity (true negative/(true negative+false positive)) and overall accuracy ((true positive+true negative)/(number of the cases)).18 Precision indicates the proportion with which the detected activity is correct. Recall, also known as sensitivity, is the proportion of actual activities that are correctly identified. Specificity is the proportion of activities not performed that are correctly identified, or in other words it is the classification algorithm’s ability to distinguish actual true-negative cases. Overall accuracy is the overall performance of the algorithm. We also evaluated how the performance of the activity classification algorithms affected the EE estimation using the activity-specific EE prediction model that was previously developed.15 In our previous work, the EE estimation based on the activity-specific prediction model assumed 100% activity classification accuracy. However, in this study we evaluated the performance of the activity-specific EE prediction model based on the actual classification results. Similar to our previous work,15 the estimated EE was compared with the criterion EE from the metabolic cart by calculating the minute-by-minute mean absolute error and the mean-signed error.

Results

The participants included 37 males and 8 females with a m.s.d. age of 40.2 (11.0) years, weight of 78.5 (21.9) kg, height of 178.2 (8.6) cm and manual wheelchair usage of 13.8 (9.1) years. Thirty-eight participants had paraplegia (SCI of T4 and below) and seven participants had tetraplegia (SCI of T3 and above). Detailed demographics has been discussed in our previous work.15 Table 1 shows the performance of the LDA, QDA and NB classification algorithms applied to the training data set (n=36) using the LOSO and sixfold by-subject cross-validation methods. The results showed that the classification accuracy was less dependent on the algorithms, but more dependent on the type and number of features. For the sake of brevity, we have chosen to present detailed results of the QDA and NB classification algorithms. Table 2 shows the classification performance in terms of the precision, recall, sensitivity and overall accuracy of the QDA and NB classification algorithms using four features in the validation data set. The overall classification performance was 96.3% and 94.8% for QDA and NB classification algorithms, respectively. Table 3 shows the confusion matrix, which is a visual representation of the actual or true activity and the activity detected by the classification algorithm. The results from Table 3 indicate that the misclassification often occurred between wheelchair propulsion and arm ergometry exercises, which involve repetitive upper extremity usage. Furthermore, Table 4 shows the EE estimation errors including the mean absolute error and mean-signed error for the validation data set (n=9) when the activity-specific EE prediction model was used in conjunction with the QDA or NB classification algorithms with four features.

Table 1 Classification performance in terms of the overall accuracy (%) for the LDA, QDA and NB classification algorithms to detect four wheelchair-related activities with varied number of features using the LOSO and sixfold by-subject cross-validation methods in the training data set
Table 2 Classification performance in terms of the precision (true-positive rate), recall (sensitivity), specificity (true-negative rate) and overall accuracy (%) of the QDA and NB classification algorithms using four features to detect the four wheelchair-related activities in the validation data set
Table 3 Confusion matrix for the QDA and NB classification algorithms using four features to classify the four wheelchair-related activities in the validation data set
Table 4 EE estimation error in terms of the mean absolute error and mean-signed error for the validation data set when the activity-specific EE prediction model was used in conjunction with the QDA or NB classification algorithms with four features

Discussion

Accessible activity monitors in wheelchair users will allow users themselves, researchers and clinicians to track regular PA, EE estimation, PA levels in community settings and functional use of upper limbs, which is related to pain and injury prevalence in wheelchair users. Results from this study indicate that the SW activity monitor along with custom machine learning classification algorithms, such as LDA, QDA and NB can be used to classify wheelchair-related PAs in MWUs. Compared with the study conducted by Postma et al.11 where six activity monitors were used to detect wheelchair propulsion episodes from a series of activities, we used a single SW activity monitor to achieve a higher classification accuracy (96% for QDA classification algorithm versus 92%) with a larger number of subjects (n=45 versus n=10). Similarly, the classification algorithms discussed here outperformed those in the previous studies by Ding et al.12 and French et al.13, who classified wheelchair-related PAs by using two devices in smaller number (n=27) of wheelchair users and three non-wheelchair users, respectively.

Several strategies were used to reduce overfitting during the classification algorithm development. As shown in Table 1, the classification accuracy improved with an increased number of features, indicating that a reasonable number of features are necessary to classify multiple PAs. Given the number of participants in the study, we chose to use a small feature set including four features for further analysis of the classification algorithms, as we wanted to strike a balance between accuracy and overfitting of the classification algorithms to unseen participants. Furthermore, the results showed that the LOSO cross-validation technique that tends to have higher variance and lower bias in a small sample had similar performance to the sixfold by-subject cross-validation technique. This led us to use the LOSO cross-validation for classification algorithm development, which helps improve the generalizability of the classification algorithms to unseen participants. The four features for the QDA classification algorithm were: the resultant acceleration, and three other variables derived from the mean absolute difference and number of peaks of the transverse acceleration. Similarly, the four features for the NB classification algorithm were: the resultant acceleration and three other variables derived from the mean absolute difference of the transverse acceleration, and mean absolute difference and number of peaks of the longitudinal acceleration. The features chosen by both the QDA and NB classification algorithms included directional, total motion and frequency of upper arm movement information from the SW’s accelerometer, indicating that the classification algorithms were sensitive to movement-based variables when classifying the wheelchair-related PAs. Even though the QDA classification algorithm yielded slightly higher accuracy than NB, the NB classification algorithm is computationally simpler and has greater potential for real-time activity classification.

In our previous work,15 we developed an activity-specific EE prediction model, which involves detecting the type of PA before applying a specific EE estimation equation for the detected PA. However, the previous work evaluated the model performance assuming the types of PAs that can be detected and classified with 100% accuracy. With over 95% classification accuracies yielded by the QDA and NB classification algorithms, we found that the performance of the activity-specific EE prediction model was minimally affected by the actual classification results. The previous study showed that the mean absolute error and mean-signed error for all activities were 16.8% and 4.9±20.7%, respectively.15 In this study, the mean absolute error and mean-signed error for all activities were 17.4% and 5.3±21.5% for the QDA classification algorithm, respectively, and 18.2% and 4.6±22.8% for the NB classification algorithms, respectively. The results in Table 3 also showed that the wheelchair propulsion and arm ergometry activities were occasionally misclassified by QDA and NB classification algorithms; yet the misclassification may not significantly affect the EE prediction as the two activities have similar EE. Further, the activity-specific EE estimation equations and the classification algorithms share some common variables including the mean absolute difference of transverse acceleration, and the mean absolute difference and average number of peaks of longitudinal acceleration.15

One limitation of this study is the small number of PAs tested in the protocol. In addition, the activities were performed in a controlled laboratory setting and prescribed in a precise manner such as propelling a wheelchair and exercising with an arm ergometer at a certain speed and/or intensity. Future studies should evaluate a larger number of PAs in the home and community of MWUs. To our knowledge, there is no device that can be directly used by wheelchair users to classify PAs and estimate EE. We chose to investigate the potential of SW activity monitor in this population owing to its ready availability in the market and multisensor capabilities.

Conclusion

Availability of physical activity monitors for MWUs can empower them to monitor everyday PA participation and EE, and make informed decisions toward healthier behaviors. The high classification accuracy of the QDA and NB classification algorithms and the low EE estimation errors when using the actual classification results suggest that the SW activity monitor can be used to classify and estimate the EE for the four activities tested in this study among MWUs with SCI.

Data Archiving

There were no data to deposit.