Classification of human walking context using a single-point accelerometer

Real-world walking data offers rich insights into a person’s mobility. Yet, daily life variations can alter these patterns, making the data challenging to interpret. As such, it is essential to integrate context for the extraction of meaningful information from real-world movement data. In this work, we leveraged the relationship between the characteristics of a walking bout and context to build a classification algorithm to distinguish between indoor and outdoor walks. We used data from 20 participants wearing an accelerometer on the thigh over a week. Their walking bouts were isolated and labeled using GPS and self-reporting data. We trained and validated two machine learning models, random forest and ensemble Support Vector Machine, using a leave-one-participant-out validation scheme on 15 subjects. The 5 remaining subjects were used as a testing set to choose a final model. The chosen model achieved an accuracy of 0.941, an F1-score of 0.963, and an AUROC of 0.931. This validated model was then used to label the walks from a different dataset with 15 participants wearing the same accelerometer. Finally, we characterized the differences between indoor and outdoor walks using the ensemble of the data. We found that participants walked significantly faster, longer, and more continuously when walking outdoors compared to indoors. These results demonstrate how movement data alone can be used to obtain accurate information on important contextual factors. These factors can then be leveraged to enhance our understanding and interpretation of real-world movement data, providing deeper insights into a person’s health.

Walking is a fundamental human movement that has many benefits for mental and physical health [1][2][3][4] .With the advancements of micro-electromechanical systems (MEMS), researchers are now able to measure human motion outside the lab for extended periods.Real-world measurements are often conducted over long periods where there is little to no control over or explicit knowledge of a participant's behavior.Generally, people engage in numerous activities across different contexts in their daily life.Various methods have been developed to gather information about what an individual is doing.Human activity recognition (HAR) serves as the initial step in understanding real-world data as it allows for the classification of an individual's activities and the identification of walking instances.HAR research has successfully utilized different combinations of wearable sensors (such as mobile phones and inertial measurement units) and methods (including classic machine learning models and deep learning) to achieve accurate classification performance [5][6][7][8][9] .Research has also been conducted in the field of transportation mode detection using mobile phones and wearable sensors 10,11 .However, even within the walking activity itself, there exists a range of contexts that give rise to different behaviors.
There are many factors that can cause changes in gait.Firstly, the location where an individual is walking has a significant impact on their kinematics.Different terrains have been shown to influence how people negotiate their walk 12,13 .The location where someone walks can provide insight into their habits, such as whether they explore beyond their home and engage with the community 14 .Additionally, certain features of the built environment that individuals navigate can impact their mobility 15,16 .These various factors are directly related to health and well-being and therefore important to monitor.Secondly, the purpose of a walk can result in different walking strategies, even when individuals are walking in the same location 17 .For example, people tend to walk faster when commuting compared to a leisurely walk, despite both taking place outdoors in similar locations.Thirdly, there are internal factors that can affect human movement, such as mood.Contrasting moods, like happiness versus sadness, can lead individuals to exhibit different walking behavior [18][19][20] .Overall, acquiring information about these factors is crucial for understanding any observed variability in real-world walking.
• Algorithm development We developed a classification algorithm that leveraged the natural grouping of real- world walking into bouts to identify walks indoors versus outdoors.• Validation of our algorithm with a real-world dataset To train and validate the model, we used a dataset gener- ated from a data collection on 20 participants in the real world over a week, where GPS and self-reporting information were collected to label the different walks.• Analysis of differences in walking kinematics with an extended dataset Once validated, we used our model to label indoor and outdoor walks from a different dataset, where 15 participants were equipped with the accelerometer over two consecutive weeks.Finally, we characterized the influence of walking indoors versus outdoors on walking kinematics.
This novel approach has the potential to facilitate the parsing and analysis of real-world walking data while utilizing only an accelerometer.

Overview
Figure 1 shows an overview of the data collection and processing framework.We leveraged two datasets in this study: datasets A and B. All methods were carried out in accordance with relevant guidelines and regulations.For both datasets, the University of Michigan's Institutional Review Board approved the procedures.Every participant gave their informed consent, before the studies commenced.Dataset A was collected using a thighworn accelerometer, self-report, and GPS data over 7 days with 20 participants.Participants were asked to report the purpose and location of the walks carried out throughout their day.Dataset B was collected using only the thigh-worn accelerometer over 14 days with 15 participants.We used the self-report and GPS data to label the walking periods from Dataset A. The walking periods labeled exclusively inside or outside were used to train, validate, and test a classification model.Then, we used the classification model to label the walking periods labeled as mixed (e.g., inside and outside), as well as the unlabeled walking periods from Dataset A, and all the walking periods from Dataset B. Finally, all the labeled walking periods from both Dataset A and B were used to characterize the differences between indoor and outdoor walking periods.

Datasets
The details for the data collection of Dataset A and B can be found here 17,31 .Briefly, for Dataset A, we collected data on 20 participants over a week in the real world (Table 1).13 females and 7 males between 21 and 49 years old with an average of 26.1 were recruited from a population of students during the summer in Ann Arbor, Michigan.The participant aged 49 years was a non-employed adult.Contextual information was collected using self-reporting and GPS data from their phones using an app called Ethica app (Ethica Data [Toronto, Canada]).For the self-reporting, participants were asked to maintain an activity log describing where they walked (ex: home, work, etc.) and the purpose of their walk (ex: going to work).We also conducted an exit interview after the real-world data collection to ensure that the self-reporting was complete and accurate.Motion data was collected using the activPAL (activPAL TM [PAL Technologies Ltd., Glasgow, UK]), a thigh-worn accelerometer.This device can be placed on the thigh using tape and offers 2-week continuous monitoring.It samples at 20Hz and possesses a 3-axis accelerometer ( range = ±4 g ).The sensor's size and battery life allow for an unobtrusive placement and high compliance.Additionally, the proprietary algorithm of the sensor offers an accurate classification of activities that we used to isolate walking 32,33 .Dataset B was collected on 15 participants over 2 weeks in the real world, using only the activPAL (Table 1).All the participants were students between 20 and 30 years old.It is important to note that 4 subjects are in both datasets.Additionally, although dataset B was over a longer period, the data collection period was amidst the COVID-19 pandemic, which might lead to a decrease in measured activity.Information about fitness and occupation were not part of the exclusion criteria and were not collected.

Classification algorithm development
Figure 2 shows the data processing framework we used to create the algorithm to classify walking periods into outdoor or indoor.
Figure 1.Data collection and processing overview-(A) Dataset A was collected over 7 days on 20 participants, using an accelerometer, self-report, and GPS data from the participants' phones.Dataset B was collected over 14 days on 15 participants using an accelerometer.(B) The walking periods from dataset A were labeled using the GPS and self-report data.Walking periods labeled exclusively indoor or outdoor were used to train, validate, and test a classification model.The walking periods labeled mixed (e.g., inside and outside), as well as all the unlabeled walking periods from dataset A and B were classified using the model to characterize indoor versus outdoor walks.

Walking period extraction
Walking in the real world can be understood as an ensemble of walking bouts.However, there are bouts of walking that are likely to belong to a same walking activity.For instance, if an individual stops at a pedestrian light, the walks before and after that stop could be grouped together.We introduced the notion of walking period to capture this start-and-stop dynamic of real-world walking.This method was created and outlined in earlier work 31 .Briefly, we used the activPAL to identify stepping bouts.Then, bouts that were separated by a standing period of less than 1 min were grouped together into a stepping period.Lastly, we developed a classification algorithm to extract walking periods from these stepping periods, since the activPAL does not distinguish between walking and running 31 .

Data labeling
We manually labeled the walking periods from Dataset A using the self-report and GPS data.We combined a visualization of the GPS data with a satellite map and the information given in the participants' self-report to assign either an inside or outside label, or a mixed label when it appeared participants were walking both inside and outside (Fig. 3).There are walking periods that were not labeled, either because there was no associated GPS data, because the self-report was missing, or both.

Feature extraction
We used a custom algorithm to extract strides from the thigh-worn accelerometer.Using the timing between strides, we computed stride time and stride frequency for each stride in a walking period.Then, we first extracted what we named the biomechanics feature set, based on our domain knowledge.This set includes walking period Figure 2. Classification model-We extracted walking periods from the accelerometer that we then labeled using self-report and GPS data.We extracted features from the accelerometer data and the stride detection.Labeled walking periods were then split into training, validation, and testing set.We trained and validated two different learning algorithms, Random Forest and Ensemble SVM, using a leave-one-participant-out scheme.This led to 15 trained models that were then tested on the 5 remaining untouched participants' data.The bestperforming model was chosen for the rest of the analyses.www.nature.com/scientificreports/duration, walking period continuity (e.g., proportion of total standing time within a period, with 100 being no standing time), mean and standard deviation of stride frequency.Stride frequency was normalized by g • l 0 , with l 0 being leg length 34 .Leg length was measured from the anterior superior iliac spine to the floor.Walking period continuity was defined as: This reduced feature set was selected because of the demonstrated relationship between walking period duration and continuity and walking variability 31 .Stride frequency is also a key parameter that is likely to vary with the environment.Moreover, this feature set can be derived using other accelerometer-based sensors with different body placements.Additionally, we computed 20 other features from both stride frequency and the raw accelerometer signal.These other features were selected based on the existing literature 31,35,36 .We compared model performance using both feature sets to identify features best capable of distinguishing between different walking environments.A table describing the features is available in the Supplementary Material.

Classification model
Learning algorithms and hyperparameters.We compared the performances of two supervised ensemble learning algorithms that use different classification principles to separate indoor versus outdoor walking periods: Ensemble Support Vector Machine (SVM) 37 and Random Forest 38 .We used ensemble methods to solve the issue of imbalanced classes but also improve the generalizability of our model 39 .Ensemble methods combine multiple base models to improve classification performances.Both SVM and Random Forest algorithms have successfully been used for terrain classification tasks with both humans and robots 13 .We used grid search to optimize the hyperparameters of both algorithms.The ensemble SVM we used is a bagging (e.g., bootstrap aggregation) classifier with SVM as a base model, 60 estimators, and a radial basis function kernel.The Random Forest algorithm used 40 estimators and used bootstrapping to build the trees.We trained both algorithms on the set of biomechanics features and all features.In summary, we evaluated 4 different cases: Ensemble SVM with biomechanics features, Ensemble SVM with all features, Random Forest with biomechanics features, and Random Forest with all features.We used the Scikit-learn library (version 1.0.2) from Python to train and evaluate the different learning algorithms 40 .
Model training, validation, and testing.To build our model, we only used walking periods labeled exclusively indoor or outdoor (Fig. 1).We divided our dataset into training, validation, and testing sets.The training and validation set contained walking periods from 15 participants and followed a leave-one-participant-out method to tune the hyperparameters of our models and evaluate the generalizability to new participants.This means that we trained a model on data from 14 participants and validated it on the remaining participant.This process was iteratively performed 15 times, with each participant serving as the validation set once, and for each of the four cases described in the previous paragraph (2 algorithms × 2 features sets).This provided 15 models that we evaluated and we chose the best-performing case to use on the 5-participant testing set.For instance, if the average performance of the models (e.g., average accuracy, f1-score, and AUROC) that were trained and validated using Ensemble SVM on all features is the highest, we will use the 15 models from this case on the testing set.The model that performed the best (e.g., highest accuracy, f1-score, and AUROC) on the testing set among the 15 trained models for the best case was chosen for the classification of the walking periods labeled as mixed and the unlabeled walking periods of dataset A, as well as all the walking periods in dataset B (Fig. 1).
Model evaluation.We used different metrics to evaluate our model and represent its performances both during training and testing in the different cases and for the different models.First, we used accuracy to determine the overall proportion of correctly classified walking periods.We complemented accuracy with F1-score since accuracy can be misleading when dealing with imbalanced datasets.F1-score takes into account both false positives and false negatives to ensure that the performance is not biased towards the more frequent class.Finally, we also used Area Under the ROC Curve (AUROC) to measure the model's ability to separate between indoor and outdoor classes.AUROC is also useful for imbalanced datasets as it takes into account false positives and false negatives.The combination of these different performance metrics provides a more comprehensive picture of the model's performance.
Feature importance.We used feature permutation to estimate the importance of each feature in the performances of our model.This method consists of randomly shuffling the values of a given feature in the validation set and calculating the change in performance of the model with the shuffled data.We used the test set to evaluate feature importance.
Analysis of mixed walking periods, labeled indoor and outdoor.We did not use walking periods that happened both inside and outside to train our model to avoid decreasing the model's ability to distinguish between indoor and outdoor.However, we investigated how the chosen model classified these mixed walking periods.The researcher used both the GPS data and the participant's self-report to label as best as possible the mixed walking periods based on whether they appeared to be mostly indoor or mostly outdoor.Then, we investigated whether the model would classify a mostly indoor walk as indoor and vice versa.It is important to note that the GPS data can be noisy and interpretation can be difficult, even with the support of the self-report data (Fig. 3).

Outdoor versus indoor walking analysis
Once we chose the best performing model, we classified the walking periods labeled indoor and outdoor and the unlabeled walking periods of dataset A, as well as all the walking periods in dataset B (Fig. 1).Then, we characterized the differences between indoor and outdoor walking periods.We looked at the differences in walking period duration and continuity, as well as an essential health indicator: walking speed.

Walking speed estimation
We used the method described in Baroudi et al. 41 to estimate stride speed from the accelerometer.Briefly, this method leveraged the relationship between stride speed v and stride frequency f 42,43 : where a and b are model parameters.Stride frequency f can be accurately estimated from stride detection using the accelerometer and we identified in previous studies the parameters a and b for each subject in both datasets using a foot-worn inertial measurement unit 17,31 .Researchers reported that 97% of the stride speed error was under 0.2 m • s −1 using this method.This framework can be used to estimate the relative speed differences for individuals walking in the real world using only an accelerometer.

Statistical analysis
Our goal was to understand the difference in walking speed, walking period duration, and continuity between indoor and outdoor settings, whilst taking into account the nested nature of our data: walking speeds are nested within walking periods, and walking periods are nested within participants.To tackle these inherent dependencies and repeated measures from each participant, we used linear mixed-effects models.This strategy facilitated the management of our multilevel data structure, properly adjusting for the correlations among multiple walking speeds, walking period duration, and continuity measures taken from the same walking period and participant.We built three models using walking speed, walking period duration, and walking period continuity as the dependent variables, while the indoor/outdoor condition was the fixed effect, and the walking period (for the walking speed model only) and participant identifiers were the random effects.We tested the assumption of normally distributed and homogeneous residuals by visualizing QQ-plots and residuals versus fitted values plots.We normalized walking speed by g • l 0 , with l 0 being the participant's leg length 34 .This design effectively cap- tured the variability in walking period parameters both within and across periods and participants.The models were implemented in R using the 'lme' function from the 'nlme' package 44 .We explored different correlation structures for the random effects and selected the most appropriate model based on the Akaike Information Criterion (AIC) 45,46 .The model with the lowest AIC was deemed the best fit for our data.The estimated fixed effect for the condition serves as an indication of the expected change in parameters when transitioning from indoor to outdoor environments, while accounting for the nested structure of the data.

Model validation
Tables 2 and 3 summarize the validation results for the different cases.The data from subjects 1, 4, 7, 17, and 19 were randomly selected to be kept for the test set.The training set contained between 957 and 1040 walking periods, with 67.9 ± 4.4% (range) of indoor labels.The validation set contained between 29 and 112 walking periods, with 66.0 ± 51.4% of indoor labels.This high range can be explained by the fact that participants had different habits and behavior.Most participants had mostly indoor walking periods, but for instance S6 had only 34.5% of indoor walks.The test set contained 339 walking periods and 78.8% of these periods were labeled indoor.The models perform better when using the biomechanics feature set.The average accuracy, F1-score, and AUROC increase of approximately 0.2 for both Ensemble SVM and Random Forest models from using all features to the biomechanics features only.Random Forest and Ensemble SVM perform comparably with both sets of features.Because Random Forest algorithms are easier to interpret and use, we chose the Random Forest algorithm with the biomechanics feature set to use with our test set.The model trained and validated using Random Forest and the biomechancs features with S8 held out performed the worst ( accuracy = 0.828 , F1-score = 0.857 , AUROC = 0.897 ) while the one with S9 held out performed the best ( accuracy = 1.000 , F1-score = 1.000 , AUROC = 1.000).

Model testing and choice
We evaluated the 15 models validated using Random forest and the biomechanics feature set on the 5-participant test set.We found that Model S16 (e.g., the model that was trained and validated with S16 held out) outperforms the other models, with an accuracy of 0.941, an F1-score of 0.963, and an AUROC of 0.931, which represented at least + 0.01 than the other models for all metrics (Table 4).As such, we chose this model to label the rest of the data and characterize the differences between indoor and outdoor walking periods.

Feature importance
We found that walking period duration was the most important feature in the model we chose (Fig. 4).When the values of this feature were shuffled within the validation set, it led to a decrease in model accuracy of 0.19.Walking period continuity, mean stride frequency, and standard deviation of stride frequency led to decreases in accuracy of 0.04, 0.06, and 0.02 respectively.

Model classifications on mixed walking periods
Figure 5 shows a visualization of walking periods labeled mixed (e.g., indoor and outdoor) using principal component analysis with the walking periods labeled only indoor or outdoor.We observed that the majority of walks labeled mostly indoor clusters with the walks labeled only indoor and vice versa for outdoor walking periods.The model we chose classified 71% of mostly outdoor walks as outdoor and 88% of mostly indoor walks as indoor.Figure 5 also shows the self-report and GPS data from two walking periods.It is likely that the researcher incorrectly labeled these mixed walks, but the model managed to classify them more accurately as it reached good validation performances showing its learning of the characteristics of indoor versus outdoor walks.

Characterization of indoor versus outdoor walking periods
After all walking periods were labeled, we had 69,616 stride speed values and 3766 walking periods from dataset A, compared to 53,930 stride speed values and 3701 walking periods from dataset B. The ratio of indoor versus outdoor was approximately 80:20 for both datasets.Different linear mixed models were built to evaluate the effect of context (e.g., walking environment) on walking speeds, walking period duration, and continuity.First, we found a large significant effect of context on walking speed b = 0.095 , t(117678) = 79.5 , p < 0.001 .This indi- cates that normalized walking speed increases by 0.095 from indoor to outdoor walks (the normalized value of 0.095 corresponds to approximately 0.28 m • s −1 depending on the participant's leg length).We also found a large significant effect of context on walking period duration and continuity, b = 9.25 , t(7446) = 48.9 ,p < 0.001 and b = 20.14 , t(7446) = 26.2 ,p < 0.001 .These results suggest that duration increases by 9.25 min and continuity by 20.14% from indoor to outdoor walks.These results are illustrated in Figs. 6 and 7. We observe that outdoor walking periods have overall higher duration ( µ outdoor = 11.4 min versus µ indoor = 2.2 min) and continuity ( µ outdoor = 81.7%versus µ indoor = 61.6% ).Participants walked on average 0.28 m • s −1 faster when walking outdoor compared to indoor.We also observe a larger variability in the distribution of stride speed indoor compared to outdoor ( σ indoor = 0.43 m • s −1 versus σ outdoor = 0.31 m • s −1 ).

Discussion
In the real world, individuals exhibit great variability in walking patterns and are able to adapt to diverse environmental contexts.Understanding these contexts is essential for extracting meaningful information about a person's mobility.Determining whether an individual is walking indoors or outdoors is a critical element of context, given the stark differences in environment and conditions that these two settings present.Here, we developed a novel framework that utilizes only an accelerometer to accurately classify indoor versus outdoor walks.To be able to ensure ecological validity from this reduced sensor set, we leveraged a unique dataset with an extended sensor Table 4. Model choice-Results of the 15 models with the 5-participant test set.Model Sx corresponds to the models trained and validated with the leave-one-participant-out scheme with Sx held out using the Random Forest algorithm with the biomechanics features set as shown in Table 3. Model S16 performs the best, as it had the highest accuracy, F1-score, and AUROC.A decrease in accuracy indicates that, when the given feature was perturbed (e.g., randomly shuffling its values), the performance of the model degraded.This means that this feature contains meaningful information on which the model relies to make its predictions.
suite that contained the accelerometer.Then, we used this approach to quantify the differences between indoor and outdoor walking patterns.This framework not only demonstrates the potential to use a minimal sensor suite to successfully gain important contextual information but also enables a more comprehensive understanding of real-world walking behavior.Both Ensemble SVM and Random forest trained with different feature sets were able to learn the characteristics of indoor and outdoor walks.The performances across models were very high, as reflected by average accuracies, F1-scores, and AUROC exceeding 0.88, 0.90, and 0.86 respectively for all training scenarios (Tables 2, 3).Trained models performed better with the biomechanics feature set, suggesting that these features are sufficient to capture the inherent differences between indoor and outdoor walking periods (Table 3).Notably, walking period duration was the most important feature (Fig. 4) with outdoor walks being longer on average than indoor walks ( ∼ +9 min ) (Fig. 6).Average stride frequency was also an important feature, as outdoor walks tend to have a higher intensity than indoor walks.The high performance of the classifier also suggests that the grouping of walking bouts into walking periods is an effective representation of real-world walking 31 .We chose the best performing model that used the Random Forest algorithm trained with the biomechanics feature set (Table 4).Using the biomechanics feature set as opposed to the raw data enables the model to be reused with different sensor types and placements.In fact, stride frequency was chosen because it can easily be derived directly from various sensors, even from smartwatches [47][48][49] .The choice of Random Forest also increases the interpretability and generalizability of our model for other populations.As such, the model we developed could be extended to other studies of mobility and help improve the understanding of human data from wearable sensors.
We used the developed model to characterize the differences between walking indoor compared to outdoor using a large dataset.We found that outdoor walking periods were significantly longer, more continuous (e.g., less standing time), and had higher stride speed (Figs. 6, 7).Researchers have been increasingly interested in the measurement of walking speed in the real world, as it is a critical health indicator for various health issues 1,50-52 .Our observations show that individuals greatly vary their walking speed indoor (Fig. 7).On the other hand, individuals took more strides outdoor, with less variability in walking speed.This suggests that isolating outdoor walks could potentially improve estimates of preferred walking speed in the real world.This substantiates the findings that longer walks show greater discriminative power for clinical populations 53 .Understanding the   www.nature.com/scientificreports/proportion of activity spent indoor versus outdoor can also be useful for the improvement of physical activity.
Although any increase in physical activity matters, there are proven benefits to walking outdoors [54][55][56] .As such, our models could be used in the different stages of intervention design, from the baseline physical activity assessment to the monitoring of intervention efficacy.The reduced sensor set also enables higher compliance (e.g., the degree to which users correctly and consistently wear the device as intended), which is essential for the reliability and validity of data collected in the real world.While this study integrates fundamental elements of context for real-world walking, there are numerous other contextual factors that can have an impact on walking behavior and biomechanics.There are also nuances within indoor and outdoor walks, such as terrain type, that also induce changes in walking patterns 12,13 .Future work should investigate those factors and their relationships with motion to potentially integrate additional classes or sub-classes into the model we developed.Further, we developed our model with 20 young adults, who were mostly students.Populations with specific habits, like a nurse, would potentially show long walks indoors that could be mistaken for outdoor walks given the importance of walking period duration in our model.Additionally, individual habits may be affected by climate and thus geographical location.Thus, the accuracy and generalizability of our model can be improved by collecting a larger dataset with a more diverse population over an extended period.Additionally, the off-the-shelf system used for this work was designed to be placed on the thigh.The gait metrics and the activity classification algorithm were tuned to measurements made from this location.However, this particular placement can become inconvenient for users during extended measurement periods.Future research should explore the use of consumer-grade wearables like smartwatches.The feature set we derived for our classification algorithm can potentially be accurately obtained from sensors with different placements.Lastly, our method was mainly developed for offline classifications, in the scenario where data is retrieved and post-processed to gain information on an individual's behavior.Expanding this framework for online classification should be pursued, for potential use in fields like rehabilitation or assistive robotics.

Figure 3 .
Figure 3. Data labeling using GPS and self-report data-We show here 3 examples for walking periods labeled indoor, outdoor, and both indoor and outdoor.The GPS sampling was variable and dependent to the type of phone used.The text in quotations corresponds to the participant's self-report.

Figure 4 .
Figure 4. Feature importance using feature permutation-Decrease in model accuracy for each shuffled feature.The error bars represent standard deviation.A decrease in accuracy indicates that, when the given feature was perturbed (e.g., randomly shuffling its values), the performance of the model degraded.This means that this feature contains meaningful information on which the model relies to make its predictions.

Figure 5 .
Figure 5. Analysis of mixed walks-(A) GPS and self-report data is shown for a sample of walks labeled by the model as indoor (left two images) and outdoor (right two images), as indicated by the coloring.The second and fourth images were originally labeled as mostly outdoor and mostly indoor by the researcher as indicated by the star and square shapes.The mixed walking periods highlighted here were mislabeled by the researcher.The sampling rate of the GPS data is variable and depends on the participant's phone type.(B) Principal component analysis with the mixed walking periods colored based on the model classification.The shapes represent the researcher's labels.We can see that the majority of mostly indoor walks are labeled indoor and vice versa, and that the walking periods that were mislabeled by the researcher are correctly labeled by the model.

Figure 6 .
Figure 6.Indoor and outdoor walking periods duration and continuity-Relationship between walking period duration and continuity-(A) Scatter plot with marginal distributions of waking period duration and continuity.Each dot corresponds to a walking period.(B,C) We binned all walking periods by their context and looked at their durations and continuity.

Figure 7 .
Figure 7. Stride speed for walking periods indoor vs. outdoor-Distribution of stride speeds for all participants grouped by context.Each shaded area represents the shape of the distribution, and the horizontal lines mark the mean.

Table 2 .
Model validation with all features.

Table 3 .
Model validation with biomechanics features.