Association between behavioral phenotypes and sustained use of smartphones and wearable devices to remotely monitor physical activity

Smartphones and wearable devices can be used to remotely monitor health behaviors, but little is known about how individual characteristics influence sustained use of these devices. Leveraging data on baseline activity levels and demographic, behavioral, and psychosocial traits, we used latent class analysis to identify behavioral phenotypes among participants randomized to track physical activity using a smartphone or wearable device for 6 months following hospital discharge. Four phenotypes were identified: (1) more agreeable and conscientious; (2) more active, social, and motivated; (3) more risk-taking and less supported; and (4) less active, social, and risk-taking. We found that duration and consistency of device use differed by phenotype for wearables, but not smartphones. Additionally, “at-risk” phenotypes 3 and 4 were more likely to discontinue use of a wearable device than a smartphone, while activity monitoring in phenotypes 1 and 2 did not differ by device type. These findings could help to better target remote-monitoring interventions for hospitalized patients.

support and financial stability, for which credit score may serve as a proxy, have been linked to adherence to remote health monitoring systems 28 .
Though these characteristics are likely to independently influence adherence to and efficacy of remote monitoring interventions, previous research has yet to fully appreciate the dynamic ways in which they may interact to produce distinct behavioral profiles with meaningful implications for health behavior. For instance, while high levels of the personality trait neuroticism have been associated with poor health behaviors such as lower medication adherence, research suggests high levels of neuroticism may in fact promote healthier behavior when co-occurring with high levels of conscientiousness 27 . Thus, our objective was to employ a person-centered approach to understanding the socio-behavioral correlates of remote activity monitoring behaviors. One promising approach for identifying groups of individuals who may have similar characteristics, and thus may respond similarly to health behavior change interventions, is to construct "behavioral phenotypes" of users based on individual behaviors, preferences, and motivations 16 .
In this study, we leveraged demographic, behavioral, and physical activity data collected at baseline to identify underlying homogeneous subgroups of individuals using latent class analysis (LCA). We next examined associations between the resulting subgroups, termed 'latent classes' and hereafter referred to as behavioral phenotypes, and patterns of remote physical activity monitoring among patients randomized to use either a smartphone or wearable device. Specifically, we compared the duration and consistency of data transmission between device types among members of each behavioral phenotype, as well as between behavioral phenotypes within each device type in a series of survival and ANOVA analyses.
Considering model fit parameters, the results of likelihood ratio tests, the distribution of the sample across classes, and model interpretability, a 4-class LCA model was selected (Supplementary Table 1). This model yielded low values for Akaike information criterion (AIC) and sample size adjusted Bayesian information criterion (BIC) and high entropy. Though the 3-class model yielded slightly lower BIC, the likelihood ratio test for the 4-class model was statistically significant, indicating significantly improved model fit when using 4 classes compared to 3 classes. Additionally, while the 5-class model yielded slightly lower AIC and higher entropy than the 4-class model, the likelihood ratio test was not statistically significant, indicating that these differences do not significantly improve model fit. The sample was also the most evenly distributed across classes in the 4-class model.

Behavioral phenotype descriptions
The four behavioral phenotypes differed significantly on all latent class indicators aside from age and physical activity (p-values for ANOVA test < 0.01; Table 1). Phenotypes also differed significantly on a number of additional sociodemographic criteria such as race, insurance type, education level, marital status, and household income (Table 2). However, they did not differ in body mass index or Charlson comorbidity index.  Table 3). Below we elaborate on these defining features and on sociodemographic characteristics of note. Phenotype 1-more agreeable and conscientious. Phenotype 1 was the largest subgroup, comprising 35.7% of the sample (n = 158). This phenotype scored the highest in agreeableness (+ 0.67 SD above the sample mean) and conscientiousness (+ 0.67 SD), with participants in this phenotype only reporting high levels of both features (Supplementary Table 2). They also had the lowest credit scores (− 0.59 SD) and are characterized by lower scores in neuroticism (− 0.33 SD). In terms of sociodemographic features, phenotype 1 was composed of more non-Hispanic Black participants and participants with lower income and levels of education. Phenotype 2-more active, social, and motivated. Phenotype 2 comprised 23.8% of the sample (n = 105). This phenotype was older than the overall sample (+ 0.59 SD from sample mean) and males were overrepresented in this phenotype relative to the whole sample. Participants in this phenotype reported the highest number of MET minutes per week across classes (+ 0.11 SD), which reflect the amount of energy expended carrying out physical activity. They also scored the highest in openness (+ 0.33 SD), extroversion (+ 0.25 SD), and social support (+ 0.33 SD), and had the highest credit scores (+ 1.03 SD), reflecting a greater degree of social and motivated behavior. There were fewer non-Hispanic Black participants and more Hispanic participants, college graduates, and higher income participants in this phenotype relative to the overall sample. Table 2. Baseline sociodemographic characteristics included in cox proportional hazard models, summarized by behavioral phenotype. SD, standard deviation; CCI, Charlson Comorbidity Index; IQR, interquartile range. a Chi-Squared test, aside from age, BMI, and CCI, for which one-way ANOVA tests were used. *p-value is significant (p < 0.05). b Calculated as weight in kilograms divided by height in meters squared. This phenotype reported the highest levels of risk-taking in both the health and safety domain (+ 1.00 SD) and social domain (+ 0.75 SD). Other defining characteristics include the lowest scores in social support (− 0.33 SD) and agreeableness (− 0.83 SD). Males were also overrepresented in this phenotype relative to the whole sample, as were participants on Medicaid.
Participants in this phenotype reported the lowest levels of physical activity (− 0.09 SD) and social risk-taking preferences (− 0.83 SD). They also scored the lowest in extroversion (− 0.63 SD), conscientiousness (− 1.00 SD), and openness (− 0.83 SD), and the highest in neuroticism (+ 0.66 SD). Males were underrepresented in this group.
Behavioral phenotype and sustained device use. Differences within each phenotype. Figure 2 shows the proportion of participants in each phenotype providing data over the 180 days after hospital discharge, comparing patients randomized to use smartphones versus wearable devices. The more agreeable and conscientious phenotype 1 and more active, social, and motivated phenotype 2 showed no differences in duration of data provision with smartphones versus wearables. However, the more risk-taking and less supported phenotype 3 and less active, social, and risk-taking phenotype 4 showed an increased likelihood to discontinue use of a wearable  www.nature.com/scientificreports/ compared to a smartphone, though this difference was only statistically significant in phenotype 3 (unadjusted log rank test: p = 0.029).
The increased likelihood to discontinue wearable use in phenotype 3 remains significant in cox proportional hazard models adjusted for sociodemographic characteristics (Table 4). Consistent with the unadjusted analyses, smartphones and wearable devices did not differ significantly in the other phenotypes. Results do not change appreciably in a sensitivity analysis defining a day of data transmission as a day with at least 1000 steps reported (Supplementary Table 3).
ANOVA analyses comparing the proportion of days with step data provided during the 180-day study period reveal significantly more consistent activity monitoring in smartphone users compared to wearable users in both the more risk-taking and less supported phenotype 3 and less active, social, and risk-taking phenotype 4 (p = 0.014 and p = 0.047, respectively; Supplementary Table 4). In the sensitivity analysis these differences are not statistically significant, though the increased consistency in data provision among smartphone users relative to wearable users shows trend level significance in phenotype 3 (p = 0.063; Supplementary Table 4).
Differences between phenotypes. Unadjusted log rank tests show a significant difference in data provision between phenotypes among wearable users (p = 0.046), but not among smartphone users (p > 0.05; Fig. 3). In adjusted cox proportional hazard models, the log-rank test comparing phenotypes among wearable users trends towards significance (p = 0.051), as does the decreased likelihood to stop providing data in the more agreeable and conscientious phenotype 1 and more active, social, and motivated phenotype 2 relative to the less active, social, and risk-taking phenotype 4 (Table 5). Findings were similar in the sensitivity analysis (Supplementary Table 5).
There were no significant differences between phenotypes in the proportion of days with data provided in either device type (Supplementary Table 6). However, there are trend level differences between phenotypes in both device types in the sensitivity analysis (smartphones: p = 0.057, wearables: p = 0.093; Supplementary Table 6). Otherwise, results did not change appreciably in the sensitivity analysis (Supplementary Table 6).

Discussion
In this study, we demonstrate that "behavioral phenotypes, " or subgroups of individuals defined by co-occurring social, behavioral, psychological, and demographic traits had different patterns of sustained use of smartphones and wearable devices for tracking physical activity. In prior work, the overall sample had higher sustained use in the smartphone group when compared to the wearable group 22 . Our findings demonstrate that this only holds true for a subset of "at-risk" individuals. Specifically, duration and consistency of sustained activity monitoring differed by device type only in the more risk-taking and less supported phenotype 3 and less active, social, and risk-taking phenotype 4. In both phenotypes, smartphone users displayed significantly greater tracking consistency compared to wearable users, while duration of activity monitoring was only statistically significantly higher among smartphone users in the more risk-taking and less supported phenotype 3. Additionally, behavioral phenotypes differed in duration of activity monitoring only among wearable users, not among smartphone users. This www.nature.com/scientificreports/ is driven in part by the high rates of early drop-off in use of wearable devices observed in the more risk-taking and less supported phenotype 3 and less active, social, and risk-taking phenotype 4. The less sustained activity monitoring seen in these phenotypes is in line with previous research. The more risk-taking and less supported phenotype 3 and less active, social, and risk-taking phenotype 4 reported the lowest baseline physical activity levels, which research has linked to a decreased likelihood of forming successful activity tracking habits 30 . These phenotypes also both display personality traits seen in "Type D" or distressed personality type, a well-established phenotype marked by a tendency toward negative emotions and social inhibition. Type D personality is correlated with high levels of neuroticism and low levels of conscientiousness, agreeableness, and social support, seen in both phenotypes 3 and 4, and has been associated with poor health outcomes and a sedentary lifestyle [31][32][33] .
Whereas the less active, social, and risk-taking phenotype 4 displayed poor activity monitoring performance across both device types, the more risk-taking and less supported phenotype 3 was a top performer among smartphone users, showing less sustained and consistent activity monitoring only among wearable users. This may be because the more risk-taking and less supported phenotype 3 diverges from the less active, social, and Table 4. Cox proportional hazard models associating study arm with last day of data transmission, censoring on patient death and adjusting for patient-level sociodemographic characteristics. Models were fit separately for each behavioral phenotype. HR, hazard ratio; CI, confidence interval; BMI, body mass index; CCI, Charlson Comorbidity Index. *p-value is significant (p < 0.05). www.nature.com/scientificreports/ risk-taking phenotype 4 and the Type D profile on a number of traits related to sociability: phenotype 3 displays high extraversion, high openness, and high degrees of social risk taking, while phenotype 4 reported the lowest scores on these metrics. This suggests that sustained use of wearable devices may be more reliant on characteristics such as neuroticism, conscientiousness, and social support, while sustained activity monitoring using smartphone devices may be more dependent on traits related to sociability and openness. Future research might seek to investigate other baseline characteristics that may mediate the relationship between these characteristics and successful physical activity monitoring via a smartphone device, such as general daily smartphone use. The more active, social, and motivated phenotype 2 displayed high levels of sustained and consistent activity monitoring across device types, most notably showing increased adherence to activity monitoring compared to the other phenotypes in the wearable arm. Though this is generally unsurprising, a driving feature of this phenotype was older age, which contrasts with previous research suggesting older adults are less likely to develop successful habits using wearable activity monitoring technology 34,35 . It is possible that the high levels of social support reported by this phenotype, in addition to support from the study team, may have partially offset initial acceptance and usability barriers oft-cited in older adults 35,36 . This reflects the importance of considering individual differences when examining patterns in and barriers to use of monitoring technologies in older adults, particularly given that they represent a rapidly growing segment of the population likely to benefit from activity monitoring technologies 34 .
In this study we directly compare activity monitoring use patterns between device types while accounting for differences in users' behavioral, psychological, and demographic profiles. Findings suggest that smartphones may be a better option when prioritizing the scalability of activity monitoring interventions, given that smartphone users provided at least as many days of step data as wearable users across all four behavioral phenotypes. However, granted that wearables did not underperform smartphones among all subgroups of individuals, trade-offs between device types should be considered within the context of the goals and target population of a specific program. For instance, while most people already own a smartphone 37 , which reduces program costs and barriers related to users forgetting to carry or charge an extra device 38 , not everyone carries their phone on them throughout the entire day, so some activity may not be recorded. Additionally, wearables can track biometric and sleep data that smartphones cannot 29 . When feasible, we recommend conducting qualitative assessments of user characteristics as well as preferences and perceived barriers to adoption and use to better tailor remotemonitoring interventions to the given population.

Limitations
This study is limited in that it is a secondary analysis of a randomized control trial, which was not designed to detect differences across two study arms and four subgroups of participants. Thus, the analyses in the current study may be underpowered. Additionally, our sample consists of patients within one health system and who had recently been discharged from the hospital, which may limit the generalizability of our findings. Nevertheless, the divergence in activity monitoring patterns we identify within subgroups of this sample points to the utility of addressing individual variation in traits related to health behaviors and technology use.

Conclusion
To our knowledge, our study is the first to investigate the association between socio-behavioral profiles and sustained physical activity tracking, notably comparing monitoring patterns across multiple device types. We demonstrate the importance of accounting for individual differences in the implementation and evaluation of activity monitoring programs. Four "behavioral phenotypes" of participants differentiated by personality traits, behavioral tendencies, and social resources showed distinct patterns in the sustained duration and consistency of remote activity monitoring, particularly among individuals randomized to use wearable devices. We find that "at-risk" phenotypes characterized by tendencies toward negative affect and lower levels of baseline physical activity and social support were more likely to discontinue use of wearable devices. www.nature.com/scientificreports/ The differences in adherence to wearable-versus smartphone-mediated activity tracking we identify across behavioral subgroups point to the presence of distinct barriers to activity tracking experienced by different populations. Future research should aim to establish socio-behavioral profiles in larger populations and characterize the unique barriers associated with them-particularly among potentially at-risk profiles as described here-to inform the strategic design of remote-monitoring technologies and the health-promoting interventions reliant on them.

Methods
Study design. This is a secondary analysis of a randomized clinical trial (ClinicalTrials.gov identifier: NCT02983812). The design and protocol of the trial have been previously published 29 . Patients were approached in-hospital between January 23, 2017 to January 7, 2019 and were eligible for participation in the trial if they were above the age of 18, could ambulate, had a smartphone compatible with the Withings HealthMate application, and planned to be discharged to home.
Prior to hospital discharge, patients were randomly assigned to track their physical activity for 6 months using a smartphone or a wrist-worn wearable device. Participants in the smartphone arm tracked their physical activity using the Withings HealthMate application, connected via any compatible smartphone device. Participants in the wearable arm used a Withings Steel device provided by the study team with a battery lasting approximately 8 months.
All participants received $50 to enroll and $50 upon trial completion. To level incentives across study arms, participants assigned to use a smartphone alone were also given the wearable device after completing the trial. For each participant, the first day of the 6-month study began day one after they were discharged. This study was approved by the University of Pennsylvania Institutional Review Board and participants provided written informed consent to participate in the clinical trial. All methods were performed in accordance with the relevant guidelines and regulations.
Using the Withings HealthMate application, physical activity data were transmitted from the devices to Way to Health 39 , a research technology platform used in prior work for activity interventions involving remote monitoring 8,9,[40][41][42][43][44] . If data had not been transmitted for four consecutive days, patients were sent a notification to synchronize their device via their selected communication preference (text message, email, or telephone voice recording). A day of data transmission was defined as a day in which more than zero steps were reported.

Latent class analysis variable selection.
During enrollment, participants were asked to complete a sociodemographic survey and series of validated instruments to evaluate physical activity level 45 , personality 46 , risk-taking preferences 47 , and social support 48 . Data on patient credit scores (VantageScore V3) were obtained from Experian within 6 months of hospital discharge.
Variables were selected for inclusion in LCA model construction based on established associations with physical activity intervention responsiveness and success of remote health monitoring 21,[23][24][25][26][27][28] . Indicators with insufficient variability were excluded given that they are unlikely to aid in identifying subgroups. All variables were converted into categorical variables in order to be included in the LCA, which requires discrete input 49 .
Latent class indicators across the following domains were included: • Demographics, including age (coded as 18-34 years, 35-49 years, and > 50 years) and sex.
• Baseline physical activity, which was assessed using the International Physical Activity Questionnaire and scored as low, moderate, and high levels 50 . • Risk-taking preferences were assessed based on patients' self-reported likelihood to engage in risky behaviors related to health/safety and social situations, measured using the DOSPERT survey. The DOSPERT uses a 7-point Likert scale and was converted into low (1-2.9), medium (3-4.9), and high (5-7) levels. • Social support was measured using the overall score on the Medical Outcomes Study (MOS) Social Support survey, computed as the average of subscores assessing emotional/informational support, tangible support, affectionate support, and positive social interactions. • Personality was assessed using the Big Five traits of extroversion, agreeableness, conscientiousness, neuroticism, and openness. The MOS and Big Five surveys both use 5-point Likert scales, and were converted to low (1-2.9), medium (3-3.9), and high (4-5) levels of each trait, as done in previous work 23 .
Statistical analyses. LCA is a statistical method used to identify distinct subgroups within a population based on patterns discerned among at least two observed dependent variables ('latent class indicators') 51 . Given a set of latent class indicators, the objective of LCA is to determine the optimal number of subgroups, or 'latent classes' , to divide a population into such that latent classes are sufficiently distinct, and individuals can be categorized into their most likely class with high accuracy. LCA was selected because it has demonstrated superior performance to other common classification techniques such as multidimensional scaling and cluster analysis in its reliability and accuracy, ability to objectively evaluate model fit, and balance of parsimony and complexity in its output 52 . This approach has previously been used to identify subgroups that differ meaningfully in response to a behavioral intervention to increase physical activity 23 and in adherence to therapeutic interventions 53,54 .
The LCA was performed in Mplus (Version 8.2), a software package commonly used for LCA 55 . To identify the number of latent classes that yielded optimal model fit, we fit a series of latent class models beginning with the most parsimonious 2-class model and iteratively increasing the number of subgroup divisions up to five. Model fit was evaluated holistically based on quantitative measures of model fit as well as qualitative assessments of model interpretability 49 . Statistical indices of model fit considered include Akaike information criterion and Scientific Reports | (2021) 11:21501 | https://doi.org/10.1038/s41598-021-01021-y www.nature.com/scientificreports/ Bayesian information criterion values, which are measures of prediction error, and entropy, which is a measure of classification accuracy, with higher values reflecting increased class distinctiveness 56,57 . We used the Vuong-Lo-Mendell-Rubin likelihood ratio test (LRT) to evaluate if adding another class statistically significantly improved model fit 57,58 . The distribution of patients throughout latent classes was also considered to maximize statistical power and model interpretability.
After determining the best fit model, we used descriptive statistics to evaluate differences in baseline and sociodemographic variables between latent classes in R (Version 3.5.1; R Foundation for Statistical Computing). Next, we characterized the key factors driving class distinctions based on group differences described in Table 1. This selection of driving factors was generally supported by an assessment of the characteristics that were either over-or underrepresented in each group. These characteristics were identified by examining the distribution of patients in each variable level (e.g., low, medium, or high degree of openness) in each class relative to the distribution in the overall sample (Supplementary Table 2). To do so, probability weights reflecting the estimated proportion of each class that fell into each level category were generated in Mplus.
To examine differences in duration of data activity monitoring between classes and study arms, we first generated survival curves using Kaplan-Meier estimates, plotting the proportion of patients providing data over the 180 days after discharge, censoring on patient death. The duration of data transmission was estimated using the last day a step value was received. Using log rank tests, we examined the unadjusted differences between study arms in each latent class, as well as between latent classes in each study arm. Subsequently, Cox proportional hazard models were fit and adjusted for age, gender, race/ethnicity, insurance, education, marital status, annual household income, body mass index, and Charlson Comorbidity Index score.
To evaluate differences in the consistency of data transmission, we compared the proportion of days of data transmission using one-way ANOVA tests. This is in line with previous research that has defined consistency of activity tracking as the percentage of days tracked relative to the number of days during a trial 59 . As a sensitivity analysis, we repeated all analyses defining a day of data transmission as a day with over 1000 steps reported, since values less than 1,000 are unlikely to capture actual activity throughout a whole day, indicating a degree of data missingness 60,61 . Investigators and analysts were blinded to group assignment.

Data availability
Ms. Fendrich had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.