Smartphone-recorded physical activity for estimating cardiorespiratory fitness

While cardiorespiratory fitness is strongly associated with mortality and diverse outcomes, routine measurement is limited. We used smartphone-derived physical activity data to estimate fitness among 50 older adults. We recruited iPhone owners undergoing cardiac stress testing and collected recent iPhone physical activity data. Cardiorespiratory fitness was measured as peak metabolic equivalents of task (METs) achieved on cardiac stress test. We then estimated peak METs using multivariable regression models incorporating iPhone physical activity data, and validated with bootstrapping. Individual smartphone variables most significantly correlated with peak METs (p-values both < 0.001) included daily peak gait speed averaged over the preceding 30 days (r = 0.63) and root mean square of the successive differences of daily distance averaged over 365 days (r = 0.57). The best-performing multivariable regression model included the latter variable, as well as age and body mass index. This model explained 68% of variability in observed METs (95% CI 46%, 81%), and estimated peak METs with a bootstrapped mean absolute error of 1.28 METs (95% CI 0.98, 1.60). Our model using smartphone physical activity estimated cardiorespiratory fitness with high performance. Our results suggest larger, independent samples might yield estimates accurate and precise for risk stratification and disease prognostication.


Materials and methods
Patients undergoing treadmill testing in the Beth Israel Deaconess Cardiovascular Stress Testing Laboratory between September 2017 and June 2018 were recruited in-person at the laboratory or by telephone after leaving the laboratory. As such, participants included individuals undergoing diagnostic workup or risk stratification for coronary heart disease or other heart diseases. We restricted to owners of iPhone models 5S and above or Apple watches because we observed these models to automatically and continuously track physical activity within the Apple Health application (Apple Inc., Cupertino, CA). This allowed for retrospective data collection. Participants were included if they provided consent and were excluded if they did not complete a stress test. Physical activity data exported from the Apple Health application is formatted per episode of activity. Each episode of activity is an observation with a recorded start time, stop time, number of steps, and distance in miles. These parameters are all measured using Apple's proprietary algorithms with input from onboard sensors such as accelerometers, gyroscopes, barometers, and GPS. Activity episodes are recorded with a duration between 1 and 600 s, and a new observation is created if the activity extends beyond 600 s. When cleaning data we also excluded participants whose data was recorded by a non-Apple device, and those missing data two weeks immediately preceding the stress test ( Supplementary Fig. S1). The Apple Health application has an export function, and we used this to securely e-mail physical activity data to study servers which were owned and maintained by the Beth Israel Deaconess Medical Center, Boston, Massachusetts. Informed consent was obtained from all subjects. The study protocol was approved by the Institutional Review Board at Beth Israel Deaconess Medical Center in accordance with relevant guidelines and regulations.
Age, self-reported race and sex, height, weight, and resting blood pressure and heart rate were collected from the electronic medical record [42][43][44][45] . Blood pressure and heart rate were measured at rest before the stress test using an automated oscillometer. From the downloaded physical activity data, we used the time, steps, and distance for each activity episode to calculate velocity in both steps/second (i.e., cadence) and meters/second, observed active time, and stride. Then, using pre-specified intervals prior to treadmill testing (1,7,30,90,180, and 365 days), we computed the sum of observed active time, sum of steps, sum of distance, peak velocity in steps/second and meters/second, and average stride length. RMSSD, a measure of variability, was calculated for each of these measures respectively by taking the daily sum, peak, or average, finding the difference between successive days, squaring the differences, and then taking the square root of the average. For those with fewer than 365 days of observation, we imputed missing data by carrying existing observations backward in time. In some activity episodes, steps were missing, but distance was present or vice versa. We imputed missing steps or distance by using whichever variable was available and the mean stride length of that individual for that day. Observations exceeding maximal physiologic ranges, such as velocity ≥ 10.44 m/second or stride length ≥ 2.47 m, were excluded 46 .
Cardiorespiratory fitness as peak metabolic equivalents of task (METs) was estimated by maximal treadmill stress testing using the extensively validated Bruce protocol 47 or a modified protocol. Modified protocols with lower intensity stages at the beginning were selected for frail or deconditioned individuals based upon a pre-test query of daily activities to allow for titration of heart rate over approximately 10 min.
Next, taking the above anthropometric and computed physical activity variables, we calculated univariable Pearson correlations with peak METs and excluded those with a p-value > 0.05. Then we computed covariance between remaining variables. For each remaining variable we gathered other variables with covariance > 0.7 and selected the one with the highest univariable Pearson correlation with peak METs. This variable was included in a pool of candidates for the multivariable regression model. Using the pool of candidate variables, we built a multivariable regression model to estimate peak METs using bidirectional stepwise selection maximizing adjusted R-squared. We validated model performance (as estimated by MAE) using bootstrapping with 10,000 samples and performed sensitivity analysis with tenfold cross-validation. Statistical analysis was performed using RStudio version 1.3.1093 (RStudio, Boston, MA).

Results
Baseline characteristics. Baseline characteristics of the study population are displayed in Table 1. Of 50 participants, median age was 67 (inter-quartile limits 55, 71) years and 19 (38%) were female. Data cleaning yielded 1.1 million unique activity episodes.
Univariable analysis. In univariable analysis, smartphone variables most significantly correlated with peak METs (p-values all < 0.001; covariance < 0.7) included daily peak gait speed in meters/second averaged over 30 days (r = 0.63), RMSSD of daily distance in miles averaged over 365 days (r = 0.57), and stride length averaged over 90 days (r = 0.43). Supplemental Fig. S2 shows the relationship of these variables with peak METs averaged over pre-specified intervals from 1 to 365 days. In general, peak gait speeds and stride length measured closer to stress test date seemed to correlate more closely with peak METs, while the relationship of RMSSD of daily distance with peak METs was stronger with more observation time. Supplementary Table S1 lists Pearson correlations and p-values for all variables with p-value < 0.05. Figure 1 displays the best-performing multivariable regression model. This model explained 68% of variability in observed METs (95% confidence limits 46%, 81%), and included age, body mass index, and RMSSD of daily distance in miles averaged over 365 days (equation parameters reported in Supplemental Table S2 and Bland-Altman plot included in Supplemental Fig. S3

Discussion
A model using smartphone data estimated cardiorespiratory fitness with high performance in a health care setting. Our study is significant in that it demonstrates the feasibility of estimating cardiorespiratory fitness using smartphones. Such cardiorespiratory fitness predictions have potential utility in estimating mortality, identifying those at risk for diseases, estimating prognosis of diseases, and tracking the progress of interventions.  www.nature.com/scientificreports/ Previous attempts to estimate fitness using self-reported physical activity achieved varying degrees of success. These estimates have relied upon self-reported questionnaire data subject to recall bias [25][26][27][28][29][30] , time-consuming walking tests [31][32][33][34] , or specialized accelerometry or fitness tracking equipment [35][36][37][38][39][40] . In contrast, we present an objective, point-of-care fitness estimation relying upon nearly ubiquitous smartphone with exceptional performance. Utilizing smartphone physical activity data has multiple advantages: (1) large numbers of measurements are collected passively, permitting retrospective averaging that strengthens associations (2) these data are at least as accessible as basic clinic variables (e.g. height, weight, and blood pressure), (3) the data are widely available (85% of US adults own a smartphone, and physical activity trackers are easily activated on Android devices) 48 , and (4) they appear to perform well when compared to a gold-standard clinical measurement and compared to previous estimates. Based upon day-to-day physical activity, Kwon et al. built a model predicting cardiorespiratory fitness with an R 2 of 0.66 using activity data from Fitbit activity trackers 38 . Bonomi et al. attained an R 2 of approximately 0.77 using both a heart rate monitor and activity tracker 37 . However, all these models relied upon specialized fitness tracking equipment and minute-to-minute heart rate measurements. In contrast, we achieved an R 2 of 0.68 using smartphone physical activity data. Interestingly, Altini et al. used the HRV4Training smartphone app, in addition to a heart rate monitor, to predict VO 2 max, but achieved only an R 2 of 0.64 49 .
Our observation that the variability of daily distance was most predictive of fitness was surprising, as we had expected that peak gait speed would be the best predictor. In fact, the variability of daily distance was not only strongly correlated with peak gait speed but also superior in predictive power when combined with age and body mass index. We hypothesize the fitness-predicting value of the variability of daily distance lies in its ability to capture an individual's reserve capacity to increase walking distance upon demand. Those with low fitness may be unable to drastically vary their travelling distance, whereas more fit individuals exercise a flexible option to travel further when needed.
Some have questioned the accuracy of fitness trackers, especially smartphones, in measuring step data [50][51][52] . At least one study has demonstrated modest accuracy of step counting using iPhone fitness data 53 . The same study indicated dedicated fitness tracking devices, such as the Fitbit activity tracker, may be more accurate than smartphones for counting steps. In this regard, our distinctive focus on fitness, rather than on simply estimating activity, is crucial, as a smartphone need not be carried at all times. For some variables, such as peak gait speed, data collection closer to stress test date seemed to correlate more closely with peak METs. For other variables, such as RMSSD of daily distance, association with peak METs seemed stronger with more observation time, suggesting that averaging more data may reduce smartphone measurement error. It is also possible that the accuracy of smartphone data is dependent upon phone-carrying location (e.g. hip versus purse), yet our data show powerful estimation of fitness even without knowing phone-carrying location and in both sexes.
Our study is limited by a small sample of patients at risk for heart disease and restriction to a single, albeit well-known, manufacturer (Apple Inc.). Also, since physical activity was assessed exclusively through an iPhone, our study was unable to include physical activity when not carrying an iPhone. Furthermore, we used a stress treadmill estimation of fitness (METs) rather than VO 2 max. However, cardiopulmonary exercise treadmill test measurement of VO 2 max is rarely performed. Nonetheless, these promising results demonstrate that the incremental predictive utility of smartphone data, combined with its ready accessibility, open new exciting opportunities for clinical and research estimation of cardiorespiratory fitness. Based upon the feasibility demonstrated in this study, it seems likely that larger, independent studies can improve sufficiently on our algorithm to yield truly useful estimates of fitness for clinical and epidemiological purposes.

Data availability
Deidentified datasets generated during and/or analyzed during the current study are available from the corresponding author with completion of an approved data distribution agreement from the sponsoring institution.