Introduction

Malignant ventricular arrhythmias are a main cause of sudden cardiac death (SCD)1. In individuals at increased risk of SCD, patterns in physical behaviour (such as physical activity levels, sedentary behaviours, sleep behaviour) have emerged as potential prognostic indicators for ventricular arrhythmia onset, heart failure progression, and patient-reported outcomes2,3,4,5. Wearable accelerometers provide a means for continuous measurement of these day-to-day physical behaviours in free-living environments6. Identifying patterns or clusters within behavioural time-series data requires dimensionality reduction, as traditional clustering algorithms are unable to effectively process the granularity of such complex datasets. The process of dimensionality reduction, for instance reducing data to summary measures, may lead to the loss of intricate, non-linear associations in the data. Alternatively, deep neural networks are capable of learning low-dimensional latent representations from these complex datasets, while preserving the richness and intrinsic information present in the data7,8. Unsupervised machine learning algorithms can then operate on these latent space representations to categorise similar samples into one cluster9,10.

In this study, we aimed to identify and characterise behavioural profiles through deep representation learning in patients at risk of malignant ventricular arrhythmias (Fig. 1). Patients with an implantable cardioverter-defibrillator (ICD) were followed for six consecutive months using a wearable accelerometer to continuously monitor physical behaviour. A deep neural network was trained to learn a compressed representation from the behavioural time-series data while preserving the relevant information. We hypothesised that through the clustering of these deep behavioural representations, we would be able to identify clinically meaningful behavioural profiles. These profiles were evaluated for their clinical relevance and association with the risk of ventricular arrhythmia.

Fig. 1: Workflow of the study.
figure 1

The workflow of the study is illustrated, that includes recruitment, data collection through a wearable device and data processing to identify behavioural profiles. a Recruitment of 303 patients from two international sites, who wore a wearable accelerometer for 180 consecutive days, during which behavioural metrics were recorded. b A convolutional residual variational autoencoder learned the latent behavioural representations. The models was provided with time-series data for each subject that consisted of 27 variables measured at 180 timepoints. c Unsupervised clustering using a k-means algorithm of the behavioural representations identified distinct behavioural profiles.

Results

A total of 303 participants were enroled in SafeHeart, of which 272 met the eligibility criteria for this study (21 patients did not wear the GENEActiv wearable accelerometer, 10 patients did not meet the required minimum of 30 days of behavioural data). A total of 37,478 days of wearable data were collected (mean 138 ± 47 days per patient). Table 1 shows the clinical characteristics of the patient cohort. Patients had a mean age of 63.1 ± 10.4 years and 80.9% were male, 133 (48.9%) patients had ischaemic heart disease as cause of heart failure, 147 (54.0%) had heart failure with reduced ejection fraction (HFrEF), and 187 (68.8%) had a secondary prevention ICD indication. Fifty (18.4%) patients received cardiac resynchronisation therapy (CRT), the majority of patients used a β-blocker (80.5%). All patients completed one-year of follow-up, during which 46 (16.9%) patients received appropriate ICD therapy for a malignant ventricular arrhythmia, five (1.8%) patients received inappropriate ICD therapy and four (1.4%) patients died.

Table 1 Baseline characteristics in the total patient cohort and across behavioural profiles

Characteristics of the identified behavioural profiles

Five behavioural profiles were identified: A (n = 46), B (n = 70), C (n = 63), D (n = 51) and E (n = 42). Mean values for behavioural metrics across the clusters are displayed in Table 2, Supplementary Fig. 4 provides a granular representation of behaviours during the 180-day monitoring period. Figure 2 shows the individual behavioural metrics for each profile, relative to the cohort averages. In summary, Clusters B and C were characterised by active profiles with high daily steps counts (15353 ± 4062 & 13577 ± 4130 steps). The volume of activity (2921 ± 1410 gs) in Cluster B was accumulated over a longer period (370 ± 76 min) and at a lower average intensity (123 ± 17 mg), while the volume of activity (2527 ± 810 gs) in Cluster C was linked to a greater number of faster walking steps (3689 ± 2100 steps) at higher cadences (92 ± 10 steps/min) and intensities (132 ± 22 gs). Clusters D and E had less active profiles with fewer daily steps (9971 ± 3103 & 9291 ± 3378 steps). Cluster D had the longest inactive bout durations and fewest number per day (0.72 ± 0.25 mins and 632 ± 110). The behavioural patterns of Cluster E were more fragmented with shorter active bouts (0.43 ± 0.12 min) and more inactive bouts (863 ± 135). Cluster A had the highest inactive duration (835 ± 110 min), the lowest activity intensity (103 ± 33 mg) and least number of steps (6246 ± 2406 steps). In the sleep domain, Cluster A was characterised by the shortest total sleep duration (281 ± 86 min), the longest average duration of wake after sleep onset interruptions (6.8 ± 2.1 min) and the longest sleep onset latency (10.8 ± 5.1 min). The nocturnal patterns of Cluster E were fragmented, similar to the day, with the lowest sleep efficiency (51.5 ± 8.6%), most wake after sleep onset interruptions (32 ± 8) and shortest maximum sleep bout lengths (42 ± 10 min). Cluster D had the long sleep interval and total sleep durations (611 ± 122 and 369 ± 75 min) with longest maximum sleep bout lengths (57 ± 11 min). The sleep profiles of Clusters B and C were unexceptional other than Cluster B having the shortest sleep interval duration (507 ± 89 min).

Table 2 Mean values of each behavioural metric collected over a 180-day period in the total patient cohort and across behavioural profiles
Fig. 2: Average values for physical behaviour measurements across the behavioural profiles.
figure 2

The average values for the behavioural measurements for each behavioural profile are displayed. a The bar charts displayed on the left depict the average values for the metrics that reflect movement behaviour across behavioural profiles, relative to the cohort average. b Bar charts depicted on the right display the average values for the metrics that reflect sleep behaviours across the behavioural profiles. All values were scaled using z-scores.

Cluster characterisation

SHAP values were computed to represent the most important behavioural markers that characterised each behavioural profile. Figure 3a shows the variables with highest feature importance across behavioural profiles. The duration spent in moderate activity, the amount of slow steps and the number of sleep events were the behavioural markers that differentiated most between clusters. Figure 3b–f illustrates the top behavioural features that predict membership of each of the clusters. Clusters C and E were predicted by a combination of sleep behaviours and movement behaviours, while the other clusters were predominantly predicted by movement behaviours alone. No statistically significant differences were observed between clusters in terms of medication usage and medical history, apart from hypertension (p = 0.037) (Table 1).

Fig. 3: Characterisation of the behavioural profiles through Shapley values obtained from a trained machine learning model.
figure 3

A trained machine learning classifier (extreme gradient boosting) was used to predict membership of a profile based on daily behavioural measurements. a The bar chart represents the importance of features used by the extreme gradient boosting model to predict each profile. Horizontal bars represent the average contribution of a behavioural metric for the predicted profile. The features are ranked based on the summed importance of that feature to predict each profile. bf The SHAP summary plot is displayed for each behavioural profile. The features are ranked by the mean absolute SHAP value. A positive SHAP value suggests a positive contribution, while a negative value indicates a negative contribution. The model predicted membership of the behavioural profile based on the daily measurements with an AUROC of 0.99.

Patient-reported outcomes across behavioural profiles

A total of 239 patients filled out questionnaires at the study baseline (non-response rates for subsequent clusters A-E were 11.4%, 15.6%, 12.7%, 9.5%, and 15.2%, respectively). Median scores for the EQ-5D-5L and KCCQ domains are provided in Supplementary Table 2. In particular, Cluster A reported physical limitations, Cluster C reported high self-efficacy but worse social limitations, Cluster D highest disease-specific quality of life, and Cluster E reported highest burden of symptoms (Supplementary Fig. 5). Differences in patient-reported outcomes between clusters were not statistically significant.

Incidence of the outcomes of interest across behavioural profile

Figure 4a shows the risk of malignant ventricular arrhythmias treated by the ICD across the clusters during one-year follow-up. Event rates for clusters A until E were respectively 30.4%, 17.1%, 17.5%, 9.8% and 9.5% (log-rank p value 0.06). As displayed in Fig. 4b, the risk of malignant ventricular arrhythmias was significantly higher in Cluster A (unadjusted HR 2.26, 95% CI 1.20–4.23, p = 0.01), which remained after adjusting for clinical covariates (adjusted HR 2.30, 95% CI 1.21–4.36, p = 0.01). Also, the risk of malignant ventricular arrhythmias in the low-risk behavioural profiles (Cluster D-E) was significantly lower compared to the other clusters (unadjusted HR 0.45, 95% CI 0.22–0.94, p = 0.03). Inappropriate ICD therapy was delivered in three patients in Cluster A (4.3%), two patients in Cluster B (2.9%), and one patient in Cluster C (1.6%). In total four patients died during follow-up, of which two in Cluster D (3.9%), one in Cluster A (2.2%) and one in Cluster C (1.6%). A significant difference in the composite endpoint between clusters was observed (log-rank p value 0.04) (Fig. 4c). Unadjusted and adjusted hazard ratios for the respective clusters for the composite endpoint are displayed in Table 3. In Supplementary Fig. 6, ROC curves of logistic regression models predicting cases of malignant ventricular arrhythmias and the composite endpoint are presented. Regression models that included cluster membership within their feature set demonstrated superior performance, compared to models that excluded this variable.

Fig. 4: Comparative results of the outcomes of interest for patients stratified by the behavioural profiles.
figure 4

Time-to-event analyses according to the behavioural profiles are presented. a Kaplan-Meier curves for malignant ventricular arrhythmias treated by the ICD, and b hazard ratios and 95% confidence intervals obtained from the Cox proportional-hazards model. c Kaplan-Meier curves for the composite endpoint of all ICD therapy and mortality, and d hazard ratios and 95% confidence intervals. The prevalence of the outcome is displayed as a percentage, represented by the blue and green circles. Distributions of times to events were compared with the log-rank test.

Table 3 Associations of the five behavioural profiles with malignant ventricular arrhythmias treated by the ICD, and the composite of all ICD therapies and mortality

Discussion

In this study, we demonstrated deep representation learning of complex day-to-day movement and sleep behaviours to enable the identification of clinically relevant behavioural profiles. These profiles were associated with an annual risk of malignant ventricular arrhythmias ranging from 30.4% to 9.5%. Our research extends prior work, bringing forth two novelties. First, while prior studies have evaluated physical behavioural metrics over monitoring intervals up to 14 days3, we identified distinct behavioural profiles derived from continuous accelerometer measurements spanning six months. Second, earlier studies have mainly focused on individual metrics for activity or sleep, despite these 24-hour rest-activity behaviours being highly interrelated. In the present work, we took a more holistic approach to physical behaviour by modelling the interplay between various concurrent behavioural mechanisms and their potential implications for clinical events.

Despite considerable variations in clinical trajectories among patients with an ICD, current follow-up strategies remain one-size-fits-all. Advances in wearable technologies have removed barriers for the continuous measurement of behavioural patterns, which could make wearables suitable as screening tools to identify individuals at-risk of disease progression. Several studies have shown that continuous activity measurements could indicate a decline in functional status, progression of heart failure, or onset of atrial fibrillation, each potentially increasing the risk of ventricular arrhythmia onset3,4,11,12. However, physical activity is also a modifiable risk factors that may reduce ventricular arrhythmia risk by alteration of autonomic tone, mitigation of the catecholamine release observed during exercise and an increase of resting parasympathetic tone13. Recent analyses of data from the UK Biobank have demonstrated a reduction in the risk of ventricular arrhythmia amongst physically active individuals14,15. With data from this prospective study, we demonstrated that an active behavioural profile does not necessarily reduce the risk of ventricular arrhythmia, which highlights the importance of considering various behaviours simultaneously. In particular, Clusters B and C had annual event rates of ~17% despite their daily time spent physically active being substantially higher compared to the other profiles. In contrast, Clusters D and E were half as likely to experience the outcome of interest, despite having less active profiles. This indeed suggests that interplays between various behaviours, such as intermittent sedentary behaviour with isolated bouts of physical exertion, rather than isolated measurements of activity characteristics, may explain differences in risk of ventricular arrhythmia onset. Furthermore, the absence of significant associations between patient-reported outcomes (e.g. symptom severity and physical limitations) and behavioural profiles might indicate that these are phenotypic in their origin rather than representing more transient behavioural patterns.

Our findings support the notion that abnormalities in 24-hour rest-activity patterns modulate the risk of ventricular arrhythmia onset. Circadian rhythm disruption has been associated with increased risk of atrial fibrillation onset16 and heart failure17 in previous studies. We observed an annual risk of ventricular arrhythmia exceeding 30% in the behavioural profile characterised by sedentary behaviour, a lack of high-intensity activity, and disturbed sleep behaviour. Adjusted for clinical covariates, this profile was associated with a three-to-four fold risk of experiencing a ventricular arrhythmia compared to the low-risk profiles. While these findings should be validated in larger cohorts, they emphasise the importance of comprehensive modelling of physical behaviour. The use of wearable devices for behavioural profiling holds promise for follow-up strategies tailored to an individual patient.

Clustering of deep learning-derived latent representations comes with the limitation of interpretability, as the latent representations are inferred from the underlying data and are not directly explainable (black box). To provide transparency, we characterised clusters by assessment of feature importance of a trained machine learning classifier that predicts cluster memberships based on day-to-day behavioural metrics18. A second limitation to our study is the use of processed output from the accelerometer, instead of the underlying raw accelerometry output. Some of these metrics are created through application of specific thresholds that rely on the calibration studies, but pose a challenge when comparing metrics among different studies or populations19. Third, from our findings, it remains uncertain whether the behavioural profiles can be generalised to other populations, such as heart failure patients who do not satisfy the criteria for an ICD, and thus warrant future research. Fourth, despite cluster membership showing significant associations with the outcome of interest after adjusting for clinical covariates, there is a risk of residual confounding. For instance, high-risk behavioural profile (Cluster A) was characterised by higher proportion of patients with a prior myocardial infarctions; however, these patients did not receive prescriptions for β-blockers, lipid-lowering drugs, or ACE inhibitors. This could point towards a potential undertreatment of these patients. This study was entirely decentralised in its design, with patient recruitment, informed consent, and study procedures conducted without physical contact between the study staff and participants. Consequently, information from imaging modalities (e.g., LVEF) and electrocardiography at the time of enrolment was not available. Future studies exploring the interplay between these clinical patient characteristics, and behavioural profiles are warranted.

Deep representation learning of physical behavioural patterns identifies distinct behavioural profiles with significant differences in their risk of malignant ventricular arrhythmia and death. Behavioural profiling using objective and real-time measurements obtained from wearable devices may enable clinicians to adjust and optimise treatment and prevention strategies to an individual patient. Interpretability of clustered latent representations and relatively small sample sizes prompt the need for further investigation into the mechanisms underlying their influence on ventricular arrhythmia risk and SCD.

Methods

Ethics

The study was approved by the Institutional Review Boards of the Amsterdam University Medical Center (date 09-04-2021, approval number 2020/248) and Copenhagen University Hospital Rigshospitalet (date 19-04-2021, approval number H-20081068). All participants provided informed consent prior to their enrolment. The study was conducted in accordance with the Declaration of Helsinki.

Study design and setting

This is an analysis of the international SafeHeart study, a prospective, observational study conducted at two tertiary academic centers in Europe (Amsterdam University Medical Center, the Netherlands and Copenhagen University Hospital Rigshospitalet, Denmark). The purpose of this study was to develop a personalised model to predict ICD therapy for malignant ventricular arrhythmia20. Data used to create the prediction model included recordings from a wearable accelerometry recording device. Patient inclusion was conducted through telephone-based procedures between May 2021 and September 2022, the enrolment date was defined as the day when the wearable device was delivered to the patient. Throughout the study, participants had the option to withdraw from the study at any stage, either partially (by discontinuing the use of the wearable device) or completely. We adhered to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for observational studies21. The study was registered at the National Trial Registration in the Netherlands (Trial NL9218; https://www.onderzoekmetmensen.nl/en).

Participants

Participants qualified for enrolment in the SafeHeart study if they fulfilled the following conditions: i) received an ICD with or without cardiac resynchronisation therapy (CRT-D) in the five years leading up to enrolment, ii) experienced appropriate or inappropriate ICD therapy (high voltage shock therapy or anti-tachycardia pacing (ATP)) or demonstrated evidence of ventricular arrhythmias within eight years prior to enrolment, iii) engaged in a remote ICD monitoring programme, and iv) were at least 18 years old. Exclusion criteria were severe physical disability, end-stage heart failure, and a life expectancy of less than one year. The study protocol with the entire list of inclusion and exclusion criteria has been published previously20.

Physical behaviour measurements

Accelerometer-based wearable devices allow for continuous and objective quantification of daily physical behaviour by the recording of body movement along reference axes and signal analysis (e.g., intensity, frequency, and volume of activity and postural changes). In this study, various behavioural metrics were collected including daily activity and inactivity durations, duration of activity and inactivity episodes, activity intensity, activity volume, step count (total, slow and fast), cadence, sleep duration, sleep efficiency, wake up after sleep onset (WASO), nap duration and sleep onset latency. A complete overview of the collected metrics and their definitions are displayed in Supplementary Table 1. To collect these metrics, participants wore the GENEActiv Original 1.1 accelerometer (Activinsights Ltd, Cambridgeshire, United Kingdom) on the wrist for 6 months. Devices were returned (for data extraction) and replaced biweekly or every 4 weeks. Continuous raw data were recorded at 50 Hz or 20 Hz and converted into daily summaries22,23. Patients were eligible for this study if they had at least 30 days of wearable data.

Outcome of interest

The prospective collection of the outcomes of interest occurred at both sites from enrolment in the study onwards. These outcomes were: i) any malignant ventricular arrhythmia defined as an episode of sustained ventricular tachycardia or ventricular fibrillation, treated by the ICD through a shock and/or ATP; ii) a composite endpoint comprising all ICD therapies and death. ICD therapies encompassed those for malignant ventricular arrhythmias, in addition to those in response to rhythms other than sustained ventricular tachycardia or ventricular fibrillation (e.g. atrial fibrillation, sinus tachycardia).

Patient reported outcome measures

Two patient reported outcome measures (PROMs), the EuroQoL 5-Dimensions 5-Levels (EQ-5D-5L) and Kansas City Cardiomyopathy Questionnaire (KCCQ), were used in the SafeHeart study24,25. Both PROMs were filled out by participants at study enrolment. The EQ-5D-5L assesses health across five domains, yielding a utility score ranging from −0.590 to 1.000. Meanwhile, the KCCQ, designed for heart failure patients, provided scores on a scale of 0 to 100, subdivided into the domains symptom burden, physical limitation, social limitation, quality of life and self-efficacy.

Deep representation learning of physical behaviour data

We derived deep representations from the day-to-day behavioural time-series collected during the first six months of the study (Fig. 1a). Specifically, we used a β-variational autoencoder (VAE) that encodes input data through a probabilistic approach (mapping data into a probability distribution) and decodes from this distribution back into reconstructed data (Fig. 1b)26,27. Supplementary Fig. 1 presents a schematic overview of the VAE architecture. The inputs were longitudinal trajectories of 27 behavioural metrics over 180 days, resulting in an input dimension of 272 × 27 × 180. Missing values of behavioural metrics were linearly interpolated and normalised. Our trained VAE reconstructed the behavioural time-series with a Pearson Correlation Coefficient of 0.988 ± 0.0379, a root mean square error (RMSE) of 0.031 ± 0.026 and a percentage root-mean-square difference (PRD) of 10.553 ± 0.038. Supplementary Fig. 2 depicts an example of the trends in behavioural measurements along with the reconstructed trend derived from 32 latent variables. The VAE models were developed using PyTorch (version 2.0.5) in Python (version 3.6.7).

We then applied an unsupervised machine learning algorithm to cluster these representations (Fig. 1c). The k-means algorithm aims to minimise the within-cluster variance, making data points within the same cluster as similar as possible and data points in different clusters as dissimilar as possible. The appropriate number of clusters was assessed using within-cluster variation (inertia), silhouette scores, and Davies-Bouldin index (Supplementary Fig. 3). Considering that the k-means algorithm operates stochastically, and initialisation of the model may affect the decision of the optimal k, we averaged the results over multiple iterations to reduce the impact of randomness28. We evaluated cluster stability by computing the Jaccard index across 100 bootstrapped samples29. Clustering was performed using the scitkit-learn library (version 1.3.0)30.

Cluster characterisation through cluster membership prediction

We aimed to characterise the identified clusters using SHapley Additive exPlanations (SHAP) values31. SHAP values are widely used to determine the contribution of particular features to the predicted outcome. We derived SHAP values from a supervised machine learning classifier (eXtreme Gradient Boosting), which was trained to predict cluster membership from daily behavioural values (48,960 days)32. Subsequent ranking of these SHAP values provides insight into behavioural metrics that contribute positively (or negatively) to cluster membership. We assessed the performance of these classifications using the receiver operating characteristic curve (ROC).

Statistical analysis

Continuous variables were presented by the median, mean, interquartile range, and standard deviation. Categorical socio-demographic and clinical variables were presented as frequencies (percentages) and compared using the χ2 test. T-tests were used for pairwise comparisons, analysis of variance (ANOVA) for assessing differences among multiple groups with normally distributed data. The Mann–Whitney U test was used for non-normally distributed variables, and the Kruskal-Wallis test for comparisons involving more than two groups with non-normally distributed data. The risk of the outcomes of interest during follow-up was estimated using the Kaplan–Meier method; log-rank tests were used to compare survival between clusters. Cox Proportional Hazard models were used to assess the association between behavioural profiles and the risk of outcomes of interest. The model included the clinical covariates age, sex, indication for ICD implantation, presence of atrial fibrillation, heart failure, and type of ICD. Schoenfeld residuals were used to check the proportional hazards assumption. A two-sided p value < 0.05 was considered significant. The prognostic significance of the behavioural profiles for the outcomes of interest was assessed through logistic regression models. Two models were constructed for each outcome of interest: the first model included clinical patient information (medical history and medication status) along with cluster membership as input features, while the second model excluded cluster membership. Prediction accuracy was assessed through stratified k-fold cross-validation, and quantified using the area under the receiver operating characteristic curve (AUROC).