TIHM: An open dataset for remote healthcare monitoring in dementia

Palermo, Francesca; Chen, Yu; Capstick, Alexander; Fletcher-Loyd, Nan; Walsh, Chloe; Kouchaki, Samaneh; True, Jessica; Balazikova, Olga; Soreq, Eyal; Scott, Gregory; Rostill, Helen; Nilforooshan, Ramin; Barnaghi, Payam

doi:10.1038/s41597-023-02519-y

Download PDF

Data Descriptor
Open access
Published: 09 September 2023

TIHM: An open dataset for remote healthcare monitoring in dementia

Scientific Data volume 10, Article number: 606 (2023) Cite this article

2929 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Dementia is a progressive condition that affects cognitive and functional abilities. There is a need for reliable and continuous health monitoring of People Living with Dementia (PLWD) to improve their quality of life and support their independent living. Healthcare services often focus on addressing and treating already established health conditions that affect PLWD. Managing these conditions continuously can inform better decision-making earlier for higher-quality care management for PLWD. The Technology Integrated Health Management (TIHM) project developed a new digital platform to routinely collect longitudinal, observational, and measurement data, within the home and apply machine learning and analytical models for the detection and prediction of adverse health events affecting the well-being of PLWD. This work describes the TIHM dataset collected during the second phase (i.e., feasibility study) of the TIHM project. The data was collected from homes of 56 PLWD and associated with events and clinical observations (daily activity, physiological monitoring, and labels for health-related conditions). The study recorded an average of 50 days of data per participant, totalling 2803 days.

Self-supervised learning for human activity recognition using 700,000 person-days of wearable data

Article Open access 12 April 2024

An overview of clinical decision support systems: benefits, risks, and strategies for success

Article Open access 06 February 2020

brainlife.io: a decentralized and open-source cloud platform to support neuroscience research

Article Open access 11 April 2024

Background & Summary

Dementia is most commonly characterised by symptoms of cognitive decline, such as memory loss and problems with attention; however, up to 90% of PLWD will also experience behavioural and psychological symptoms, including sleep disturbances, agitation, and apathy¹. In addition, up to approximately 1 out of 4 unplanned hospital admissions for PLWD are due to potentially preventable causes such as severe UTI, falls, and respiratory problems. These symptoms and events affect the health and well-being of PLWD, increase the stress and anxiety of caregivers, and increase the demand on healthcare services. As such, providing timely and effective interventions is a significant challenge in dementia care and requires frequent, reliable, and privacy-aware activity and health monitoring for PLWD.

The TIHM project employs low-cost, Internet of Things (IoT) sensing technologies to enable predictive and proactive in-home healthcare monitoring. By supporting the integration of analytical solutions, the TIHM platform allows us to develop clinically applicable machine intelligence and decision-support methods for early and personalised interventional care.

Within TIHM, remote devices for collecting vital signs, and environmental and activity data were used to monitor the day-to-day well-being of PLWD^2,3. The use of these technologies can help PLWD retain their independence for longer periods of time and provide caregivers with evidence-based information that may reduce potential anxiety and depression in PLWD⁴. Furthermore, the integration of machine learning methods and in-home monitoring technologies allows for the identification of changes in cognition and physical well-being. Several studies have applied machine learning and analytical techniques to the data collected as part of the TIHM project to investigate activity and health patterns and develop methods to detect and predict conditions that affect the wellbeing of PLWD and caregivers^5,6.

A major issue with current remote monitoring systems is the heterogeneity of the underlying devices and technologies^7,8. Different devices use different data formats and proprietary interfaces and applications to present the data, making it difficult to integrate information from various sources and process them in (near-) real-time. These conflicting setups hinder the process of extracting patterns, detecting anomalies, and performing predictive analysis using integrated data from different digital sources. TIHM provides an integration of data from various sources and modalities to transform in-home monitoring applications and create intelligent decision-making support systems using routinely collected data. By applying machine learning models that are designed with partially labelled, multi-modal, noisy and dynamic data in mind, we have developed several explainable methods for detecting and predicting adverse health conditions and events^6,9.

In this paper, we present the TIHM dataset¹⁰ collected during the feasibility study phase of the project. This includes anonymised information on daily activities, sleep monitoring, clinical and physiological data, and corresponding labelled health events. The dataset collected through the TIHM project can be employed for studies that develop analytical and machine learning solutions for continuous healthcare monitoring, especially in dementia care. It can offer preliminary data to design and validate methods that analyse multi-modal data with sparse annotations for healthcare monitoring applications. For example, new AI methods could be developed to detect: i) vital sign abnormalities; ii) neuropsychiatric symptoms; iii) social isolation; and iv) functional decline.

Methods

A secure digital platform was developed to integrate in-home IoT and remote monitoring technologies to collect routine physiological, sleep, movement, and ambient data¹¹.

Digital markers

Digital markers are measurable physiological and in-home movement data gathered and assessed by digital devices, including portable and passive monitoring sensors. Digital markers can deliver novel and useful insights into an individual’s activity patterns and physiological health, allowing for continuous and non-invasive healthcare monitoring. Remote monitoring technologies also provide a novel approach to monitoring the effect of new interventions in clinical trials and observational and interventional studies¹².

In TIHM, sensory devices were installed in participants’ homes, and activity data was continuously recorded via passive infrared (PIR) sensors (installed in the hallway and living room), movement sensors (on kitchen, bedroom and bathroom doors), door sensor (installed in the main entrance), and an under-the-mattress sleep-mat (for monitoring sleep and in-out of bed activity). Participants were supplied Bluetooth-enabled devices to measure their blood pressure, heart rate, body temperature, weight, and hydration daily. Figure 1 shows an example of the residential setting of a participant in the study equipped with the sensors. Details of the devices and digital markers are shown in Tables 1, 2.

Table 1 Overview of the digital markers collected in the TIHM dataset¹⁰, detailing the monitoring device used and the frequency of measurement for the collection of data.

Full size table

Table 2 List of devices used for data collection in the study, including manufacturer, device type, and specific product model with links to specifications.

Full size table

Participants

To be eligible for this study, participants needed to meet the inclusion criteria of being a person over 50 years old, with a verified diagnosis of dementia (of any type) or mild cognitive impairment, who has the capacity to provide informed consent to participate in the study, and either received treatment from an Old Age Psychiatry department in the past or is currently on their caseload. In addition, participants required a study partner or caregiver who had known the PLWD for at least six months and was able to attend research assessments with them. If a participant was unable to provide information about their health, their partner or caregiver completed the necessary assessments on their behalf. Individuals with unstable mental states, including severe depression, severe psychosis, agitation, anxiety, active suicidal thoughts, or those receiving treatment for terminal illnesses were not included in the study. A total of 56 people were selected as participants. All the participants have granted the publication of this dataset. The demographic details of the participants in the dataset is shown in Table 3. Some participants in the study requested not to share all or part of their information outside the study. For these cases, the corresponding information is represented by “N/A” (Not Available) in Table 3 and their data was not included in the dataset.

Table 3 Demographics of the participants in the study (n = 56).

Full size table

Ethical approval

The TIHM study received ethical approval from the London-Surrey Borders Research Ethics Committee; TIHM 1.5 REC: 19/LO/0102. The study is registered with National Institute for Health and Care Research (NIHR) in the United Kingdom under Integrated Research Application System (IRAS) registration number 257561. To the best of our knowledge, this is the first publicly available dataset for remote healthcare monitoring for PLWD that includes in-home activity and sleep data, physiological measurements, and labelled health and care-related events during the monitoring period. TIHM is also currently being offered as a service by the Surrey and Borders National Health Service (NHS) Trust in the United Kingdom.

Dataset collection

We combined in-home sensory data with individuals’ healthcare information extracted from General Practitioner (GP) records and hospital visits to create a holistic view of their well-being and care needs.

The sensor deployment relied on off-the-shelf devices to monitor in-home activities and physiology. These sensors continuously collected and communicated the data to a data collection and integration platform. The data from the sensors in this release are de-identified, cleaned (removing redundant and multiple records) and merged based on their categories into four different tables which are further explained in Section Data Records. The annotations and data labels for this study were collected by a monitoring team who contacted the participants to determine if they had experienced a health-related event. The data was labelled as true if the monitoring team validated the presence of a health-related event and false if there was no event.

The initial alert generation for triaging a healthcare event was governed by a set of rules and thresholds applied to physiological measurements and the output of an analytical model designed to analyse in-home activity and physiology data¹³. This initial analytical model was only intended to guide the monitoring team in identifying episodes of agitation and creating a labelled dataset for further data analysis and machine learning developments.

By combining the data from the in-home sensors, we obtained a comprehensive understanding of an individual’s home activity and health, and used this information to determine the risk or presence of health related conditions⁹. For example, we detected changes in an individual’s activity patterns, such as a change in room usage that may indicate social isolation or agitation⁵.

Dataset de-identification

Two types of de-identification have been applied to data. During the study, the data was pseudo-anonymised for the clinical monitoring team and for developing analytical models. The data includes the demographics (age and sex) in addition to raw sensory observations and measurements. Information governance and control methods and procedures were applied to the data during the project. An NHS-approved Data Processing and Impact Assessment was conducted for the data collection, storage and access procedures. Before making the TIHM dataset¹⁰ available online, the data was then fully anonymised by removing all personally identifying information or identifiable attributes. Participants are randomly assigned with a universally unique identifier (UUID) to increase security in the de-identification.

Data Records

The TIHM dataset¹⁰ is available at Zenodo. It consists of five separate tables (Activity, Sleep, Physiology, Labels, and Demographics) containing information about various aspects of remote healthcare monitoring. A description of the data files included in the TIHM dataset¹⁰ is shown in Table 4. Each table includes timestamps related to each event and the assigned UUIDs of the participants to allow for cross-referencing and synchronisation among the various records.

1.
The Activity table includes data from motion and door sensors that track movement in different locations in the home. The temporal resolution for this data is in seconds. For each recorded activity, the locations may be a subset of the commonly recorded locations in the home, which include ‘Back Door’, ‘Fridge Door’, ‘Hallway’, ‘Kitchen’, ‘Lounge’, ‘Bedroom’, Bathroom’, ‘Front Door’, and ‘Dining Room’.
2.
The Sleep table includes sleep data collected using sleep tracking mats. This data includes four sleep states (i.e., awake, light, deep, REM), as well as information on snoring, heart rate, and respiratory rate reported by the sleep-mat device. The temporal resolution of the heart rate, breathing rate, and sleep state data is per minute, whilst a PLWD is in bed, on top of the device.
3.
The Physiology table contains daily records of vital signs, including body temperature, skin temperature, diastolic blood pressure, systolic blood pressure, heart rate, muscle mass, body water, and body weight. Some participants may not have recorded this information on a daily basis, resulting in sparsity in the physiology data.
4.
The Labels table includes data on six types of alerts that have been verified by the monitoring team in the TIHM study. These labels include episodes of agitation, abnormally high or low blood pressure, abnormally high or low body temperature, low body water (i.e. dehydration), abnormally high or low heart rate, and weight changes. Seven participants did not have any confirmed alerts during the project and are not included in this table. The Labels table can be used for training predictive models. The thresholds used to raise and verify these alerts are shown in Table 5.
Table 4 An overview of the data files included in the TIHM dataset¹⁰.
Full size table
Table 5 List of health-related events alerts that are generated based on the measurements of different health parameters and their respective thresholds.
Full size table
5.
The Demographics table provides sex and age group information for each participant. All participants are separated into three age groups: (70, 80], (80, 90], (90, 110].

An overview of the Activity table is shown in Fig. 2, which summarises total in-home movement in each location daily for all the participants. The increasing trend of the total number of in-home movements aligns with the increasing number of participants in the study. The figure also shows a large drop on the 14th of June 2019, which was caused by a technical failure in the data collection server. A similar phenomenon can also be observed in other tables.

It should be noted that the movement data was collected over the whole household and includes both PLWD, their carers, and any potential visitors’ movements in the house. This data can be used for trend and pattern analysis by which to identify changes over time or during specific time windows. For example, we have used the in-home movement data in a model to analyse the risk of agitation in PLWD⁵. Fig. 3 illustrates an example of activity patterns of a study participant extracted from the dataset. Figure 3a displays the irregular in-home movements of a PLWD who experienced frequent neuropsychiatric symptoms. Figure 3b illustrates activities of a PLWD with no neuropsychiatric symptoms, where clear habitual patterns are present in daily activities.

As an example of multiple sources in the data, in Fig. 4, we combine information from multiple sources of physiology data (e.g., blood pressure, body weight, temperature) for a single participant on a daily basis and display this data aligned with the alerts reported in the dataset. We can see in Fig. 4 that blood pressure alerts were generated when the participant’s blood pressure was higher than the threshold.

Technical Validation

In order to verify the usability and applicability of the observations and measurements in the dataset for health risk detection or prediction, we have trained and tested a set of classifiers for identifying the risk of Agitation. Before training the classifiers, we first aggregated and pre-processed the activity and physiology data according to the following steps:

1)
Aggregating location movements by computing statistical attributes of movements at each hour of each day (i.e. sum, mean, maximum, and standard deviation). For example, we obtain four features for describing daily movements in bathroom: “Bathroom_count_sum, Bathroom_count_mean, Bathroom_count_max, Bathroom_count_std”. In this case, “Bathroom_count_mean” indicates the mean of the number of movements in bathroom at each hour of a given day.
2)
Aggregating physiology information by taking the maximum values of all measurements in each day. Since most physiological measurements only have one record per day, this step aligns these measurements to one daily figure.
3)
Filling in missing values in all numerical features by 0. We intentionally did not apply a data imputation technique at this step to show the effect of missing values in the modelling results. Applying carefully guided imputation methods could improve the results of future experiments.
4)
Normalising all numerical features by the min-max normalisation for each participant as: \({x}_{i,p}=\frac{{x}_{i,p}-\min \left({x}_{i,p}\right)}{\max \left({x}_{i,p}\right)-\min \left({x}_{i,p}\right)}\), where x_i,p denotes the subset of the i-th feature for participant p.
5)
Up-sampling positive cases (for samples with validated agitation alerts) in the training set to overcome the class imbalance issue. This is because the positive samples are less than 10% in all training sets of the cross-validation.

Five baseline models were evaluated, including Gradient Boosting Trees, Multi-Layer Perceptron, Logistic Regression, Naäve Bayes, and Gaussian Process. In our experiments, we applied a 5-fold cross-validation (as shown in Fig. 5a) to evaluate the performance of the baseline models, taking into account the sequential nature of time series data. Figure 5b shows the performance of all baseline models, which demonstrates the potential of developing predictive and analytical models using TIHM dataset¹⁰ for applications in health and well-being analysis. We also visualise the feature importance metrics learned by the Logistic Regression model in Fig. 6. The SHapley Additive exPlanations (SHAP) value¹⁴ of each feature represents its impact on the model output regarding a given input. Figure 6 illustrates the distribution of SHAP values for each feature, which are estimated by all test samples during the cross-validation. The colour spectrum in Fig. 6 indicates whether the raw value of a feature is high or low. This helps to verify which features contribute more to the positive or negative predictions.

More advanced methods for feature engineering and data modelling can potentially improve the predictive performance of this experiment by further consideration of the temporal dependencies within the longitudinal data that are not captured in these baseline models. Here we mainly focused on presenting a baseline sample and showcasing the use of the dataset.

Usage Notes

The TIHM dataset¹⁰ offers preliminary data to design and validate clinically applicable machine intelligence and decision-support methods for continuous healthcare monitoring. We have provided raw data and guidelines on how to access, visualise, manipulate and predict health-related events within the dataset, available on the Github repository (https://github.com/PBarnaghi/TIHM-Dataset). The Jupyter Notebooks have been developed using Python 3.9.

The dataset is organised in five separate tables stored as separate CSV files, including, Activity, Sleep, Physiology, Labels and Demographics. Data can be cross-referenced across the files. The instructions for loading the data and a set of sample codes for loading and using the dataset are provided in the supplementary code.

Code availability

The TIHM dataset is available in the corresponding Zenodo repository¹⁰ and consists of five separate tables (Activity, Sleep, Physiology, Labels, and Demographics). For further information on the data records, please refer to the README file. The code for the experiments presented in the manuscript is available on the Github repository (https://github.com/PBarnaghi/TIHM-Dataset). The libraries and their versions and dependencies that are used in the code are also provided as a separate configuration file in JSON/YAML format.

References

Feast, A. et al. Behavioural and psychological symptoms in dementia and the challenges for family carers: systematic review. The British Journal of Psychiatry 208, 429–434, https://doi.org/10.1192/bjp.bp.114.153684 (2016).
Article PubMed PubMed Central Google Scholar
Enshaeifar, S. et al. The internet of things for dementia care. IEEE Internet Computing 22, 8–17, https://doi.org/10.1109/MIC.2018.112102418 (2018).
Article Google Scholar
Ray, P. P., Dash, D. & De, D. A systematic review and implementation of iot-based pervasive sensor-enabled tracking system for dementia patients. Journal of medical systems 43, 1–21, https://doi.org/10.1007/s10916-019-1417-z (2019).
Article Google Scholar
Buchanan, J., Christenson, A., Houlihan, D. & Ostrom, C. The role of behavior analysis in the rehabilitation of persons with dementia. Behavior therapy 42, 9–21, https://doi.org/10.1016/j.beth.2010.01.003 (2010).
Article PubMed Google Scholar
Palermo, F. et al. Designing A Clinically Applicable Deep Recurrent Model to Identify Neuropsychiatric Symptoms in People Living with Dementia Using In-Home Monitoring Data, Workshop on Bridging the Gap: From Machine Learning Research to Clinical Practice, NeurIPS, https://doi.org/10.48550/arXiv.2110.09868 (2021).
Fletcher-Lloyd, N. et al. Home monitoring of daily living activities and prediction of agitation risk in a cohort of people living with dementia. Alzheimer’s & Dementia 17, e058614, https://doi.org/10.1002/alz.058614 (2021).
Article CAS Google Scholar
Gong, J. et al. Home wireless sensing system for monitoring nighttime agitation and incontinence in patients with alzheimer’s disease. In Proceedings of the conference on Wireless Health, 1–8, https://doi.org/10.1145/2811780.2822324 (2015).
Spasojevic, S. et al. A pilot study to detect agitation in people living with dementia using multi-modal sensors. Journal of Healthcare Informatics Research 1–17, https://doi.org/10.1007/s41666-021-00095-7 (2021).
Enshaeifar, S. et al. Machine learning methods for detecting urinary tract infection and analysing daily living activities in people with dementia. PloS one 14, e0209909, https://doi.org/10.1371/journal.pone.0209909 (2019).
Article CAS PubMed PubMed Central Google Scholar
Palermo, F. et al. Tihm: An open dataset for remote healthcare monitoring in dementia. Zenodo https://doi.org/10.5281/zenodo.7622128 (2023).
Enshaeifar, S. et al. A digital platform for remote healthcare monitoring. In Companion Proceedings of the Web Conference 2020, WWW ‘20, 203–206, https://doi.org/10.1145/3366424.3383541 (Association for Computing Machinery, New York, NY, USA, 2020).
Kourtis, L. C., Regele, O. B., Wright, J. M. & Jones, G. B. Digital biomarkers for alzheimer’s disease: the mobile/wearable devices opportunity. NPJ digital medicine 2, 1–9, https://doi.org/10.1038/s41746-019-0084-2 (2019).
Article Google Scholar
Enshaeifar, S. et al. Health management and pattern analysis of daily living activities of people with dementia using in-home sensors and machine learning techniques. PloS one 13, e0195605, https://doi.org/10.1371/journal.pone.0195605 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Advances in neural information processing systems 30, https://doi.org/10.1145/3366424.3383541 (2017).

Download references

Acknowledgements

We would like to thank our project partners and sponsors for their contributions to the project. The technology deployment and data collection in this study was supported in collaboration with Howz (https://www.howz.com). A complete list of contributors and partners can be found at https://www.sabp.nhs.uk/tihm. This project was supported by a grant from the Office of Life Sciences at the Department of Health UK and NHS England, grant number (TS/N009894/1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The DCARTE library (https://github.com/esoreq/dcarte/) was used for downloading a snapshot of the data from the data repository. The dataset is provided for research purposes and supporting patient care. Please acknowledge the Surrey and Borders Partnership NHS Foundation Trust in any publication or use of this dataset.

Author information

These authors contributed equally: Francesca Palermo, Yu Chen.

Authors and Affiliations

Imperial College London, Department of Brain Sciences, London, W12 0NN, UK
Francesca Palermo, Yu Chen, Alexander Capstick, Nan Fletcher-Loyd, Chloe Walsh, Eyal Soreq, Gregory Scott, Helen Rostill, Ramin Nilforooshan & Payam Barnaghi
The UK Dementia Research Institute, Care Research and Technology Centre, London, W1T 7NF, UK
Francesca Palermo, Yu Chen, Alexander Capstick, Nan Fletcher-Loyd, Chloe Walsh, Samaneh Kouchaki, Eyal Soreq, Gregory Scott, Helen Rostill, Ramin Nilforooshan & Payam Barnaghi
Surrey and Borders Partnership NHS Trust, Leatherhead, KT22 7AD, UK
Chloe Walsh, Jessica True, Olga Balazikova, Helen Rostill & Ramin Nilforooshan
University of Surrey, Guildford, GU2 7XH, UK
Samaneh Kouchaki

Authors

Francesca Palermo
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Capstick
View author publications
You can also search for this author in PubMed Google Scholar
Nan Fletcher-Loyd
View author publications
You can also search for this author in PubMed Google Scholar
Chloe Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Samaneh Kouchaki
View author publications
You can also search for this author in PubMed Google Scholar
Jessica True
View author publications
You can also search for this author in PubMed Google Scholar
Olga Balazikova
View author publications
You can also search for this author in PubMed Google Scholar
Eyal Soreq
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Scott
View author publications
You can also search for this author in PubMed Google Scholar
Helen Rostill
View author publications
You can also search for this author in PubMed Google Scholar
Ramin Nilforooshan
View author publications
You can also search for this author in PubMed Google Scholar
Payam Barnaghi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.B., H.R., conceptualised the study. F.P. and Y.C. analysed and de-identified the data and drafted the manuscript. Y.C. and A.C. conducted the experiments and developed the code for repeatability. F.P. assisted with data visualisation. R.N. led the clinical design and monitoring. O.B. and J.T. contributed to the study protocol and governance. All authors reviewed the manuscript.

Corresponding author

Correspondence to Payam Barnaghi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Palermo, F., Chen, Y., Capstick, A. et al. TIHM: An open dataset for remote healthcare monitoring in dementia. Sci Data 10, 606 (2023). https://doi.org/10.1038/s41597-023-02519-y

Download citation

Received: 06 March 2023
Accepted: 30 August 2023
Published: 09 September 2023
DOI: https://doi.org/10.1038/s41597-023-02519-y