The future of sleep health: a data-driven revolution in sleep science and medicine

In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applications ranging from large-scale epidemiological research linking sleep patterns to disease, to wellness applications, including the sleep coaching of individuals with chronic conditions. However, in order to realise the full potential of these technologies for individuals, medicine and research, several significant challenges must be overcome. There are important outstanding questions regarding performance evaluation, as well as data storage, curation, processing, integration, modelling and interpretation. Here, we leverage expertise across neuroscience, clinical medicine, bioengineering, electrical engineering, epidemiology, computer science, mHealth and human–computer interaction to discuss the digitisation of sleep from a inter-disciplinary perspective. We introduce the state-of-the-art in sleep-monitoring technologies, and discuss the opportunities and challenges from data acquisition to the eventual application of insights in clinical and consumer settings. Further, we explore the strengths and limitations of current and emerging sensing methods with a particular focus on novel data-driven technologies, such as Artificial Intelligence.


Supplementary Note 1 Sleep Metrics
Beyond the sleep staging guidelines provided by AASM, there are several sleep metrics that are commonly used when assessing sleep-wake cycles. Sleep onset time is defined as the boundary that determines the transition between a period where the person is awake to when they are sleep. Similarly, the boundary between when a person is asleep and the transition to wake is known as the sleep awakening time. The following table introduces some of the most readily used sleep metrics based on these definitions.
Additionally, recently Phillips and colleagues described the Sleep Regularity Index (SRI) as "the likelihood that any two time-points, on a minute -to-minute basis, 24-hours apart were the same wake-sleep state, across all days 1 ". So if we were to derive SRI using 30-second epochs on accelerometer data the SRI equation would be: Given N days of recorded divided into M (epoch=30s) daily epochs, suppose that s i,j = 1 if sleep on day i and epoch j and 0 if they are awake.
New sleep metrics based on EEG data are currently being derived, in search of establishing reliable sleep EEG biomarkers that could be used to phenotype patients. 2, 3 However, it is important that new sleep metrics also address the prevalence and representativeness of the data being used and account for it as well as the sampling and data-collection bias associated to sleep studies.

Supplementary Table 1
Supplementary Understanding how well a specific method is performing at the classification task is of great importance for research, clinical, industry and lifestyle applications. While evaluating a method or model's accuracy can be insightful, it is not sufficient. In sleep-wake and sleep stage classification task not all errors are equal. Indeed, there are two categories of error: predicting a negative when the instance is positive and predictive a positive when the instance is negative. Moreover, there are two categories of good prediction: successful prediction is termed true and unsuccessful prediction is termed false. These four variants form a confusion matrix: Supplementary Table 2 Supplementary  Table 3 introduces the most common metrics used for sleep-wake classification algorithm performance evaluation.
Supplementary Table 3 Supplementary T P T P +F P Agreement between the data labels and positive labels given by the algorithm Recall (Sensitivity)

T P T P +F N
Effectiveness of the algorithm to identify positive labels Specificity T N T N +F P Effectiveness of the algorithm to identify negative labels F1 Score

Supplementary Note 3 Performance Metrics on Sleep-Wake Classification
From a classification perspective, the correctness of the classification algorithms can be evaluated by the number of instances or events correctly recognised class examples (true positives), the number of instances or events that are correctly identified as not belonging to a certain class (true negatives), as well as the instances or events that are wrongly classified as a given class (false positives) or were not recognised as a class (false negatives). Given these four metrics, we can compute what is known as a confusion matrix: Based on the results obtained from the confusion matrix, there are several important metrics of performance that are derived to evaluate sleep classification algorithm performance. These are described in Supplementary Table 3.
Furthermore, when evaluating classification performance, other metric's are used depending on the characteristics of the data set. For instance, Cohen's Kappa is a metric that offers a comparison of the observed accuracy with respect to an expected accuracy (random chance, e.g. sleep-wake classification ) and is defined as: Where Pr(a) is the observed accuracy and Pr(e) is the expected accuracy. Other commonly used metrics of classification accuracy are Hamming and hinge loss, the Matthews correlation coefficient or zero-one classification loss.

Supplementary Note 4 Actigraphy Specific Sleep Metrics
Actigraphy data can be analyzed and studied to focus in either sleep or circadian rhythms. Traditional sleep metrics like the ones explored previously (Wake after sleep onset, total sleep time, etc) can be extracted from actigraphy data. Moreover, metrics related to circadian rhythms can also be derived. Some of the most common ones are interday stability (IS), intraday variability (IV), amplitude at rest (L5) and relative amplitude (RA). 4 These metrics provide information beyond that of traditional sleep metrics derived from actigraphy data.
For instance, intraday variability (IV) is a measure of sleep fragmentation and interday stability (IS) can be used to asses sleep regularity.