Introduction

Every year, more than 5.7 million adults are admitted to intensive care units (ICU) in the United States, costing the health care system more than 67 billion dollars per year1. A wealth of information is recorded on each patient in the ICU, including high-resolution physiological signals, various laboratory tests, and detailed medical history in electronic health records (EHR)2. Nonetheless, important aspects of patient care are not yet captured in an autonomous manner. For example, environmental factors that contribute to sleep disruption and ICU delirium3, such as loud background noise, intense room light, and excessive rest-time visits, are not currently measured. Other aspects of patients’ well-being, including patient’s facial expressions of pain and various emotional states, mobility and functional status4,5 are not captured in a continuous and granular manner and require self-reporting or repetitive observations by ICU nurses6,7. It has been shown that self-report and manual observations can suffer from subjectivity, poor recall, limited number of administrations per day, and high staff workload. This lack of granular and continuous monitoring can prevent timely intervention strategies8,9,10,11,12,13. With recent advancements in artificial intelligence (AI) and sensing, many researchers are exploring complex autonomous systems in real-world settings14. In ICU settings, doctors are required to make life-saving decisions while dealing with high level of uncertainty under strict time constraints to synthesize high-volume of complex physiologic and clinical data. The assessment of patients’ response to therapy and acute illness, on the other hand, is mainly based on repetitive nursing assessments, thus limited in frequency and granularity. AI technology could assist not only in administering repetitive patient assessments in real-time, but also in integrating and interpreting these data sources with EHR data, thus potentially enabling more timely and targeted interventions15,16. AI in the critical care setting could reduce nurses’ workload to allow them to spend time on more critical tasks, and could also augment human decision-making by offering low-cost and high capacity intelligent data processing. In this study, we examined how pervasive sensing technology and AI can be used for monitoring patients and their environment in the ICU. We utilized three wearable accelerometer sensors, a light sensor, a sound sensor, and a high-resolution camera to capture data on patients and their environment in the ICU (Fig. 1). We used computer vision and deep learning techniques to recognize patient’s face, posture, facial action units, facial expressions, and head pose from video data. We also analyzed video data to find visitation frequency by detecting the number of visitors or medical staff in the room. To complement vision information for activity recognition, we analyzed data from wearable accelerometer sensors worn on the wrist, ankle, and arm. Additionally, we captured the room’s sound pressure levels and light intensity levels to examine their effect on patients’ sleep quality, assessed by the Freedman Sleep Questionnaire17. For recruited patients, we retrieved all available clinical and physiological information from EHR (Table 1). For a pilot study, we prospectively recruited 22 critically ill patients with and without ICU delirium to determine whether the Intelligent ICU system can be used to characterize the difference between their functional status, pain and environmental exposure. The Confusion Assessment Model-Intensive Care Unit (CAM-ICU) [19] was administered daily as the gold standard for detecting delirium.

Figure 1
figure 1

(a) Intelligent ICU uses pervasive sensing for collecting data on patients and their environment. The system includes wearable accelerometer sensors, video monitoring system, light sensor, and sound sensor. (b) The Intelligent ICU information complements conventional ICU information. Pervasive information is provided by performing face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, extremity movement analysis, sound pressure level detection, light level detection, and visitation frequency detection. Face Detection Icon, Face Recognition Icon, Facial AU Detection Icon, Facial Expression Recognition Icon, and Head Pose Detection Icon: © iStock.com/bitontawan.

Table 1 Cohort characteristics.

Pilot Study Results

We recruited 22 patients in the surgical ICU at the quaternary academic University of Florida Health Hospital (Table 1, Supplementary Fig. S1) and five were excluded by the study’s exclusionary criteria. Nine patients (53%) were diagnosed with ICU delirium by the daily CAM-ICU assessment for at least one day during the enrollment period. Delirious patients (defined as patients who were delirious throughout their enrollment period, number of patients = four) and non-delirious patients (defined as patients who were not delirious at any day during their enrollment period, number of patients = eight) did not significantly differ in baseline characteristics, except for the number of comorbidities (Table 1). All delirious patients were identified as hyperactive subtype using the Delirium Motor Subtyping Scale (DMSS-4)18. During the enrollment period, data were collected continuously for up to seven days from each patient. We collected 33,903,380 video frames visibly containing face, 16,123,925 video frames of patient posture, and 3,203,153 of patient facial expressions. We also collected 1,008 hours of accelerometer data, 768 hours of sound pressure level data, 456 hours of light intensity level data, and 1416 hours of physiological data. Occasionally, one or more sensors were removed at patient’s request, during bathing, or during clinical routines. For training our deep learning models on ground truth labels, we annotated 65,000 video frames containing individual faces, and 75,697 patient posture video frames. All model training and testing were performed on an NVIDIA Titan X Pascal Graphical Processing Unit (GPU).

Face detection

To detect all individual faces in each video frame (including the patient, visitors, and clinical staff), we used the pretrained Joint Face Detection and Alignment using Multi-Task Cascaded Convolutional Network (MTCNN)19. Face detection was evaluated on 65,000 annotated frames containing at least one individual face, resulting in a Mean Average Precision (mAP) value of 0.94.

Patient face recognition

To recognize the patient’s face among detected faces, we implemented the FaceNet algorithm20 as an Inception-ResNet v1 model21. The algorithm achieved an overall mAP of 0.80 and had slightly higher mAP value of 0.82 among non-delirious patients compared to delirious patients mAP of 0.75.

Patient facial action unit detection

We detected 15 facial action units (AUs) from 3,203,153 video frames using the pretrained OpenFace deep neural network22. The 15 AUs included six binary AUs (0 = absent, 1 = present), and 12 intensity-coding AUs (0 = trace, 5 = Maximum value), with three AUs reported as both binary and intensity (Supplementary Table S1). Successful detection was defined as the toolbox being able to detect the face and its facial AUs. Successful detection was achieved for 2,246,288 out of 3,203,153 video frames (%70.1). The 15 detected AUs were compared between the delirious and non-delirious patients (Fig. 2). All AUs were shown to be significantly different between the two groups (p-value < 0.01).

Figure 2
figure 2

(a) Distribution of intensity-coding facial Action Units (AUs) among delirious and non-delirious patients shown as boxplots where middle line represents median and lower and upper end lines represents 25th and 75th percentiles, respectively. Facial AUs are coded between 0 (absence of the facial AU) to 5 (maximum intensity of facial AU); (b) Percentage of frames with each binary-coding facial AU present among delirious and non-delirious patients during their enrollment period. Binary-coding facial AUs are coded either 0 (absent) or 1 (present). This bar plot shows how often a certain action unit is observed in all recorded video frames as percentage and standard error bars. In (a,b), *shows statistically significant difference between delirious and non-delirious groups (p-value < 0.001).

Patient facial expression recognition

We used the Facial Action Coding System (FACS) to identify common facial expressions from their constituent AUs (Supplementary Table S2)23,24. Eight common expressions were considered, including pain, happiness, sadness, surprise, anger, fear, disgust, and contempt. The occurrence rate of facial expressions was compared between the delirious and non-delirious patients (Fig. 3). We were able to show that distributions of several facial AUs are different among the delirious and non-delirious groups. The differences in the distribution of such AUs point to the differences in affections of delirious and non-delirious patients. For instance, the presence of brow lowerer AU signals a negative valence25 and is stronger among the delirious patients than non-delirious patients (Fig. 2). Facial expressions patterns can also potentially be used in predicting deterioration risks in patients26. Delirious patients had suppressed expression for seven out of eight emotions. All facial expressions except for anger had significantly different distribution among the delirious and non-delirious patients (p-value < 0.001).

Figure 3
figure 3

Percentage of frames with each facial expression present among the delirious and non-delirious patients, calculated based on constituent AUs (Supplementary Table S2). This bar plot shows how often a certain expression is observed in all recorded video frames as percentage and standard error bars. *Shows statistically significant difference between delirious and non-delirious groups (p-value < 0.001).

Head pose detection

We detected three head poses including yaw, pitch, and roll, using the pretrained OpenFace deep neural network tool22. The head rotation in radians around the Cartesian axes was compared between the delirious and non-delirious patients, with the left-handed positive sign convention, and the camera considered as the origin. Delirious patients exhibited significantly less variation in roll head pose (rotation in-plane movement), in pitch head pose (up and down movement), and in yaw head pose (side to side movement) compared to the non-delirious patients (Fig. 4). Extended range of head poses in non-delirious patients compared to delirious patients (Fig. 4) might be the result of more communication and interaction with the surrounding environment.

Figure 4
figure 4

(a) Distribution of head poses among delirious and non-delirious patients during their enrollment days shown as boxplots. Pitch, yaw, and roll describe the orientation of the head in its three degrees of freedom. Pitch is the rotation around the right-left axis, up and down, as shaking the head “Yes”. Roll is rotation around the inferior-superior axis, as shaking the head “No”. Yaw is rotation around the anterior-posterior axis, side to side, like shaking the head “Maybe”; (b) Percentage of the frames spent in each posture among delirious and non-delirious patients shown along with standard error bars; (c) Percentage of frames with visitors present in the room (disruption) for delirious and non-delirious patients during the day (7AM–7PM) and during the night (7PM–7AM) shown along with standard error bars. In (ac), *shows statistically significant difference between delirious and non-delirious groups (p-value < 0.001).

Posture recognition

To recognize patient posture, we used a multi-person pose estimation model27 to localize anatomical key-points of joints and limbs. Then we used the lengths of body limbs and their relative angles as features for recognizing lying in bed, standing, sitting on bed, and sitting in chair. We obtained an F1 score of accuracy of 0.94 for posture recognition. The highest misclassification rate (11.3%) was obtained for sitting on chair (misclassified as standing). The individual classification accuracy of recognizing postures was: lying = 94.5%, sitting on chair = 92.9%, and standing = 83.8% (Supplementary Table S3). Delirious patients spent significantly more time lying in the bed and sitting on chair compared to non-delirious patients (p-value < 0.05 for all four postures, Fig. 4).

Extremity movement analysis

We analyzed the data from three accelerometer sensors worn on patient’s wrist, ankle, and arm, and compared the results between delirious and non-delirious patients. For the purpose of feature calculation, we consider daytime from 7 AM to 7 PM, and nighttime from 7 PM to 7 AM, based on nursing shift transitions. Figure 5 shows the smoothed accelerometer signal averaged over all delirious and all non-delirious patients. We also derived 15 features per each accelerometer (Table 2), resulting in 45 total features for the wrist, ankle, and arm sensors. We compared the extracted features in delirious and non-delirious patients for the wrist-worn sensor, arm-worn sensor, and ankle-worn sensor (Table 2, Supplementary Tables S4 and S5). Delirious patients had higher movement activity for wrist and lower extremity, and lower movement activity for the upper extremity during the entire 24-hours cycle, daytime (7 AM–7 PM), and nighttime (7 PM–7 AM). The 10-hour window with maximum activity intensity showed different levels of activity between the two patient groups. However, activity in the 5-hour window with the lowest activity intensity was not significantly different, possibly due to low activity levels in ICU in general. The number of immobile moments during the day and during the night were also different between the two groups, with less number of immobile moments detected for the delirious patients, hinting at their restlessness and lower sleep quality. The extremity movement features did not show significant difference for arm and ankle. This might stem from the overall limited body movements of all ICU patients.

Figure 5
figure 5

Delirious and non-delirious group comparisons for (ae) sensor data and (fj) physiological data. Sensor data included accelerometer data recorded on the wrist, arm, and ankle, as well as light intensity level recorded using an actigraph capable of recording light intensity level and the sound pressure level using an iPod on the wall behind the patient’s bed physiological data included heart rate, systolic blood pressure and diastolic blood pressure, respiration rate, and oxygen saturation. Physiological data were collected with a resolution of approximately once per hour as part of the patient’s care. The graphs show the smoothed average value per group, with the transparent band around each average line showing the 95% confidence interval. The bar at the top of each panel shows the night (black) and day (white) time.

Table 2 Movement features for the wrist, compared between the delirious and non-delirious groups.

Visitation frequency

The pose estimation model was also used on the video data to identify the number of individuals present in the room at any given time, including visitors and clinical staff. Delirious patients on average had fewer visitor disruptions during the day, but more disruptions during the night (Fig. 4).

Room sound pressure levels and light intensity

The sound pressure levels for delirious patients’ rooms during the night were on average higher than the sound pressure levels of non-delirious patients’ rooms (Fig. 5). Average nighttime sound pressure levels were significantly different between the delirious and non-delirious patients (p-value < 0.05). Delirious patients on average experienced higher light intensity during the evening hours, as can be seen in Fig. 5. Average nighttime light intensity levels were significantly different between the delirious and non-delirious patients (p-value < 0.05).

Sleep characteristics

We examined the Freedman Sleep Questionnaire17 responses for the delirious and non-delirious patients to compare their sleep patterns. While the median of the overall quality of sleep in the ICU and effect of acoustic disruptions and visitations during the night were different among the delirious and non-delirious groups, these differences were not statistically significant. However, delirious patients reported a lower overall ability to fall asleep compared to non-delirious patients, and they were more likely to find the lighting to be disruptive during the night (p-value = 0.01, p-value = 0.04, respectively, Supplementary Fig. S2).

Physiological and EHR data

Patients’ demographic and primary diagnosis were not significantly different between the delirious and non-delirious patients (Table 1). Delirious patients on average had higher average heart rate, oxygen saturation, and respiration rate, a sign of potential respiratory distress and agitation. Systolic and diastolic blood pressure of the delirious patients were lower than non-delirious patients during the evenings (Fig. 5). All delirious patients received continuous enteral feeding orders and were fed throughout the nighttime while 50% of non-delirious patients had enteral feeding order during their enrollment days.

Discussion

In this study, we showed the feasibility of pervasive monitoring of patients in the ICU. This is the first study to develop an autonomous system for patient monitoring in the ICU. We performed face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, extremity movement analysis, sound pressure level detection, light level detection, and visitation frequency detection, in the ICU. As an example, we evaluated our system for characterization of patient and ambient factors relevant to delirium syndrome3,28,29. Such a system can be potentially used for detecting activity and facial expression patterns in patients. It also can be used to quantify modifiable environmental factors such as noise and light in real time. This system can be built with an estimated cost of < $300 per ICU room, a relatively low cost compared to daily ICU costs of thousands of dollars per patient. It should be noted that after proper cleaning procedures, the same devices can be also reused for other patients, further reducing the amortized cost per patient in the long term.

To the best of our knowledge, this is the first study to continuously assess critically ill patients’ affect and emotions using AI. The AI introduces the ability to use the combination of these features for autonomous detection of delirium in real time and would offer a paradigm shift in diagnosis and monitoring of mood and behavior in the hospital setting.

Our system also uniquely offers autonomous detection of patients’ activity patterns by applying deep learning on sensor data obtained from video and accelerometer. This previously unattainable information can optimize patients’ care by providing more comprehensive data on patients’ status through accurate and granular quantification of patients’ movement. While there is previous work that has used video recordings in the ICU to detect patient’s status30, they were not able to measure the intensity of patients’ physical activity. The combined knowledge of patients’ functional status through video data and their physical activity intensity through movement analysis methods can help health practitioners to better decide on rehabilitation and assisted mobility needs.

Our collected data hint at several interesting observations, including more significant disruption of the circadian rhythm of physical activity in delirious patients, as confirmed by other studies31,32,33. To concur with existing practice in literature, we used standardized tools accepted as gold standard including Friedman’s sleep quality questionnaires and CAM-ICU assessment. In contrast to previous studies, we collected and analyzed information at a granularity and with an accuracy that is impossible to obtain using conventional methods such as questionnaires. Circadian rhythm, which is important for many health regulatory processes in the body, is generally severely disrupted in ICU patients31,32,33. Delirious patients reported lower overall ability to fall asleep in the ICU. Delirious patients also reported a higher degree of disruption from lighting compared to non-delirious patients. Delirious patients’ noise perception was not statistically significantly different from the non-delirious group, even though the average sound pressure level from the delirious patients’ rooms was higher -almost equivalent to street traffic noise during the sleep time. This could be possibly due to affected hearing and vision perception of delirious patients. As shown in other studies on noise and light exposures in the ICU, both delirious and non-delirious patients experienced sound pressure levels that were well above the recommended guidelines of World Health Organization (WHO) (Fig. 5): 35 dB, with a maximum of 40 dB overnight34,35. Continuous measurement of noise and light exposures in patients in the ICU allow for implementing real-time interventions to improve patients’ sleep hygiene.

Using our vision techniques, we were also able to observe differences in visitor disruptions among delirious and non-delirious patients. More disruptions were observed during the day for non-delirious patients. This may both contribute to and stem from delirium since interactions with others can have reorientation effects and reduce the risk of delirium. At the same time, since delirious patients do not engage in interactions with others, they might have shorter visitations from both family and clinical staff. This was further corroborated by the smaller variation in head pose variation among delirious patients compared to non-delirious patients. More disruptions during the day for non-delirious patients possibly points to their capability to have more interactions with others, including family caregivers and visitors. ICU environment instills a sense of loneliness in the patients, reported by many delirious patients36,37,38,39. Visitations during the day may contribute to preventing delirium, because of conversations’ reorientation effects and the sense of relief that seeing recognizable faces invokes. On the other hand, isolation precautions have been reported to up to double the rate of delirium40,41,42. Delirious patients also had more disruptions during the night, possibly due to their more severe condition and frequent visits from providers, which could aggravate their sleep disruptions.

Using our extremity movement data, we showed that patients’ activity in the wrist was significantly different between delirious and non-delirious patients, but this was not the case for the arm and ankle. It should be noted that the ideal location for wearing the actigraphy device is not necessarily the same for different clinical conditions43,44,45. This insight can potentially be used to reduce the number of required on-body sensors for monitoring specific conditions such as delirium. To the best of our knowledge, this is the first study that examines the wear location for actigraphy devices in characterizing activity patterns in delirium patients. The wrist-based extracted features can be potentially used in delirium detection, as well as in tracking the efficacy of mobility and rehabilitation interventions.

While our system shows great promise for future ICU patient monitoring applications, there are several limitations that need to be considered. Our aim of simultaneously incorporating several noninvasive monitoring techniques for extended periods in the ICU environment requires considerable efforts in data collection, data annotation, and data analysis. As a pilot study to assess the feasibility of the system, we recruited a small number of patients.Nonetheless, different video frames of the same patient show a large variation in terms of lighting, occlusion, visiting staff, patient posture, patient expressions, and medical equipment, in the highly dynamic and fast-paced environment of the ICU. Our analyses were based on recording units (frames) and days of data,which were carefully isolated between the training and test sets. However, still the relatively smaller number of patients could possibly affect some of the results and their variability, once examined in a larger cohort. For example, we did not detect any significant differences between movement features derived from ankle and arm sensors, but larger sample size and more diverse features might show higher discriminative power. Using a larger sample size would also allow us to better customize our deep learning vision models to the ICU environment.

Another limitation of the study is that we did not consider the medications that patients were receiving during their stay in the ICU. Sedatives, analgesics, and anticholinergic medications may potentially affect patients’ sleep, pain, activity patterns, and delirium. To correctly account for these factors, the drugs’ dosages and half-times need to be extracted and their continuous effect on each individual patient needs to be considered.

Implementing our system in a real ICU also poses many challenges, resulting in several limitations in our study. One major issue is the crowded scene in the ICU room, resulting in patient face occlusion and inability to detect expressions at all times. One potential solution is deploying multiple cameras with different view angles. Patient’s face and body might be also obstructed by ventilation devices, bandage, or simply blocked from view by blankets. Another issue arises from the multitude of medical devices on the patient, making it difficult to use wearable sensors on all patients. A related issue occurring for the wearable sensors is that they might get lost, or the clinical staff might not place them in the right location after taking them off for a bath.

Another challenge was inaccurate environment data collection, stemming from the physical properties of light waves. While we made every effort to place the light sensor near the patient’s head to record the same amount of light exposure that the patient is experiencing, individuals or medical equipment might block the light sensor. Again, deploying multiple light sensors, as well as developing vision-based light analysis modules can alleviate this problem.

Privacy can be also a major concern for any system that uses video monitoring. To be able to develop and validate our system, we needed to annotate the video frames to establish ground truth for every event -face recognition, posture detection, and disruptions. However, a future operational version can rely on real-time and online vision analysis without storing any video data. This approach could also reduce the need for extensive storage requirements.

In summary, as a proof of concept, we were able to demonstrate the feasibility of pervasive monitoring of patients in the ICU. We expect future similar systems can assist in administering repetitive patient assessments in real-time, thus potentially enabling more accurate prediction and detection of negative events, and more timely interventions, reducing nursing workload, and opening new avenues for characterizing critical care conditions on a much more granular level.

Materials and Methods

This study was carried out at surgical ICUs in the University of Florida Health Hospital, Gainesville, Florida. It was approved by University of Florida Institutional Review Board by IRB 201400546, and all methods were performed in accordance with the relevant guidelines and regulations. Written informed consent was obtained from patients and/or their surrogates before enrollment in the study. All adult patients 18 years or older who were anticipated to stay longer than 48 hours in ICU and were able to wear an ActiGraph were eligible for enrollment. For each patient, camera and accelerometer data, along with surveys data were collected for up to one week, or until discharge or transfer from the ICU, whichever occurred first. EHR data were collected for the duration of patients’ hospitalization. All history data for diagnoses and procedures were also retrieved for analysis.

Among consenting patients, two patients withdrew before data collection started, one was excluded because of being transferred from the ICU before data collection commenced, two patients stayed for less than a complete day in the ICU. Five patients were not considered in the delirium analysis because they had both delirium and non-delirium days.

For characterizing delirium in terms of activity, facial expressions, environmental light and sound pressure levels, and disruption, delirious patients were defined as those who were delirious throughout their enrollment period, as detected by the CAM-ICU test. DMSS-4 and Memorial Delirium Assessment Scale (MDAS)46 tests were used to detect the motoric subtype and severity of delirium, respectively. To better observe the differences between the delirious and non-delirious patients, we only compared the data between patients who were delirious throughout their enrollment period and those who were non-delirious throughout their enrollment period.

Data acquisition

The pervasive sensing system for data acquisition included (1) a high-resolution and wide-field-of-view camera, (2) three wearable accelerometer sensors, (3) light sensor, (4) microphone for capturing sound pressure levels, and (5) a secure local computer. A touchscreen user-friendly interface allowed nurses and caregivers to stop data collection at any time. Data were captured on a local secure computer throughout the patient enrollment period and transferred to a secure server for analysis upon patient discharge.

Vision

We captured video using a camera with a 90o diagonal field of view with 10X optical zoom for zooming on the patient face, and at 15 frames per second (fps) speed. The camera was placed against the wall, facing the patient. We took several privacy precautions with respect to video data, including posting clearly visible signs warning of “recording in session”. A sliding lens cover was used as a quick privacy alternative. No audio information beyond aggregate sound pressure levels was collected. A simple user interface also allowed the nurses and family caregivers to stop recording at any time, or to delete any scenes if needed.

Wearable sensors

Wearable accelerometer sensors provide complementary information to vision information for activity recognition in the ICU. We used three Actigraph GT3X (GT3X) devices (ActiGraph, LLC. Pensacola, Florida) to record patients’ activity intensity throughout their enrollment period. We placed one GT3X device on the patient’s dominant wrist, one on the dominant arm, and another on the dominant ankle to be able to examine different lower and upper body movements. For patients with medical equipment on their dominant wrist, arm, or ankle, or those who did not wish to wear the device on any of these positions, we placed the GT3X device on the opposite side, or removed it altogether, if necessary. Nurses were instructed to remove the devices for bathing and medical procedures, if necessary, and to replace the devices afterward. The GT3X sensor weighs less than 19 grams and can record data for up to 25 days. It records activity intensity in form of activity counts47. We recorded data at 100 Hz sampling rate and used 1-min activity counts in our analysis.

Sound and light

To capture the effect of environment disruptions on sleep quality, we recorded light intensity and sound pressure levels in the room throughout the patient’s enrollment period.

Physiological signals and EHR data

We retrieved electronic health records EHR data using the University of Florida Health (UFH) Integrated Data Repository as Honest Broker. EHR data included physiological signals recorded in the ICU via bedside monitors, including heart rate, temperature, systolic and diastolic blood pressure, respiratory rate, and oxygen saturation. Additionally, we retrieved EHR information on demographics, admission information, comorbidities, severity scores, pain and CAM-ICU scores, laboratory results, medications, procedure and diagnosis codes, and enteral feeding status.

Questionnaires

As the gold standard, we administered daily questionnaires to assess patients’ sleep quality during their enrollment. We used the Freedman sleep questionnaire. We also administered the CAM-ICU delirium assessment daily. For delirious patients, we administered MDSS-4 and MDAS tests daily to identify the subtype and severity of delirium. For patients who were asleep or unavailable because of clinical procedures, the questionnaires were administered as soon as they became available, before 3 pm.

Analysis

Once data was captured by our pervasive sensing system, it was analyzed to examine different aspects of patient status and environment. We used vision information for face detection, face recognition, facial action unit detection, head pose detection, facial expression recognition, posture recognition, and visitation frequency detection. We employed several deep neural network algorithms for analyzing video data and we used statistical analysis to analyze accelerometer sensor data, as well as to compare the environmental factors and activity intensity between the delirious and non-delirious patients. For numerical variables description, we used mean and standard deviation in case of approximately normal distribution and median and interquartile range (25th, 75th) if normality assumptions were not satisfied. Differences between groups were analyzed using the Student’s t-test, Kruskal-Wallis test or Mann-Whitney-Wilcoxon analysis of variance for continuous variables and the chi-square or Fisher’s exact test for categorical variables as appropriate. We used Python: 2.7, Tensorflow: 1.1.0, OpenCV: 3.2.0, Caffe: 1.0 for implementing vision algorithms. We used Scikit-learn: 0.18.1 and R: 3.4.1 for conventional machine learning and statistical analysis.

Face detection

As a first step, we detected all individuals present in the room. We used the Joint Face Detection and Alignment using Multi-Task Cascaded Convolutional Network (MTCNN) to detect individuals in each video frame. This framework employs a cascaded architecture with three stages of deep convolutional neural networks (CNN) to predict face and landmark locations in a coarse-to-fine manner (Supplementary Fig. S3). We evaluated the accuracy of the MTCNN models based on ground truth provided by expert annotator. A total of 65,000 video frames were annotated by delineating a bounding box surrounding each individual.

Face recognition

After individual faces were detected, we performed face recognition to identify the patient in each video frame. This step is necessary since at any given moment several individuals can be present in the room, including the patient, nurses, physician, and visitors. To perform face recognition, we implemented the FaceNet algorithm, which consists of an Inception-ResNet V1 model. First, we extracted 7 seconds of still images at 15 fps containing the patient face, as training data. Training data was passed through the face detection pipeline. Thus, the input to FaceNet model is the set of aligned images obtained from MTCNN. The trained classifier for each patient was tested on 6,400 randomly selected images of the same patient, containing both patients and non-patients in the same frame. We evaluated the accuracy of the FaceNet model per each patient. If a patient was recognized with a probability of 0.9 or higher, it was reported as a positive recognition. Pipeline of the patient face recognition system is shown in Fig. 6.

Figure 6
figure 6

(a) Pipeline of the patient recognition system begins with alignment of faces in the image using MTCNN (Multi-stage CNN). The aligned images are then provided as input to the faceNet network which extracts features using a pre-trained Inception-Resnet-V1 model, then performs L2 normalization on them, and finally stores them as feature embeddings. These embeddings are used as input for the k-nearest neighbor (KNN) classifier to identify posture. (b) Pipeline of the posture recognition system includes a two-branch three-stage CNN and a KNN classifier. At each stage of the CNN, Branch 1 predicts the confidence maps for the different body joints, and branch 2 predicts the Part Affinity Fields for the limbs. These predictions are combined at the end of a stage and refined over the subsequent stage. After stage 3, the part affinity fields of limbs are used to extract the lengths and angles of the body limbs. Any missing values are imputed using the KNN imputation, and a pre-trained KNN classifier is used to detect posture from the extracted features.

Facial action unit and expression recognition

For each video frame, facial Action Units (AUs) were obtained from the OpenFace toolbox and were used to detect eight common facial expressions: pain, happiness, sadness, surprise, anger, fear, disgust, and contempt (Supplementary Table S1). Facial expressions were only considered during daytime (7 AM–7 PM), when there is sufficient light in the room. Based on the FACS formulas (Table 1), pain expression was identified as a combination of the following AUs: AU4 (brow lowerer), AU6 (cheek raiser), AU7 (lid tightener), AU9 (nose wrinkler), AU10 (upper lip raiser), and AU43 (eyes closed). Other facial expressions can be similarly identified. Once an expression was detected, we computed how often that expression ei was observed, denoted as the expression frequency fi. The relative expression frequency fi is calculated as in equation (1). Here, Ni refers to the number of frames where expression ei was observed, and N refers to the total number of frames.

$${f}_{i}=\frac{{N}_{i}}{N}$$
(1)

Posture classification

After detecting and recognizing patient’s face, we localized anatomical key-points of joints and limbs using the real-time multi-person 2D pose estimation27 with part affinity fields (Fig. 6). This allowed us to recognize poselets, which describe a particular part of posture under a given viewpoint48. The part affinity fields are 2D vector fields that contain information about the location and direction of limbs with respect to body joints. Our pose detection model consisted of two Fully Convolutional Neural networks (FCN)49 branches, where one branch detects the location of the joints, and the other branch detects the association of those body joints as limbs. Identified poselets were provided to a k-nearest neighbor (KNN) classifier to identify the full posture. To train the model using a balanced dataset, we augmented ICU patient data with scripted data (Appendix B). We considered four main posture classes to be recognized: lying in bed, standing, sitting in bed, and sitting on chair. Several ICU functional and mobility scales are based on evaluating patients’ ability to perform these activities50,51.

Extremity movement

We calculated several statistical features to summarize the accelerometer data obtained from the wrist, arm, and ankle. We used the LOESS non-parametric regression method for smoothing to show the smoothed average of activity intensity for each group over the course of the day. We compared fifteen features extracted from the accelerometer data of the delirious and non-delirious patients. Features used in this analysis include four groups of features to reflect different aspects of activity intensity patterns. First group of features includes mean and standard deviation (SD) of activity intensity for the whole day, during daytime, and during nighttime. Second group of features includes activity intensity of 10-hour window with highest sum of activity intensity (M10), time of start of M10 window, activity intensity of 5-hour window with lowest sum of activity intensity (L5), time of start of L5 window, relative amplitude- which is (M10 − L5)/(M10 + L5). Third group of features consists of Root Mean Square of Sequential Differences (RMSSD), RMSSD/SD. The fourth group of features includes the number of immobile minutes during daytime and during nighttime. Immobile minutes are defined as minutes with activity intensity of zero. Mean of activity intensity reflects the amount of patient’s activity intensity, while standard deviation of activity intensity reflects the amount of change in patient’s movement. M10 and L5, and their corresponding start times are chosen to show what times the most amount of activity and the least amount of activity occur, and to detect whether they correspond to daytime and nighttime, respectively. RMSSD is used to detect immediate changes in the activity intensity, which can potentially point to unintentional activity, since patients in the ICU do not normally have fast actions for long periods of time, and RMSSD/SD normalizes these immediate changes by the overall changes in the data captured by standard deviation. Number of immobile minutes can be used to observe the percentage of the time patients were inactive during the day (undesirable) and during the night (desirable).

Sound and light

We placed an iPod with a sound pressure level recording application and a GT3X with a light sensor on the wall behind the patient’s bed to record sound pressure levels and light intensity levels (Supplementary Table S6)).

Sound pressure level collection was performed using the built-in microphone of the iPod. Sound waves can be described as a sequence of pressure changes in the time domain, and ears detect changes in the sound pressure52. Sound pressure level (SPL) is a logarithmic measure of the effective pressure of a sound relative to a reference value and is measured in decibel (dB). Sound pressure is proportional to sound intensity and is defined as in equation (2).

$$SPL=\,\mathrm{ln}(\frac{p}{{p}_{0}}){N}_{p}=2{lo}{{g}}_{10}(\frac{p}{{p}_{0}})B=20{lo}{{g}}_{10}(\frac{p}{{p}_{0}})dB$$
(2)