Quantifying arousal and awareness in altered states of consciousness using interpretable deep learning

Consciousness can be defined by two components: arousal (wakefulness) and awareness (subjective experience). However, neurophysiological consciousness metrics able to disentangle between these components have not been reported. Here, we propose an explainable consciousness indicator (ECI) using deep learning to disentangle the components of consciousness. We employ electroencephalographic (EEG) responses to transcranial magnetic stimulation under various conditions, including sleep (n = 6), general anesthesia (n = 16), and severe brain injury (n = 34). We also test our framework using resting-state EEG under general anesthesia (n = 15) and severe brain injury (n = 34). ECI simultaneously quantifies arousal and awareness under physiological, pharmacological, and pathological conditions. Particularly, ketamine-induced anesthesia and rapid eye movement sleep with low arousal and high awareness are clearly distinguished from other states. In addition, parietal regions appear most relevant for quantifying arousal and awareness. This indicator provides insights into the neural correlates of altered states of consciousness.


Supplementary Note 1: TMS-evoked responses according to TMS target sites
In our study, the sleep and anesthesia-based experiments targeted the parietal region (excluding one participant who underwent xenon-induced anesthesia, which was stimulated on the motor cortex). In the case of patients, we stimulated the parietal and premotor regions; however, there was no response at certain times; therefore, we recorded TMS-EEG data at different locations. In our data, the TMS target sites in the UWS and MCS patients were over the premotor, motor, and parietal regions. Therefore, to overcome variability due to different TMS target sites, we observed TMS-evoked response in patients with UWS when several TMS target sites were applied. Supplementary Figure   S1a shows the TMS-evoked potentials at four TMS target sites (right premotor, left motor, left parietal, and right parietal regions) in four different UWS patients. As it can be observed at 15-100 ms, there were high potentials at the TMS target site. Earlier TMS-evoked potentials were more local and dependent on the TMS target site than later TMS-evoked components. However, at 200-400 ms, the effects on the TMS target site disappeared. We additionally compared the TMS-evoked responses when stimulating left or right parietal regions during NREM sleep with no subjective experience (Supplementary Figure S1b). Similarly to UWS patients, initial TMS-evoked responses were influenced by the TMS target site while later responses were less impacted. The spatiotemporal profile at 200-400 ms were almost similar when stimulating left or right parietal regions. Therefore, we used the results after 200-400 ms post application of TMS for the calculation of ECI.

Supplementary Note 2: Comparison of single-trial classification performance according to input types and classifiers
We first focused on TMS-EEG data collected during both sleep and wakefulness. During NREM sleep, TMS yielded a local and stereotypical positive-negative response, similar to the spontaneous oscillations of sleep slow waves. In contrast, during REM sleep and healthy wakefulness, TMS triggered a rapidly changing and spatially differentiated cortical response (Supplementary Figure S2). Two types of information were considered for the type of 3D input of the classifier, which are as follows: (i) data representation 1 (DR1): spatio-spectral information and (ii) data representation 2 (DR2): spatiotemporal information. Under sleep and healthy wakefulness, we classified TMS-EEG data (200-400 ms after TMS) using a CNN based on low or high states of arousal and awareness.
LDA and SVM were applied as comparative classifiers. In the LOPO cross-validation, the data of five out of six participants, excluding one participant (target) to be tested, were trained. Then, the data of the excluded target participant were tested. Because there were six participants in the sleep dataset, this process was repeated six times (Supplementary Figure S3a). No information on the target participant was included in the training phase. The classification performance for arousal and awareness in the sleep experiment is listed in Supplementary Table S1. The classification accuracy differed depending on the classifier (LDA, SVM, and CNN) and the input type (DR1 and DR2) for both arousal (Chi-square(5, 30) = 19.49, p = 0.002) and awareness (Chi-square(5, 30) = 21.82, p < 0.001).
The highest classification accuracy was 87.79 ± 2.74% (mean ± standard deviation) and 91.95 ± 5.19% using CNN based on DR2 for the arousal and awareness states, respectively (Supplementary Table   S2). Excluding SVM-DR2, the two-class classification accuracy of CNN-DR2 was significantly higher than that of the other classifiers and inputs. Further, although there were no statistical differences between CNN-DR2 and SVM-DR2 after multiple comparison corrections, the performance of CNN-DR2 was superior to that of SVM-DR2 for all six participants. Consequently, the classification performance using CNN based on DR2 was higher than other inputs (i.e., DR1) and classifiers (i.e., LDA and SVM). Therefore, in all subsequent classifications, CNN-DR2 based on the LOPO approach was used.
We speculate that the data within the time interval 200-400 ms best discriminate between consciousness and unconsciousness in DoC patients (Chi-square(2) = 14.71, p < 0.001; see Supplementary respectively. There was no significant difference in accuracy for each interval of 200 ms in both sleep and anesthesia (Sleep: Chi-square(2) = 0.50, p = 0.77; anesthesia; Chi-square(2) = 0.81, p = 0.67). We speculate that uniformly high accuracy in sleep and anesthesia conditions across time intervals is supported by the fact that TMS stimulation sites are consistent across participants and between conditions (wake vs. sleep and wake vs. anesthesia). We believe that, in DoC patients, the time interval 0-200 ms does not provide satisfactory performances possibly because early EEG responses to TMS may significantly differ across stimulation sites at early latencies. Thus, we deliberately selected the second interval (200-400 ms) in order to make the proposed framework universally applicable across conditions (sleep, anesthesia, and DoC) and across different TMS target sites.

Supplementary Note 3: Similarity in TMS-evoked responses using the hierarchical clustering for transfer learning
The averaged TMS-evoked potentials after 200-400 ms were compared to investigate the relationship among the three domains. We calculated the domain similarity distance between sleep, anesthesia, and patients with DoC domains based on cosine distance (Supplementary Figure S4). A small cosine distance between two domains indicates high similarity between these domains. Low arousal states in the sleep (NREM and REM sleep) and anesthesia domains (ketamine, propofol, xenon) were close to each other. High arousal states in the sleep (healthy wakefulness before sleep) and anesthesia domains (wakefulness before anesthesia) were also close and formed a cluster. Similarly, low awareness states in the sleep (NREM) and anesthesia domains (propofol, xenon) were adjacent to each other. Further, high awareness states in the sleep domain (REM and healthy wakefulness) were also close to high awareness states in the anesthesia domain (ketamine and wakefulness before anesthesia). During arousal, the high state in the patients with DoC domain (MCS and UWS patients) was surprisingly close to the low state, especially in the low state of anesthesia (ketamine, propofol, and xenon).

Supplementary Note 4: Comparison of classification performance for transfer learning
In TMS-EEG data, we classified arousal and awareness states into two classes (low versus high) to use domain transfer learning (Table 2). When the LOPO cross-validation was applied in domain transfer learning, all data, excluding those of one participant, were trained for calculating ECI in the target domain. Thus, if the source domains were sleep (n = 6) and anesthesia (n = 16), and the target domain was sleep, then 21 data points were trained, excluding one of the 22 participants, to calculate ECI. As there were six participants in the sleep dataset, this was repeated six times (Supplementary  Table S4).
In resting-state EEG data (without TMS), we also classified low or high in both arousal and awareness using domain transfer learning (Table 3). For arousal, the classification performance using only anesthesia was higher when compared to anesthesia with DoC as a source domain in the training phase (t(14) = 7.242, p < 0.001). However, for awareness, there were no significant differences in classification accuracy when only the anesthesia and the combined anesthesia with DoC domains were used as the source domain (t(14) = 0.708, p = 0.490). In patients with DoC, only high arousal existed; therefore, classification performance was overfitted when trained on the DoC domain only. In addition, there were no significant differences in classification accuracy between using only DoC and DoC with the anesthesia domain as source domains for awareness (t(29) = -0.322, p = 0.749). This was the same result as the classification performance of TMS-EEG data using domain transfer learning, given the absence of sleep data.

Supplementary Note 5: Range of ECI and PCI in arousal and awareness
The optimal cutoff for ECI aro and ECI awa was 0.5 in all conditions. During sleep, ECI aro ranged from 0.001 to 0.145 in NREM sleep, from 0.072 to 0.460 in REM sleep, and from 0.587 to 0.982 in healthy wakefulness. In addition, ECI awa ranged from 0.001 to 0.462 in NREM sleep, between 0.519 and 0.989 in REM sleep, and between 0.893 and 0.996 in healthy wakefulness. Under anesthesia, the range of ECI aro was between 0.571 and 0.945 for high state (wakefulness) and between 0.036 and 0.394 for low state (ketamine-, propofol-, and xenon-induced anesthesia). Similarly, the range of ECI awa was from 0.568 to 0.976 for high state (wakefulness and ketamine-induced anesthesia) and from 0.030 to 0.466 for low state (propofol-and xenon-induced anesthesia). Finally, ECI aro and ECI awa in MCS patients ranged between 0.573 and 0.995 and 0.551 and 0.930, respectively. In addition, in UWS patients, ECI aro ranged from 0.527 to 0.968 and ECI awa ranged from 0.062 to 0.473.
Consequently, low and high states were perfectly distinguished by an optimal cutoff of 0.5 in both ECI aro and ECI awa .
In ECI using resting-state EEG data, the optimal cutoff was 0.5 as with ECI using TMS-EEG data. Under anesthesia, the range of ECI aro was between 0.771 and 0.993 for the high state (wakefulness) and between 0.001 and 0.456 for the low state (ketamine-, propofol-, and xenoninduced anesthesia). Similarly, ECI awa ranged from 0.563 to 0.999 in the high state (wakefulness and ketamine-induced anesthesia) and from 0.015 to 0.421 in the low state (propofol-and xenon-induced anesthesia). In patients with DoC, ECI aro ranged between 0.589 and 0.995 for MCS and UWS patients.
Finally, ECI awa ranged between 0.520 and 0.985 in MCS patients and between 0.045 and 0.468 in UWS patients.
We calculated PCI in TMS-EEG sessions. During sleep, PCI varied from 0.100 to 0.293 in NREM sleep, from 0.315 to 0.522 in REM sleep with subjective experience, and from 0.440 to 0.668 in healthy wakefulness. PCI ranged from 0.125 to 0.273 under propofol-and xenon-induced anesthesia and from 0.388 to 0.678 in ketamine-induced anesthesia and wakefulness. Finally, in the severely brain-injured patients, PCI ranged between 0.120 and 0.300 for the patients with UWS and between 0.342 and 0.620 for MCS patients. In the four MCS* patients, the range of PCI was between 0.416 and 0.505. Figure 5 shows the classification performance from a single trial up to the standard number of trials in anesthesia and patients with DoC. In anesthesia and wake conditions, calculating ECI with single trials achieved performance of 0.742 specificity, 0.917 sensitivity, and 0.885 AUC. Sensitivity was always above 0.9 even with a single trial, and AUC was above 0.9 from 2 trials. The specificity, on the other hand, was above 0.9 from 9 trials. However, as the number of trials increases, the performance did not improve (not higher than 0.9), and it sometimes fell below 0.9. However, it was always higher than 0.821. In patients with DoC, a specificity of 0.717, sensitivity of 0.806, and AUC of 0.740 were obtained using single trials. As a result, specificity, sensitivity, and AUC were above 0.9 from 13 trials, 7 trials, and 4 trials, respectively.

Classification performance according to the number of trials in the ECI calculation
Originally, ECI is calculated by averaging the interclass probability of all trials in a single session.
Here, the ECI was computed when changing the number of trials in a single session, from 1 to 80 trials, to explore the performance of the ECI according to the number of trials. For example, if a single session consisted of five trials, the average value of five trials was calculated as ECI.

Difference between correct and incorrect trials
We compared difference between correct and incorrect trials for all participants during sleep and healthy wakefulness in terms of spatiotemporal information. TMS-evoked amplitude from 200 to 400 ms was averaged over frontal, temporal, and parietal regions and compared using a paired t-test with Bonferroni correction. The division of each region was the same as when calculating relevance scores for a specific region.
Then, ECI was calculated in each single session. propofol (P02 to P05), and (c) xenon (P02 to P05). Each colored box indicates the probability that the corresponding trial is high in each participant. Wakefulness before anesthesia and anesthesia with ketamine are classified as high awareness, whereas anesthesia with propofol and xenon are classified as low awareness.

Supplementary Figure 9. Interclass probability in single trials for ECI awa in patients with DoC:
Each colored box indicates the probability that the corresponding trial is high in each participant.

Supplementary Figure 14. Data input for ECI framework:
The time-series 2D EEG data were converted to 3D features utilizing the EEG electrode map. According to spatial information, the 10 × 11 matrix was transformed. Zero was entered for the null electrode position. The z axis represents frequency or time information. As for spectral information, the frequency indicated the delta, theta, alpha, beta, and gamma bands. As for temporal information, the time period of 200-400 ms after the TMS stimulation was considered. Thus, the input was a 10 × 11 × 5 matrix (DR1; spatio-spectral information) or 10 × 11 × 72 matrix (DR2; spatiotemporal information).

Supplementary Table 1. Classification accuracy (%) during sleep and healthy wakefulness:
For both arousal and awareness, two-class (low versus high) classification was performed using three classifiers and two data inputs. The two-sided multiple t-tests were used for post-hoc analysis using Fisher's least significant differences method for multiple comparisons. network; DR1 = 10 × 11 × 5 (spatio-spectral information); DR2 = 10 × 11 × 72 (spatiotemporal information); p-value = statistics with CNN-DR2 using Fisher's least significant differences method for multiple comparisons.

Supplementary Table 2. Statistics of classification performance under sleep data for post-hoc analysis:
The p-value was determined using two-sided multiple t-tests with Fisher's least significant differences method for multiple comparisons, and a p-value less than 0.05 is indicated in italics.

Arousal Awareness
Classifier-Input 95% CI p-value 95% CI p-value Lower limit Upper limit disorders of consciousness with post-hoc analysis: CNN-DR2 was used as the classifier and input.
In DR2, the time periods of the temporal information were generated at 200 ms intervals after the TMS. The p-value was measured using two-sided multiple t-tests with Fisher's least significant differences method for multiple comparisons, and a p-value less than 0.05 is indicated in italics.

Awareness
Temporal information 95% CI p-value