Deep anomaly detection of seizures with paired stereoelectroencephalography and video recordings

Real-time seizure detection is a resource intensive process as it requires continuous monitoring of patients on stereoelectroencephalography. This study improves real-time seizure detection in drug resistant epilepsy (DRE) patients by developing patient-specific deep learning models that utilize a novel self-supervised dynamic thresholding approach. Deep neural networks were constructed on over 2000 h of high-resolution, multichannel SEEG and video recordings from 14 DRE patients. Consensus labels from a panel of epileptologists were used to evaluate model efficacy. Self-supervised dynamic thresholding exhibited improvements in positive predictive value (PPV; difference: 39.0%; 95% CI 4.5–73.5%; Wilcoxon–Mann–Whitney test; N = 14; p = 0.03) with similar sensitivity (difference: 14.3%; 95% CI − 21.7 to 50.3%; Wilcoxon–Mann–Whitney test; N = 14; p = 0.42) compared to static thresholds. In some models, training on as little as 10 min of SEEG data yielded robust detection. Cross-testing experiments reduced PPV (difference: 56.5%; 95% CI 25.8–87.3%; Wilcoxon–Mann–Whitney test; N = 14; p = 0.002), while multimodal detection significantly improved sensitivity (difference: 25.0%; 95% CI 0.2–49.9%; Wilcoxon–Mann–Whitney test; N = 14; p < 0.05). Self-supervised dynamic thresholding improved the efficacy of real-time seizure predictions. Multimodal models demonstrated potential to improve detection. These findings are promising for future deployment in epilepsy monitoring units to enable real-time seizure detection without annotated data and only minimal training time in individual patients.

www.nature.com/scientificreports/ acquiring large, annotated datasets and screening for artifacts is time-and cost-prohibitive which diminishes utility unless the pre-trained models are exceptionally well-generalizable. Given the variety of waveforms, dynamic noise, and other idiosyncrasies often present in patient recordings, seizure detection remains challenging. We present our results from training individually tailored, self-supervised Long Short-Term Memory (LSTM) deep neural networks on continuous in-hospital multichannel SEEG and video recordings with no explicitly labeled data (Fig. 1). Here, we define seizure detection as the task of anomaly detection in high-dimensional sequences. A dynamic thresholding method, developed by NASA for use on the Mars Rover, was adapted to improve detection sensitivity and mitigate false positives, suggesting feasibility as a new, more dynamic paradigm for real-time anomaly detection in video and electroencephalographic data 14 .
Concurrent SEEG and video signals, totaling over 2000 h across all patients and channels analyzed, were processed by adapting previously described methods ("Signal processing") 15,16 . LSTMs and convolutional LSTM autoencoders were trained for each patient as described in "LSTM training and parameters". Dynamic thresholding was compared to conventional static thresholding, crossover experiments (Figs. 2, 3) were performed to characterize models' patient-specificity, and joint models incorporating SEEG and video detection were constructed to assess the added benefit of multimodal detection. Model outputs were compared to ground truth anomalous sequences agreed upon by three fellowship-trained epileptologists who were blinded to the results of the model. The positive predictive value (PPV), sensitivity, and F 1 scores were compared between models. Mean absolute percent error and minimal duration of recording data to train each model were also noted.

Figure 1.
Overview of the workflow for continuous monitoring with video and SEEG and real-time analysis in the epilepsy monitoring unit. Patients with DRE receive continuous monitoring of their intracranial SEEG leads (red) and simultaneous video recording in their hospital beds (blue). A convolutional LSTM autoencoder (CNN + LSTM) was applied to the video recordings to calculate a regularity score for each frame over time. This regularity score time series and the SEEG time series (green sequence, bottom left) were then separately fed into an LSTM network to reconstruct their signals (blue sequence, bottom middle) and calculate a reconstruction error (red sequence, bottom right) which was then subjected to a self-supervised dynamically thresholding method to identify anomalous events in real-time.   . Crossover testing produces a large increase in the number of false positive results. This indicates that trained models are attuned to the unique electrical signal of a given patient. Green shading refers to prediction mismatches that correspond to correctly identified anomalies whereas red shading refers to prediction mismatches that correspond to false positives. After training on continuous data from a given patient, testing the network on an unseen sequence derived from the same patient resulted in high fidelity of predicted sequences, with most of the prediction mismatches ( Fig. 3, left, top, green circle) corresponding to true anomalies (Fig. 3, left, bottom, green bar). Testing this same model on an unseen, normalized sequence derived from a different patient produced considerably more prediction mismatches (Fig. 3, right, top, red and green circles), resulting in higher false positive rates (Fig. 3, right, bottom, red bars).
Multimodal detection. Joint models incorporating self-supervised anomaly detection in video and SEEG recordings were constructed to determine the potential added benefit of multimodal detection (Fig. 4C)

Discussion
The study is the first to implement a multimodal self-supervised deep learning workflow for intracranial seizure detection in DRE patients. While previous studies have used bedside recordings to classify hypermotor seizures, few have jointly evaluated video and electroencephalographic feeds to detect seizures 17,18 . This study provides a novel proof-of-concept in this arena by demonstrating the potential of self-supervised anomaly thresholding to improve the sensitivity and PPV of automated seizure detection on continuous multimodal recordings in real-time. Because error residuals in anomaly detection are often non-Gaussian, the nonparametric dynamic thresholding method for error classification used in this study overcomes a major limitation of prior studies using parametric thresholding methods which assumed a distribution that does not fit the residuals. The pipeline presented in this work utilizes a LSTM network and a convolutional LSTM autoencoder to enable real-time detection of anomalous events in high-resolution SEEG and video data, respectively, making them valuable in a prospective setting. Models were trained on only 5-10 min of SEEG recordings which did not necessarily include a seizure event and labeled data was not required, thereby reducing time and cost of analysis. Crossover studies suggested the self-learned representations of SEEG recordings are patient-specific, which provides confidence in the ability of our algorithm to identify clinically relevant features given the diversity of signal properties between patients. Taken together, clinical translation of this work could personalize the care of patients and augment the workflow of staff in the EMU. By ingesting a few initial minutes of a patient's recording, this pipeline would enable continuous long-term monitoring of ictal events and reduce frequent false alarms in the context of subtle environmental changes, which would otherwise be time intensive and cost prohibitive.
Earlier methods in inpatient epileptic seizure detection have traditionally relied more on constant monitoring of patient recordings by trained personnel. This is available in approximately 56-80% of EMUs, whereas automatic online EEG warning systems are present in only 15-19% of EMUs 19 . While clinical seizure semiology provides some critical information to help elucidate the zone of onset and propagation pathways, periictal behavioral assessments facilitate an even more comprehensive understanding of these details. Most algorithms for EEG-based seizure detection in clinical settings center around multiple-channel analyses rather than singlechannel 19 . Following data acquisition by the electrodes, these systems typically employ a method for artifact rejection followed by an algorithm for event detection usually involving analysis of the electrographic changes during seizures in terms of amplitude, frequency, or rhythmicity. Methods for these analyses in previous algorithms for patient-specific seizure detection have included both linear and nonlinear time-frequency signal analysis techniques [20][21][22] . More recent studies focusing on automated seizure detection have relied on other machine learning techniques, including support vector machines, k-nearest neighbors, and convolutional neural networks, which require complete electroencephalograms before determining whether anomalies are present [23][24][25] . Such properties limit the application of these approaches primarily to retrospective data. Furthermore, unlike deep learning methods which learn the best features to implement to achieve optimal performance, these older methods require manual feature extraction and careful programming of the network to obtain acceptable results. Other work has focused on developing large pre-trained models with the goal of successful generalization to other patients 26 . Of note, there are several generalized, commercially-available seizure detection algorithms currently on the market, including Persyst-Reveal 27 , IdentEvent 28 , BESA 29 , and EpiScan 30 . The primary limitation of these methods, however, is that they may not generalize well to other patients given the wide variety of signal characteristics that may exist as a result of recording quality, patient disease and electrophysiological characteristics, or other uncontrollable factors. This, in turn, may limit clinical efficacy. In contrast, as described previously, the workflow presented in this study could be rapidly deployed in clinical settings to create patient-specific models with improved adaptability for prospective prediction.
Limitations. This study's limitations include using retrospective data for training and a relatively small patient cohort, which could introduce selection bias. While overfitting is always a concern in deep learning, we controlled for this by holding out data for validation for each patient and using early stopping criteria during www.nature.com/scientificreports/ model training ("LSTM training and parameters"). Additionally, although incorporating videos improved sensitivity, it also increased false positives. Developing more sophisticated tiered or weighted systems for escalating anomalies detected in concurrent multimodal recordings could reduce false positives in this workflow. Future work is underway to adapt these methods to a prospective, randomized format to confirm the utility of selfsupervised dynamic thresholding for seizure detection in a clinical setting.

Conclusions
Self-supervised dynamic thresholding of patient-specific models significantly improves the PPV of seizure detection in continuous SEEG recordings from DRE patients compared to traditional static thresholds. Incorporating concurrent video recordings into multimodal models significantly improved sensitivity, but reduced PPV, though not significantly. The characteristics of these models are promising for future deployment in clinical settings to improve the speed, precision, and cost-effectiveness of epilepsy monitoring, which may ultimately improve the safety profile of SEEG monitoring for our patients.

Methods
Study protocol. Patients with drug resistant epilepsy (DRE) at an academic medical center were retrospectively enrolled in the study. Subjects with significant progressive disorders or unstable medical conditions requiring acute intervention, those taking more than three concomitant antiepileptic drugs (AEDs) or with changes in AED regimen within 28 days, and patients with onset of epilepsy treatment less than two years prior to enrollment, were excluded from the study. In total, 14 consecutive DRE patients underwent surgical implantation of 10-18 multichannel SEEG leads from 2018-2019 as per standard hospital protocols (average: 15 leads, 147 channels) and subsequent in-hospital video and SEEG monitoring for 4-8 days (average: 6 days). Patients were 16-38 years old (average: 24.5 years), 57% were female, and 71.4% were taking AEDs during the recording period. All patients had recordings with at least one epileptiform event (Table 1). This study was approved by the Mount Sinai Health System Institutional Review Board (IRB). Informed consent was waived by the IRB with oversight from the Program for the Protection of Human Subjects Office. All methods were performed in accordance with their relevant guidelines and regulations.
Signal processing. High-resolution SEEG recordings sampled at 512 Hz were obtained from the Natus NeuroWorks platform, filtered with a one-pass, zero-phase, non-causal 50 Hz low-pass finite impulse response filter, and scaled to (− 1, 1). Concurrent video recordings for each patient in the monitoring unit were acquired at 480p resolution at 30 frames per second. Videos were segmented into sequential clips, converted to .tiff image files using FFmpeg, and fed into a convolutional LSTM autoencoder that was structured to have 2 convolutional layers, 3 convolutional LSTM layers, and 2 deconvolutional layers 16 . A regularity score time series was calculated for all video frames by computing the reconstruction error of each frame by summing up all pixel-wise errors, as described by Hasan et al. 15 . Signal processing was conducted using MNE 0.17.1 and SciPy Signal in Python 3.7.

LSTM training and parameters.
A self-supervised training regimen was established where each channel from the SEEG recordings and regularity score time series was divided into training and testing sequences using variable train:test splits ranging from 20:80 to 50:50. 29% of recordings in the train set had epileptiform events  (Table 2). A LSTM network with 80 hidden layers was initialized for each channel and trained on the unlabeled training sequence for up to 35 epochs (or until early stopping criteria were met) with a sequence length typically between 250,000 to 750,000 elements, which spanned anywhere from 10 to 30 min overall and either did or did not include known anomalies. To mitigate the risk of model overfitting, early stopping criteria were used while training each model. These criteria specified that training iterations must decrease the loss metric by at least 0.003 to allow additional training iterations to occur. Using a training "patience" of 5, up to 5 consecutive training iterations were allowed to occur without decreasing the loss metric by at least 0.003 before model training was stopped early. Each LSTM used a mean-squared error loss metric, an Adam optimizer, and a dropout of 0.3. Within the training sequences, 20% of the data was set aside as validation before testing. After training, the performance of each model was assessed on the unseen test sequences. The network was assessed for its ability to predict future values in real-time (Fig. 4A), compare the predictions to the actual values, and compute a smoothed error based on the difference between the actual and predicted values (Fig. 4B). LSTMs and convolutional autoencoders were implemented using TensorFlow.
Self-supervised dynamic thresholding method. A novel dynamic thresholding approach, developed by the NASA Jet Propulsion Laboratory to detect real-time anomalies in telemetry data from the Mars Rover, Curiosity, was adapted to our models to label anomalies based on the error values from the time series predictions 14 . In contrast to conventional static thresholds frequently used for anomaly detection (e.g. mean ± 2 standard deviations), this dynamic method uses a sliding window approach to find optimal local thresholds, such that the percent decrease in the mean and standard deviation of the smoothed error in the window is maximized if values above the set threshold are excluded. To mitigate false positives, an error pruning procedure was implemented in which the sequence of smoothed errors was incrementally stepped through, the percent decrease between time steps was computed, and steps with a percent change greater than 10% remained anomalies while steps with a change less than 10% were reclassified as normal.

Crossover and multimodal video/SEEG detection experiments.
To evaluate the patient-specific nature of the LSTM models, crossover experiments were conducted, in which models were trained on recordings from one patient and tested on another, while all other conditions remained identical to previous testing

Combined video + SEEG anomaly detection results
Sensitivity (mean ± SEM) 100.0 ± 0% www.nature.com/scientificreports/ conditions, including the dynamic thresholding and error pruning methods (Fig. 2). Fourteen combinations of train and test sequences derived from the study population were randomly selected to conduct the crossover experiments.
To assess the added value of multimodal detection, the concurrent video and SEEG recordings for each patient were separately fed into the corresponding deep neural networks described previously. The resulting anomalous sequence predictions made by the self-supervised dynamically threshold in the LSTM decision function for each detection modality was then pooled before comparing the predicted anomaly times with the consensus labels of the expert panel of epileptologists. We did not encounter any disagreements among the panel regarding consensus labeling within this dataset. The results of model performance on individual patient recordings are detailed in Table 3, along with the patient's clinical and electrophysiologic seizure manifestations.
Metrics for assessing signal reconstruction quality. We assessed the models for their ability to capture the underlying signal itself using standard time series metric of mean absolute percentage error (MAPE), representing each recording channel that was reconstructed by the LSTM for each patient. The MAPEs ranged from 0.15-1.57% for each patient (average: 0.75%; Table 2), suggesting generally excellent reconstruction of the SEEG signal by the LSTM. Video regularity score signals were noisier due to diverse events occurring during recording, leading to higher MAPEs (average: 19.95%; Table 2).

Statistics.
For continuous variables in this study, the Kolmogorov-Smirnov test was first used to test for a normal distribution. Given the lack of a normal distribution in the data of this study, continuous variables were compared using the Wilcoxon-Mann-Whitney test. A threshold of p < 0.05 with two-tailed testing was used to determine statistical significance. Statistics were conducted using Prism 7.

Data availability
Data from this study are available upon reasonable request. In accordance with institutional policy for data protection, a Data Transfer Agreement must be completed between Mount Sinai and the requesting institution.  www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.