A personalized and evolutionary algorithm for interpretable EEG epilepsy seizure prediction

Seizure prediction may improve the quality of life of patients suffering from drug-resistant epilepsy, which accounts for about 30% of the total epileptic patients. The pre-ictal period determination, characterized by a transitional stage between normal brain activity and seizure, is a critical step. Past approaches failed to attain real-world applicability due to lack of generalization capacity. More recently, deep learning techniques may outperform traditional classifiers and handle time dependencies. However, despite the existing efforts for providing interpretable insights, clinicians may not be willing to make high-stake decisions based on them. Furthermore, a disadvantageous aspect of the more usual seizure prediction pipeline is its modularity and significant independence between stages. An alternative could be the construction of a search algorithm that, while considering pipeline stages’ synergy, fine-tunes the selection of a reduced set of features that are widely used in the literature and computationally efficient. With extracranial recordings from 19 patients suffering from temporal-lobe seizures, we developed a patient-specific evolutionary optimization strategy, aiming to generate the optimal set of features for seizure prediction with a logistic regression classifier, which was tested prospectively in a total of 49 seizures and 710 h of continuous recording and performed above chance for 32% of patients, using a surrogate predictor. These results demonstrate the hypothesis of pre-ictal period identification without the loss of interpretability, which may help understanding brain dynamics leading to seizures and improve prediction algorithms.


Materials and methods
The strategy followed can be divided into data preprocessing, feature extraction, training, testing, and phenotype study (see Fig. 1). In short, the raw EEG is filtered and segmented into time-windows from which features are extracted. Then, the first 60% chronological seizures are used as input to the EA, which is executed 30 times. The best individual (set of features and correspondent pre-ictal time) of each execution is selected. The EA output features are then tested and evaluated with the last 40% chronological seizures. An SPH of 10 min was used both in training and testing stages. This procedure is explored for three different minimum pre-ictal periods: 40, 50, and 60 min. If the results for a given patient are satisfactory, a phenotype study can be made. These steps are described in this section and are shown for one patient, as our approach is patient-specific.
Regarding the EA output features, these will be based on a feature construction [32][33][34] process: the application of a set of constructive operators to a set of existing features, which results in the construction of new ones. The latter are believed to be more powerful, as these higher-level generated features take into account the interactions in the previous feature space. In this case, the features from the feature extraction stage (first-level features) will be used to construct second-level ones by windowing and applying a mathematical operator, constituting the phenotype features. In the following, first-level features will be named as features, while the second-level ones will be referred to as hyper-features.
Database. From the European Epilepsy Database also known as EPILEPSIAE database 1,4 , 19 DRE patients (11 males and 8 females, aged 40.26±13.52 years) from the Universitätsklinikum Freiburg in Germany were selected. The dataset comprises 120 seizures (71 for training and 49 for testing), 284 h of training data and 710 h of testing data ( ≈ 1 month). It is important to mention that, for training seizures, we selected only the last recorded 4 h before each seizure. Nevertheless, our testing data is continuous, as it comprises all available interictal data without any segment removal. Our patient selection criteria were the following: (i) patients containing www.nature.com/scientificreports/ only seizures with focus on the temporal lobe, as these are the most representative in DRE patients; (ii) patients having an average of 2-5 daily seizures and a minimum of 5 recorded seizures that were separated by periods of at least 4 h; and (iii) EEG scalp recorded specifically with a sampling frequency of 256 Hz. We selected only patients with temporal lobe epilepsy as it is the most common type of focal epilepsies 35 . All electrodes were placed according to the 10-20 system. The data was collected while patients were in the clinic for routine presurgical monitoring. The use of this data for research proposals has been approved by the Ethical Committee of the three hospitals involved on the database development (Ethik-Kommission der Albert-Ludwigs-Universität, Freiburg; Comité consultatif sur le traitement de l'information en matière de recherche dans le domaine de la santé, Pitié-Salpêtrière University Hospital; and Ethics Committee of the Coimbra University Hospital). All methods were performed following the relevant guidelines and regulations. Informed written patient consent was also obtained.
Evolutionary algorithm. We used a Genetic Algorithm (GA) as the EA, whose steps are described in Fig. 2. Population, which is a set of possible solutions, is initialized randomly with a fixed number of 100 individuals. Each individual has its encoding, which can be seen as the bridge between the problem context and the problem-solving space, where phenotypes (possible solutions) are encoded into genotypes (a chain of characters coded from the individual). Then, each individual is evaluated based on the fitness function, which is a mathematical criterion that results in a measure related to the seizure prediction performance. Then, half of   www.nature.com/scientificreports/ the individuals (parents) are selected to reproduce through binary tournament selection (parent selection). The evolution operators are recombination and mutation, where we used a recombination rate of 0.80 (80% of times, two parents produced an offspring) and a mutation rate of 1.00 (all offspring suffered a mutation). The individuals among parents and offspring with the best fitness evaluation are selected and comprise the next generation of individuals ((1+ ) replacement strategy 29 ). Evolution occurred until one of the following criteria was met: i) maximum fitness was reached, ii) fitness did not increase over the last 50 generations, iii) 15000 new individual evaluations were performed (see Supplementary Material for more information concerning the EA configuration). As a reduced computation time is desired in order to have real-life applicability, a fast convergence is desired. Thus, a greedy approach was used, i.e., points offering the most obvious and immediate benefits are chosen. Despite this strategy may not usually produce an optimal solution, it is believed that it approximates the global optimum one in a reasonable amount of time 37 .
Codification and evolution operators. Figure 3 illustrates the rationale behind genotype, phenotype decoding and mutation. Concerning genotype, a population is represented by a group of individuals, where each one is defined by five hyper-features. Each hyper-feature is encoded with seven genes: dominant feature, band-wave feature, non-band wave feature, mathematical operator, electrode, window length and time instant (minutes before the minimum pre-ictal period) (see Fig. 3a). Genotype-phenotype mapping (see Fig. 3b) consists in: (i) finding the feature that will be decoded to the phenotype for each hyper-feature by inspecting the dominant feature gene, e.g. if the dominant feature gene value is band-wave, then the band-wave feature is decoded; (ii) constructing each hyper-feature by windowing the decoded feature, from the given electrode, within the window length, and then by applying the respective mathematical operator; and (iii) placing each hyper-feature chronologically in a timeline according to its respective time instant and obtaining the pre-ictal period (the temporal distance between the first chronological hyper-feature and the seizure). The latter allows to not only analyzing a sequence of instants instead of only one instant but also to find the best pre-ictal time (see Supplementary Material for a genotype-phenotype decoding example). Then, with the hyper-features constructed and placed chronologically, it is possible to perform sliding-window analysis, classification, regularization and evaluation, all these addressed in the fitness function.
In Fig. 3c, one can visualize all possible values for each gene and its neighbourhood that must be established to perform recombination and mutation. These neighbourhoods were designed while accounting the relationship between each gene value (see Supplementary Materials for more details). Mutation, interpreted as a unitary step that will cause a random and unbiased change 29 , occurs in the following form for an individual (see Fig. 3d): one of the hyper-features is chosen randomly, and then one gene of that hyper-feature is chosen randomly to mutate. The remaining hyper-features and genes continue unaltered. Recombination is a stochastic operator that combines genetic information from two parents (individuals) into one or more offspring 29 . After selecting two parents to reproduce, this operator performs the recombination of all paired hyper-features. Thus, hyper-feature pairing is the first step and then, the recombination operator works at the hyper-feature gene level. Each offspring Fitness function. An individual fitness evaluation is made iteratively (retraining the logistic regression classifier with new seizures) according to seizure prediction performance (see Fig. 4a). For each tested seizure (see Fig. 4b), hyper-features and labels are collected from previous seizures by performing time-moving analysis (1-min step). Delayed features with a lag l = 1, 2, 3 min are also extracted, which has the objective of transforming static hyper-features into temporal ones since the used classifier does not handle time explicitly. Redundancy is handled, by removing features with an absolute correlation coefficient |ρ| > 0.95 . Then, they are standardized by a z-score process. Before classifier training, classes' weight was balanced with an inverse proportion to their frequency of occurrence (see Supplementary Material for mathematical formulation). We chose the logistic regression classifier, as it is computationally light, its decision curve takes the form of a logistic function, it is an intrinsically interpretable model and therefore, incorporates interpretability directly to its structure 27,38,39 . It models the probability p(x) of a sample x with n predictors belonging to a certain class as shown in Eq. (1) 40 , where β n is the regression coefficient value concerning the hyper-feature x n : For the tested seizure, the same procedure was applied, but using z-score parameters and logistic regression from training seizures. Then, a regularization technique is applied as it is desired to have a predictor robust to noise: the Firing Power 41 (see Supplementary Material for mathematical formulation). It quantifies the classifier rate output as pre-ictal in a past-time window with the size of the pre-ictal period. When an arbitrary threshold (the maximum tolerance for the prediction error) is surpassed, an alarm is triggered. The latter was set to a reasonable limit of 0.70.
Prediction performance is based on four measures: seizure sensitivity S p (the ratio of correctly predicted seizures), sample sensitivity S s (ratio of samples classified as pre-ictal within all pre-ictal samples); time under false alarm T f (ratio of samples classified as pre-ictal within all inter-ictal samples); and FPR/h (number of false alarms divided by the total time inter-ictal period).
Accordingly, the performance for a tested seizure is given by Eq. (2), which we consider optimal if its value is 1. The latter corresponds to correctly predicting all samples ( S s =1 and T f = 0 ) and therefore, predicting the seizure ( S p = 1 ) while not triggering any false alarm (FPR/h=0). By measuring these metrics simultaneously, performance is not only based on a seizure prediction system, but also on correctly classifying the maximum number of samples. Furthermore, FPR/h is multiplied by T f , as it is not meaningful to have the same T f with different number of alarms: it is preferred to have a shorter number of false alarms. Thus, it emphasizes the downside of having simultaneously a high FPR/h and a longer T f . Finally, the fitness function evaluation is obtained by averaging all tested seizure performances: Training, testing and statistical validation. For a patient, in a real-life context, one would choose a determined minimum pre-ictal period and would run one execution of the EA. The best set of features would then be used for predicting seizures. In an academic context, as this paper concerns an exploratory study, we executed an EA 30 times for each patient, for each minimum pre-ictal period. These sets of selected features were then prospectively tested with the patient's past 40% seizures (unseen data), using the same pipeline used for the fitness function but with one extra step: after raising an alarm, a refractory period of SOP+SPH was used. Due to the latter, we excluded refractory periods from the inter-ictal period in our FPR/h calculations, in order to only account for the period during which false alarms can be triggered, and which enables a proper comparison with other methods 16 . Performance is based on S p , FPR/h and comparison with a surrogate predictor 42 . The latter was implemented with the objective of understanding if the proposed algorithm performed above chance level. www.nature.com/scientificreports/ The surrogate predictor makes use of Monte Carlo simulations by random shifting seizure times. A model is considered to perform above chance if its performance is higher than the surrogate one with statistical significance, under the following null hypothesis that the proposed method performance is not above chance level. Unspecific methods, as the random predictor 43 , which assume that alarms are triggered randomly without using information from the EEG signal, are also commonly used. Nevertheless, despite a surrogate approach requires more computation time, it offers greater confidence in determining if a model performs above chance 11 (see Supplementary Materials for a detailed implementation of our surrogate predictor).
Since we performed 30 executions for each minimum pre-ictal period, we did two different statistical validations: one on the overall set of executions and another on the number of executions that perform above chance. Concerning the first, we considered a performance to be above chance level if its average value is higher than the one observed for the averaged surrogate predictor, with statistical significance of α=0.01 (using a one-tailed t − test ). The second validation is related to the real-life context, where we would only run one execution (and not 30) of the EA and use it: we need to understand how likely the selected features would perform above chance level, and if that probability is statistically significant. Towards the latter, we calculated the number of executions that outperformed the surrogate predictor with statistical significance of α=0.01 and verified if this number was significant for the whole set of 30 executions, by comparing the obtained ratio with the one from a binomial distribution (this procedure was inspired by Alvarado Rojas et al. 19 , as they used it to check if the number of validated patients was significant for the whole group). Thus, for a significance of α=0.05, the probability to observe, for at least i of I (individuals) executions that outperformed the surrogate predictor, is given by: Phenotype study. As EAs are associated with random components (as initialization, parent selection, and evolution operators), it is possible to obtain, for each execution, a different solution (set of hyper-features) with similar performance. Thus, the objective of performing a phenotype study is to understand the overall influence of each gene value. It is possible to calculate the gene value predictive power from a gene using by assigning to it the absolute of the correspondent logistic regression coefficient. Presence was also studied, where a binary value (1/0) was assigned considering the gene value presence in a hyper-feature. By computing these values to all hyper-features that compose an individual, one obtains the correspondent gene value predictive power for an individual. After this, one can compute the correspondent normalized gene value predictive power and normalized presence for all individuals (see Supplementary Material for more details and for mathematical formulation). Figure 5 presents the statistical validation on testing seizures for all patients and for all minimum pre-ictal periods, along with patient stratification. Thus, colour represents the ratio of executions (N executions out of 30) that outperformed the surrogate predictor while the diamond-shaped marker represents which sets of executions had an overall performance above chance level. It is possible to see that, for 40, 50, and 60 min of minimum pre-ictal periods, 42% (8 in 19), 37% (7 in 19), and 42% (8 in 19) of patient models performed above chance level, respectively. Furthermore, 32% (6 in 19) presented a performance above chance level for all three pre-ictal periods, and therefore, we consider this value as our number of validated patients, and 48% (9 in 19) for at least one pre-ictal period. Furthermore, it was possible to develop a significant number of executions that are significant for the whole set, for 89% (17 in 19) of patient-models for all three pre-ictal periods simultaneously, and for 100% (19 in 19) for at least one pre-ictal period.  Figure 5. The performance of all patient-models, organized by the minimum pre-ictal period. Colour represents the ratio of executions that outperformed the surrogate predictor, while the diamond shape means that a given overall set of 30 executions outperformed the surrogate predictor. On the top line, patient stratification is presented concerning seizure classification (*, FOA or FOIA/FBTC seizures), sleep stage at seizure onset (+, sleep/wake), circadian cycle at seizure onset (., day/night) and annotated activity pattern (x, rhythmic/non-rhythmic). On the overall column, one can see the ratio of patients whose models overall performance is above chance level. www.nature.com/scientificreports/ The average fitness value in training was 0.62 ± 0.12 and the SOP duration was 40.46 ± 8.85 min. In testing, we obtained an S p of 0.38 ± 0.19 , 0.36 ± 0.24 , and 0.37 ± 0.27 , and an FPR/h of 1.03 ± 0.84 , 0.76 ± 0.39 , and 0.58 ± 0.31 for the minimum pre-ictal periods of 40, 50, and 60 min, respectively.

Results and discussion
Patient stratification was based on seizure classification (FOA or FOIA/FBTC), seizure activity (rhythmic/ non-rhythmic), sleep stages (sleep/awake), the period of the day (night/day, with 10 pm and 7 am as time thresholds 19 ) at seizure onset. In these, a patient was selected if a determined stratification criterion was met both in training and testing seizures. Activity pattern was the only criterion that improved significantly the percentage of patient models performing above chance for at least one pre-ictal period: 58% (7 in 12). With the remaining stratification parameters, the obtained percentages were 50%, 42% and 33% for seizure classification, circadian cycle, and sleep stage, respectively. Concerning S p , it is worth mentioning that we have obtained a Pearson correlation coefficient of ρ = 0.39 and ρ = 0.29 between this metric and stratification criteria of seizure classification and activity pattern, respectively. Table 1 presents training and testing results, as well as information concerning the considered group of patients: number of seizures and recording duration, seizure activity pattern, seizure classification, and day period (i.e. Day or Night) at seizure onset. It is worth noting that we are only presenting here one set of 30 executions for each patient, which corresponds to the pre-ictal period that presented the best performance. One can find the performance for all patient-models in Table S2 from Supplementary Material. In fact, one could have included the pre-ictal period in the genotype instead of searching for different minimum values, but this can be justified by the fact that its duration influences directly the used seizure performance metrics. By including the pre-ictal period in the genotype, we would take the risk of seeing pre-ictal time changes just because it would immediately increase fitness value, while not being related to brain dynamics. Furthermore, EA hyper-features are also capable of increasing the pre-ictal period duration through the increase of all hyper-features' time instant, as explained in genotype-phenotype mapping. This operation was allowed since it depends on all hyper-features simultaneously. Moreover, experimenting with different and consecutive pre-ictal periods allowed us to explore the idea of seizure susceptibility that may be envisioned as a regression problem 10 . The fact that it was possible to build prediction models that achieved performance above chance level for all tested three pre-ictal periods, for 32% of patients, might suggest this.
It is worth noting the existence of a relation between the number of training seizures and fitness, with a Pearson correlation value of ρ = −0.49 . We assume this as natural, as it is considered a more difficult task to identify all pre-ictal samples without raising false alarms for a higher number of seizures. Furthermore, relations concerning fitness and testing S p ( ρ = 0.19 ) and testing FPR/h ( ρ = −0.25 ) were also found. These findings may lead us to believe in the existence of concepts drift that can not be handled by simply using in the EA all available data and by retraining the used features with upcoming data. Perhaps, results would differ if a new EA would be executed whenever a new seizure was available, using only the last N seizures. In other words, this would require us to re-select our features periodically with the availability of new seizures. Despite this iterative procedure would largely increase computational complexity in our study, along with the necessity of more testing seizures, it could be applied in real life, as our training stage is relatively fast (an EA execution that uses 3 seizures for training and that reaches the maximum number of iterations, takes approximately 2 h 50 min to run on a computer with an Intel Core i7-8700 CPU 3.20GHz 3.19 GHz processor, 32Gb of RAM, on Windows 10 Pro, with Python 3.7 on Spyder 3.3.4). In this study, we tried to take into account computational complexity, as real-time applicability, power-efficiency and minimal computation are important for a real-life context 11 . This is the major reason why we only used the last 4 h of data before each training seizure, as using all available one would enlarge significantly training duration. This is a limitation of this methodology since it is advised to use all inter-ictal control data, as a restriction could introduce a confounding bias 16 .
The performance results from the proposed approach and other studies [17][18][19] , based on patients from the EPI-LEPSIAE database, can be compared here. Authors in those studies used lower SPH intervals. As these methods can be used for not only the development of closed-loop systems integrated with seizure-suppression strategies but also used for a patient warning-system only, we opted for a longer SPH in order to account all possibilities. We believe that a 10-min SPH may be a reasonable limit for a patient to minimize seizure consequences, as safely stop driving a car before a seizure. The results reported by Alvarado Rojas et al. 19 for patients with temporal-focus seizures, despite outperforming the proposed methodology's sensitivity ( S p = 0.66 ), and presenting a lower FPR/h ( FPR/h = 0.33 ), obtained a percentage of patients performing above chance level of about 10% (3 out of 34), which is lower than ours for any of the three pre-ictal periods. Direito et al. 17 performed one of the largest studies with EPILEPSIAE concerning the number of patients: 218 with only 11% patient-models performing above chance, and an overall sensitivity of 0.39, which is similar to ours. Nevertheless, it is important to stress that these authors used the random predictor 43 for statistical validation while we used a surrogate method.
Other studies 18,20,44 using EPILEPSIAE database could be compared, but in these, several models were trained and tested, where the best predictor was selected based on testing performance. This selection procedure results in an over-estimation of the real performance, given that if a higher number of predictors are tested, the chance to hit seizures, only by chance, increases. This can explain the higher performance obtained. It is somehow limited in real-life application, as it is not possible to choose the best model based on testing values. A fair comparison with the present approach would correspond to select, for each patient, the set of features and corresponding pre-ictal period that best performed on our testing set.
Alvarado Rojas et al. 19 used a threshold classifier. Concerning features, despite easy to be understood in terms of feature engineering, it may be somehow difficult for a physician to understand the interactions between the phase of low-frequency rhythms (slow waves and theta) and the amplitude of different sub-bands of gamma rhythms. The remaining studies 17,18,20,44 were based on Support Vector Machines (SVMs) classifiers being fed with the same features as the ones in this work, and with additional ones which can be more difficult to explain to a physician, such as the autoregressive modelling predictive error, decorrelation time, Hjorth parameters, third , rhythmic sharp waves (s), rhythmic alpha waves (a), rhythmic delta waves (d), rhythmic theta waves (t), rhythmic beta waves (b), repetitive spiking (r), cessation of inter-ictal activity (c), amplitude depression (m)), seizure classification (unclassified (UC), Focal Onset Aware (FOA), Focal Onset Impaired Awareness (FOIA), Focal to Bilateral Tonic-Clonic (FBTC)), and period of the day at onset (day, between 7 am and 10 pm (D), and night, between 10 pm and 7 am (N)). Training and testing results are also presented, where the last two columns concern the two performance statistical validations (* stands for statistical significance).  www.nature.com/scientificreports/ and fourth-order statistics and energy wavelet coefficients. Additionally, their number of features per model 10 features in average 20 . Based on the above, we believe our methodology, despite not being fully explainable to a physician, it is more intuitive than the remaining. Concerning classifiers, a threshold classifier is clearly the most simple and intuitive, which was the reason behind the usage of a logistic regression: its binary decision also concerns a threshold. Furthermore and as mentioned, it was also inspired by the concept of being more appropriate to consider the seizure prediction as a regression problem 10 , despite its final transformation into a classifier. These findings lead us to believe that it may be possible to build more interpretable models that perform above chance level for a higher ratio of patients ( ≈ 32%) when comparing with other more complex methods. However, our methodology is clearly outperformed in terms of FPR/h and sensitivity, which addresses its incapacity to handle the remaining cases and enhances the demand for increasing the model complexity. Additionally, we would like to highlight the existing limitations on these comparisons: the mentioned studies (except for 44 ) presented a higher number of patients and seizures, as well as larger inter-ictal periods when comparing to our study. As we had a small number of testing seizures for each patient, our results also present significantly large standard deviations, which must be considered. This was also the reason why we implemented the surrogate predictor, as it is more flexible 16,42 than the random predictor, it allows for adaptation to the used data and may provide a more solid validation. Moreover, as we are working with data that we have available for unlimited testing, the sliding-step size, number of used features, and number of lag features were reasonably chosen based on computation time, without any tuning concerning testing results. Despite our results may be low in terms of sensitivity and FPR/h, we are avoiding a publication bias by excessively testing our data, as this may constitute a severe problem in this type of data 11 .
Concerning a real-life context, the development of a closed-loop system that disarms seizures by electrical stimulation, which requires iEEG data, seems to be the most viable option, concerning recent developments 10,11,16,45 . Nevertheless, scalp EEG has also some advantages over the iEEG, as it is non-invasive and it would allow for a wider use: it contains fewer risks for the patient and a simple warning system could be cheaper. Another reason that led us to choose the scalp EEG was our objective of providing more knowledge concerning the ictogenesis process along with the network theory, as it proposes that even focal seizures may arise from abnormal activity resulting from a large-scale functional network that spans across lobes and hemispheres 11,16 . Furthermore, our idea of iteratively re-selecting our features by running our EA periodically would also consider the existing dynamics of the epileptic network, which is not static and would provide a greater insight 11 .
One of the major reasons to propose this method is the possibility to develop auditable predictors and to extract patient-specific clinical knowledge. Thus, patient 53402 (for a 40-min minimum pre-ictal period) was selected as an example, since the testing results were considered satisfactory ( S p = 0.78 ± 0.31 and FPR/ h=0.35 ± 0.11 , and outperformed the surrogate predictor). Figure 6 presents the phenotype study for the presence ( Presence(gene value ) and predictive power ( Pp(gene value ) of all genes (electrode, window length and time instant genes) from the correspondent obtained hyper-features. Regarding electrodes and their correspondent lobe and hemisphere (spatial study), there are interesting findings related to patient comfort and signal acquisition factors. For instance, it is possible, for this patient, to choose a setting of electrodes that are placed in only three different lobes. In fact, 53% of cases account for electrode placement in three different lobes, which was the most frequent scenario. This can be medically important to understand brain phenomena and to overcome real-life obstacles concerning EEG acquisition and patient comfort.
Looking at the results from the time-related genes (window length and time instant), it is possible to understand the demand for (i) the investigation of simultaneous temporal scales (different window lengths) and (ii) the search for a sequence of events. When considering all the 30 executions, not in just a single one a set of hyperfeatures used the same window length or used only one single instant. It is possible to see the presence of at least two different time windows and two different time instants for all executions. The most frequent case was the simultaneous presence of two different instants (47% of times). This demonstrates that the EA tends to choose a sequence of two instants. Thus, it searches for a seizure-related pattern instead of searching for a particular feature in a determined instant. Concerning window lengths, at least three different values were present in, at least, 80% of times, where the 1-min window was always present, followed by the 5-min one (90% of times). It is also important to mention that the temporal lobe electrodes were not more chosen than others. Nevertheless, we believe it would be likely that, for a lower SPH, we would have found a significantly higher predictive power in the temporal lobe electrodes, as we would be closer to seizure onset.
Concerning features and mathematical operators, theta band relative power (97% of cases), mean normalized frequency (67% of cases) and medium temporal intensity (60%), and average power (53% of cases) were the most extracted features. For each set of hyper-features, the minimal number of different features was three, where four was the most frequent scenario (60% of times). In all cases, the set of hyper-features presented features from both groups: non-band waves and band-waves (see Supplementary Material for a figure concerning the phenotype study of the decoded features and mathematical operators).
From the 30 sets of hyper-features, one was selected to demonstrate its interpretability. Thus, the chosen one is represented in Fig. 7 and has a SOP period of 45 min (pre-ictal period of 55 min). Its training performance was S p =1.00 and FPR/h=0.00 for training and S p =1.00 and FPR/h=0.16 for testing seizures. As it is possible to see, two different feature instants (sequence instants) with four different window lengths are present, as well as a change on the selected electrodes over time, which supports the network theory. This is a model that can be more easily explained to a physician, as the extracted features are relatively simple and in a reduced number, and the used classifier is a logistic regression, instead of a black-box model.
In short, this method offers the possibility of extracting medical knowledge from a different perspective due to the phenotype study. Additionally, it is also possible to find a set of different solutions to find which ones need to be always present and others that have interaction, i.e., which features always appear in the presence of others www.nature.com/scientificreports/ (association learning). The use of a reduced set of electrodes may provide more comfort to the patient while its acquisition may be more simple. In the case of genes with a significant predictive power when compared to others, it may indicate the existence of key properties for the EEG seizure prediction context.

Conclusion
This work can be considered as a proof-of-concept study of using EA for seizure prediction. Performance above chance level was achieved for a significant number of patients ( ≈ 32% ) while maintaining interpretability, by accounting synergy between features and all pipeline stages. Despite only 32% patient models have performed above chance level in all three pre-ictal periods, it was possible to develop for 89% of patients a number of executions that were statistically significant for all tested pre-ictal periods, which gives us hope in this methodology. Even though the training stage of this methodology may be computationally expensive and therefore, only the last recorded 4 h before each seizure were used, its real-time application is light and simple: light pre-processing and feature extraction processes, followed by the application of a logistic regression. Nevertheless, our methodology in terms of FPR/h and sensitivity is considerably outperformed by other methodologies using data from the same database, which may indicate the need for higher complexity models. Despite the obtained percentage of patients performing, an FPR/H<0.15 was not obtained 7 . Moreover, since we used data from surgical monitoring, this study can only be envisioned as a hypothesis. Towards a clinical validation, additional studies must be performed with long-term recordings from patients in their daily life, as the study carried out by Cook et al. 45 . It is also important to mention that we did not test this framework in other types of epilepsies, which concerns future work. We plan on testing the robustness of this approach in patients with other types of epilepsy including both focal onset (e.g., frontal lobe epilepsy) and generalized onset. We believe these results can be improved and that this methodology, combined with other developed approaches, confounding variables and other biosignals [10][11][12]16,41 , can help the design of novel prediction algorithms aiming at clinical acceptance.