The value of intra-operative electrographic biomarkers for tailoring during epilepsy surgery: from group-level to patient-level analysis

Signal analysis biomarkers, in an intra-operative setting, may be complementary tools to guide and tailor the resection in drug-resistant focal epilepsy patients. Effective assessment of biomarker performances are needed to evaluate their clinical usefulness and translation. We defined a realistic ground-truth scenario and compared the effectiveness of different biomarkers alone and combined to localize epileptogenic tissue during surgery. We investigated the performances of univariate, bivariate and multivariate signal biomarkers applied to 1 min inter-ictal intra-operative electrocorticography to discriminate between epileptogenic and non-epileptogenic locations in 47 drug-resistant people with epilepsy (temporal and extra-temporal) who had been seizure-free one year after the operation. The best result using a single biomarker was obtained using the phase-amplitude coupling measure for which the epileptogenic tissue was localized in 17 out of 47 patients. Combining the whole set of biomarkers provided an improvement of the performances: 27 out of 47 patients. Repeating the analysis only on the temporal-lobe resections we detected the epileptogenic tissue in 29 out of 30 combining all the biomarkers. We suggest that the assessment of biomarker performances on a ground-truth scenario is required to have a proper estimate on how biomarkers translate into clinical use. Phase-amplitude coupling seems the best performing single biomarker and combining biomarkers improves localization of epileptogenic tissue. Performance achieved is not adequate as a tool in the operation theater yet, but it can improve the understanding of pathophysiological process.

Inter-ictal ioECoG recordings are richer in content than only sparse events such as spikes, which is why more sophisticated approaches exploiting different background features of the recorded signals have been suggested [25][26][27] . During the past years a plethora of biomarkers based on signal analysis have been developed. These biomarkers can be conceptually subdivided in three categories: univariate, bivariate and multivariate methods. Univariate methods provide information related to each signal separately, bivariate methods investigate the relationship, most typically correlation, between each couple of signals (known as functional connectivity 28 ), while multivariate methods estimate the global relationship between all the signals available. Further differentiation is possible within categories: for example when calculating signal correlations whether the calculation of the biomarker is amplitude or phase-based, time or frequency based, undirected or directed, linear or nonlinear [25][26][27] .
Several studies have shown that certain of these signal biomarkers in the ongoing ECoG can distinguish epileptogenic tissue from non-epileptogenic tissue without reliance on sparse events like spikes or even sparser events like seizures [29][30][31][32][33][34][35][36] , but only statistically on the aggregate group level. A biomarker set is needed that precisely identifies the tissue that should be surgically removed at the patient level, to be applied for clinical care.
Clinical translation of such signal biomarkers is currently hampered by high inter-subject variability, and not easily generalizable. This problem is compounded by the lack of an objective (and commonly accepted) way to assess the biomarker performance in comparison to a "gold standard". The closest approximation of a 'groundtruth' scenario is typically built relying on the outcome after resection.
We aimed to develop a pragmatic test-bed for different kind of signal biomarkers proposed in recent literature [30][31][32][34][35][36][37][38][39][40][41][42] for which an effect was reported on group level. We would like to test if these signal biomarkers can move beyond group level effect and help guiding the surgery.
We defined a 'ground-truth' scenario with the attempt to preserve the cause-effect relationship between the removal of the suspected EZ and seizure outcome, with the implicit assumption that is possible to instantaneously detect an effect (i.e. interruption of the epileptic network) in ioECoG after resection. This is possible thanks to the nature of our data: we have both pre-resection and post-resection recordings, information related to the resected area in patients with a good seizure outcome (i.e. Engel 1A). We reasoned that the EZ was sufficiently removed in those patients who become seizure-free without medication after the resection, such that we may use the post-resection recordings to compute a reliable signal biomarker reference value for 'normal' tissue (i.e. not able to trigger a seizure). Comparing this reference to the pre-resection biomarker values, computed on the resection area, makes it possible to have a proper estimate of the performances, in terms of the biomarker effectiveness to localize the epileptogenic tissue. Ideally, the successful biomarker should have a value bigger than the reference biomarker value in order to pinpoint the suspected epileptogenic location.
We investigated, in our ground-truth scenario, the performances of different signal biomarkers that have been used in previous studies [30][31][32][34][35][36][37][38][39][40][41][42] . We defined our biomarker pool with the attempt to be exhaustive according to the three different categories of univariate, bivariate and multivariate biomarkers. We combined all the biomarkers together, given that they potentially carry different information, and assessed the performance of our multi-feature biomarker. The purpose of this study was to assess the effectiveness and the impact on clinical translation of signal biomarkers based on ioECoG. Methods patients. We selected patients from a retrospective database of refractory epilepsy patients (RESPect) who underwent ioECoG-tailored resective surgery at the University Medical Center (UMC) in Utrecht, the Netherlands, between 2008 and 2018. The database was collected following the guidelines of the institutional ethical committee and all the methods were carried out and approved in accordance with the Medical Ethics Committee of the UMC Utrecht (Metc 18-109). For the retrospective part, that informed consent was waived by the Medical Ethical committee of the UMC Utrecht.
We included patients if (1) data was anonymized, visually assessed, annotated and imported in BIDS (2) at least one minute artefact free pre-and post-resection ECoG recording was available (3) post-surgical seizure outcome after 1 year was available, (4) the recording grids format was 4 × 5 electrodes with or without additional strips , (5) availability of pictures pre-and post-resection to label the electrodes (resected or not), (6) ECoG was sampled at a 2,048 Hz, (7) patients were not included in the HFO trial 44 , (8) patients had a 1 year good seizure outcome (Engel 1A) after surgery.
These criteria restricted our dataset to 47 patients: 30 of whom underwent anterior temporal lobe resection with amygdalohippocampectomy and the remaining 17 being patients with an extra-temporal lobe resection. We defined a subset of patients as 'cured' if after one year they belonged to Engel 1A class and stopped using anti-seizure medication.
Data acquisition. IoECoG signals were recorded for clinical purposes using 4 × 5 electrode grids and 1 × 6 or 1 × 8 electrode strips (Ad-Tech, Racine, WI) placed directly on the cortex. The grids and strips consist of platinum electrodes with 4.2 mm 2 contact surface, embedded in silicone, and 1 cm inter-electrode distances. Recordings were made with a 64-channel EEG system (MicroMed, Veneto, Italy) at 2,048 Hz sampling rate using an anti-aliasing filter at 538 Hz. The signal was referenced to an external electrode placed on the mastoid. Grids and electrode strips were placed in multiple arrangements on the cortex/cortical resection area before and after resection. Propofol was used to induce general anesthesia and maintained using a propofol infusion pump. Propofol was interrupted during ioECoG recordings until a continuous background pattern was achieved.
Data preprocessing and processing. The recordings from channels with visually marked noise (double checked by at least two people in a common reference montage) were excluded. The data was then re-referenced using a bipolar montage. The bipolar montage for the grid was computed both along the horizontal and vertical directions of the grid. This was done in order to take into account possible different orientations of the sources underneath and to optimally use all electrodes. For each situation the selected minute was divided in 5 s segments and the following preprocessing steps were applied independently for each segment: detrending, demeaning and z-score transformation. Depending on the specific measure, additional pre-processing steps were applied (see "Biomarkers" section for details). If the specific measure required filtering a finite impulse response (FIR) filter was used.
For every univariate measure (Auto-Regressive Residual Modulation, and phase-amplitude coupling, see "Biomarkers") we averaged the values across the segments to obtain a unique value for each bipolar channel across the situation. For bivariate and multivariate measures, producing functional connectivity matrix in each time segment, we first averaged in order to have one value per bipolar channel; second we averaged these values across time segments. Granger Causality (GC) was a special case because the multivariate model was fitted pooling the segments all together which resulted in one functional connectivity matrix (i.e. no need to average across the segments).
If the functional connectivity matrix was obtained from a directional measure (non linear correlation coefficient h 2 , GC, short-time direct Directed Transfer Function, sdDTF see Biomarkers) we considered the outstrength (i.e. the effect that a channel has on the other channels).
Furthermore we repeated the analysis using a common average montage.
Identification of electrodes covering resected tissue. Electrodes were classified into resected or nonresected using photographs taken during surgery. We labeled a bipolar channel (1) as resected if both monopolar channels were included in the resection area; (2) as not-resected if both the monopolar channels were excluded from the resected area and (3) bipolar derivations for which one monopolar channel was resected and the other not were not considered/excluded from analysis.
Measuring effect across all the channels. We considered for each biomarker all values computed in the pre-resection situations for channels that were eventually resected (from now on 'pre-resection resected channels'), and we compared them with the channel values computed on all post-resection situations in cured patients to assess if the different biomarkers could detect an effect on a group level. We tested for differences in the distributions with a two-sample one-sided Kolmogorov-Smirnov test (testing pre-resection resected values > post-resection values) since we used a priori information regarding the directionality of the effect from previous works 30-32,34-42 . Measuring effect using maximum per patient. All biomarkers that showed a significant difference (p < 0.01, no correction for multiple comparison) in the previous analysis were further analyzed. We calculated the maximum value of the biomarker across all pre-resection situations in resected channels of all patients and the maximum value across all post-resection situations in only cured patients. Then, we compared the distributions of maxima across patients between pre-resection resected channels and post-resection channels using a two-sample one-sided Kolmogorov-Smirnov test (testing pre-resection resected values > post-resection values). We defined a threshold to discriminate between pathological and healthy tissue by taking the maximum value of the biomarker across all channels of the post-resection situations in the cured patients. We reasoned that if a patient becomes seizure-free without medication after surgery, it means that the operation was successful: enough tissue was removed and the remaining tissue can be considered not able to generate seizures. Therefore, measuring the biomarker in this tissue (what is left after resection, post-resection situations) can give an estimate of 'normal' values of the biomarker. Choosing a threshold as the maximum across channels/situations/patients represents a way to define a 'universal' threshold that can be applied to discriminate between normal and epileptogenic tissue even for new patients.
For each subject and each biomarker, we considered as a successful outcome the detection of at least one value of the pre-resection resected biomarker values above the threshold. We defined an overall performance across subjects counting the number of patients for whom such condition was fulfilled (see Fig. 1).
We avoid the comparison between resected versus not-resected (i.e. sensitivity and specificity) channels for two main reasons: (1) such a comparison may reveal more insight about the relationship between the biomarker and the resection strategy rather than the epileptogenicity and the biomarker, because the resected area is usually larger than the epileptogenic zone; (2) specificity (i.e. ratio between pre-resection not-resected values below the threshold and all pre-resection not-resected values) is biased since we cannot have full coverage of the brain recording not-resected areas.
In addition, we defined a 'cumulative' biomarker combining together all the biomarkers. Specifically, we considered a patient to be a successful outcome if for any of the biomarkers at least one (or more) is above its respective threshold value. We then computed the overall number of successful patients (i.e. patients for whom at least one biomarker out of the pool was able to localize epileptogenic tissue in one of the pre-resection resected electrodes). Mesiotemporal versus neocortical channels. For temporal patients, the first three electrodes of the electrode strip directed at the mesiotemporal structure (hippocampus, amygdala and enthorhinal cortex) were classified as mesiotemporal channels and the other channels (i.e. grid + remaining strip electrodes) were classified as neocortical channels. We compared the values of hippocampal channels and neocortical channels using a two-sample one-sided Kolmogorov-Smirnov test (testing mesiotemporal channel values > neocortical channel values) in order to understand the effect of the anatomical structure on the overall result. computed signal biomarkers. Univariate biomarkers. Auto-regressive residual modulation. The auto-regressive residual modulation (ARRm) provides the amount of non-harmonicity in the signal quantified as the high residual variation after auto-regressive modelling 37,38 . It has been shown that brain tissue with high non-harmonicity corresponds to areas with high frequency oscillations (HFOs) which in turn may be an indication of epileptogenic tissue [45][46][47][48][49] .
Following Geertsema's work 38 we defined the ARRm parameters as: (1) window length of 40 samples, which with a sample frequency of 2048 Hz, corresponds to approximately 20 ms; (2) consecutive 50% overlapping windows. For the detailed formula see "Appendix".
Phase amplitude coupling. Phase-amplitude coupling (PAC) is a form of cross-frequency coupling 50 where the amplitude of higher frequency oscillation is modulated by the phase of lower frequency oscillation. Recent studies [39][40][41][42]51 have shown that high PAC values are related to the SOZ. There are many proposed methods to estimate PAC 52-58 and different parameter choices that can be made (i.e. the low and high frequency interval where estimate the phase and amplitude). We decided to investigate our dataset computing PAC between the modulating phase of theta band activity (3-4 Hz) and the amplitude of gamma activity (80-500 Hz) because this frequency band pairs were successfully investigated in recent epilepsy related studies [39][40][41][42]51 . For the detailed formula see "Appendix".
Bivariate biomarkers. Phase locking value. Phase locking value 59 (PLV) is a non linear bivariate measure quantifying frequency-specific phase synchronization between two signals. Mormann and colleagues 34 have been one of the first group to show how mean phase coherence (another name for PLV) can correctly lateralize the side of the epileptic focus using inter-ictal ECoG recordings. We computed PLV in the gamma frequency band (30-80 Hz), see the formula in the "Appendix". . This threshold is compared to the pre-resection resected recordings for all patients. For each patient if any of these latter values is higher than the reference threshold we have a successful outcome (i.e. the biomarker was able to localize the epileptogenic tissue). The number of successful outcomes was used as an overall performance measure for the biomarker.
Scientific RepoRtS | (2020) 10:14654 | https://doi.org/10.1038/s41598-020-71359-2 www.nature.com/scientificreports/ Phase lag index. The phase lag index 60 (PLI) is a bivariate measure quantifying the asymmetry of the distribution of the phase differences between two signals. Van Dellen et al. 31 investigated inter-ictal ECoG using PLI in temporal lobe patients. They showed that PLI was related to disease history. Moreover, van Diessen et al. 32 found that network based PLI quantities (strength and eigenvector centrality) in theta and gamma frequency bands, were associated with areas with HFOs and SOZ in temporal lobe patients. We chose to compute the PLI in the gamma band (30-80 Hz) using the formula in the "Appendix".
Non linear correlation coefficient. The non linear correlation coefficient h 2 xy between signal x and y is an extension of the linear correlation coefficient that captures both linear and non linear interactions. It has been widely used to analyze brain signals in the field of epilepsy (see recent reviews 25,26 ). Of note, the work of Bettus and colleagues 35 showed that h 2 provided information related to the localization of the epileptogenic focus using inter-ictal ECoG recordings. In this latter work the effect measured by h 2 was significant for theta, alpha, beta and gamma bands and mostly independent to inter-ictal spiking. We computed h 2 in gamma band  using the formula in the "Appendix". We looked also at the possible delayed effect computing h 2 shifting one signal compared to the other for different delays ([− 0.0332 s 0.0332 s] in steps of 0.0083 s) and we chose the delay which gave the maximum h 2 .
Granger causality (time-domain). Granger causality (GC) methods are statistical approaches based on autoregressive modelling determined on the data that estimated the amount directed relationship among different time-series. In a bivariate scenario, one time-series x 'Granger cause' another time series y if the inclusion of past values of x reduces the variance of the modelling error compared to modelling error using only y past values. This can be generalized to a multivariate scenario where the reduction of modelling error of the multivariate model is used instead (see Blinowska 61 for detailed review).
Park et al. 36 successfully applied time based multivariate Granger causality to inter-ictal ECoG recordings showing that ictal networks can be inferred from inter-ictal recordings. They showed that there was a significant correlation between the epileptogenic location inferred using ictal recordings (i.e. defined by a neurologist team) with the location pointed out using GC on inter-ictal recordings. We applied the analysis pipeline suggested by Park et al. 36 , therefore we added the first order differentiation as an extra step in the pre-processing analysis before to compute the z-scores, furthermore the Akaike's Information Criterion 62 was used to select the optimal model order. See the "Appendix" for formulas.
Short-time direct directed transfer function. An extension of Granger causality methods to the frequency domain is the directed transfer function 63 . It estimates the causal (in Granger sense, reduction of the modelling error) influence a time-series x exerts on a y time-series in a multivariate modelling of the time-series. Shorttime DTF (sdDTF) represents a further development of the DTF in order to capture the dynamic changes of the causal relationship 61 .
Zweiphenning et al. 30 using sdDTF on ioECoG recordings observed that the out-strength (i.e. quantification of the 'driving' behaviour of a channel) of a channel in high-frequency bands (gamma and ripple band) matched the resected channels in patients with a seizure-free outcome. We computed sdDFT following the Zweiphenning's pipeline, therefore we chose a model order of 30 samples, since this model order gave the best results. See "Appendix" for details.
Code implementation. All the code is available at https ://githu b.com/sufor raxi/multi ple_bioma rkers . We used MATLAB (Release R2019a, The MathWorks, Inc., Natick, Massachusetts, United States.) as a software framework plus the following toolboxes fieldtrip 64 , SIFT 65,66 (for the computation of sdDTF) and MVGC 67 (for the computation of GC).

Results
patient description. Table 1 shows the patients characteristics. Our dataset consisted of 47 drugs-resistant epilepsy patients (23 male; mean age 25.3) with good outcome (Engel 1A). Thirty of these patients were temporal patients who underwent hippocampectomy as part of the resection, while the remaining were extra-temporal. Thirteen patients had successfully withdrawn all medication after surgery (cured patients); 16 patients managed to control seizures with a lower dosage of anti-epileptic medication and 18 kept the same dosage of medication. The primary pathology diagnosis is reported in Table 1. Pathology diagnosis revealed in the majority of the patients (N = 17) a low grade tumor (WHO I + II), 7 patients a focal cortical dysplasia, 7 patients a cavernoma and 7 patients gliosis/scar tissue. In addtion, there were 5 patients with mesiotemporal sclerosis, 2 patients with cortical malformation development, one patient with tuberous-sclerosis and one patient with no abnormalities. The total number of bipolar channels in the post-resection recordings were 1,864, while 1,138 channels were recorded during the pre-resection phase were eventually resected ("resected channels"). For our analysis we did not use the non-resected channels (1754) nor the channels that we could not assign a label ("cut channels", 417) from the pre-resection recordings. On average we had around 40 channels per subject recorded in the postresection and about 24 channels labeled as resected in the pre-resection recordings.
Measuring effect across all the channels. Figure 2 shows the comparison between the biomarker distributions of values computed in pre-resection resected channels in improved patients (Engel 1A) and postresection channels in cured patients (Engel 1A without medication). Five out of seven biomarkers (ARR, PAC, PLI, H2, GC) were significant (p < 0.01) using a one-sided Kolmogorov-Smirnov test. www.nature.com/scientificreports/ Measuring effect using maximum per patient. Figure 3 shows the comparison between the distribution of maximum values computed in pre-resection resected channels for improved patients and maximum post-resection channel values for cured patients. We show the five biomarkers for which a significant effect was reported across all the channels (Fig. 2). Each coloured dot represents the maximum value of the biomarker for each individual patient across all the channels and situations. Although for all the biomarkers the pre-resection resected distribution has a longer tail than the post-resection one, only the distributions related to PAC are significantly different (p < 0.01). We defined a threshold for each biomarker as the maximum across all patients in the post-resection distribution so that we could quantify the number of patients for whom in the pre-resection recordings we could localize the channel to be resected. The best performance was obtained using PAC, 17 out of 47 patients are above the threshold.
pooling together all the biomarkers. Figure 4 shows for each patient in the pre-resection recordings the number of biomarkers above their respective thresholds (i.e. the maximum across all channels and all patients in the post-resection cured group). For each patient, we counted if at least one channel showed a value higher than the respective threshold for each biomarker. The biomarker counts are higher in temporal patients compared to extra-temporal ones. There was no patient with all the biomarkers above the thresholds. These two results suggest that the biomarkers convey different kind of information related to epileptogenicity.
For the 'cumulative' biomarker, combining together the contribution of all biomarkers (i.e. at least one biomarker above the threshold), we obtained a performance of 23 out of 47 patients outperforming the best single biomarker performance by 12%.
Upon investigation of the two subgroups (temporal vs. extra-temporal patients), we recomputed the threshold for each biomarker for each subgroup. Figures 5 and 6 show the number of biomarkers above the thresholds per patient in the two subgroups. The biomarkers seem more sensitive in the temporal subgroup (median number of patients above threshold across biomarkers 15 out of 30 versus 2 out of 17 in the extra-temporal subgroup). Figure 7 summarizes the performances of the 'cumulative' biomarker for the whole group, the temporal and extra-temporal subgroups. The best performance was obtained considering the temporal subgroup: 29/30 patients showed a value in the pre-resection recordings higher than the threshold for at least one biomarker of the pool of biomarkers. For the extra-temporal patients 4 out of 17 patients were above the threshold. common average montage. Repeating the analysis with a common average montage did not change the results when we measured the effect across all channels Supplementary Fig. S3. The same five biomarkers (ARR, PAC, PLI, H2 and GC) were significantly different (Kolmogorov-Smirnov test: p < 0.01) between pre-resection resected channels in improved patients (Engel 1A) and post-resection channels in cured patients (Engel 1A without medication). When we computed the difference using the maximum statistic per patient, we could not find any significant results (even though the trend was similar, see Supplementary Fig. S4).
For the 'cumulative' biomarker, combining together the contribution of all biomarkers (i.e. at least one biomarker above the threshold), we obtained a performance of 27 out of 47 patients outperforming the case using the bipolar montage (23/47) (see Figs. 8 and 9).
Mesiotemporal versus neocortical channels. In temporal patients we found a significant (p < 0.01) difference between mesiotemporal channels and neo-cortical channels for the PAC (using only the bipolar montage) and GC (both bipolar and common average montage), while no significant difference was found for the other biomarkers.

Discussion
This study investigates the performances of different univariate, bivariate and multivariate signal biomarkers, used separately and combined, to discriminate between non-epileptogenic and epileptogenic tissue using interictal data derived from ioECoG. We performed all the analyses in a ground-truth scenario, using post-resection recordings of completely cured patients (for whom seizure control without medication was achieved for at least one year after resection) as a way to define a reference threshold for non-epileptogenic tissue to be compared with channels in pre-resection recordings of improved patients.
We could replicate previous findings regarding the detection of an overall effect, epileptogenic versus nonepileptogenic tissue, comparing separately the distribution of pre-resection recordings against post-resection recordings in 5 out of 7 biomarkers (this was true independently of the reference montage, bipolar or average). This is a remarkable result considering the differences in methodological (and arbitrary) choices we used to harmonize the analysis pipelines of the aforementioned studies. For the biomarkers for which we failed to observe a significant effect (PLV and sdDTF) this failure could be indeed related to different signal processing pipelines regarding the epoch length 69 , the reference montage 70,71 , the different state of vigilance [72][73][74] .
The resection area in good seizure outcome patients often includes normal brain tissue along with electrophysiologically abnormal tissue. In order to overcome this problem, we repeated our analysis using the maximum value across channels for each subject. Although each tested biomarker showed a longer tail of the pre-resection values compared to the post-resection values, only the PAC showed a significant difference between the two www.nature.com/scientificreports/ distributions (only in the bipolar montage). PAC also revealed the best performance allowing to detect the pathological tissue in 17 out of 47 patients (15 out of 47 using average montage). Our results confirm the important role of cross-frequency coupling in neuronal communications 50,75 and also reinforce the idea that abnormal PAC values are linked to ictogenesis [39][40][41][42]51,68,[76][77][78] . A possible shortcoming for PAC is that it may be affected by ringing artifacts of sharp transients 57 and it has been shown how ECoG inter-ictal spikes affect PAC estimation 51 . We did not account for this possible bias, however since inter-ictal spikes contribute to the definition of the irritativezone 79 we believed that our analytic approach is justifiable.  www.nature.com/scientificreports/ A possible explanation of the failure of the other biomarkers might be the fact that the maximum across channels represents a too strict and crude statistic to detect an effect. In fact, the maximum statistic works on the implicit assumption that one channel with an electrophysiologically abnormal value (i.e. higher value than the threshold) is enough for ictogenesis. The maximum as a statistic may overlook important global network features that go beyond single channel statistics, as critical mass in terms of epileptogenic tissue that is needed to trigger seizures [79][80][81][82][83][84][85][86] . Moreover, choosing only one channel makes the method sensitive to artefacts not recognized during the pre-processing.
Furthermore, we used the maximum across cured patients to build a universal threshold and this results in the threshold being dependent on the group studied (i.e. using a different subset of cured subjects may change  www.nature.com/scientificreports/ the threshold and the results). However, as it is clear from Fig. 3, we are in the worst-case scenario (i.e. any subset of the cured patients will improve the performances).
The combined biomarkers using a bipolar montage improved the overall performances from 17 out of 47 to 23 out of 47 individual patients for whom the detection of pathological tissue in the pre-resection recordings was feasible. When applying an average montage and combining the biomarkers, performances further improved reaching 27 out of 47. These results, as a whole, suggest that different biomarkers may capture different mechanisms of ictogenesis and it is inline with recent literature suggesting that more robust results are shown by combining different biomarkers 40,41,87 since they potentially exploit independent information.  www.nature.com/scientificreports/ Indeed, the reference montage has an effect on the biomarkers 70,71 , however on the overall it seems that results are robust. A more in depth analysis comparing the effect of different montage on ECoG data is worthwhile, but beyond the scope of this study.
When we performed separately the analysis depending on the type of epilepsy (temporal or extra-temporal), the combination of multiple biomarkers for temporal patients held the remarkable result of 29 out of 30 patients for whom we could localize the epileptogenic tissue, while this was true only in 4 out 17 patients for the extratemporal group. Using an average montage we observed a similar divergence (28 out of 30 for temporal patients and 8 out of 17).
This performance difference may point out different structure related mechanisms (i.e. neo-cortical versus mesiotemporal) involved in the ictogenesis or different structure related physiological variation in biomarker values. Hence, considering the two different anatomical patient groups allows for a definition of a better reference threshold (i.e. more structure tuned) to discriminate between normal and pathologic tissue. Indeed, results may be influenced by the different neurophysiological properties of the tissue independently from the epileptogenicity (i.e. biomarkers are detecting a difference in terms of structure, mesiotemporal vs neocortical, rather than epileptogenicity). A recently published intracranial ECoG atlas 88 of recording in healthy tissue points into this direction, highlighting how different anatomical brain areas have specific electrophysiological signatures in terms of spectral oscillatory and non-oscillatory properties. Furthermore, the benefit and the need to assess biomarkers relative to anatomically normative values has been reported for univariate measures 51,89,90 and bivariate measures 91 . Given the limited spatial extent of our recordings and the inability to precisely localize the electrodes we did not perform such analysis.
We found a significant difference (PAC and GC for bipolar montage, while only GC for average montage) in temporal patients comparing mesiotemporal channels with neocortical ones. However, the results of 29 out of 30 (for bipolar montage) cannot be fully explained in structure related terms since for temporal patients the maximum value above the biomarker reference was found in the mesiotemporal channels 13 times out of 24 for PAC and 5 out of 13 for GC.
Nevertheless, our results regarding temporal patients are comparable to a recent similar work on epileptogenic localization on a dataset of predominantly temporal patients 41 . The poor performances, using a bipolar montage, in the extra-temporal group could also be affected by the limited amount of resected channels considered in the pre-resection recordings. The mean number of resected channels available in the extra-temporal group was around 13 compared to 30 in the temporal group, and for two extra-temporal patients only one channel was available.
It is important to realize that, in this retrospective study, the total amount of data analyzed per situation (1 min) is a drawback since it has been shown that longer periods of data are needed to detect pathological signatures 40,41,73 . The time constraint in intra-operative recordings will always be an issue, as the goal is not only to find a biomarker able to discriminate what is epileptogenic/non-epileptogenic but it is to accomplish it in a reasonable amount of time during surgery.
The choice of solely the gamma frequency band for some of the biomarkers is a limitation, since it has been reported that brain networks findings in intracranial recordings are frequency-dependent 29,92 . We therefore investigated in the Supplementary materials the different frequency bands for PLV and PLI, since recent literature 32,93,94 reported an effect for some of the classical frequency bands. Overall, the analysis using the gamma frequency band seems the one with the better performances. Our a priori choice was motivated by previous works in which gamma frequency band appeared consistently to reveal significant results using inter-ictal intracranial recordings 29,30,33,35 .
We did not compare and integrate an high-frequency oscillations analysis in our results, even though recent literature reported on the predictive power of such biomarker alone (both interictally 15-17 and ictally 78 ) or combined with other biomarkers 40,41 . However, since an unequivocal definition of a HFOs is still missing (even though some efforts have been done in this respect 95 ) and their automatic detections is biased by artefacts in intra-operative data and depend on visual scoring, we preferred not to include the HFO analysis in our study and applied only channel-based automatic methods with the attempt to find an objective automatic way that could be easily implemented during surgery to assist the clinical neurophysiologist in accessing the ECoG. The ARR biomarker should account for the effect of HFO since they are highly correlated 38 . Future investigations considering different aspects of HFO analysis pipeline (i.e. visual scoring, reference montage, different detection algorithms) are desirable but are out of the scope of the current study.
There are three main limitations related to localization matters. The first consists of the not straightforward way to project the bivariate and multivariate biomarkers computed from signals recorded from two (or more) different locations to a single location. This is not an issue for univariate measures since they provide a more confined measure, in terms of localization. The employment of a bipolar montage, even though to a lesser extent, posit the same obstacle. The use of high density grids can be a possible approach to improve the localization precision.
The second is related to the unavailability of accurate electrode localization to compare the value of the biomarkers on the same tissue pre-and post-resection (what is left after resection). In fact, using the pre-and post-resection pictures is enough to mark (not-)resected channels, but it does not allow to quantify the value of the biomarker in the same location pre-and post-resection. Third, in temporal patients, there is uncertainty on the part of the mesiotemporal structure we are recording from since we do not have the exact position of the placement of the strip.
Another limitation was related to the effect of propofol on the different biomarkers. It is known that propofol induces changes in the EEG 96 . Moreover, signal based biomarkers have been shown to be sensitive to the transition from wakefulness to unconsciousness 74,97-104 . We tried to limit the propofol effect choosing the last minute of recording where the ECoG background was more stable and less affected by known ECoG patterns induced Scientific RepoRtS | (2020) 10:14654 | https://doi.org/10.1038/s41598-020-71359-2 www.nature.com/scientificreports/ by propofol (i.e. burst suppression, slowing of the signals). However, further studies properly designed (e.g. synchronization of ECoG traces with the propofol-injection pump) are required to investigate and take into account the effect of propofol on signal biomarkers during intra-operative respective surgery.
In conclusion, in this retrospective study, using a substantial number of patients for whom seizure control was improved 1 year after the operation, we pointed out the importance to work on a ground-truth scenario to evaluate biomarker performance at patient level. Our results suggest that a universal unique biomarker is insufficient to pinpoint the epileptogenic tissue. The combination of different biomarkers improved the localization performances. The results should be considered more from a perspective of pathophysiological understanding rather than as a tool for the operation theater since performance achieved is not yet adequate. r 1,w i + r 2,w i . r 3,w i > 95th percentile r 3,w and D(w i ) < 0.9.
PAC θ ,γ = T t=1 A γ (t)e iφ θ (t) www.nature.com/scientificreports/ Phase lag index. Given x and y as two signals, we filtered the signals in gamma frequency band (30-80 Hz), obtaining two filtered signals x γ and y γ . We then computed the Hilbert transform of the two filtered signals to obtain the instantaneous phases φ x γ (t) and φ y γ (t) for each time point t . We then compute PLI with the following formula: where �φ(t) = φ x γ (t) − φ y γ (t) , 〈 〉 is the average across all time points t , | | is the absolute operator and sgn is the sign function: Non linear correlation coefficient. The implementation of h 2 used in this work follows the implementation suggested by Kalitzin et al. 105 that is obtained using the following formula: where N is the time length of the two signal where e x and e xy are the residual variance from the autoregressive model using only previous values of x and the residual variance using previous values of x and y. This definition can be generalized to a multivariate (multi-channels) case with the following formula: where Granger causality from i to j is equal to the natural logarithm of the ratio between the variance of the residual using the reduced regressive model (considering all the time-series n − 1 other than i ) and the variance of the residual obtained from the full model (considering all the n time series). For the time based GC we used the implementation in the MVGC toolbox 67 .

Short-time direct directed transfer function.
The sdDFT is can be computed using the following formula 106 : where H ij is the transfer function describing the directed causal relationship from j to i at frequency f , χ ij is the direct partial coherence between i and j . The combination between the H ij and χ ij gives a measure of directed causal interaction between j and i in a multivariate (multi-channel) system. We used the implementation in the SIFT toolbox for sdDTF 65,66 .