Radiomics features of hippocampal regions in magnetic resonance imaging can differentiate medial temporal lobe epilepsy patients from healthy controls

To investigative whether radiomics features in bilateral hippocampi from MRI can identify temporal lobe epilepsy (TLE). A total of 131 subjects with MRI (66 TLE patients [35 right and 31 left TLE] and 65 healthy controls [HC]) were allocated to training (n = 90) and test (n = 41) sets. Radiomics features (n = 186) from the bilateral hippocampi were extracted from T1-weighted images. After feature selection, machine learning models were trained. The performance of the classifier was validated in the test set to differentiate TLE from HC and ipsilateral TLE from HC. Identical processes were performed to differentiate right TLE from HC (training set, n = 69; test set; n = 31) and left TLE from HC (training set, n = 66; test set, n = 30). The best-performing model for identifying TLE showed an AUC, accuracy, sensitivity, and specificity of 0.848, 84.8%, 76.2%, and 75.0% in the test set, respectively. The best-performing radiomics models for identifying right TLE and left TLE subgroups showed AUCs of 0.845 and 0.840 in the test set, respectively. In addition, multiple radiomics features significantly correlated with neuropsychological test scores (false discovery rate-corrected p-values < 0.05). The radiomics model from hippocampus can be a potential biomarker for identifying TLE.

Image preprocessing and radiomics feature extraction. Preprocessing of the images was performed to standardize the data analysis across patients. We resampled the original images to a 1 mm isovoxel and used FreeSurfer 5.3.0 (surfer.nmr.mgh.harvard.edu) to obtain subject-specific masks of brain regions as defined by the Desikan-Killany atlas 26 . This procedure involved motion correction of T1-weighted images, removal of nonbrain tissue 27 , automatic Talairach transformation, segmentation of subcortical white matter and deep gray matter structures 28 , intensity normalization, tessellation of the gray matter/white matter boundary 29 , automated topology correction, and surface deformation following intensity gradients 30 . Once the cortical models were complete, they were registered to a spherical atlas, and the cerebral cortex was parcellated into units with respect to gyral sand sulcal structures 31 . We mapped the brain parcellation mask for each subject from FreeSurfer space to the isovoxel native space and extracted two regions of interest (ROI) masks (right and left hippocampi). We visually checked for segmentation or registration errors by overlaying each subject's native-space-transformed ROI masks onto their T1-weighted images.
Non-uniform low-frequency intensity was removed by applying the N4 bias correction algorithm. Next, arbitrary T1 signal intensities were normalized using the Whitestripe normalization algorithm 32 . All images were resampled to 1 mm isovoxels across all patients.
The following radiomics features were extracted from each ROI on T1-weighted images using an open-source package (PyRadiomics, version 1.3.0) 33 : (1) 14 shape features, (2) 18 first-order features, and (3) 61 s-order features (including gray level co-occurrence matrix, gray level run-length matrix, gray-level size zone matrix, and neighboring gray tone difference matrix) (detailed information on Supplementary Radiomics feature selection and machine learning models. All imaging features were normalized using z-score normalization. For feature selection, the F-score, least absolute shrinkage and selection operator (LASSO), and mutual information (MI) with ten-fold cross validation was applied. After feature selection, the radiomics classifiers were constructed by machine learning models including support vector machine (SVM), logistic regression (LR), or AdaBoost, with ten-fold cross-validation. Hyperparameters were optimized by random search. Thus, a total of 9 combinations of machine learning algorithms were trained and validated. Performance was evaluated in the training sets and validated in the test sets to differentiate the entire TLE group from HC group, without controlling the alignment of the epileptic side. Further analysis was performed to differentiate the ipsilateral TLE group from the HC group. In subgroup analyses of (1) right TLE vs. HC and (2) left TLE vs. HC, feature selection and machine learning were performed separately. In addition, to overcome data imbalances in subgroup analyses, each machine learning model was trained (1) without over-sampling, or (2) with SMOTE (synthetic minority over-sampling technique) 34,35 . Thus, a total of 18 combinations of machine learning algorithms and over-sampling were trained and validated for each subgroup. Performance was evaluated in the training set and validated in the test set. Area under the curve (AUC), accuracy, sensitivity, and specificity were estimated.
Based on the radiomics classification model in the training set, the best combination of feature selection, classification methods, and subsampling in each model was used in the test set.
The different feature selection, over-sampling, and classification methods computed using Machine learning were performed using Python 3 with Scikit-Learn library v 0.21.2. The threshold for statistical significance was set at p < 0.05. The overall process is shown in Fig. 1.
Comparison of the diagnostic performance with that of human readers. We tested the diagnostic performance to differentiate TLE from HC in training and test set on T1-weighted images by a consensus of 2 readers (a neuroradiologist with 9 years of experience and a neurologist with 27 years of experience, respectively), who were blinded to the clinical information. The performance of radiomics model and human readers were compared using AUC with DeLong's method 36 . Correlations between radiomics features and neurocognitive function. Pearson's correlation coefficient was calculated to evaluate the relationships between the identified radiomics features and significantly different neuropsychological results between TLE and HCs (FDR-corrected p < 0.05).   Table S3).
The clinical characteristics of the subjects in the training set and test set are summarized in Table 1. There were no differences in the clinical characteristics between the training and test sets. Table 2 summarizes the results of the best performing models in the training and test sets. The performance of the 9 combination of models in the training set is shown in Fig. 2a. In the training set, the AUCs of the models ranged from 0.680 to 0.920. LASSO feature selection and SVM showed the best diagnostic performance in the training set, with an AUC, accuracy, sensitivity, and specificity of 0.920 (95% confidence interval [CI] 0.870-0.970), 80.5%, 85.7%, and 80%, respectively. The selected features (10 from right and 6 from left hippocampus) consisted of 2 first-   Fig. 2b.

Right TLE vs. HC. The performance of the 18 combination of models in the training set is shown in
In the training set, the AUCs of the models ranged from 0.734 to 0.891. F-score feature selection, with LR and SMOTE showed the best diagnostic performance in the training set, with an AUC, accuracy, sensitivity, and specificity of 0.920 (95% CI 0.870-0.970), 81.1%, 95%, and 68.5% in the test set, respectively. The selected features (14 from right and 16 from left hippocampus) included 23 first-order features, 3 s-order features, and 4 shape features. In the test set, the F-score feature selection with LR and SMOTE showed an AUC, accuracy, sensitivity, and specificity of 0.845 (95% CI 0.723-0.968), 77.4%, 72.7%, and 80%, respectively.
Left TLE vs. HC. The performance of the 18 combination of models in the training set is shown in Fig. 2c. In the training set, the AUCs of the models ranged from 0.845 to 0.935. LASSO feature selection, LR, with SMOTE showed the best diagnostic performance in the training set, with an AUC, accuracy, sensitivity, and specificity of 0.935 (95% CI 0.893-0.977), 87.8%, 982.5%, and 93% in the test set, respectively.
Comparison of the diagnostic performance of human readers. The reviewers correctly identified 14 and 7 cases of TLE in the training and test sets. The AUC, accuracy, sensitivity, and specificity was 0.617 (0.452-0.764), 17.1%, 23.3%, and 100% in the test set, respectively. The radiomics model showed significantly better performance than human readers did (p-value = 0.001).

Correlations between radiomics features and neuropsychological scores.
Among the 16 significant radiomics features from LASSO in differentiating TLE from HC, three were significantly correlated with K-BNT score, two were significantly correlated with CVLT direct recall score, and one was significantly correlated with RCFT immediate recall score (all FDR-corrected p-values < 0.05) ( Table 3).

Discussion
In our study, radiomics analysis of hippocampus showed promising results, with an AUC of 0.848, 0.845, and 0.840 for identifying the entire TLE as well as right TLE and left TLE in the test set, respectively. Surgical resection of the epileptogenic zone, the region which is necessary and sufficient for the initiation of seizures, is an additional treatment option which can achieve seizure freedom 37 . Our radiomics approach with machine learning may be applicable to surgical planning by early identification, and thus allow for an improved patient selection and counseling. Our radiomics model was particularly useful in showing better diagnostic performance than visual inspection of human readers, which is the current standard approach, showing its utility. Previous radiomics studies in the neuroradiology field have mostly focused on brain tumors [38][39][40][41][42] . Our study shows that a radiomics model can be a useful biomarker for identifying TLE. www.nature.com/scientificreports/ Notably, our model included both right and left TLE in the entire TLE dataset. TLE is considered to display a strong asymmetrical distribution of abnormalities such as volume loss or white matter abnormalities, primarily observed ipsilateral to the seizure onset site 7 . However, it is also known that the contralateral side of seizure onset may also show volume loss or white matter alterations, although less prominently than the ipsilateral side 7,17,43,44 . Our radiomics model showed a good performance (AUC 0.848) in differentiating the entire TLE from HC, indicating that radiomics has the potential for creating a generalized model that is not influenced by the laterality of TLE. Our results are in agreement with previous studies reporting that microstructural changes precede macroscopic atrophy 45 , and that radiomics may reflect microstructural information different from that provided by volumetric measures.
Neuropathological research has revealed various patterns of neuronal cell loss or gliotic changes within the hippocampus, including hippocampal sclerosis, which is the most common histopathologic abnormality 46,47 . MRI T1 relaxation time is a direct reflection of tissue characteristics and has been reported to independently predict histological measures of neuronal density in TLE 48 . Such variations in relaxation time, which directly cause variations in MRI signal intensity, may provide information beyond that provided by volumetric measures. Radiomics features, especially second-order features, capture the spatial variation in T1 signal intensity that may reflect the underlying pathophysiology, which may explain our observation.
In our study, we have applied domain knowledge in performing radiomics analysis in bilateral hippocampi, which is the key structure involved in TLE 7 . However, because radiomics features tend to extract agnostic information that is invisible to human eyes, one cannot assume which feature will be most important in identifying TLE. Thus, we have further narrowed down the significant features by using various feature selection methods. This methodology results in creating a more generalized and stable classifier that is robust against the idiosyncrasies of the training data 49,50 .
Further, it is known that TLE patients show cognitive impairments, affecting not only memory, but also a broad array of cognitive capacities including executive functions, language and sensorimotor skills [12][13][14] . Previous MRI studies have shown structural and functional compromises associated with cognitive impairment in TLE patients 12,[15][16][17] , but these studies were also focused on single conventional parameters. Other studies have shown dysfunctional networks related to cognitive impairment, seizure severity, or seizure related change in TLE patients in task-based and/or resting-state functional MRI studies 8,[50][51][52][53][54][55] . In our study, significant correlations were found between radiomics features and neuropsychological test scores, suggesting that radiomics features can also serve as imaging biomarkers for cognitive capacity in TLE patients.
Our study has several limitations. First, this is a retrospective study in a single institution with a relatively small sample size. Further studies with a larger dataset and external validation are warranted for better assessment. Second, because only the hippocampus masks were included, other important regions such as the amygdala and parahippocampus should be investigated in future studies. Third, T2-weighted or FLAIR images were not included in this study because the MRI protocol for healthy controls did not include the aforementioned sequences. Further studies including radiomics features from T2-weighted or FLAIR images should be performed. Fourth, we preserved the laterality of TLE rather than flipping (right to left or left to right TLE) to evaluate the radiomics model. However, flipping would have allowed using sampling algorithms sparingly. We have performed this method because previous studies have shown that right and left TLE demonstrate asymmetrical and different qualities of hippocampal injuries [53][54][55][56][57][58] .
In conclusion, radiomics model from the hippocampus can be a potential biomarker for identifying TLE.

Data availability
Our anonymized data can be obtained by any qualified investigators for the purposes of replicating procedures and results, after ethics clearance and approval by all authors.

Code availability
The code for machine learning was modified from our previous available radiomics analysis package (https :// githu b.com/ChoiD M/YBIGT A_AI_HeLP).