Introduction

Febrile seizures (FS) are the most common convulsive disorder in childhood, usually associated with a fever of 38 °C (100.4°F) or higher and an incidence of 2–11%1,2,3,4. Fever is not triggered by metabolic disorders or central nervous system (CNS) infection; both etiologies need to be excluded in the differential diagnostic workup. Children aged four months to 5 years are mostly affected with the peak incidence at 18 months of age3,4. Besides an increased risk of epilepsy5 and psychiatric disorder6, the recurrent FS (RFS) represent the most common long-term effect of FS4,6 with an estimated 14–24% of the recurrence within the same febrile illness2. Overall, the recurrence decreases with age from 50% in children younger than 12 months at the first FS to 30% afterward. After a second FS, the probability of additional FS is 50%3,4. The accurate prediction of FS/RFS can outline strategies to prevent FS and mitigate the burden of FS on a child’s health or avoid repeated hospital visits, which may deepen the anxiety in children and families1,2,3,4.

Multiple FS/RFS risk predictors have been proposed such as body peak temperature, iron status, electrolyte imbalance, age, sex, and genetics, but inconsistent or contradictory results across various studies were observed. Combining more than one predictor may drive the risk prediction higher2,7,8, thus, estimating the FS/RFS risk with higher accuracy. Iron deficiency (ID) is frequent in infants and toddlers with the concurrent peak age as FS7,8. The association of iron status and febrile seizures (FS) has been postulated but supported by equivocal or inconclusive reports1,7,9,10,11. This controversy may be explained by cultural and geographic differences, as iron status is closely linked to socioeconomic status, malnutrition, and weaning practices1,8,9,10,11.

In this work, we collected blood iron status and demographic data in prospective cohorts of children with FS, RFS, fever without seizures, and afebrile healthy controls. Next, we investigated separation ability, i.e., sensitivity (SE) and specificity (SP)12, of individual variables. Then, we designed multivariate linear mixture models sensitive and specific to FS risk and recurrence.

Methods

This prospective case–control study was performed at the University Hospital Brno, Czechia, from April 1, 2015 to August 31, 2017 with a subsequent 5-year follow-up (until August 31, 2022) under the Masaryk University Ethics Review Board approval and was conducted in accordance with the ethical principles of the Declaration of Helsinki. The informed consent form was obtained from every participant's parent/legal guardian prior to the study enrollment.

Participants

A total of 162 Caucasian children were enrolled and formed one case group (FS group) and two control groups (febrile patients without seizures and healthy controls). Inclusion criteria were age 4–72 months, electroencephalograph (EEG) without epileptiform abnormality and normal background activity corresponding to age (FS group), normal neurodevelopment, and neurological exam. The diagnostic criteria of FS followed the American Academy of Pediatrics clinical guidelines3,4. The FS group consisted of 53 children (15 females) aged 4–70 months and formed two subgroups; non-recurrent FS (36 children, 11 females) and RFS (17 children, 4 females). Three children (one female) presented complex non-recurrent FS (one with repeated seizure within 24 h and two with transient focal post-ictal deficit); all the other FS children presented with a simple FS. Fifty-three children (26 females), aged 6–70 months, had a febrile illness caused by respiratory or urinary tract infection but without seizures. The healthy control group, recruited from children coming for a regular preventive care exam, comprised 56 children (30 females) aged 6–67 months. Exclusion criteria were age below four or above 72 months, peak body temperature ≤ 37.5 °C (99.5°F) for febrile groups, psychomotor developmental delay, malnourishment, seizures lasting more than 15 min, focal signs or lateralization in a neurological exam, epilepsy, genetic epilepsy with febrile seizures plus, antiepileptic-drug usage, history of afebrile seizures, history of CNS infection or severe head trauma, electrolyte, glucose, or homeostasis imbalance. Children suffering from chronic illnesses such as cardiovascular, renal, rheumatological, or malignant diseases, hemoglobinopathies, or other blood disorders that are associated with a higher likelihood of anemia were excluded. Demographics are summarized in Table 1.

Table 1 Demographic and blood iron status variables.

Data collection

Each participant underwent a blood draw with the analysis of red blood cell count (RBC), hemoglobin (HGB), serum iron (Fe), iron saturation (satFe), ferritin (Fer), transferrin (TF), and unsaturated iron-binding capacity (UIBC). In FS and RFS patients, electrolytes and vitamin D were also measured. Blood draw analysis results, peak body temperature, age, sex, gestational age (GA), height and weight percentiles were utilized in between-group difference testing and multivariate statistical modeling. In addition, all available screening values for all seizures were reported in participants with RFS. The precise FS duration and interval between fever onset and FS were not collected as these parent-reported outcomes tend to be inaccurate. The study participants were followed up for five years to record RFS or treatment for ID or anemia.

Statistical analysis

Between-group differences were evaluated with the Wilcoxon rank-sum test for each examined variable (critical threshold value pFWE < 0.05; FEW—family-wise error correction; non-corrected p < 0.05 was considered as a trend in the data). For variables demonstrating significant differences between case and control groups, a maximal sum of SE + SP defined the variable-specific separating threshold (Fig. 1). The healthy control group was not included in the SE + SP estimations, as healthy children without fever do not seek medical attention. The SE + SP sum is proportional to a minimal Euclidean distance to the ideal “gold standard” classifier, i.e., the classifier with SE = 100% and SP = 100%, in the receiver operating characteristics (Fig. 4b).

Figure 1
figure 1

Between-group differences at the univariate level. The figure-embedded table summarizes between-group differences with highlighted significant findings. Graphs show value distributions for selected variables. Automatically enumerated discriminating thresholds (dashed gray lines) and corresponding SE and SP values are displayed for satFe, Fe, and UIBC variables, which demonstrated the strongest separation between groups. 1 healthy controls, 2 febrile patients without seizures, 3 febrile patients with non-recurrent FS, 4 febrile patients with recurrent FS, GA gestational age, Age age at the first febrile seizure attack, Height height percentile, Weight weight percentile, HGB hemoglobin, Fe serum iron concentration, Fer serum ferritin concentration, FS febrile seizures, TF serum transferrin concentration, satFe iron saturation, UIBC unsaturated iron-binding capacity, thr threshold, SE sensitivity, SP specificity.

Pearson cross-correlation analysis (r) investigated the presence of mutual linear relationships between variables (critical value |r|> 0.26 ≈ p < 0.001 for 162 samples).

A univariate analysis does not usually reach the gold standard classifier property. As the blood and demographic screenings provide a low cross-correlated multi-dimensional dataset of “independent” variables, several data analysis approaches utilizing step-wise linear regression were designed to find a multivariate linear mixture model (Eq. (1)) that increases the SE + SP to FS risk or recurrence and gets closer to the gold standard classifier.

$${\textbf{y}}\, = \,{\textbf{x}}_{0} \, + \,\beta_{1} {\textbf{x}}_{1} \, + \, \cdots \, + \,\beta_{n} {\textbf{x}}_{n} \, + \,\boldsymbol{\varepsilon}$$
(1)

The vector x0 is the constant member and the vector ϵ is Gaussian random noise. Vectors xm where index m  {1, 2,…,n} represent n variables (i.e., variables derived from the blood screening or demographic variables) significantly contributing (p < 0.05) to the expected signal y. Coefficients βm define magnitudes of contributions. The crucial part of linear mixture modeling is the definition of the expected signal y.

Three models (i.e., model1, model2, or model3) with three different expected signals (i.e., y1, y2, or y3) were designed and tested. In model1, y1 equals 0 at positions of healthy controls, equals 1 at positions of patients without FS, and 2 at positions of patients with FS. In the model2, only patients were considered, and y2 equals 0 at positions of patients without FS and 1 at positions of patients with FS. In model3, only patients with FS were considered, and y3 equals 1 at positions of patients with non-recurrent FS and equals 2 at positions of patients with RFS. Model-specific Wilcoxon rank-sum test, SE, SP, and the separating threshold maximizing the SE + SP sum were evaluated in the same fashion as for the univariate approach while getting closer to the gold standard classifier was the set goal.

Model1 and model2 represent two concurrent models potentially separating non-seizure and seizure patients with high SE and SP. Therefore, we tested whether an orthogonal projection (f) of both models into one bi-linear model y12 (Eq. (2)) can even increase the SE and SP and improve the developing classifier. Two scalar separating thresholds y1 and y2 were again identified by maximizing the SE + SP sum.

$${\textbf{y}}_{12} \, = \,f\left( {{\textbf{y}}_{1} ,\,{\textbf{y}}_{2} } \right)$$
(2)

Continuous biological factors, such as age, gestational age, height percentile, and weight percentile, were additional inputs for the linear mixture modeling via the step-wise linear regression for model1, model2, and model3. For model3, maximal body temperature and sodium and vitamin D concentrations were additional input variables in the regression analysis. Categorical biological factors should be spread uniformly over the dataset to guarantee a fair design of any classifier. Sex was distributed equally in the control groups. However, FS and RFS demonstrated higher prevalence and incidence in males. Therefore, we employed the adaptive synthetic sampling approach matching the number of female samples in the case (FS and RFS) groups to minimize the risk of imbalanced learning13,14. As initial conditions were randomized, each model training was repeated 5000 times to test and guarantee model stability and reliability. Moreover, sex was also used as a binary input variable in the regression.

The sample size of our dataset was limited. To test dataset power to establish stable FS risk and recurrence model/s, we have permutatively down-sampled the dataset to 90%, 80%, 70%, 60%, and 50% of its original size, while intra-group sex distributions remained unchanged. Again, the adaptive synthetic sampling matched the number of female samples in the case groups. Model training was 5000 times repeated for each dataset size. Objective measures assessing model/s’ stability and reliability were as follows: (i) frequency occurrence of the most common model (a priori defined by the full 100% dataset size); averages and variances of (ii) regression coefficient; (iii) explained variance; (iv) Pearson correlation coefficient between modeled and predicted signal y (Eq. (1)); (v) between-group separating threshold determined via the SE + SP sum maximization; and (vi) SE and SP. In under-sampled datasets, the SE and SP were assessed for selected (training) and unselected (testing) data points.

Data and computer code availability and license statement

Raw input anonymized data and MATLAB language script (written in version R2018b) making statistical testing and deriving the regression models are available under the GNU General Public License version 3 at: https://github.com/umn-milab/febrile-seizure-blood-models (release r20231005).

Tools for cross-correlation analysis are available under the same license at: https://www.mathworks.com/matlabcentral/fileexchange/74204-corrplotg.

The MATLAB basic programming environment, MATLAB Statistics, Machine Learning Toolbox, and Econometrics Toolbox licenses need to be available to an end-user for full program compatibility.

The MATLAB implementation of the adaptive synthetic sampling is available in the ADASYN toolbox under the copyright© 2015, Dominic Siedhoff: https://www.mathworks.com/matlabcentral/fileexchange/50541-adasyn-improves-class-balance-extension-of-smote.

Results

Iron status results and demographics are summarized in Table 1. The prospective enrollment revealed a 2.5-fold higher incidence of FS and 3.25-fold of RFS in males than females, respectively. Control groups showed balanced sex distributions. Complex FS were all non-recurrent and occurred in three children (5.7%). Family history in the first-degree relatives for FS was positive in four cases (two females; 7.6%), who all presented with simple non-recurrent FS. Family history for epilepsy was positive in one male (1.9%) with simple RFS. Peak body temperature did not differ between FS subgroups. The EEG was recorded after the seizure and did not show a pathological finding in any case. In the follow-up, none of the study participants was treated for ID or anemia.

Univariate between-group differences

Figure 1 shows significant between-group differences or trends for single variables. Group-specific demographics with iron status are in Table 1. Serum Fe, satFe, and UIBC were the only three variables demonstrating a significant difference between control and case groups (Fig. 1). The automatically enumerated thresholds with corresponding SE and SP are presented in Fig. 1. There were no significant differences for FS case subgroups at the single-variable level (Fig. 1). The significant difference in Fer levels was only between afebrile healthy controls and febrile children without seizures. The visualization of control and case groups for the single variables is shown in Fig. 1. Within-group differences were present in healthy controls when divided based on sex. The median and interquartile range (IQR, defined as 25–75% percentiles) of iron concentration was 10.4 (7.9–14.2) μmol/l in males and 15.3 (10.7–20.2) μmol/l in females (p = 0.021); iron saturation was 0.15 (0.12–0.24) in males; 0.21 (0.18–0.26) in females (p = 0.032). No other sex-related within-group differences were observed.

Serum electrolytes and vitamin D did not differ between FS and RFS groups. Sodium concentrations were 133 (130–137) mmol/L in FS and 133 (131–138) mmol/L in RFS. Vitamin D concentrations were 89.9 (46.9–135.1) nmol/L in FS and 77.0 (45.3–105.2) nmol/L in RFS.

Linearly dependent variables

As expected, height and weight percentiles were linearly dependent. In addition, several blood iron status variables were mutually cross-correlated. Demographics and iron status were not significantly correlated, except for the positive correlation between age and hemoglobin. A detailed view of the cross-correlation analysis is shown in Fig. 2. Simultaneously, we did not observe any clear non-linear relationships between variables (Fig. 2), which would suggest a potential necessity for the non-linear transformation of some variable/s before further linear mixture modeling.

Figure 2
figure 2

Cross-correlation matrix plot for investigated variables. Value in the upper-left corner of each plot is the Pearson correlation coefficient (r) for corresponding variable pairs. Value r is red-highlighted for the significant coefficient with p < 0.001. The correlation regression line is presented as a black dashed line. The values at x- and y-axes are fixed for each variable across the plot. Histograms at the main plot diagonal display the value distribution for each corresponding variable. GA gestational age (weeks), Age age at the first febrile seizure attack, Height height percentile, Weight weight percentile, HGB hemoglobin, Fe serum iron concentration, Fer serum ferritin concentration, TF serum transferrin concentration, satFe iron saturation, UIBC unsaturated iron-binding capacity.

Multivariate linear models maximizing between-group differences

Multivariate linear mixture models with enhanced separating properties between case and control groups (i.e., model1 or model2) or between case sub-groups (i.e., model3) were defined.

The model1 (Eq. (3), Fig. 3a, Table 2) identified the significant contribution of four linearly mixture variables (i.e., Fe, UIBC, height percentile, and Fer) forming a predicted signal yp1 with increased separating properties (SE = 95.49 ± 1.61%, SP = 69.43 ± 1.15%) between non-seizure and seizure patients with the separating threshold 0.5744 ± 0.0317. The quantitative characteristics of the estimated model1 (Eq. (3)) were as follows: F-value F = 38.41 ± 0.66, root mean square error RMSE = 0.6239 ± 0.0024, explained variance R2 = 46.04 ± 0.42%, and Pearson correlation coefficient r between the modeled signal y1 and predicted signal yp1 r = 0.643 ± 0.000. Means, including variances of derived regression coefficients, are listed in Table 2.

$$\mathbf{y}_{1} \propto {\mathbf{y}}_{p1} = \, - 0.071*{\mathbf{Fe}} + 0.012*{\mathbf{UIBC}} + 0.005*{\mathbf{Height}} + 0.003*{\mathbf{Fer}}$$
(3)
Figure 3
figure 3

Between-group differences with multivariate linear mixture models. Left-sided panels: (a-b) represent dataset 3D visualizations in the space of three significant variables (in figure (a) height, UIBC, Fe; in figure (b) height, UIBC, and satFe) with p-values for respective between-group comparisons under each panel; c shows linear dependence between height percentile and HGB evaluated with Pearson correlation coefficient (r) for subgroups of patients with non-recurrent and recurrent febrile seizures. Right-sided panels: (a-b) show distributions of regressed values for all investigated groups, c for subgroups of patients with non-recurrent and recurrent febrile seizures. Automatically enumerated discriminating thresholds are shown with dashed gray lines; corresponding SE and SP values for separation properties of control and case groups are based on model1 (a), model2 (b), model3 (c). Model equations are displayed in the y-axis label descriptions. 1 healthy controls, 2 febrile patients without seizures, 3 febrile patients with non-recurrent FS, 4 febrile patients with recurrent FS, Fe serum iron concentration, satFe iron saturation, Fer serum ferritin concentration, Age age at the first febrile seizure attack, Height height percentile, FS febrile seizures, UIBC unsaturated iron-binding capacity, HGB hemoglobin, thr threshold, SE sensitivity, SP specificity, *p-values were evaluated with the Wilcoxon rank-sum test.

Table 2 Quantitative characteristics and stability of identified multivariate linear mixture models tested on full and undersampled dataset.

Single-subject predicted yp1 values significantly separated all examined groups between themselves except for case sub-groups, and control subgroups (Fig. 3a).

The model2 (Eq. (4), Fig. 3b, Table 2) identified the significant contribution of four linearly mixture variables (i.e., satFe, UIBC, height percentile, and Age) forming a predicted signal yp2 with increased separating properties (SE = 83.53 ± 1.04%, SP = 82.89 ± 0.92%). The quantitative characteristics of the estimated model2 (Eq. 4) were as follows: F = 28.82 ± 0.63, RMSE = 0.3620 ± 0.0019, R2 = 47.38 ± 0.54%, and r = 0.660 ± 0.000. Means, including variances of derived regression coefficients, are listed in Table 2.

$$\mathbf{y}_{2} \propto {\mathbf{y}}_{p2} = \, - 3.224*{\mathbf{satFe}} + 0.004*{\mathbf{Height}} + 0.009*{\mathbf{UIBC}} - 0.005*{\mathbf{Age}}$$
(4)

Same as the model1, single-subject predicted yp2 values significantly separated all examined groups between themselves except for case sub-groups and control sub-groups (Fig. 3b).

The mutual orthogonal projection (Eq. (2)) of model1 (Eq. (3)) and model2 (Eq. (4)) formed a bi-linear classifier providing the strongest separating properties (SE = 81.5%, SP = 88.7%; Fig. 4a).

Figure 4
figure 4

Increased specificity of the case group separation and receiver operating characteristics while combining model1 and model2. (a) Visualization of the mutual model1 (x-axis)—model2 (y-axis) projection for all investigated groups. Right panel shows the zoomed-in area (delimited by dashed grey line) of the upper-right quadrant. The bi-linear classifier represents the thresholds of each separate model1 and model2 derived from data distributions shown in Fig. 3a,b. Thresholds are visualized as black solid lines. (b) Receiver operating characteristics and Euclidean distance (E) between an ideal “gold standard” classifier and the optimal classifier fit for the corresponding model/variable. Fe serum iron concentration, satFe iron saturation, Height height percentile, Age age at the first febrile seizure attack, UIBC unsaturated iron-binding capacity, HGB hemoglobin, thr threshold, SE sensitivity, SP specificity, ROC receiver operating characteristics.

All three presented linear mixture models (i.e., model1, model2, and bi-linear model1-model2 classifier; Fig. 3a,b, and 4a) improved separating properties and predictive power to FS risk when compared to the univariate analysis (Fig. 4b). The bi-linear classifier demonstrated the lowest Euclidean distance to the gold standard classifier (Fig. 4b).

Model3 (Eq. (5), Fig. 3c, Table 2) estimated a trivariate model (i.e., height percentile, HGB and satFe) forming a predicted signal yp3 with separation properties (p = 0.00128), which improved predictive power to FS recurrence when compared to model1 (p = 0.0032) or model2 (p = 0.0036; Fig. 3), or to univariate trends (height percentile p = 0.0050; weight percentile p = 0.0199; satFe p = 0.0202; and Fe p = 0.0363; Fig. 1). Quantitative characteristics of the model3 (Eq. 5) were as follows: F = 8.24 ± 0.83, RMSE = 0.4182 ± 0.0055, R2 = 26.04 ± 1.93%, and r = 0.441 ± 0.005. Due to suboptimal model characteristics, the subgroup-specific yp3 values remained overlapping, and separating SE/SP were limited to 83.86 ± 7.67% / 58.44 ± 6.69% (Fig. 3c). Means, including variances of derived regression coefficients, are listed in Table 2.

$$\mathbf{y}_{3} \propto {\mathbf{y}}_{p3} = \, - 0.0072*{\mathbf{Height}} + 0.0129*{\mathbf{HGB}} + 6.1796*{\mathbf{satFe}}$$
(5)

The parameter sensitivity analysis on under-sampled datasets showed the stability of the proposed regression coefficients in all three models. Still, their standard deviation increased as the dataset got more under-sampled (Table 2). Similar mean and standard deviation properties were applied for the models’ RMSE, R2, Pearson correlation, SE, and SP (Table 2). Models’ F value decreased, and the separating threshold increased as the dataset got more under-sampled (Table 2). When the dataset was divided into training and testing sub-datasets, the SE and SP were slightly lower on the testing dataset than obtained on the training dataset. However, both measurements remained proportional (Table 2). Model2 was the most stable and reproducible model as it remained the most often detected model even if the dataset was under-sampled to 70% of its original size (Table 2). Simultaneously, no other model was detected for the original 100% dataset size (Table 2). Model1 remained reproducible and the most often detected when the dataset was under-sampled to 90% of its original size (Table 2). Model3 was stable and reproducible only for the dataset of the original 100% dataset size (Table 2).

In summary, the under-sampled datasets led to models with either a sub-set of significant variables or a full set of significant variables and additional tested variables. However, such models were suboptimal compared to our models1–3. The significant contribution of presented variables can be expected in all three investigated models, but a certain validation of models1 and 3 would benefit from a larger dataset (Table 2).

Discussion

We confirmed the previous findings in febrile seizure research, such as blood iron status association with the risk of FS and higher incidence of FS in males than females with fever. More importantly, we designed novel multivariate linear mixture models for a potential accurate risk prediction of FS risk and recurrence based on blood iron status and demographic data. The models and, specifically, the derived bi-linear classifier demonstrated high SE + SP to discriminate between children who developed seizures and those who stayed seizure-free during the febrile episode. The accurate FS risk prediction among children with fever bears an unimagined potential in managing FS, such as FS prevention and avoiding the related stress and anxiety from seizure and hospitalization. Although our data were from a single center and the sample size is relatively limited, we propose the application of similar approach relying on multivariate models and classifiers to predict the risk of FS or RFS.

Multiple predictors have been identified1,2,6,7,8,9,10,11, pointing towards the multifactorial etiology of FS. One of the common FS predictors was the presence of ID8,11. Iron is an essential nutrient for brain maturation and overall body growth with unprecedented indispensability during “critical periods” of accelerated brain development spanning ages 6 to 24 months15,16,17. Within this time, the brain is prone to structural and functional alterations that may manifest immediately or arise later in life in the form of epilepsy18,19, neurodevelopmental problems such as memory problems, learning deficit, poor attention span, intellectual disability, behavioral disturbance15,19,20, or even as various psychiatric disorders6,20,21. Although the peak onset of FS is concurrent with this time period8, the impact of altered blood iron status on brain iron status, and consequently on brain structure and function, is unclear.

The previous literature on the blood iron status and FS mainly reported the association of ID and FS1,8,11,22, with some studies demonstrating non-existing or even opposite association8,9,10. Our findings showed a strong association between blood iron status and FS. Lower serum Fe levels and higher UIBC were in febrile children with seizures compared to those without seizures or afebrile healthy controls. The sensitivity of the serum iron measures to distinguish between the group with and without FS was high. Still, the specificity of these tests was relatively low, limiting their applicability in the clinical setting. Therefore, we generated multivariate mixture models for the group separation to increase the specificity. The models yielded the equations using specific variables such as ferritin and UIBC, iron concentration, and saturation. But also, body height and age were factors applied in the model to predict FS, despite the comparable and non-significantly different distribution across groups. Body height, age, and iron are interrelated with increased iron requirements in infancy and early years of life23,24,25. ID usually associates with faster growth whenever iron demands for growth exceed intake26. In the first two years of life, the risk of negative iron balance and organ prioritization may negatively affect brain development. The prioritization of iron distribution, which favors RBC (i.e., erythropoiesis) over the brain, heart, and skeletal muscles15,16, implies that ID may result in impaired neurodevelopment presenting with various degree of intellectual disability. Moreover, the elevated ferritin accompanying inflammation as an acute phase reactant is sequestered and, thus, not available for erythropoiesis and other organ systems. This defense mechanism, which aims to restrict serum iron from utilization by pathogens or tumors27, may lead or further contribute to ID, resulting in an increased risk of FS. Therefore, blood screening with an eventual iron-rich diet or iron supplementation may be warranted to prevent FS and neurodevelopmental sequelae.

We demonstrated that the bi-linear classifier consisting of two multivariate mixture models for the group separation provided high sensitivity and much improved specificity compared to univariate assessments or the models applied separately. Thus, carefully weighing the study limitations, we consider that the bi-linear classifier based on the presented models may represent a practical screening tool to determine the FS risk in febrile children. However, the robustness of the bi-linear classifier needs to be verified with a larger and more geographically and racially diverse cohort providing updated model coefficients or an extended variable list, which may result in the SE and SP at the proximity of the gold-standard classifier.

None of our models identified sex as a significant variable, although we observed higher FS incidence in males, which further confirms the findings of previous studies1,2,6,11. Significant sex effects were not observed in iron status and demographics in the febrile group without seizures and FS subgroups. In the healthy control group, lower iron concentration and saturation were noted in males compared to females. In analogy, the male sex represents a risk factor for ID or ID anemia in infants and young children23,24,26,28. Moreover, sex may determine seizure susceptibility and type, as demonstrated in the animal model20,21. The sex difference or male overrepresentation in FS human studies is well documented1,2,6,11. In the Japanese population, the male sex was identified as one of the major predictors of FS recurrence2. Our study showed more frequent RFS in males. Sex hormones control many molecular and cellular processes in brain differentiation, including the modification of the neural response to stress or brain injury. Thus, the increased FS susceptibility in males is likely influenced by multiple factors, including iron status alteration.

Regarding FS recurrence, the unique trivariate model consisting of HGB concentration, body height percentile, and Fe saturation was derived. The model’s reasonable separation (i.e., SE + SP) and model reproducibility were suboptimal, requiring further improvement and additional variables to define a model with optimal FS recurrence predictive power.

We only utilized linear mixture modeling between investigated variables. It is possible that the proposed analysis may benefit from a non-linear transformation of some variables before the regression analysis. However, we consider that strategy of a lower potential for a marginal improvement on the current dataset as we have not observed any non-linear relationships between variables. When the dataset is enlarged, the training of a non-linear classifier in the space of the orthogonal model1-model2 projection may lead to an improved models’ prediction.

Study limitations

A small and geographically limited Caucasian sample size represented the primary study constraint. Thus, using the full dataset for model regression with the subsequent classifier evaluation may lead to classifier overfitting in all derived and tested models. Therefore, a re-test of fixed models will be necessary at a fully independent and larger dataset that will enumerate and validate true models’ SE and SP.

Body height or weight percentile tables normalized for the Czech population may differ across nations, and slightly varying regression coefficients may be derived (i.e., β coefficients in Eqs. (2), (3), or (4)). Future multi-center experiments with diverse pediatric populations may re-test or derive regression coefficient expectations with a specific variance and define more generalizable models’ normative values.

Imbalanced sex distribution in case groups may bias our findings. The employed adaptive synthetic sampling was performed in an effort to minimize such a dataset effect. Future research needs to collect vitamin D samples in all investigated cases to rightfully determine its role.

Similar to the previous FS studies, Refs.1,11 the serum Fer levels may be elevated in various inflammatory conditions as ferritin is an acute phase reactant and marker of acute and chronic inflammation. Reference26 Although the ferritin levels were not significantly different across febrile groups of children with or without seizures, the influence on overall iron status during inflammatory conditions, mainly restricted serum iron utilization26, is noteworthy and may contribute to the FS development.

Conclusion

We confirmed the relationship between iron status and FS with a higher incidence in males. More importantly, we proposed a novel approach to evaluate the FS risk in infants and young children with fever. First, multivariate linear mixture models were derived based on blood iron status and demographic variables. The approach emphasized between-group separation properties when height percentile and age were included in the iron status observation. Next, a bi-linear classifier consisting of two multivariate mixture models provided the optimal SE + SP for FS risk. Finally, we have designed an innovative trivariate model sensitive to FS recurrence, utilizing height percentile, hemoglobin, and Fe saturation. We also hypothesize that a future extension of the novel FS recurrence model about the vitamin D variable can substantially improve its sensitivity and specificity. Future multi-center studies with a larger and more geographically and racially diverse cohort will re-test and validate the robustness of derived models to prove or disclaim them as classifiers with predictive power to FS risk or recurrence.