The immune system is a complex interaction between cells and molecules that has a variety of functions to protect the host against possible microorganism invasions and prevent diseases1. Various factors affect the functioning of the immune system; exercise is one of these factors. Many studies have proposed exercise as a tool to study the interactions between metabolic stress and the immune system and have used sportmics approach to understand exercise-induced cellular and metabolic modifications2,3,4,5. A review of previous studies showed that although health promotion is one of the implications of exercise, exercise with unsuitable intensity and duration weaken the immune system and increases the risk of contracting various diseases6,7. Also, the various findings related to the impact of exercise on the immune system showed that in addition to the intensity and duration of exercise training, different factors such as gender8,9,10, age, physical fitness level and body mass index (BMI)11, body fat percentage12 affect white blood cells (WBCs) level. Although there has been considerable research on the interaction between exercise and the immune system (WBCs), still a comprehensive result or a specific pattern has not been provided related to the immune system response to different intensities and durations of exercise, because this relationship is nonlinear and much complicated13. Therefore, until now, researchers have not been able to achieve the optimal pattern of exercise for people, while the discovery of this pattern is vital because exercise with proper intensity and duration can boost the immune system and reduce the chance of contracting diseases such as viral infections, cancer, and inflammatory diseases7.

In recent years, machine learning (ML) models have been increasingly noticed as new technology and a powerful tool in information processing, prediction, and modelling14,15,16,17,18,19. Decision-tree (DT) algorithms are one of the main tools of ML that have been used in a wide spectrum of applications in clinical fields, including the diagnosis and prediction of cardiovascular diseases and cancers20. Random forest (RF) is the most successful general-purpose algorithm in modern times21 that has shown the highest accuracy among different variants of supervised ML algorithms in most clinical studies20. ML algorithms can be divided into three categories according to the way the machine is being taught: supervised, unsupervised and semi-supervised. Supervised ML algorithms are based on response variables that can supervise the analysis20,22.

Despite the increased use of intelligent techniques for medical decision support systems23, there are very few studies in the area of exercise immunology24,25. Furthermore, to the best of our knowledge, there are no studies that have used ML models (e.g., RF) to develop an efficient tool to predict the number of WBCs during exercise. Thus, we provided a novel approach based on the RF model to predict the number of lymphocyte (LYMPH), neutrophil (NEU), monocyte (MON), eosinophil (EOS), basophil (BASO) and WBC during exercise for healthy people. Our proposed method is easily applicable with the least limitations in applying different factors. In this regard, the present study has two main objectives: (1) investigate an RF model to predict the number of WBCs during exercise and (2) investigate the importance of intensity and duration of exercise in the prediction of the number of WBCs during exercise.



This study involved human participants and was approved by the Research Ethics Committee and all methods were performed in accordance with the relevant regulations. The objectives and the research process were clearly explained to all of the subjects, and all participants provided written consent prior to the start of the study. A total of 200 eligible healthy subjects (100 men, 50.0%) in the age range of 18–60 years participated in this study. For knowing of health history (e.g., the presence of infectious, cardiovascular, inflammatory or immune diseases), subjects were screened with questionnaire before the study period. Also, the participants were asked not to take anti-inflammatory agents, steroids and vitamin supplements for 2 weeks before the exercise sessions and refrain from exercise training or vigorous physical activity. The statistical information of 200 individuals is summarised in Table 1.

Table 1 characteristics of participants and input and output data.

The protocol

We measured the anthropometric indicators (weight, height and BMI) using standard techniques. To evaluate VO2 max, the subjects completed a Bruce test to voluntary exhaustion on a calibrated treadmill26 in the cardiology clinic. Given that changes in the immune system depend on exercise intensity (low, moderate, high)27, hence in this study, exercise protocol was planned according to the intensity suggested by the American College of Sports Medicine (ACSM) (i.e., low intensity (50–63% of HRmax), moderate intensity (64–76% of HRmax), and high intensity (77–93% of HRmax))28. Before implementing the exercise session, the maximum heart rate (HRmax) using the Tanaka method was computed29. Then, the minimum and maximum target heart rate (HRtarget) based on the determined intensity for each subject was obtained by the Karvonen method30.

The participants performed on a treadmill (Rodby, RL1602E, Sweden) the exercise protocol in accordance with the determined HRtarget (i.e., between the minimum and maximum HRtarget). The heart rate of the subjects during exercise protocol was monitored continuously with a Polar watch and chest strap (Polar Electro Oy, Kempele, Finland) to ensure that the exercise program was performed according to the intensity specified by ACSM. It is noteworthy that subjects were tested in an individual training condition in a public fitness centre, and for each subject only one of the above-mentioned intensities has performed. The duration of exercise training according to the capacity of the subjects was considered, hence a certain duration was not determined for subjects in advance. The individual’s capacity is influenced by different factors such as age, gender, BMI, and intensity of exercise31. Blood samples (3 ml of peripheral venous blood) were taken at baseline and immediately after the completion of the exercise to determine plasma levels of leukocytes. Finally, the collected data were used for input and output of the RF model to predict the WBCs level.

Random forest (RF)

RF as DT- based algorithm is an extremely successful classification and regression method. This approach, aside from having few parameters to tune, is generally recognized for its accuracy and its ability to deal with small sample sizes32. The approach combines several randomized decision trees and produces a forest of decision trees. Every tree predicts a class which the final decision was achieved by averaging all predictions19. It is necessary to mention, the data before the modelling process was transformed to range from 0 to 1 because the normalization of data minimizes bias and ensures that they receive the same attention within the network33 In WBCs modelling, to avoid over-fitting, K-fold cross-validation was applied to train and test the RF model. In this approach, the whole dataset was randomly partitioned into 5 equal sized subsamples (40 cases). Of the five subsamples, four samples for training (160 cases) and one sample for testing (40 cases) were used. this process repeated 5 times, in each time one of the subsamples was used as the validation data19.

Model structure and features importance

The use of proper input vectors in supervised ML algorithms is important in the modelling process34. In this simple prediction model, the effective factors on WBCs based on past studies, including BMI, VO2 max, intensity (HRtarget1 and HRtarget2) and duration of exercise training for input was adopted. We also considered WBCs values before exercise training as a required input because the number of WBCs differs between individuals. For the model output, the number of WBCs after exercise training was assessed and finally, 6 different scenarios were established for modelling according to Table 2.

Table 2 Input and output scenarios.

Feature importance due to their simplicity and interpretability of feature ranking is an important and widely used analysis method in modelling with the machine learning algorithms. Most of the supervised ML algorithms including RF provide feature importance19. In this study, importance of each parameter based mean decrease in impurity (MDI) was estimated.

Evaluation criteria

Six quantitative metrics, including the Pearson coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), root relative square error (RRSE)34, and Nash–Sutcliffe efficiency coefficient (NSE) were used for performance analysis of the model in the testing dataset. It’s worth noting that the NSE has been used for the performance evaluation of ML models in different fields (e.g., hydrology, physics)33,35,36,37 and has been confirmed as a more reliable efficiency index compared with R233. Therefore, we suggested it for evaluation of the results of this study. The equations for the above-mentioned indices are expressed as follows:

$$ {\text{RMSE}} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {(O_{i} - P_{i} )^{2} } } $$
$$ {\text{MAE}} = \frac{{\mathop \sum \nolimits_{{\text{i} = 1}}^{{\text{n}}} \left| {{\text{O}}_{{\text{i}}} - {\text{P}}_{{\text{i}}} } \right|}}{{\text{n}}} $$
$$ {\text{NSE = 1}} - \frac{{\mathop \sum \nolimits_{{\text{i} = 1}}^{{\text{n}}} \left( {{\text{O}}_{{\text{i}}} - {\text{P}}_{{\text{i}}} } \right)^{{2}} }}{{\mathop \sum \nolimits_{{\text{i} = 1}}^{{\text{n}}} \left( {{\text{O}}_{{\text{i}}} - {\overline{\text{O}}}} \right)^{{2}} }} $$
$$ {\text{R}}^{{2}} = \left[ {\frac{{\mathop \sum \nolimits_{{\text{i = 1}}}^{{\text{n}}} \left( {{\text{O}}_{{\text{i}}} - {\overline{\text{O}}}} \right)\left( {P_{{\text{i}}} - \overline{P} } \right)}}{{\sqrt {\mathop \sum \nolimits_{{\text{i = 1}}}^{{\text{n}}} \left( {{\text{O}}_{{\text{i}}} - {\overline{\text{O}}}} \right)^{{2}} } \sqrt {\mathop \sum \nolimits_{{\text{i = 1}}}^{{\text{n}}} \left( {P_{{\text{i}}} - \overline{P} } \right)^{{2}} } }}} \right]^{{2}} $$
$$ RAE = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {O_{i} - P_{i} } \right|} }}{{\sum\nolimits_{i = 1}^{n} {\left| {O_{i} - \overline{O} } \right|} }} $$
$$ RRSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - P_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {O_{i} - \overline{O} } \right)^{2} } }}} $$

where n is the number of data, Oi and Pi are the ith actual and predicted values, respectively. Also, \(\underline{O}\) and \(\underline{P}\) are the average of actual and predicted values, respectively. RMSE and MAE range from 0 to + ∞, NSE ranges from − ∞ to 1, and R2 ranges from 0 to 1which higher NSE and R2 values and lower RMSE and MAE values indicate better efficiency of models33,34.

Ethics approval

This study involved human participants and was approved by the Research Ethics Committee of Allameh Tabataba'i University (reference number IR.ATU.REC.1401.052). Participants gave informed consent to participate in the study before taking part.


Result of scenarios analysis

The RF model to predict the number of WBCs was evaluated using performance indices (RMSE, MAE, RAE, RRSE, NSE, and R2). Their values for all scenarios during the testing phase are shown in Table 3.

Table 3 Performance of the RF model for prediction of WBCs levels.

Result of feature importance analysis

We also estimated the features importance in all the scenarios. The results of the features importance score are indicated in Table 4 and graphically in Fig. 1.

Table 4 Features importance based on MDI for prediction of WBCs levels.
Figure 1
figure 1

Graphical representation of features importance.

Also, to assess the efficiency of the best scenario of the developed model, correlations between actual and predicted values of WBC, NEU, LYMPH, MON, BASO, and EOS during the testing phase were presented in (Fig. 2). Comparisons amongst all tested models showed that the model for predicting BASO (R2 = 0.11) had the lowest correlation and the model for predicting WBC (R2 = 0.77) had the best correlation and predicted WBC were in closer agreement with the actual WBC values compared with NEU, LYMPH, MON, BASO, and EOS.

Figure 2
figure 2

Scatter plots of the actual WBCs and predicted WBCs by RF during the testing phase.

Moreover, the plot of variations of actual values versus predicted values for the best scenario (i.e., WBC) during the testing phase was shown in Fig. 3.

Figure 3
figure 3

Curve of the actual WBC versus predicted WBC by RF during the testing phase.


Evaluation of the results

Based on the obtained results, for predicting the number of WBC, LYMPH, NEU, and MON, the most effective feature was values of WBC, LYMPH, NEU, and MON before training followed by intensity and duration of exercise; for predicting the number of EOS, the most effective feature was values of EOS before training followed by VO2 max, and BMI; and for predicting the number of BASO, no feature was not effective. These results are consistent with the physiological function of the body. Adjustment of the immune response using the central nervous system is performed by bidirectional signals between the nervous, endocrine and immune systems38. Two important pathways for immune system dysregulation are: The hypothalamic–pituitary–adrenal axis and the autonomic nervous system. Exercise can activate the hypothalamic–pituitary–adrenal axis and the sympathetic nervous system which stimulates the secretion of the hormones such as catecholamines (adrenaline and noradrenaline), adrenocorticotropic hormone, and cortisol. Each of these hormones can cause quantitative and qualitative changes in immune function39. For example, an increase in adrenaline concentration and a lesser degree of noradrenaline are the main factors of LYMPH dynamics in acute exercise40. Also, some studies showed that cortisol, primarily by the demargination of cells from the blood vessel walls, with a minor contribution from the bone marrow, cause neutrophilia41. Most researchers in the field of exercise immunology believe that the immune system reflects the magnitude of physiological stress experienced by the exerciser42. Exercise-induced muscle tissue injury and inflammation elicit a strong immune response involving NEU, EOS, BASO, MON, and macrophages. Immune-specific proteins (e.g., oxylipins) are produced to modulate the innate immune response, involved in initiating, mediating, and resolving this process43,44. The majority of the expressed immune-related proteins (e.g., lysozyme C, neutrophil elastase and defensing1, cathelicidin antimicrobial peptide, α-actinin-1, and profilin-1) are involved with pathogen defense and immune cell chemotaxis and locomotion. Other proteins (e.g., serum amyloid A-4, myeloperoxidase, plasma protease C1 inhibitor, α-2-HS-glycoprotein, andα-1-acid glycoprotein 2) increase during recovery and affect the inflammatory acute phase response43. This profound, exercise-induced perturbation in metabolites, lipid mediators, and proteins likely has a direct influence on immune function and results in transient immune dysfunction45.

Low effectiveness of intensity and duration of exercise in the prediction of the number of EOS may be because of more effects of EOS in allergic diseases and parasitic infections46. Moreover, it may show that these cells need more severe stress than the stress induced in this study47. Also, the high impact of intensity and duration of exercise on the prediction of WBC levels considering the effect of exercise on NEU, LYMPH, and MON and a large volume of them in leukocytes (NEU (about 60%), lymphocytes (about 30%), and MON (about 5.3%)48), can be justifiable.

A comparison amongst different scenarios based on standard statistics (RAE, RRSE, NSE and R2) showed that scenario 1 to predict the number of WBC, had the highest performance, while to predict the number of BASO, the results of the RF model were not acceptable. Generally, based on the NSE metric, the RF model for predicting NEU, LYMPH, MON, and EOS levels showed good performance (0.65 < NSE ≤ 0.75) and for predicting WBC showed very good performance (0.75 < NSE ≤ 1.00)33,49.

The comparison of the actual versus predicted WBC graph in Fig. 3 confirms that, although there is a relatively good agreement between actual and predicted values of WBC, in some cases, the predicted values were not accurate. It often occurs in modelling, which is partly due to the number of data50. Also, the application of more precise data51 can produce better results. Moreover, the type of ML model (e.g., M5 Prime (M5P)) and the use of hybrid algorithms (e.g., random committee (RC)-RF)) may enhance the modelling accuracy20. In this study optimization of model parameters was accomplished through trial and error, which the use of meta-heuristic optimization algorithms (e.g., genetic algorithm (GA))37 can increase the efficiency of the ML model. On the other hand, since obesity is an inflammatory disease which can interfere with the results, hence, the use of variables such as body fat percentage as a more precise characteristic12 instead of BMI input can improve the results. Finally, it is important to consider that WBCs can also be influenced by different factors, including the menstrual cycle in females (progesterone concentration)52, diet, psychological stress, and environmental stress (e.g., temperature and relative humidity)11, which in our study were not controlled and their control may increase the accuracy of predicting WBCs using RF model.

Overall, the results of the present study as an initial step confirmed the performance of an ML model to predict the number of WBC during exercise. Furthermore, the proposed RF model in this study can help to reduce the incidence of diseases by identifying the appropriate intensity and duration of exercise.


The determination of the optimal pattern of exercise training (i.e., proper intensity and duration that doesn’t suppress the immune system function) is very significant to maintain people's health. Given that, until now, no solution to this problem has been presented, hence this study was designed to develop a new method based on the RF model using the relevant and accessible variables to achieve accurate estimates of WBCs level during exercise. Results of our study demonstrated that the RF model could predict the values of WBCs during exercise using the characteristics of people (BMI, VO2 max, and WBCs values before exercise training) and the intensity and duration of exercise, which leads to achieving the optimal pattern of exercise training for people. Future studies can investigate the potential of this approach with more subjects to provide a simple RF-WBCs calculator for convenient use by athletes, non-athletes, coaches, and doctors. It should be noted our samples reflect healthy people in the age group of 18–60 years, and thus results may not be applicable to other populations (e.g., children, old people, and individuals with medical conditions).