Reliability and suitability of physiological exercise response and recovery markers

There is currently insufficient evidence about the reliable quantification of exercise load and athlete’s recovery management for monitoring training processes. Therefore, this test–retest study investigated the reliability of various subjective, muscle force, and blood-based parameters in order to evaluate their suitability for monitoring exercise and recovery cycles. 62 subjects completed two identical 60-min continuous endurance exercise bouts intermitted by a four-week recovery period. Before, immediately after, three, and 24 h after each exercise bout, analysis of parameters were performed. Significant changes over time were found for rating of perceived exertion (RPE), multidimensional mood state questionnaire (MDMQ), maximum voluntary contraction parameters (MVCs), and blood-based biomarkers (p < 0.05). Excellent reliability was calculated for MVCs, mean corpuscular volume and 5-bound distance (ICC > 0.90). A good reliability was found for thiobarbituric acid reactive substances (TBARS) (ICC = 0.79) and haematological markers (ICC = 0.75–0.86). For RPE, MDMQ, interleukin (IL-) 1RA, IL-6, IL-8, IL-15, cortisol, lactate dehydrogenase (LDH), creatine kinase (CK) only moderate reliability was found (ICC < 0.75). Significant associations for IL1-RA and CK to MVC were found. The excellent to moderate reliability of TBARS, LDH, IL-1RA, six measured haematological markers, MVCs and MDMQ implicate their suitability as physiological exercise response and recovery markers for monitoring athletes’ load management.

The reliable quantification of individuals' physiological response to acute exercise bouts are of major importance for monitoring training. Both, subjective as well as objective markers are used to control athletes' training in accordance with individual abilities. When training stimuli are wrongly applied due to a lack in load management, it results in an imbalance in the exercise load-recovery cycle with the well-known consequence of overtraining, increased risk of injury, and the inhibition of fundamental adaptation processes [1][2][3] . Therefore, it seems desirable to further explore the reliability of subjective and objective markers in order to understand how individual athletes deal with physical strains, and thus optimize exercise load-recovery balance. The need of accurate and reproducibly biomarkers in clinical practice, which reflect objective, quantifiable medical signs or effects of treatments and interventions in physiology of biochemical processes, has become a commonplace character 4 .
In order to evaluate and establish suitable markers for monitoring training, and recovery, it is important to investigate subjective and objective process-based changes with high levels of accuracy and precision 5 . However, these requirements are questionable for various parameters, since many biomarkers have crucial limitations and an immense fluctuation width [1][2][3] . Previous studies concluded that it is important to assess the reliability and specificity of markers in controlled interventional trials 6 . Particularly in the context of sport, there are only limited data in consideration of reliable markers which reflect the physiological exercise response and recovery processes 7 .
Actually, there is still insufficient knowledge available as to which specific parameters are most suitable to monitor exercise load-recovery status 1 . Some metabolic, immunological and haematological markers are Scientific RepoRtS | (2020) 10:11924 | https://doi.org/10.1038/s41598-020-69280-9 www.nature.com/scientificreports/ commonly used as objective physiological exercise response indicators 8 . However, markers such as creatine kinase (CK), which is released in response to muscle fiber damage, are known to have a large intraindividual and interindividual variability 9,10 . Though, the detailed evaluation of selected haematological parameters, inflammatory markers, enzymes, and metabolic markers such as haemoglobin (HGB) 8 , interleukin (IL-)-1 receptor antagonist (IL-1RA) 11 , lactate dehydrogenase (LDH) 12 or thiobarbituric acids (TBARS) 13 are still pending. These markers were chosen due to their use in exercise context analyzing immune activation, metabolic demands, or oxidative stress 14 . Besides, regarding a comprehensive and serious assessment of athletes' load state, the use of combinations of suitable parameters including functional testing, subjective testing and biochemical analyses should be considered 15,16 . In this regard, suitability is defined by a certain exercise sensitivity and correlation to muscle force parameters, such as maximum voluntary contraction (MVC) 17 . Some studies proved the suitability of muscle force and subjective parameters such as the rating of perceived exertion (RPE) or the multidimensional mood state as subjective tools for physical performance assessments [18][19][20] . Accordingly, the discovery of additional markers might contribute to creating a panel of parameters which might offer the possibility of analyzing multiple aspects of human performance and health status.
The aim of the current study was to examine the different exercise response, changes during recovery, and the test-retest reliability of various subjective parameters, muscle force values and blood biomarkers after strenuous bouts of endurance exercise; thereby evaluating their suitability for monitoring exercise load and recovery. We hypothesized that some of the analyzed parameters are suitable and reliable as markers which reflect the physiological exercise response and recovery processes and can be used for sports practice.

Muscle force parameters.
No changes over time were found for the 5-bound distance (Fig. 4a) Correlation analysis. A correlation was found between MVC in knee extension and TBARS immediately after (r = 0.26, p = 0.044), 3 h after (r = 0.34, p = 0.009) and 24 h after (r = 0.30, p = 0.020) the RFT at testing day 1 (TD1). Interestingly, there were also correlations in the changes between the pre-exercise value and 3 h after (r = -0.30, p = 0.021). In addition, the MVC in knee flexion correlated with the TBARS three hours after (r = 0.31, p = 0.031) and 24 h after (r = 0.28, p = 0.030). In confirmation with these results, the differences in measuring

Discussion
The novel findings of the present study are the high reliability of TBARS, LDH, IL-1RA, MCV, HGB, PLT, RBC, HCT, and MCHC after two identical controlled bouts of endurance exercise, suggesting their suitability as bloodbased biomarkers for monitoring physiological exercise response and recovery status in endurance athletes. Based on the high associations to MVC, CK seems to be a suitable biomarker. However, the reliability of CK was only moderate, questions its use as a marker in sports practice. MVCs and the MDMQ seem to represent appropriate complementary monitoring tools to the blood markers.
Regarding blood-based biomarkers, the highest reliability was found for TBARS, followed by LDH and IL-1RA values. A similar physiological exercise response-recovery curve for the TBARS concentrations was shown in a recent study by Krüger et al. (2016) after a 30 min continuous bicycle test 21 . Other findings confirm a high responsiveness of TBARS after exercise, which physiologically reflects an increased oxidative stress and subsequently an enhanced lipid peroxidation after acute exercise 22 . The observed correlations between TBARS and MVCs after the endurance exercise support an association between levels of oxidative stress and muscle force, and also proved the potential suitability of TBARS as a diagnostic parameter in endurance sports 2,10 . Despite the observed high reliability of TBARS as a parameter for evaluating physiological exercise response and recovery processes, the method of TBARS analysis has been generally considering critically in the literature due to several limitations. As discussed by Cobley et al. (2017), a major flaw of TBARS measurement is the low specificity of TBARS which react with various substrates to form malondialdehyde (MDA) such that most MDA is generated by the assay itself 23 . In addition, the heating step during TBARS analysis causes partial lipid decomposition leading to the formation of extraneous MDA. However, current data proved that standardized procedures might compensate for some methodological flaws. LDH values showed similar kinetics like TBARS over time. With regard to high reliability of blood LDH values and results of previous studies, this marker might represent another suitable biomarker for exercise load and recovery 24 . Within the cytokines, IL-1RA showed the highest reliability and, in parallel, seems to be associated with physiological exercise response and progressive recovery. IL-1RA is secreted by various types including immune cells, and inhibits pro-inflammatory activities of various cytokines. The observed changes in IL-1RA concentration correspond with earlier observations which showed a peak at one and a half to two hours after exercise and a decrease back to baseline levels 24 h after treadmill running 25 . However, the reliability of IL-1RA has not been established so far. Interestingly, we found an association to MVC, Figure 5. Rating of perceived exertion (a) requested using BS and multidimensional mood state (b) scaling by using the MDMQ before, after, 3 h after and 24 h after two identical endurance running field tests (Test 1 and Test 2). Data are presented as mean ± standard deviation. *Significantly different from previous measuring time point of both tests. #Difference against pre-exercise (before) in both tests (p < 0.05). www.nature.com/scientificreports/ suggesting a link to muscular fatigue. Thus, the anti-inflammatory cytokine, IL-1RA, seems to be a reliable and suitable marker which reflect the physiological exercise response and recovery processes in athletes' during endurance exercises. IL-10 is primarily expressed by monocytes and represents an anti-inflammatory cytokine, while IL-15 is a regulator of the activity of T cells and natural killer cells 26 . Limitations in the methodological analysis of IL-10 and IL-15 are suggested to be the reason for the weak reliabilities within these cytokines. The results of both markers are below the limit of detection of all commercially available enzyme-linked immunosorbent assay (ELISA) kits. Thus, IL-10 and IL-15 are not seriously quantifiable by this method. CK is a frequently used diagnostic marker for detecting exercise-induced muscle disruption or increased cell permeability. With regard to the assessment of suitability, CK is exercise sensitive and CK blood concentrations highly correlated to MVC values at different time points. Contrary to our expectations, for CK, only a moderate reliability was found. Due to the high interindividual variability in serum CK, the assignment of reliable reference values for athletes is complicated 10 . In accordance with these results, a study of Roe et al. (2016) found only a low reliability with a high coefficient of variation and a poor sensitivity measured in rugby players 27 . It is suggested that variabilities in CK release after exercise is caused by the existence of high and low responders due to the availability of different gene polymorphisms. However, other factors, such as training status or gender, might affect the reliability of CK 9 . In addition, the peak of CK values was somewhat delayed at 24 h after the RFTs, making it difficult to associate this parameter to muscular fatigue and to represent possible muscular recovery processes. This finding is consistent with other data that proved a poor correlation of CK values with other objective physiological markers 25 . However, it should be evaluated if CK is a more eligible marker for determination of muscular damage after eccentric or unaccustomed exercise over days.
For the majority of the haematological markers, a high reliability was found. These parameters include MCV, HGB, PLT, RBC, HCT, and MCHC. Excellent reliability was calculated for MCV. MCV is a predictive indicator for haematological diseases and according to findings of the resulting data, also a suitable marker to represent the physiological exercise response and recovery processes. A review of seasonal variations of haematological parameters in athletes summarizes concordant characteristics within the same sport discipline 28 . It could conceivably be hypothesized that the methodological analyses of haematological markers have a higher stability compared to the cytokine or enzyme measurements. It is assumed that the immediate processing and the lack of any centrifugation or freezing procedures result in a higher reliability. However, a previous study has proven a potential negative impact on the stability of cytokine measurements in plasma 29 . While haematological markers might be affected by an increased mechanotransduction and plasma volume contraction during strenuous endurance exercise bouts 30 , most cytokines are released by different cell types and are involved in multiple physiological processes. However, data about the reliability of haematological markers are rare, although many studies have called for follow-up studies to review the quality criteria 31 .
Interestingly, MVCs can be classified as highly reliable. In line with previous studies, a decreased MVC of knee extension, as well as a reduction in MVC of knee flexion, were found after endurance exercise trials 18 . Similarly, an excellent reliability of MVC was previously confirmed 32 . Previous studies which quantify the load condition after endurance exercise use the MVCs of the lower limbs to examine muscular fatigue directly by functional testing 18 . Thus, MVC analysis of knee flexion and knee extension might be useful functional tests to analyze exercise load and recovery in endurance exercise.
It is somewhat surprising that the MDMQ data quoted higher reliabilities compared to Borg scale (BS) values. RPE is regularly used in sports science research and a valid parameter of internal training load 33 . However, not much is known about its reliability in the context of endurance exercise. In contrast, the score of multidimensional mood state is rarely used in sports science. Nevertheless, a reliability of ICC = 0.70 was found, indicating that the questionnaire might be a suitable complementary diagnostic tool for the assessment of exercise load and recovery processes in endurance sports.
Interestingly, the training status had negligible effects on the physiological exercise response. Accordingly, the suitability of the exercise response and recovery markers was not significantly different between the subgroups of trained and untrained individuals. However, the reason might be that we did not recruit highly trained endurance athletes for this study. Surprisingly, we found a difference in the reliability of CK and CRP. Here, CK showed a lower reliability for trained compared to untrained subjects, while for CRP higher reliability was found in the trained subgroup.
Finally, a number of important limitations need to be considered. The preliminary testing, as well as further exercise trials in the study, were performed in the field rather than under laboratory conditions. In addition, we did not use the maximum oxygen consumption (VO 2max ) as a gold standard to control exercise intensities 34 . Therein, we see greater benefits from our method, particularly regarding the transferability in sports practice. Due to the limited space, we could not analyze the differences between gender, menstruation cycle and training status. This study focused exclusively on research of suitable and reliable physiological exercise response and recovery markers with a high number of random participants.
In conclusion, the best reliability and suitability were found for TBARS and IL-1RA suggesting their eligibility as markers for the monitoring of exercise response and recovery processes in endurance sports. Also, a good reliability was found for LDH, while CK showed good suitability but only moderate reliability. Accordingly, data indicate to expand the panel of blood biomarkers to monitor the athletes' load-recovery status in a reliable way. Perhaps, the use of a combination of selected blood markers, MVC measurements, and subjective assessment tools such as MDMQ, should be discussed. Further research in this field is needed to evaluate the suitability of marker combinations, which comprehensively assess physiological exercise response and recovery processes in athletes. These findings should be combined with the development of innovative analyzing tools, that can be applied by athletes and trainers. www.nature.com/scientificreports/

Methods
Subjects. For a test-retest study design, 106 trained and untrained male and female subjects, aged 19-43 years, were recruited randomly and voluntarily to participate. 62 (31 male and 31 female) of them completed all examinations and were included in statistical analysis. According to the American College of Sports Medicine guidelines for exercise testing and prescription, the participants were differentiated based on their endurance capacity in subgroups of trained (T) (N = 37) and untrained (UT) (N = 25) individuals 35 . The T group consisted mainly of runners, strength athletes, and semi-professional team sports players such as soccer, handball, and volleyball players. All other subjects were defined as either recreational active or inactive individuals. Their personal characteristics and anthropometric data are collectively shown in Table 1. All subjects were informed about the nature, purpose, and potential risks of the study and signed an informed consent statement prior to study participation. The local Ethical Committee of the Justus-Liebig-University Giessen (Germany) reviewed the study and approved ethical clearance, which was obtained according to the Declaration of Helsinki. In order to ensure that all subjects were physically healthy and fit enough to participate in sporting activities, they were medically screened. Exclusion criteria consisted of smoking, pregnancy, mothers in the lactation period, cardiovascular diseases, acute infections, musculoskeletal injuries, acute symptomatic respiratory deficits, and chronic diseases.
Experimental approach: preliminary testing. The first step of the experimental approach contains testing of endurance capacity parameters to monitor the kinetics of various markers during further two identical strenuous exercise trials under controlled conditions. Subjects were tested for their endurance capacity during a continuous progressive exercise field test using lactate diagnostic, as previously described 36 . Briefly, subjects started on a 200 m running track at 6 km/h and increased their running speed by 2 km/h every three minutes until subjective exhaustion. Prior to the field test, between the three-minute stages with a break of 30 s and immediately after exhaustion, 20 µL of capillary blood was taken from the earlobe with an end-to-end glass capillary. Heart rate (HR) was continuously tracked using HR monitors (Polar FT1, Polar Electro Oy, Finland). Blood lactate values were analyzed using enzymatic-amperometrical detection (Bosen S-Line Plus, EKF-Diagnostics Sales GmbH, Magdeburg, Germany). HR and blood lactate values were used to evaluate the individual anaerobic threshold (IAT) using the Ergonizer Software for medical application (Ergonizer Software 4.9.4, Freiburg, Germany). The IAT was used to determine the individual running intensity during the following strenuous exercise trials. Calculation of IAT was performed by adding the constant value of 1.5 mmol/L to lactate concentration at the individual's lactate threshold 37 .
Experimental approach: testing days of strenuous exercise trials. Approximately one week after the preliminary test, first testing day TD1 of strenuous exercise trial took place. Both testing days (TDs) started between 8:00 and 9:00 am for each subject. Prior to the TDs, subjects were instructed on several standardized conditions to which they had to comply. From four days before the particular TDs, subjects were not allowed to take part in any exhausting physical activity, only regenerative training was acceptable. Furthermore, it was forbidden to consume alcohol the day before. A nutrition protocol had to be drawn up, which included all consumed drinks and meals one day prior TD1 as well as breakfast on the testing day. The protocol served as a guideline for the food intake prior to the second testing day (TD2) to ensure standardized conditions. At the respective testing day, subjects had to fill out a questionnaire concerning their regular physical activity and their usual nutrition. All participants did not change their regularities in nutrition as well as in physical activity in between the exercise trials. In female subjects, the menstrual cycle was documented as well. These questionnaires were issued to document large deviations in these habits and to exclude possible changes in physical performance between TD1 and TD2. The testing procedure contains two identical 60-min continuous endurance running field tests RFTs, intermitted by a recovery period of approximately four weeks. In order to examine the test-retest reliability of measured parameters, the previously described standardized conditions during both TDs were given high significance. The exercise protocol consisted of 40 min running at an intensity corresponding to 95% of HR at IAT, followed by 20 min at 110% in order to ensure exhaustion. The participants completed both RFTs at the same duration at the respective HR. Specific duration and intensity were chosen after the evaluation of a pilot study as well as in previous studies 38 . The outcome measures were collected before, immediately after, 3 and 24 h after each exercise test by double analysis. MTPs were chosen to make data comparable to previous studies 11,21 . Muscle force parameters. To investigate the muscle force performance of the lower limbs, a 5-bound test (5BT) for measuring jump distance and an isometric strength test for measuring maximum voluntary contraction of knee flexion and extension were used. The 5BT was carried out as previously described 40 . Briefly, subjects were required to stand with their preferred foot forward at a marked starting point and bound five consecutive bounds with alternating left and right foot. Three trials were performed and the jump with the largest horizontal distance (m) was documented. The jump distance was measured from the marked starting position to the heel of the rear foot after the fifth jump. MVC of knee extensors and knee flexors were analyzed using isometric strength dynamometer m3diagnos (Schnell, Peutenhausen, Germany). First, subjects were seated and fixed in a standardized position with a defined device angle of 60° to measure the MVC of the knee extensors. An abdomen belt and crossed arms in front of the chest limited any extraneous movements of the upper body. Secondly, the MVC of knee flexors was examined in a lying position with a defined device angle of 150°. For each test, the best value of two trials was recorded. MVCs were calculated by analysis software (Diagnos Professional V1.0, Schnell).
Subjective parameters. The subjective RPE were requested using the BS 41 . Furthermore, each subject had to complete a German version of the MDMQ 42 . It contains twelve items rated on a five-point Likert scale and measures three subscales (good-bad mood, alertness-tiredness and calmness-restlessness). These subscales are summed up, yielding a score between four and 20, with higher scores indicating better mood, higher alertness and calmness. From all three subscales, an index between twelve and 60 was calculated, which reflects the acute multidimensional mood state.
Statistics. Data of all subjects are presented as means ± standard deviation of the mean and the minimum and maximum values. In cases of normal or log-normal distribution (Kolmogorov-Smirnov test), data were analyzed using the two-way ANOVA to observe mean differences between the MTPs depending on the TDs. If analysis revealed any significant main effects between the MTPs (p < 0.05), post hoc analysis was conducted by using the Bonferroni test. To consider training status, we separated the participants into subgroups of trained and untrained individuals and added these as between-subjects factor into the ANOVA analysis. Furthermore, analysis of the test-retest reliability between the MTPs of TD1 and TD2 was carried out with the ICC (model: two-way mixed effects; type: single measurement; definition: absolute agreement). In all cases, ICC > 0.5 was accepted as a minimal test-retest reliability. Values between 0.5 and 0.75, 0.75 and 0.9, and greater than 0.9 are indicative of moderate, good, and excellent reliability 43 . Pearson's correlation analysis was used to analyze the suitability of the parameters. In all cases, p < 0.05 was accepted as being significant. Statistical power analysis was performed according Cohen et al. (1988) 44

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author (Karsten Krüger; Karsten.Krueger@sportwiss.uni-giessen.de) on reasonable request and with permission from all involved institutions. www.nature.com/scientificreports/