Introduction

Everyday military work involves a variety of movement patterns, including walking, running, jumping, pushing, pulling, lifting, throwing, kicking, dragging, crawling, shooting a firearm, carrying a heavy backpack and/or other equipment, all of which put a lot of strain on the soldier's body1. Soldiers must be very physically fit in order to carry out later occupational responsibilities with ease and increased efficiency1,2. Similar to other physically demanding occupations such as firefighting, and/or law enforcement, muscular strength, strength endurance, and cardiorespiratory fitness (CRF) have been shown to be one of the most important components of a soldier's physical fitness profile3,4,5. To test these attributes, there are many individual tests and test batteries, of which the Army Physical Fitness Test (APFT) is the most widely used test battery for military personnel due to its simplicity and ease of administration1,3. The APFT, which consists of sit-ups, push-ups and a 2-mile run (2MR), was introduced in 1980 for members of the US Army. It is design to measure upper body and core muscles strength endurance and CRF1,3. While muscle strength and core muscle strength endurance are relatively easy to measure, assessing CRF in this special population appears to be more challenging due to the specifics of military operations, which are complex and often unpredictable1,6,7. CRF testing is routinely performed by most armed and tactical forces around the world as part of their recruitment process for new members or simply as an annual examination of their personnel1,7. Indeed, direct measurement of CRF during an incremental run on a treadmill (TR) is considered the "gold standard" for estimating maximal relative oxygen uptake (VO2max)8. However, because of its high cost, complex measurement procedures, and inability of group testing, this method is not routinely used to examine CRF in soldiers1,3. In addition to the 2MR, several other field tests such as the 12-mile run, the 3000-m run, the 2400-m run, and the 1-mile run are used for CRF evaluation of military personnel3,9. Later tests are usually conducted outdoors and were found to have questionable validity ranging from low to high10,11. Therefore, weather, climate, and terrain can influence the results and may limit the maximum and reliable performance. Although standardized, the most commonly used tests, such as the 2MR, have been shown to be difficult for individuals because the pacing strategy is internally controlled7. As such, it does not reflect the real situation on the battlefield, where most activities are externally driven by the environment and the enemy. Running patterns on the battlefield are characterized by intermittent, high-intensity shuttle runs with constant changes of direction combined with forward and backward running and turning, while unexpected interruptions of running are common when soldiers are exposed to open fire from the enemy. Furthermore, a high standard of strength and conditioning planning and programming required for military personnel cannot be met with such a test because the final result is expressed as the total time required to run a given distance without any other information needed for training prescription and necessary optimization. Over the past two decades, an intermittent shuttle run testing with 30–15 Intermittent Fitness Test (30–15IFT) has been successfully implemented for CRF testing in various populations12,13,14. 30–15IFT is an incremental test consisting of 30-s shuttle runs interspersed with 15-s active recovery periods. In addition, it has been demonstrated to accurately predict the VO2max values in both elite and young male handball, soccer and hockey players12,14, respectively. This suggests that it can be utilized for both CRF assessment and individual training prescription12,15. Other useful measurements, such as maximal heart rate (HRmax) and end-running speed (ERS), can be generated from 30–15IFT in addition to predicted VO2max. According to a recent study14 elite handball players' ERS and VO2max values were significantly greater when determined from the 30–15IFT test than when produced from the TR test. As a result, prescribing an aerobic training intensity based on ERS may vary significantly (i.e., up to 3.2 km/h or 19%)14 and harm athletes' CRF development.

The recent reports of the International Congress on Soldiers' Physical Performance emphasize that the two most important priorities in military research and practice are the monitoring of physiological condition and the study of physical demands in operational environments, which need to be improved16. Considering all this, we believe that the 30–15IFT could be a suitable tool for measuring CRF of military personnel.

To date, no study has examined the reliability and validity of the 30–15IFT compared to a standard continuous incremental running test and/or a 2MR in infantry members. From a practical perspective, it would be of great interest to Slovenian Armed Forces (SAF) personnel to provide their strength and conditioning coaches with a valuable measure of CRF to determine and monitor the readiness of SAF infantry members. We hypothesized that the 30–15IFT will prove to be a highly reliable and valid measure of CRF while other parameters such as HRmax and ERSIFT will be valuable indicators of IFT data for prescribing, monitoring, and optimizing high-intensity interval training in this population.

Methods

Study design

This is a randomized cross-over study using a within-subjects test–retest design. To test the current hypothesis, a TR test, a 2MR test, and two 30–15IFT were performed in the population of SAF infantry members. The present study was conducted over a maximum period of 2 weeks per participant. Participants were instructed to visit the laboratory four times at least 72 h apart. At the first visit, they were familiarized with the experimental procedures and asked to sign a written informed consent form to participate in the study. Then, participants were randomly assigned to four experimental conditions (i.e., TR, 2MR and two 30–15IFT running tests).

Additionally, participants were divided into two groups based on their score on the APFT (score ranging from 1 to 5). Thus, the highest-scoring group (HSG) is defined as an APFT score of 5; and the lowest-scoring group (LSG) is defined as an APFT score of ≤ 2.

In the conceptualization phase of the study, we conducted an a priori power analysis for the estimated intraclass correlation coefficient (ICC). Based on previous studies with similar aim we expected to find high to nearly perfect ICC for the reliability of 30–15IFT test15,17. Therefore, with the research power of 0.90 and two-tiled α = 0.05, a minimum sample size of 9 participants showed to be sufficient to detect a value of ≥ 0.80 for the ICC.

Participants

Thirty-four SAF infantry members (males, N = 27) were recruited for the proposed study. Inclusion criteria were: age 18–40 years, both sexes, SAF members who are performing regular military tasks on daily basis, with no history of injuries in the last 6 months prior to recruitment, have not reported any musculoskeletal pain, with no history of chronic musculoskeletal, metabolic, pulmonary, neurological and cardiovascular diseases. Not satisfying an inclusion criterion, such as being under 18 or older than 40, having a musculoskeletal injury or illness that is either chronic or recent, was established as an exclusion criterion. Participants were instructed to avoid any strenuous physical activity for at least three days prior to the start of the first testing session and during the course of study, which was monitored by their superiors. They were requested to refrain from consuming ergogenic substances prior to the tests. Before the initial assessment on each testing day, a brief meeting was held to explain the study protocol in detail.

Experimental procedures

The experimental procedures were thoroughly explained in the recently published protocol of the current study18. In brief, all tests were performed at the facilities at the Faculty of Sport (intentionally deleted) between 8 a.m. and 1 p.m. The actual arrival of participants to the testing location was pre-planned and coordinated with the commanding officers. Each daily testing group had 6–8 participants. The timeline of testing procedures is explained in detail in Table 1. The study was approved by the Slovenian National Medical Ethics Committee, Ministry of Health (Ljubljana, Slovenia; reference number: 0120-495/2021/6) and conducted in accordance with the 1964 Declaration of Helsinki and its subsequent amendments.

Table 1 Timeline of the study protocol.

Body composition assessment

Body mass and height were measured using a stadiometer and scale anthropometer (GPM, Model 101, Zurich, Switzerland) to the nearest 0.1 cm, while body mass was assessed with multifrequency bioelectrical impedance (InBody 720: Biospace, Tokyo, Japan) to the nearest of 0.05 kg. Additionally, muscle mass (MM%) and fat mass (FM%) were calculated using manufacturer’s algorithm.

Physiological testing

A portable metabolic gas analyzer (K5, COSMED, Italy) was used to obtain physiological parameters. Simultaneously, a heart rate was measured by heart rate monitoring belt (Garmin Edge 830 Pack, Kansas, United States). The data were recorded in 5-s interval and automatically analyzed by using the original Polar software.

Continuous incremental running treadmill test

After 5 min of baseline measurements, while standing on the treadmill (HP Cosmos, Germany), the participants warmed-up at 8 km/h run and constant gradient of 1% inclination, as previously suggested19. Then, test was executed by running until volitional exhaustion where running speed was increased progressively by 2 km/h per minute. The achievement of VO2max was identified as the plateauing of VO2 (< 2.1 ml/kg/min decrease) despite an increase in workload20. If the above-stated criterion was not fulfilled, the participants were asked to perform a further constant-speed test equal or higher than the highest speed achieved at the end of the incremental test, as recommended21. Throughout the test, respiratory gases were continuously measured breath-by-breath and reduced to 10-s averages22. ERS was determined as the minimal running velocity that elicited VO2max over a period of 30 s.

Continuous 2-mile run test

The 2MR was used as a continuous field test and was performed on a 400-m synthetic athletic track with the supervision of the research team. Participants were required to complete the 2MR course without any physical help in the shortest time possible. The heart rate at the end of the 2MR was considered the HRmax achieved in the test.

30–15 intermittent fitness test

30–15IFT test consists of 30-s shuttle runs interspersed with 15-s active recovery periods. Running speed was set at 8 km/h for the first 30-s run and increased by 0.5 km/h in each 30-s phase thereafter, while the participants were required to run back and forth between two lines 40 m apart at the predetermined pace determined by a prerecorded beep like elswhere14,18. The speed of the last successfully completed stage was recorded as the test result, that is ERSIFT15, while the VO2max was calculated by the previously proposed formula15.

Statistical analysis

All data are be presented as mean ± SD with 95% confidence interval limits [95% CI]. All statistical analysis was done using the SPSS statistical software (version 27.0, IBM Inc, Chicago, USA). Descriptive statistics was used to summarize demographic characteristics of participants and outcomes. Normality od the data was confirmed by Shapiro–Wilk test, while the homogeneity of variances of normally distributed variables was tested by Levene’s test.

Differences in key physiological outcomes between the different CRF tests (TR vs. 2MR vs. 30–15IFT) were assessed by 1-way Analysis of variance (ANOVA). Cohens’ d (ES) was used to assess magnitude of difference between tests and was interpreted as: trivial: < 0.20, small: 0.20–0.50, moderate: 0.50–0.80, or large: > 0.8023. The relative reliability of all dependent variables between two 30–15IFT trials was estimated using the ICC, two-way random effects model (consistency type). ICC values were classified as: very high: > 0.90, high: 0.70–0.89, and moderate: 0.50–0.69. The following criteria was used to declare good reliability: CV < 5% and ICC > 0.6917. In addition, the standard error of the estimate (SEM) and the coefficient of variation (CV) were calculated as measures of absolute reliability, indicating the within-subject variation, as previously proposed24. To test the usefulness of the IFT, the spreadsheet provided by Hopkins was used25. Usefulness was determined by comparing the smallest worthwhile change (SWC) with the typical error of measurement (TE) and was interpreted as follows: Marginal: TE > SWC, OK: TE = SWC, and Good: TE < SWC25.

Relationship between VO2max, HRmax and ERSIFT, TR and 2MR was assessed using Pearson’s correlation (r). Also, the relationship between VO2max obtained from TR, 2MR and 30–15IFT was investigated. The following thresholds of the correlation coefficient were used to assess the magnitude of the relationships analyzed: weak: ≤ 0.35; moderate: 0.36–0.67; and high: ≥ 0.6826.

In addition, to determine performance differences between two groups of SAF members, a repeated-measures General Linear Model was used for main parameters estimated from different tests (TR vs. 30–15IFT vs. 2MR) as within-subject factor, whereas groups (HSG vs. LSG) was used as between-subject factors. Partial eta squared (η2) values of 0.01, 0.06 and 0.14 rated difference as small, moderate and high, respectively23. Statistical significance for all analysis conducted was accepted at p ≤ 0.05.

Results

Shapiro–Wilk’s test confirmed that all data were normally distributed (p > 0.05).

Similar VIFT (test: 17.25 ± 1.75 km/h; re-test: 17.43 ± 1.67 km/h) and VO2max-IFT (test: 49.24 ± 5.94 ml/kg/min; re-test: 49.76 ± 5.66 ml/kg/min) were observed between two 30–15IFT testing trials, while HRmax (test: 193.50 ± 8.97 b.p.m.; 190.85 ± 10.01 b.p.m) values differed (Table 2). A very high test–retest reliability (ICC > 0.90; CV% ≤ 1.82) (Table 2).

Table 2 Reliability analysis for end-running speed (ERSIFT), maximal heart rate (HRmax) and maximal relative oxygen consumption (VO2max) during 30–15 intermittent fitness test (30–15IFT).

The TE for ERSIFT (TE = 0.24 km/h), HRmax (TE = 0.29 bpm) and VO2maxIFT (TE = 0.22 ml/kg/min) were lower than presumed SWC (ERSIFT = 0.34 km/h; HRmax = 1.90 bpm; VO2maxIFT = 1.16 ml/kg/min) and thus, these measures were rated as “Good” (Table 2).

Significant differences were observed between TR, 30–15IFT and 2MR tests for ERS (high η2 = 0.623; p < 0.001), HRmax (high η2 = 0.158; p < 0.001), and VO2max (high η2 = 0.160; p < 0.001) (Table 3).

Table 3 Observed results for end-running speed (ERS), maximal heart rate (HRmax) and maximal relative oxygen consumption VO2max in treadmill running test (TR), 30–15 intermittent fitness (30–15IFT) test and 2-mile run (2MR) test.

Post hoc analysis showed significant differences between ERSIFT and ERSTR (MD: 3.60 km/h; large ES = 3.01; 95% CI [2.21; 3.80]; p < 0.001); ERSIFT and ERS2MR (MD: 5.29 km/h; large ES = 7.62; 95% CI [5.75; 9.47[; p < 0.001); ERSTR and ERS2MR (MD: 1.69 km/h; large ES = 1.54; 95% CI [1.03; 2.03]; p < 0.001); VO2maxIFT and VO2max2MR (MD: 6.68 ml/kg/min; large ES = 1.18; 95%CI [0.73; 1.59]; p < 0.001); HRmaxIFT and HRmaxTR (MD: 8.89 bpm; large ES = 1.37; 95% CI [0.90; 1.84]; p = 0.001); HRmaxIFT and HRmax2MR (MD: 8.32 bpm; large ES = 1.40; 95% CI [0.92; 1.87]; p = 0.001), respectively. Furthermore, a high correlation was observed between ERSIFT and VO2maxIFT (r = 0.930; p < 0.001), ERSTR (r = 0.786; p < 0.001), VO2maxTR (r = 0.695; p < 0.001), ERS2MR (r = 0.919; p < 0.001), VO2max2MR (moderate r = 0.621; p < 0.001) and running time on the 2MR (r = − 0.916; p < 0.001) (Table 4). Also, HRmaxIFT showed high correlation with HRmaxTR (r = 0.751; p < 0.001) and HRmax2MR (r = 0.816; p < 0.001) (Table 4).

Table 4 Pearson’s correlation coefficient (r) between end-running speed (ERS), maximal heart rate (HRmax) and maximal relative oxygen consumption VO2max parameters obtained during a treadmill running test (TR), 30–15 intermittent fitness (30–15IFT) test and 2-mile run (2MR) test.

HSG ERSIFT (mean difference [MD]: 2.57 km/h; 95% CI [1.31, 3.83]; p = 0.001), and predicted VO2maxIFT (MD: 8.79 ml/kg/min; 95% CI [4.83, 12.74]; p < 0.001) were higher than in LSG, whereas HRmax did not differ (p = 0.333) (Table 5).

Table 5 Comparisons between highest scoring group (HSG) and lowest scoring group (LSG) on Army Physical Fitness test in end-running speed (ERS), maximal heart rate (HRmax) and maximal relative oxygen consumption VO2max parameters obtained during a treadmill running test (TR), 30–15 intermittent fitness (30–15IFT) test and 2-mile run (2MR) test.

The differences observed between HSG and LSG were greater when values derived from IFT were considered, compared to TR and 2MR for both ERS and predicted VO2max (Table 5). Furthermore, HSG showed to have higher muscle mass percentage (MD: 4.27%; 95% CI [1.35, 7.19]; p = 0.006), lower fat mass (MD: − 7.09%; 95% CI [− 12.01, − 2.16]; p = 0.007), and body mass index (MD: − 2.69 kg/m2; 95% CI [− 5.03, − 0.35]; p = 0.026) compared to LSG (Table 5).

Discussion

The purpose of this study was to examine the reliability, criterion validity and usefulness of the 30–15IFT within SAF infantry members. In agreement with previous research on the topic, our results suggest that ERSIFT obtained during the 30–15IFT provides a reliable, valid and useful method of assessing soldier’s CRF. A very high reliability ratings (ICC = 0.971–0.975) were observed for ERSIFT, HRmaxIFT and VO2maxIFT, with small (ERSIFT and VO2maxIFT) to moderate (HRmaxIFT) test–retest differences. Although outcome measures obtained during the 30–15IFT demonstrated high correlations (r = 0.695–0.930) to the same measures obtained during the gold-standard continuous TR, ERS, HRmax and VO2max were higher in the 30–15IFT (Table 3). This is in good agreement with previous findings14,27. Also, by showing good rating for all 30–15IFT outcome measures, our results confirmed the usefulness of the 30–15IFT for testing soldier’s CRF.

The 30–15IFT has gained popularity among practitioners and researchers in the field of exercise science. Before it can be used as a field instrument to routinely monitor CRF in infantry members, it is important to investigate its physiometric properties such as reliability and criterion validity. Our data demonstrate both relative (i.e., ICC) and absolute (i.e., CV) reliability of all outcome measures collected with the 30–15IFT in this specific population. The comparable findings were observed elsewhere27,28,29. The most recent systematic review aimed to investigate the test–retest reliability of the 30–15IFT found that ICC values for ERS ranged from 0.80 to 0.99, whereas ICC values for HRmax ranged from 0.90 to 0.9729. In addition, the authors found that the CV values for ERS ranged from 1.5 to 6% (1.72% in the present study), whereas the CV values for HRmax ranged from 0.6 to 4.8%29 (1.29% in the presents study). The observed variation in ICCs and CVs between later studies might be prescribed to different population studied and a level of familiarization with the testing battery implemented29,30. The population studied comprised of athletes competing in different sports, and thus 30–15IFT test for some sports might be specific compared to others. Also, it is well known that an individual may become more proficient in each test with increased experience30. However, a small difference observed between test–retest trials for ERS (MD: 0.18 km/h), with SWC of 0.34 km/h, which is less than 1 stage (0.5 km/h), suggests absence of a learning effect in our sample. Furthermore, our results have shown that high reliability can be achieved with a single familiarization session and detailed instructions on how to perform the test.

Besides the reliability and the validity, the information on usefulness of the 30–15IFT test will allow sports scientist to make firm conclusion about whether the changes observed in response to strength and conditioning programming intervention are important or not24. Our results showed good usefulness rating for ERSIFT, which is in good agreement with previous findings28,31,32,33. For example, for ERSIFT, SWC was found to be marginal (SWC = 0.20 km/h, TE = 0.56 km/h), marginal (SWC = 0.20 km/h, TE = 0.31 km/h) and ok (SWC = 0.34 km/h, TE = 0.32 km/h), in female basketball players32, female football players27 and male futsal players31, respectively. Thus, comparable with results observed among athletic population the 30–15IFT test showed to be useful for testing and monitoring training intervention progress in military personnel. This data suggests that any improvements in performance as small as 0.5 km/h (one stage) can be considered real and meaningful in military personnel.

A comparison of HSG and LSG level soldiers showed differences in 30–15IFT. HSG reached greater ERSIFT (MD: 2.57 km/h) and had higher VO2maxIFT (MD: 8.79 mlO2/kg/min). A mean ERSIFT and VO2maxIFT differences between groups were greater than SWCs (ERSIFT > 0.5 km/h; VO2maxIFT > 1.16 mlO2/kg/min), suggesting that there were meaningful differences between HSG and LSG. Furthermore, HRmax is also considered useful for detecting meaningful performance changes in the individual as small as 2 bpm. This confirms previous findings in female football players27, and male rugby34 and futsal31 players. Although this is the first study aimed to investigate reliability, validity and usefulness of 30–15IFT in military personnel, our results are in good agreement with findings observed in athletic population27,34. Later studies revealed differences in 30–15IFT test performance between players of different levels, whereas for example a national team female football players achieved for 1.15 km/h greater ERS and had for 2.2 mlO2/kg/min greater predicted VO2max then national club players27. It is interesting to note that magnitude of differences observed between HSG and LSG were greater when values derived from IFT were considered, compared to TR and 2MR for both ERS and predicted VO2max (Table 3). This suggests that performance parameters obtained during the IFT test, such as ERS and VO2max, may be more sensitive markers for distinguishing between better and less prepared soldiers than the same variables derived from TR and 2MR tests. We believe these results can help practitioners working with military personnel to guide and optimize their strength and conditioning programming, especially when it comes to CRF interval training. The ERS obtained with the 30–15IFT showed to be 5.29 km/h higher than ERS obtained from 2MR test. This valuable information suggests that practitioners still using the 2MR test for CRF evaluation in military personnel and looking for the optimal running speed for interval training targets should add 5.29 km/h to the average speed obtained with the 2MR test to achieve the required aerobic speed.

Although the present study has many unique aspects, such as the novelty of the population studied and the direct comparison of the results of three different endurance tests, there are certain limitations in the current investigation that need to be pointed out. Due to the low proportion of female subjects, we were unable to conduct separate reliability analyses that considered the sex of the participants. However, women make up only 7% of the SAF's soldiers, and considering equal right and opportunities, our study is not far from the real situation in the SAF. Further studies aimed at establishing normative values for the 30–15IFT for SAF infantry members and investigating its relationship with occupation-specific test batteries such as the ACFT and/or the Marine Combat Fitness Test are warranted.

Conclusion

The results of this study show that the 30–15IFT test is a reliable, valid and useful tool for assessing CRF in military personnel. Moreover, the ERS and predicted VO2max values derived from the IFT test could be more sensitive markers of combat readiness than the same variables derived from the TR and 2MR tests. Therefore, the 30–15IFT test can be reliably used by researchers and strength and conditioning coaches working with military personnel to test and monitor cardiorespiratory adaptations to training interventions.