Physical fitness (hereinafter called just fitness) has been related to a person’s ability to perform physical activities that require aerobic capacity, endurance, strength or flexibility, which seems to be linked to inherited and environmental factors.1 Fitness has been considered a powerful marker of health, both, in childhood and in adulthood,2 independent of physical activity.3,4 A low fitness status during childhood and adolescence is associated with important health-related outcomes, such as increased future risk for obesity5 and cardiovascular diseases,6 impaired skeletal health,7 reduced quality of life8 and poor mental health.9 In spite of the healthy benefits of a high fitness, children and adolescents performance in fitness tests has declined over the last three decades.10

The most frequently evaluated fitness component is cardiorespiratory fitness (CRF).11 However, in the last years other fitness components such as flexibility, muscular fitness and speed/agility were evaluated and associated with health outcomes.9

Nowadays, fitness reference values in adolescents from different countries in America,12, 13, 14 Asia,10 Africa11 and Europe9 have been published. However, there is a scarcity of reference values for pre-pubertal children in Europe and in other continents, using harmonised measures of fitness in the literature.14 Reference values are necessary to classify children and to monitor the fitness status of the population. The IDEFICS study (Identification and prevention of dietary and lifestyle-induced health effects in children and infants)15 offers a good opportunity to establish normative values for a large set of fitness components using common and well-standardised methods in a large sample of children from eight European countries. Therefore, the aim of this study is to provide sex- and age-specific fitness reference standards for European children aged 6–9 years.

Materials and methods

Study design

The IDEFICS study15 is a prospective cohort study with an embedded controlled intervention aiming to prevent childhood obesity in a community-oriented approach. IDEFICS is a multi-centre study on lifestyle and nutrition among children aged 2–9 years from eight European countries (Sweden, Germany, Hungary, Italy, Cyprus, Spain, Belgium and Estonia).

In the first survey, data collection took place from September 2007 to June 2008 (T0, baseline survey); 2 years later the follow-up survey was conducted from September 2009 to May 2010 (T1, follow-up survey), where the present analysis is based on cross-sectional data only. A detailed description of IDEFICS sampling and recruitment approaches, standardisation and harmonisation process, data collection, analysis strategies, quality control measures and inclusion/exclusion criteria have been already published.15 The study was approved by the Research Ethics Committees of each institution and region involved. Parents signed an informed consent.

Study population

A cohort of 18 745 children aged 2–10.9 years was established including all children recruited at baseline and children newly recruited at thr first follow-up. Fitness was measured in children older than 6 years (N=10 302, 50.7% girls) fulfilling the inclusion criteria (having complete data on weight, height, age and sex). Because not all children took part in all components of the physical fitness battery, analysis groups and sample sizes vary for the different physical fitness tests (Figure 1). We compared these varying analysis groups with the overall study population of children older than 6 years. Since the groups of children who participated in the back-saver sit-and-reach test, the standing long jump or the handgrip strength test were nearly identical, we considered these children as one analysis group. The second analysis group consisted of children who participated in the 40-m sprint test; the third group of children who participated in the Flamingo balance test; and the children who conducted the shuttle run test formed the fourth group. Mean age varied between 7.6 and 7.7 years, and the percentage of boys was highest in the total study population with 49.4% and smallest (47.3%) in the third analysis group (Flamingo balance test). The prevalence of overweight and obese children was highest (25.3%) for children of group 1 compared with 24.5% in the total study population and smallest (20.9%) in the children participating in the shuttle run test. The most pronounced differences were seen with respect to the ISCED (International Standard Classification of Education) level where the maximum level of both parents was considered. The distribution of ISCED levels was rather similar in the overall study population, analysis groups 1 and 3, but fewer children from families with ISCED level 0–2 participated in the 40-m sprint (4.5%; analysis group 2) and in the shuttle run test (6.8%; analysis group 4) as compared with 10.5% in the overall study population, 10.3% in group 1 and 10.0% in group 3.

Figure 1
figure 1

Flow chart of the population involved in this study from the IDEFICS study.

Physical fitness

Components of the fitness tests were mostly adapted from the ALPHA health-related fitness test battery, and based on a published review their reliability has been shown in children and adolescents.9,16, 17, 18 The test battery thus included: the flamingo balance test, back-saver sit-and-reach test, handgrip strength test, standing long jump test and 40-m sprint test. Moreover, CRF was assessed by the 20-m shuttle run test (20mSRT).19

The protocols used for fitness testing are described in detail below:

The flamingo balance test measures the ability to balance successfully on a single leg. The child has to bend his/her free leg backwards and grip the back foot with his/her hand on the same side, and stand like this for 1 min. The child is given one try before to become familiar with the test. Then the number of attempts needed to stand on one leg for 1 min is counted for each leg. Children were excluded if they had put down the other foot 15 times or more within the first 30 s. The test score is calculated as the sum of attempts with both legs; lower scores indicate better performance.

The back-saver sit-and-reach test measures the flexibility of the hamstring muscles and it was conducted in the FITNESSGRAM battery.20 The test is performed with a standard box with a scale on the top. The participant is required to sit with the untested leg bent at the knee; the tested leg is placed straight with the foot placed against the box. In the back-saver sit-and-reach test, only one leg was evaluated at a time. The participant slowly reached forward as far as possible. The back-saver sit-and-reach test is similar to the traditional sit-and-reach test, except that the measurement is performed on one side at a time, so a specific score is obtained from each side. The score is calculated as the average of both sides; higher scores indicate better performance.

The handgrip strength test measures the maximal isometric force that can be generated mainly by the forearm. The child stays in a standard bipedal position with the arms in complete extension holding the dynamometer (TKK 5101; Takei, Tokyo, Japan) without touching any part of the body with it. The dynamometer is adjusted to sex and hand size for each child.21 The measurement scale starts with 5 kg. Children who did not reach this minimum (N=5; 0.1%) were scored as 2.5 kg (average 0–5 kg). The score is calculated as the average of right and left handgrip strength; higher scores indicate better performance.

The standing long jump test22 assesses lower-limb explosive strength. The child jumps as far as possible off the stand, trying to land with both feet together and maintaining the equilibrium once landed (it was not allowed to put the hands on the floor). The score was obtained by measuring the distance between the last heel-mark and the take-off line. Two tries were allowed and the best score was retained. Higher scores indicate better performance.

The 40-m sprint test measures the maximum running speed of the child. This test is carried out along 40 m delimited by five marker cones aligned, within a distance of 10 m between each two neighbouring cones. With 3-m distance, five more marker cones are placed in parallel marking the running track. The child is instructed to run as quick as he/she can, after the starting signal. Two tries were allowed, the best score was retained. In this test lower scores indicate better performance.

The 20mSRT estimates the CRF (aerobic capacity). Children run back and forth between two lines 20-m apart following beep signals played from a pre-recorded CD. The test is finished when the child stops owing to fatigue or when he/she does not reach the line in time with the audio signal on two consecutive occasions. During the fieldwork, 20mSRT was performed using four different versions: multistage fitness test, sports coach UK (applied in Germany, Estonia and Cyprus); Leger test, CAEP Quebec Faca (applied in Spain and Hungary); multistage fitness test (applied in Sweden) and Uithouding shuttle run test (applied in Belgium). During the fieldwork, 20mSRT was assessed in shuttles. Results were unified according to the Leger test protocol. The initial speed in the Leger test starts in 8.5 km h−1, with progressive increases of 0.5 km h−1. Taking into account the speed in the other three protocols, we estimated the equivalent shuttles for every protocol. Shuttles were converted to stages in order to calculate maximal oxygen consumption (VO2max) using Legeŕs equation.19 A greater number of shuttles indicate better performance. This test was not performed in Italian children (N=2440; 51.8% girls).

Anthropometric measurements

International guidelines for anthropometry in children were used in the IDEFICS study.23,24 Body weight (kg) and height (cm) were measured in barefoot children, clothed in underwear, using an adapted version of electronic scale TANITA BC 420 SMA (Tanita Europe BV, Amsterdam, The Netherlands), precision 100 g, range 0–150 kg and a portable stadiometer (seca 225, seca, Birmingham, UK), precision 0.1 cm, range 70–200 cm, respectively. Body mass index was calculated as body weight (kg) divided by height (m) squared.

Statistical analysis

We calculated percentile curves of the physical fitness outcome variables as a function of the covariate age stratified by sex using the General Additive Model for Location Scale and Shape (GAMLSS) method. The GAMLSS method is an extension of the LMS method that models three parameters depending on one explanatory variable: M accounts for the median of the outcome variable and the coefficient of variation, (S) accounts for the variation around the mean and adjusts for non-uniform dispersion, whereas the skewness (L) accounts for the deviation from a normal distribution using a Box–Cox transformation. The GAMLSS method is able to particularly model the kurtosis using other distributions and to include >1 covariate. We used the gamlss package (version 4.2-6)25 of the statistical software R (version 3.0.1).26 Different distributions, that is, the Box–Cox power exponential (BCPE), Box–Cox Cole and Green, inverse Gaussian and Delaporte distribution were fitted to the observed distribution of the physical fitness outcome variables. Moreover, the influence of age on the parameters of the considered distributions were modelled either as a constant, as a linear function or as a cubic spline. Goodness of fit was assessed by the Bayesian Information Criterion and Q–Q plots to select the final model including the fitted distribution of the physical fitness outcome variables. Worm plots were used as a diagnostic tool to assess whether adjustment for kurtosis was required.27 Finally, percentile curves for the 1st, 3rd, 10th, 25th, 50th, 75th, 90th, 97th and 99th percentiles were calculated based on the model that showed the best goodness of fit.25,28 For the Flamingo balance test only the 50th, 75th, 90th, 97th and 99th percentile curves are presented, because lower percentile curves of the best model did not correspond to the percentage of values below the percentile curves, that is, for example, instead of the targeted 25% of the girls’ values there were only 12.6% of the girls’ values below the 25th percentile curve. Similar results were observed for the 3rd, 10th and 25th percentile curves of the flamingo balance test in boys and girls.

The chosen GAMLSS models in boys and girls are listed in Table 1; for example, the best model for the 40-m sprint test in boys and girls was achieved with a Box–Cox power exponential distribution where the four parameters were modelled as follows: the location parameter μ linearly, the scale parameter log(σ) and the shape parameters ν and log(τ) as constants. Taking account the discrete distribution of the data from the flamingo balance test in contrast to all other models, the discrete Delaporte distribution was chosen.

Table 1 Chosen GAMLSS models to calculate the physical fitness reference values


Tables 2, 3, 4 and Figures 2,3,4 show the sex- and age-specific reference values (P1, P3, P10, P25, P50, P75, P90, P97 and P99) for the different fitness tests in European children.

Table 2 Percentiles of coordination/equilibrium; flexibility and speed/agility calculated with GAMLSS in normal-weight children
Table 3 Percentiles of upper-limb maximal strength and lower -limb explosive strength calculated with GAMLSS in normal-weight children
Table 4 Percentiles of cardiorespiratory fitness: 20-m shuttle run test (shuttles and VO2max (ml kg−1 min−1)) calculated with GAMLSS in normal-weight children
Figure 2
figure 2

Percentile curves of the flamingo balance test (attempts), back-saver sit-and-reach test (cm) and 40-m sprint test (s) at median age in normal-weight children.

Figure 3
figure 3

Percentile curves of handgrip strength test (kg) and standing long jump test (cm) at median age in normal-weight children.

Figure 4
figure 4

Percentile curves of the 20-m shuttle run test (20mSRT) estimates the CRF (aerobic capacity) at median age in normal-weight children.

Boys performed better than girls in speed/agility (40-m sprint), muscular strength (handgrip strength and standing long jump) and CRF (20mSRT; shuttles and VO2max). Nevertheless, girls performed better than boys in balance (flamingo) and flexibility (back-saver sit-and-reach).

Older children performed better than younger children in all tests, except P50 in back-saver sit-and-reach (in both sexes) and <P90 in 20mSRT (VO2max in both sexes).

Finally, for all fitness tests, the range of fitness levels between P3 and P99 is wider for a given age than the range of fitness levels across age groups and between sexes.


The main findings of this study are (1) boys performed better than girls in speed, lower- and upper-limb strength and CRF; (2) girls performed better in balance and flexibility; (3) overall, older children performed better than younger children; and (4) the range of fitness levels between P3 and P99 is wider for a given age than the range of fitness levels across age groups and between sexes.

The majority of the fitness tests used in the IDEFICS study was selected based on the scientific evidence that showed associations with different health outcomes.29 These tests have also been shown to be valid, reliable and feasible for health-monitoring purposes at the population level.16 Fitness has been identified as a predictor of morbidity and mortality for all causes.30, 31, 32, 33 In this regard, various studies have shown that low fitness in children and adolescents is associated with adiposity,7,34,35 cardiovascular risk factors36,37 and skeletal38 and mental health.39

In 2009, Ruiz et al.9 systematically reviewed whether fitness in childhood was a predictor of cardiovascular disease risk factors, events and syndromes, quality of life and low back pain later in life. The authors found moderate evidence for the association between changes in CRF and cardiovascular risk factors, and between CRF and the risk of developing metabolic syndrome and arterial stiffness. In addition, CRF has also been widely studied in relation to metabolic risk and adiposity.36,37,38 Interestingly, handgrip strength in Swedish male adolescents (aged 16–19 years) has been identified as a risk factor for the major causes of death in adulthood (<55 years).6 Moreover, results from the Amsterdam Growth and Health Study, following adolescents aged 13 years until the age of 27 years, showed that the longitudinal improvements in VO2max were related to a healthy cardiovascular risk profile.41

To the best of our knowledge, our study is the first in providing cutoff values of sex- and age-specific fitness for pre-pubertal European children. These values may be useful in identifying children being at higher risk for developing unfavourable health outcomes owing to their low fitness level. Previously, Casajus et al.42 published physical fitness levels in children from Aragon (Spain) aged 7–12 years. Our results are roughly comparable to these with respect to flexibility (they used sit-and-reach), muscular strength (handgrip strength and standing long jump) and CRF (20mSRT, VO2max) only for those children aged 7–9 years. Similar values were observed in both studies for these fitness tests in boys and girls. For example, P50 of sit-and-reach was 17.3cm vs 19.4 cm for Spanish and European boys, respectively, and 19.3cm vs 21.9 cm for Spanish and European girls, respectively. P50 of handgrip as average of the right and left side was 12.7 kg vs 11.8 kg for Spanish and European boys, respectively, and 11.6 kg vs 10.8 kg for Spanish and European girls, respectively. P5 of standing long jump was 117.2 cm vs 120.5 cm for Spanish and European boys, respectively, and 104.5 cm vs 110.7 cm for Spanish and European girls, respectively. Finally, P50 of 20mSRT was 48.1 ml kg−1 min−1 vs 47.3 ml kg−1 min−1 for Spanish and European boys, respectively, and 46.7 ml kg−1 min−1 vs 46.2 ml kg−1 min−1 for Spanish and European girls, respectively.

Castro-Pinero et al.44 published percentile values for muscular strength (standing long jump)43 and CRF (20mSRT, stages) in children and adolescents from Cadiz (Spain) aged 6–17 years. Our results are also comparable to those from Castro-Pinero et al.43,44 for those children aged 6–9 years. For example, P50 of standing long jump was 111.6 cm vs 120.5 cm for Spanish and European boys, respectively, and 114.8 cm vs 110.7 cm for Spanish and European girls, respectively. In addition, P50 of 20mSRT (stage) was 2.7 vs 2.0 for Spanish and European boys, respectively, and 1.9 vs 2.0 for Spanish and European girls, respectively.

In adolescent populations, Ortega et al.45 first published European fitness reference values for 12.5–17.5-year-old youths from 10 cities (HELENA study); reporting sex- and age-specific physical fitness levels. In this line, Haugen et al.46 recently reported normative fitness data for Norwegian 13–15-year-old adolescents. A systematic review10 identified Australian studies reporting normative fitness data for children and adolescents aged 9–17 years. In addition, Tremblay et al.14 reported normative data for aerobic fitness, flexibility and muscular strength for Canadian 6–19-year-old youths; however, only the 50th percentile was reported. The present results are comparable to those obtained by Tremblay et al.14 for flexibility (sit-and-reach) and muscular fitness (handgrip strength) in Canadian children (for the age range 6–10 years). For these fitness dimensions, European children performed worse than the Canadian ones. For example, in the study of Tremblay et al.,14 the performance in sit-and-reach at P50 of 6–10-year-old children was 25 cm and 30 cm in boys and girls, respectively. In the present study, the performance in back-saver sit-and-reach at P50 of 6–9-year-old children was 19.4 cm and 21.9 cm for boys and girls, respectively. In addition, in the study of Tremblay et al.,14 the performance in handgrip strength at P50 was 25 kg and 22 kg in boys and girls, respectively, calculated as the sum of the best right- and left-hand attempt. In our study, the performance in handgrip test at P50 of 6–9-year-old children was 23.2 kg and 21 kg for boys and girls, respectively. The differences observed in flexibility might be explained by slightly different methodologies. Tremblay et al.14 measured flexibility by using sit-and-reach and using the best performance of two attempts. In our study, the back-saver sit-and-reach values were calculated as the average of the left and right sides.

There is scarcity of data on reference standards of fitness for children. Hence, the presented European data are relevant in various respects. A fitness level below P5 may be considered as potentially pathologic since, for example, low scores on CRF and handgrip tests are associated with cardiovascular risk. Therefore, it may be recommended to monitor children with a fitness level below this percentile for cardiovascular risk markers.47,48 For a practical use of these data, schools, sport clubs and so on may consider to take into account a Likert type scale to classify childreńs performance X, as follows: very poor (X<P10); poor (P10X<P25); medium (P25X<P75); good (P75X<P90); and very good (XP95).

Some limitations should be considered when interpreting the findings in this study. As already mentioned before, the study was not planned to be representative with respect to the broad range of variables that we investigated, but given the population-based approach our study sample should be considered as an unselected population. As the children were free to opt out of examination modules, the analysis groups and the sample sizes of children participating in the various components of the test battery to assess physical fitness varied from test to test. However, comparing the respective analysis groups with the overall sample of 6–10.9-year-old children, no major differences were detected with respect to mean age, sex distribution and prevalence of overweight and obesity. It would have been interesting to investigate whether the reliability of the fitness tests differs in apparently pre-pubertal children and adolescents. However, this was not feasible within our IDEFICS sample but may be investigated in later follow-ups of the cohort. The main strengths of this study are the large and heterogeneous sample of European children, the standardised use of well-known and validated health-related fitness tests and a strong statistical method to obtain normative values of fitness tests.

In summary, our results provide sex- and age-specific physical fitness reference standards in European children. These data in children aged 6.0–9.9 years complement the study published by Ortega et al.45 in adolescents aged 12.5–17.5 years. Unfortunately, there is a gap between 9.9 and 12.4 years without updated fitness reference data at the European level, which has to be filled. In the meanwhile, country-specific data such as those commented above should be used in children of this age.