Introduction

An ultra-marathon is any race that exceeds the traditional marathon distance of 42.195 km or lasts longer than 6 h1,2. These races can be distance-limited (such as 50 km or 100 km)3 or time-limited (such as 6 h or 12 h)4. As such, a 50-km run is considered the shortest distance-limited ultra-marathon. The 50 km distance is the most popular ultra-marathon race, offering a challenging but achievable step up from the traditional marathon distance2. There has been significant scientific interest in this specific ultra-marathon race format. Studies have explored a range of topics related to this type of race, including performance trends5,6, the age at which the best performances are achieved5,7,8, age-related performance decline5,9, the sex difference in performance5,10,11,12, the aspect of inflammatory processes13, nutritional aspects14, exercise-induced influences on the heart15,16,17,18 and the foot strike pattern19,20.

One aspect that has not been explored is the country of affiliation of the fastest 50-km ultra-marathoners. It is widely acknowledged that runners from East African countries are the fastest in the marathon distance21. However, little comparable information is available for ultra-marathon races, especially for shorter distances like the 50-km race. Previous studies have investigated the country of affiliation of top performers in longer ultra-marathons, such as the 100-km22,23 and 100-mile races24.

One study found that Japanese runners were the fastest in the 100-km ultra-marathons23, while another study reported that Russians were the fastest in the same distance race22. For 100-mile ultra-marathons, the fastest women were found to be from Sweden, Hungary, and Russia, while the fastest men originated from Brazil, Russia, and Lithuania24. The exceptional results of the Russian athletes during this time span could have been affected by doping and the use of illicit substances25,26.

Regarding the age group presenting better results in long-distance events, Rust et al.27 demonstrated that in a 100-km ultra-marathon, runners in the age group 18–24 years were slower than runners in the older age groups. Therefore, it seems that very young runners do not achieve the best results in long-distance races. However, it is interesting to know which age group has the best results in 50-km ultra-marathon.

The aim of this study was to identify the country of affiliation and age of the fastest 50-km ultra-marathon runners, as well as the locations of the fastest races. While the dominance of East African runners in traditional marathon events is well documented, similar insights for ultra-marathons, especially at the shorter 50-km distance, are noticeably absent. Previous studies have explored the origins of top performers in more extended ultra-marathons, such as the 100-km and 100-mile races. However, to the best of our knowledge, this is the first study to analyze the country of affiliation for elite athletes in the 50-km category.

Based on the findings of previous studies, we hypothesized that there would be a significant diversity in the nationalities of the leading runners in the 50-km ultra-marathon, with a specific expectation that Russian athletes would emerge the fastest. Additionally, we predict that runners aged 35 and older will achieve the fastest times.

Methods

Data set and data preparation

For this study, official race results from the official DUV website (https://statistik.d-u-v.org) were obtained. Each race record included the participants’ age, gender, country of affiliation, the event location and year, and the average race speed in km/h. Data processing entailed general clean-up (discarding duplicates and incomplete/erroneous records) and classifying records into 5-year age groups. The country variables (‘athlete country’ and ‘event country’) showed high cardinality and records from countries with samples smaller than 10 records were filtered out to reduce noise.

Statistical analysis

Histograms of the number of records and the average race speed by age/age group were visualized, displaying approximate Gaussian distributions. The variables ‘athlete country’ (the athlete’s country of affiliation) and ‘event country’ (the country where the race took place) were used to rank the countries by average race speed by aggregating by the country columns and then sorting by average race speed. Records from countries with less than 10 records were removed from the set to reduce noise and ensure that the results were statistically representative.

The resulting dataset contained 1,398,845 race records from 549,154 unique runners from 122 countries, participating in 50-km races held in 86 countries worldwide between 1894 and 2022. The descriptive statistical data in the ranking tables includes the number of records and the mean, standard deviation (std), max, and min values of the race speed.

In addition to this, an XG Boost regression model was built, with the following variables used as predictors or inputs to the model:

  • Athlete_gender_ID

  • Age_group_ID

  • Athlete_country_ID

  • Event_country_ID

These variables are the encoded versions of the original variables (‘athlete gender’, ‘age group’, ‘athlete country’, and ‘event country’). Athlete_gender_ID was encoded as 0 = female and 1 = male. Age_group_ID was encoded as the lowest value included in the age group (“18–24” becomes 18, “25–29” becomes 25, etc.). The Athlete_country_ID and Event_country_ID variables were encoded as per the country’s position in the descriptive country ranking tables. The predicted variable, or model output, was ‘race speed’ (km/h). Two evaluation metrics, MAE and R2, were calculated to assess the model's accuracy and behavior, along with the model relative features importance and prediction distribution plots. Following some basic hyper-parameter tuning, the model was trained and tested with the full sample (in-sample testing).

To further qualify the results, an MLR (Multivariate Linear Regressor) model and four individual ULR (Univariate Linear Regressor) models—all based on the OLS (Ordinary Least Squares) method were made. The results were then compared to the XGBoost model results to quantify the statistical importance of the variables. All data processing and analysis were done using Python (http://www.python.org/) and a Google Colab notebook (https://colab.research.google.com/).

Ethical approval

This study was approved by the Institutional Review Board of Kanton St. Gallen, Switzerland, with a waiver of the requirement for informed consent of the participants as the study involved the analysis of publicly available data (EKSG 01/06/2010). The study was conducted in accordance with recognized ethical standards according to the Declaration of Helsinki adopted in 1964 and revised in 2013.

Results

After the dataset was processed and countries with less than 10 records were discarded, the dataset used to train and evaluate the model contained 1,398,845 race records (1,026,546 for men and 372,299 for women) from 549,154 unique runners from 122 countries, participating in races held in 86 different countries between 1894 and 2022. During this period, the overall number of women and men marathoners increased, and the men-to-women ratio decreased (Fig. 1).

Figure 1
figure 1

The number of runners and men-to-women ratio over the years.

Runners’ country ranking

The dataset was grouped by the values in the ‘athlete country’ column and subsequently arranged in descending order based on the average (mean) race speed, with the fastest at the top. The resulting 122-country ranking is shown in Table 1, where the fastest runners originated from African countries (Ethiopia (ETH), Lesotho (LES), Malawi (MAW) and Kenya (KEN)).

Table 1 Athletes’ country sorted by average (mean) race speed.

Event country ranking

The dataset was aggregated based on the values in the ‘event country’ column, followed by sorting according to the average (mean) race speed, placing the fastest at the top. Table 2 displays the resulting ranking of 86 countries. The fastest race times were achieved in races held in Europe (Luxembourg (LUX), Belarus (BLR), and Lithuania (LTU)) and the Middle East (Qatar (QAT) and Jordan (JOR)).

Table 2 List of event countries sorted by average (mean) running speed.

Model interpretability charts

The charts and plots presented in the Figs. 2, 3, 4, 5, provide a detailed visualization that combines a descriptive view of the full 50-km race sample with the predictive model insights. For each of the four predicting variables (age, gender, country of affiliation and country of event), a set composed of three charts is shown. A prediction distribution chart at the top as a boxplot chart with the 2nd quartile (median value) in the box label, a red line chart in the middle, representing the average race speed for each group, setting a target for the model prediction distributions, and a counting chart at the bottom showing the number of race records for each value of the predictor or group. For the ‘Age group’ and ‘Athlete gender’ predictors, all values were displayed. Still, for the ‘Athlete country’ and ‘Event country’ predictors, only the first 20 (the fastest 20) were displayed because of high cardinality (these match the top 20 countries in the ranking tables).

Figure 2
figure 2

Model interpretability charts for feature: Gender. (a) Predictions of race speed (km/h); (b) Average race speed (km/h); (c) Records count.

Figure 3
figure 3

Model interpretability charts for feature: Age group. (a) Predictions of race speed (km/h); (b) Average race speed (km/h); (c) Records count.

Figure 4
figure 4

Model interpretability charts for feature: Athlete country. (a) Predictions of race speed (km/h); (b) Average race speed (km/h); (c) Records count.

Figure 5
figure 5

Model interpretability charts for feature: Event country. (a) Predictions of race speed (km/h); (b) Average race speed (km/h); (c) Records count.

Men (8.17 km/h) were faster than women (7.38 km/h) (Fig. 2). Age group 20 (ages 20 to 24 years) was the fastest (8.34 km/h) in the 50-km races with running speed decreasing with age (Fig. 3).

The fastest runners’ countries were Ethiopia (ETH, 14.09 km/h), Lesotho (LES, 13.05 km/h), Malawi (MAW, 12.35 km/h), and Kenya (KEN, 12.34 km/h) as per the ‘athlete country’ ranking table (Fig. 4). The fastest 50-km races took place in Luxemburg (LUX, 11.36 km/h), Belarus (BLR, 11.31 km/h), Lithuania (LTU, 11.21 km/h), Qatar (QAT, 11.16 km/h), and Jordan (JOR, 10.70 km/h) as in the ‘event country’ ranking table (Fig. 5). The visualization further reinforced the earlier findings regarding the higher speed of runners from certain African countries and the optimal race conditions in specific European and Middle Eastern locations.

Evaluation metrics and features importance

The model for the 50-km race class exhibits an R2 = 0.36 coefficient of determination value, which indicates a weak but existing association of the predicting variables with the model output. In terms of feature importance, ‘Event country’ was the most important predictor (66%), followed by ‘Athlete gender’ (23%), ‘Age group’ (7%), and ‘Athlete country’ (5%) (Fig. 6). This hierarchy underscores the relative impact of these factors on race performance,

Figure 6
figure 6

Evaluation metrics and features importance. Sample size 1,398,845. XGBoost trees 500. MAE (km/h 1.39). Feature importance Event_country_ID 0.66, Athlete_gender_ID 0.23, Age_group_ID 0.07, Athlete_country_ID 0.05.

The MLR model achieved an R2 = 0.325, which is only marginally worse than our XGBoost results. All four predictors contribute statistically significantly to the MLR model output with a P value of 0.000 in all cases. The ULR models showed, although statistically significant, little statistical importance for ‘Athletes gender’ with an R2 = 0.025 and ‘Age group’ with an R2 = 0.006, while ‘Event country’ proved significant with R2 = 0.279 and Athlete country with R2 = 0.260 indicating that the ‘Athlete country’ and the ‘Event country’ variables are nearly equally important when used individually, suggesting a high correlation between them (e.g. runners in each country events were mostly affiliated to that same country).

Discussion

The primary objective of this study was to investigate the country of affiliation of the fastest ultra-marathoners in the 50-km race category. Another aim was to identify the countries where the fastest 50-km ultra-marathon races are held and the age of the fastest runners participating. The main findings were (i) the fastest runners in the 50-km ultra-marathon originate from African countries (Ethiopia, Lesotho, Malawi, and Kenya), (ii) the countries with the fastest 50-km racecourses are in Europe (Luxembourg, Belarus, and Lithuania) and in the Middle East (Qatar and Jordan) and (iii) the age group 20–24 years showed the fastest 50-km ultra-marathon times. The results refute the authors' hypothesis since the fastest runners were from African countries, and the age group of the fastest 50-km ultra-marathon runners was younger than expected.

Runners from Ethiopia, Lesotho, Malawi, and Kenya are the fastest 50-km ultra-marathoners

The first finding was that runners from Ethiopia, Lesotho, Malawi, and Kenya were the fastest 50-km ultra-marathoners. Several factors contribute to the prevalence of runners from East African countries in long-running events, such as marathons and ultra-marathons. These include a genetic predisposition, adherence to a traditional diet, living and training at high altitudes, and sociocultural background28,29. It is important to note that the country's infrastructure requires Ethiopians to daily walk or run with heavy school bags for long periods of time30,31.

For a significant period, there has been a prevailing suggestion that genetic background significantly influences sporting potential by determining the anthropometric, cardiovascular, and muscular characteristics contributing to adaptation during physical training32. This has suggested that runners from East African countries possess an inherent genetic advantage that predisposes them to superior athletic abilities32. Genetic studies conducted on elite African runners have not identified any unique genetic makeup; instead, they underscore the substantial genetic diversity among the general population and elite runners from East African countries33. Based on the available evidence, the subjects’ phenotype, shaped by various factors over time, exerts a greater influence on their success in long-distance running than their genotype34.

However, Kenyan runners have been found to exhibit a significantly higher activity of the enzyme hydroxylacyl-CoA dehydrogenase, which plays a crucial role in generating energy from lipids35. This suggests that Kenyan runners may have a more efficient ability to derive energy from lipid sources than some of their competitors35. Currently, there is no available information regarding enzymatic activity among elite Lesothan, Malawian or Ethiopian distance runners.

Larsen et al.36 examined the anthropometric characteristics of Kenyan distance runners, revealing that their legs were 5% longer compared to elite distance runners from Scandinavian countries. Additionally, the Kenyan runners had thinner and lighter calves, weighing 12% less when compared to runners from Scandinavian countries. Supporting these findings, Saltin et al.37 demonstrated that Kenyan distance runners exhibited greater metabolic efficiency, particularly at race-pace running speeds, compared to runners from Scandinavian countries. These observations suggest that the inherent ectomorphic somatotype of elite Kenyan runners may contribute to their success on the track and roads by enhancing their biomechanical and metabolic efficiency. However, it should be noted that there is a lack of evaluation regarding the biomechanical and metabolic efficiency of Lesothan and Ethiopian distance runners.

A study examining the dietary patterns of long-distance runners from Africa has revealed that they comply with most nutritional guidelines for endurance runners38. The traditional Ethiopian diet consists of 13% protein, 23% fat, and 64% carbohydrates38. The traditional Kenyan diet consists of 10% protein, 13% fat, and 77% carbohydrate39. The national dish of Lesotho is a fermented sorghum porridge40. Some staple foods include cornmeal porridge covered with a sauce consisting of vegetables40. The carbohydrate portion of the diet primarily consists of vegetables, fruits, rice, and unrefined sugar40. Malawi’s culinary culture revolves around integral ingredients such as sugar, corn, potatoes, sorghum, and fish, including the staple food Nsima made from ground corn41. People from African countries have consumed these low-fat, high-carbohydrate diets for centuries, and their composition is consistent with research-based recommendations for endurance runners29. While these diets seem beneficial for training and excelling in middle- and long-distance running competitions, they do not appear to possess unique differences compared to the training diets of runners from other continents29. As a result, other factors beyond food play a significant role in determining athletic superiority, as these diets are unlikely to provide a significant distinctive competitive advantage. It is important to highlight that the high-carbohydrate diet maintains muscle glycogen but has a negative effect on high-intensity exercise performance42. However, that may not be an issue considering lower intensities as long-distance running.

Certain factors, such as total hemoglobin mass, may be influenced by the environment where elite runners from the Kalenjin people in Kenya and the Arsi people in Ethiopia live and train29. The Kalenjin and Arsi people have a long history of residing at higher altitudes ranging from 2000 to 2500 m29. In particular, Ethiopian elite runners originate from high-altitude areas exceeding 4000 m, with approximately 80% of the population residing at or above 2000 m43. Malawi’s central plateaus, reaching 760 to 1370 m, cover approximately three-fourths of the entire land area44. Lesotho is the sole sovereign nation on Earth that exists entirely at an elevation surpassing 1000 m45. Consequently, its lowest point, reaching a remarkable altitude of 1400 m, is the world's highest among all countries. More than 80% of Lesotho’s landmass resides at elevations exceeding 1800 m46. The environmental context of living and training at higher altitudes could potentially contribute to developing specific physiological characteristics, including total hemoglobin mass29.

For many people of Ethiopia, Lesotho, Malawi and Kenya, running is a routine aspect of daily life, often utilized for transportation or as part of household chores, and children frequently start running at a young age as their main method of travel to school43,47. A theory suggests that long-distance runners may achieve a higher maximal oxygen uptake (\({\dot{\text{V}}\text{O}}_{2}\) max) due to their early exposure to extensive walking and running29. Again, this could explain their exceptional endurance-running performance in later years.

African countries, most prominent in Ethiopia and Kenya, have a strong running tradition, and many experienced coaches and trainers work with young runners to develop their skills and talents28,48,49. These countries have a well-established infrastructure for running, with numerous running camps and facilities that support the development of elite runners48,49,50,51. It is important to remember that extraordinary athletic achievements among specific populations undoubtedly result from the successful combination of numerous factors.

Considering the example of a marathon race that has ~ 8 km shorter distance than 50-km-races, with the exception of a study52, there has been a consensus that Kenyans and Ethiopians were the fastest runners. The exception was the analysis of "World Athletics" fastest marathon runners from 1999 to 2015, which found that Latvians and Ethiopians were the fastest women and men, respectively52. On the other hand, a study of trends in the "New York City Marathon" from 2006 to 2016 as well as separate research of 50 years, showed that Kenyans and Ethiopians were the fastest53. This observation was confirmed in another popular American race, i.e., the Boston Marathon, analyzing data from 1972 to 2018 as well as from 1897 to 201754,55. Moreover, these two East African nationalities were the fastest in the "World Marathon Majors" (Boston, Berlin, Chicago and New York) and the "Stockholm Marathon" from 2000 to 201456, as and marathon races held in Switzerland from 1999 to 2014 confirmed this observation57. Thus, the findings of the present study in 50-km races agreed with those in marathon races, which might be explained by the high affinity of these two race distances. The models predictor showed the highest output values for African countries underlining the expected result. It might be assumed that similar physiological and psycho-social characteristics would play an important role in performance for these race distances.

The effect of the socioeconomic status of the participants should not be ignored. Due to the higher socioeconomic status and, in turn, a higher participation rate of athletes from Europe, Asia, or North America, compared to the participation rate of African countries, their sample will forcefully be more heterogeneous with a lower average speed, resulting in a comparable higher average speed of athletes from African countries. Further studies could attempt a stepwise analysis to gain more insight into the prediction strength of the athletes’ country affiliation itself.

The fastest 50-km racecourses are in Europe (Luxembourg, Belarus, and Lithuania) and the Middle East (Qatar and Jordan)

Another significant finding was that the fastest mean race times were recorded in races held in Europe (Luxembourg, Belarus, and Lithuania) and in the Middle East (Qatar and Jordan). This result is highlighted by the highest output of the event country predictor.

Although Luxembourg has the fastest mean race times, it should be considered an outlier. The high mean race speed is due to the exceptionally high minimal race speed compared to the other countries. Based on the minimal race speeds, which elevate the mean race speed, we can assume that the participating runners were well above the average participant in other races. This effect should be considered a limitation of this study since a high distribution of lower-performing runners will skew the mean downward, as seen in the example of the United States of America, which has the highest average race speed but is downgraded by the high number of slower participants. Upon that, for precise measures, events with mean race speeds should not be considered in future analysis.

All the races mentioned above share a common characteristic—they are held on flat courses with minimal elevation changes. The racecourses in Belarus (indoor)58, Lithuania (road race)59, Qatar (road race, flat trail race)60, and Jordan (road race)61 are known for their flat terrain, which greatly contributes to achieving faster race times. These races offer smoother terrain and predictable conditions, allowing runners to maintain a steady pace without hindrance from inclines or steep descents. In addition, a study has demonstrated that flat terrain race results have been affected by the new advanced shoe technology62.

In contrast, trail running races are characterized by a sequence of off-road sections that involve uphill and downhill segments, resulting in significant physiological and mechanical changes63,64. Uphill sections involve prolonged and intense concentric muscle actions, while downhill sections require eccentric actions in the lower limb muscle–tendon unit65. These muscle actions and the duration of contractions differ from those in level road running, which primarily involve repetitive and continuous stretch–shortening cycles in the lower limb extensors66. In level road running, the upward and downward movements of the center of mass are generally balanced, along with the positive and negative external work within each step67. However, during incline running, the “bouncing” mechanism gradually diminishes as speed and slope increase67. On positive slopes, the step period decreases, and the body’s downward movement is reduced, while on negative slopes, the step period increases, and the upward movement decreases67,68. Steep changes in slope also lead to noticeable alterations in ground reaction forces, including a decrease in normal impact force peaks and parallel braking force peaks, accompanied by an increase in parallel propulsive force peaks68. Consequently, the repeated variations in slope and the associated mechanical responses in trail running races are likely to influence the manner of muscular contraction and metabolic demands69. To sum up, flat terrain plays a crucial role in achieving faster race times by providing more predictable conditions that enable runners to maintain a steady pace70. This allows runners to sustain their rhythm throughout the race and optimize their energy usage.

It is important to consider that the racecourse alone does not determine the entire outcome of the race. Again, factors such as runners’ preparation, training methods, nutrition, and individual capabilities also play significant roles. A combination of favorable racecourse characteristics and various other factors contributes to the overall faster race times observed in these countries. Although environmental factors like humidity and temperature can influence performance, this study did not include them in its analysis because of the unreliable and incomplete data for the analyzed events.

The fastest 50-km ultra-marathoners are in the age group 20–24 years

An unexpected finding was that the fastest 50-km ultra-marathoners were in the age group 20–24 years. Typically, the age range when the fastest ultra-marathon race times are achieved is around 35 years or older7. The average age for first-time ultra-marathoners has remained unchanged in recent decades71. Individuals participating in an ultra-marathon were approximately 36 years old and had prior experience competing in shorter distances for approximately seven years71. The average age for first-time ultra-marathoners has remained unchanged in recent decades71. Several studies have analyzed the age of the best ultra-marathon performance27,72,73,74,75, revealing that peak performance is generally achieved at an older age compared to the best performance in half-marathons and marathons76. For marathon racing, the best race time is typically achieved around the age of 3077,78, in ultra-marathons, the age of best performance has generally been observed to be around 35 years or older4,7,79,80, with the age of peak ultra-marathon performance seemingly increasing as race distance increases. In particular, in 50-km ultra-marathon running, the best performance age is usually around 39–40 years7. It is easy to justify this considering that the peak performance is near 30 years old81 and decline after 4082.

Furthermore, this finding might be explained in terms of the variation of participation by age group. It was observed that a much smaller number of runners was in the age group 20–24 years compared to the older age groups. This difference in participation might indicate that this age group might be considered as a relatively more ‘selective’ than the older and more ‘massive’ age groups.

In summary, our finding that the fastest 50-km ultra-marathoners were in the age group 20–24 years is unexpected. The analysis of our model shows that the average race speed decreases after the PDP peak for age group at 20–24 years continuously until approximately − 1.75 km/h for the age group 75+. This contradicts the general belief that peak performance in ultra-marathons is achieved at an older age. This suggests that younger runners may have an advantage in this race distance, and further research is needed to understand the factors contributing to this age group’s success.

Conclusion

In conclusion, this study provides valuable insights into the country of affiliation and performance of the fastest 50-km ultra-marathoners. Runners from Ethiopia, Lesotho, Malawi, and Kenya emerged as the top performers in this race format, benefiting from genetic predisposition, traditional diets, high-altitude living and training, and sociocultural background. The fastest mean race times, on the other hand, were observed in Europe (Luxembourg, Belarus, and Lithuania) and the Middle East (Qatar and Jordan), attributed to flat racecourses, well-developed infrastructures, and favorable conditions. A surprising finding was that the fastest ultra-marathoners in the 50-km distance were in the age group of 20–24 years, challenging the notion of peak performance in older age groups for ultra-marathons. Further research is needed to understand the underlying factors contributing to the success of younger runners in this specific race distance.