Introduction

The Ironman triathlon is highly popular, attracting a considerable number of participants who primarily consist of age group triathletes, including recreational and master triathletes1,2,3. An Ironman triathlon is a long-distance triathlon race that comprises a 2.4-mile (3.86 km) swim, a 112-mile (180.260 km) bicycle ride, and a 26.22-mile (42.195 km) run, equivalent to a full marathon. The race is completed in this sequence, covering a total of 140.6 miles (226.3 km) [www.Ironman.com/].

In the case of ‘Ironman Hawaii,’ which serves as the Ironman World Championships, there has been an increase in the participation of master triathletes, while the involvement of younger triathletes has declined2. Notably, over the years, master triathletes have displayed ongoing improvements in both split and overall race times, indicating that they have yet to reach their performance limits1,2,3,4.

For Ironman triathletes, it is important to know what influences their race performance4. Previous experiences such as the personal best time in an Ironman race4,5, in a marathon4,6,7, and in an Olympic distance triathlon4,5,7,8,9 have shown to be predictive for faster Ironman overall race times. Training also plays a significant role4,10,11,12,13 with both training volume3,8,9,13 and training intensity4,5,9,12,14 showing varied predictability. Interestingly, personal best times prove to be more predictive of performance than training volume7. Additionally, having a somatotype with low body fat has emerged as a strong predictor of Ironman race performance12,15,16,17. Furthermore, originating from the USA also seemed predictive18,19 and the best performance age is usually around 30–35 years18,20.

Several studies have aimed to identify the most predictive split discipline in an Ironman triathlon21,22,23. For professional Ironman triathletes, the bike split appears to be the most predictive split discipline22. However, when analyzing a larger dataset, cycling showed a limited predictability for overall race time, while running emerged as the most predictive split discipline22. This is most likely because the fastest Ironman triathletes were also the fastest in running21. Both cycling and running probably play crucial roles in determining performance in an Ironman race23. In addition, knowing the best pacing strategies to achieve a fast Ironman race time is also essentially24,25,26.

This study aimed to examine the racecourses of various Ironman races to determine which ones are the fastest. This information is particularly relevant for age group athletes aiming to qualify for the Ironman World Championship in ‘Ironman Hawaii’ and to attain a fast overall race time. At the moment, athletes have seven different options to qualify for ‘Ironman Hawaii’ with (1) standard qualification where age group athletes have to compete in one of the 47 Ironman races held globally trying to get slot in their age group based on the fastest age group race times, (2) competing in one of the rare Ironman 70.3 events offering a slot in an age group category, (3), achieving an extra women’s slot in one of the 17 Ironman events held globally, (4) the Ironman Legay Program where an athlete must complete 12 full distance Ironman triathlons and never have raced before at the Ironman World Championship, (5) the Ironman Foundation Annual Ironman World Championship Auction, and (6) the Physically Challenged Open/Exhibition Division Drawing, and (7) the Ironman Executive Challenge as last opportunity [www.Ironman.com/news_article/show/1241591].

By identifying the fastest racecourses, athletes can make informed decisions when selecting the most suitable event to improve their chances of qualification for Ironman Hawaii. Since originating from the USA18,19 seems to be predictive of achieving a fast Ironman race time, we hypothesized that a majority of Ironman age group triathletes would come from the USA and preferentially compete in Ironman races held within the USA. In case most of the Ironman age group triathletes would originate from the USA, we also might assume that the fastest Ironman age group triathletes would also be from the USA.

Methods

Data set and data preparation

All athlete data was downloaded from the official Ironman website (www.Ironman.com) using a Python script (www.python.org/). The triathletes’ gender, age, country of origin, event location and year, and times for swimming, running, cycling, and transitioning were thus obtained. We also considered environmental characteristics such as water temperature for swimming and air temperature for both cycling and running. Furthermore, characteristics of the swim course (e.g., swimming in a bay, lake, ocean, reservoir, or river) and both the cycling and running courses (e.g., flat, hilly, or rolling) were included. We considered recreational finishers’ race data of all age groups competing between 2002 and 2022 in all Ironman races recorded on the Ironman website (www.Ironman.com). From an original dataset of 684,656, including professional and age group Ironman triathletes’ records, the age group sub-sample consisted of 677,702 records after all the required data processing. We defined each successful finish of an athlete as a race record.

Statistical analysis

Summary tables and boxplot charts (by age group, event location, racecourse type, etc.) are computed by aggregating records and calculating the statistical parameters of the resulting groups. Descriptive statistics are presented using mean, standard deviation, frequencies, percentages, and min/max values. A two-way ANOVA was used to analyze the differences between countries, locations, and sexes. ML (machine learning) regression models were built, trained, tested, and compared using four different algorithms: Random Forest Regressor, XG Boost Regressor Cat Boot Regressor, and Decision Tree Regressor. These algorithms are popular examples of ML tree-based models. Although mostly applicable to tabular data, they are also widely used in image-related tasks. The Decision Tree is the most basic of the four algorithms. It implements a single large tree, splitting the data into smaller sets by placing conditions on the predicting variables, always seeking to minimize the entropy level. The Random Tree Forest implements several independent decision trees and then averages their outputs to give a final result that is often more accurate than any of the individual trees. CatBoost and XGBoost function similarly to the latter in building several trees. However, it uses the gradient boosting technique during the model training, where each new tree uses the knowledge from earlier ones. In all cases, the models’ target (dependent or predicted variable) is the race Finish Time. The following (independent) variables are used as predictors:

  • numerical variables:

    • Gender

    • AgeGroup

    • Country

    • EventLocation

    • Water temperature (°C)

    • Air temperature (°C)

  • categorical variables

    • Swim (‘bay’, ‘lake’, ‘ocean’, ‘reservoir’, ‘river’)

    • Bike (‘flat’, ‘hilly’, ‘rolling’)

    • Run (‘flat’, ‘hilly’, ‘rolling’)

Some variables were originally categorical but were encoded as numerical: Gender was encoded as 0 = women, 1 = men, and the age groups were encoded by taking the first two digits of each age group (so “18–24” becomes 18, “25–29” becomes 25, etc.). To encode the country variable, the full set of race records was aggregated by country, and then the resulting set was sorted by number of records (frequency) in descending order. The index of the resulting list (starting at zero) was then used to numerically encode the country names. So, for instance, the USA with ID 0 is at the top of the list. An equivalent process was followed to encode the event location predictor. The fastest racecourses are defined as those race locations where the average finish time is the lowest. Similarly, the fastest countries have the best (lowest) average race times. The two most basic models (the single Decision Tree and the Random Forest) use only the numerical predictors, while Cat Boost and XG Boost use all numerical and categorical variables. We used the full dataset to train and evaluate the models (in-sample training), which, while giving us no hint of our models’ generalization capabilities, allows us to obtain the maximum knowledge from the data available. We calculated the mean absolute error (MAE) and the coefficient of determination (R2) as the accuracy metrics. The MAE represents the mean of the absolute values of the individual prediction errors over all instances in the test set, in which higher values mean higher prediction errors. R2 represents a measure of the model “goodness of fit”. The models’ feature relative importance was also calculated, representing a score for each feature in a specific model. Higher values correspond to features that are of higher importance in predicting the target variable. Due to their mathematical complexity, ML models are often used as black boxes. Their suitability can only be assessed by the accuracy of the results with existing and new samples. To interpret the models’ logic and draw conclusions on the impact of each predictor on the target variable, we used model explainability tools such as the SHAP library. The aggregated SHAP values indicate which predictor is most important for that model and how each predictor influences the model output. A two-way ANOVA was used to analyze the differences between women and men regarding race course characteristics. All data processing and analysis were done using Python (www.python.org/) and related libraries in a Google Colab notebook (https://colab.research.google.com/).

Results

A total of 677,702 Ironman age group finishers´ records (544,963 from men and 132,739 from women) participating in 444 different Ironman events across 66 locations between 2002 and 2022 were analyzed. Men achieved an average race time of 12.64 ± 1.78 h, and women an average race time of 13.53 ± 1.69 h (Fig. 1).

Fig. 1
figure 1

Distributions of overall race times of women and men Ironman age group triathletes.

Table 1 summarizes the number of athletes by age group with their overall race times (mean, SD, min, and max). The variables age group, gender, and their combined effect have a statistically significant effect on overall race time (p < 0.001). The fastest men were in age group 30–34 years, the fastest women in age group 25–29 years.

Table 1 Number of athletes by age group with their race times (mean, SD, min, and max) in hours.

Most of the successful age group triathletes originated from the USA (274,553), followed by triathletes from the United Kingdom (55,410) and Canada (38,264) (Fig. 2).

Fig. 2
figure 2

Top 25 countries by number of race records, and average race time by country.

Most of the triathletes competed in Ironman Wisconsin (38,545), followed by Ironman Florida (38,157) and Ironman Lake Placid (34,341) (Fig. 3).

Fig. 3
figure 3

Top 25 Ironman event locations by number of race records, and average race time (in hours) by location.

Table 2 gives an overview of the top 25 Ironman event locations by participation (number of unique records), including the number of races, the number of recorded race times, the number of athletes, the overall race times (mean, SD, min, max), the descriptions of the courses for swimming, cycling, and running, and the water and air temperatures. The fastest overall race times were achieved in Ironman Copenhagen (11.68 ± 1.38 h), followed by Ironman Hawaii (11.72 ± 1.86 h), Ironman Barcelona (11.78 ± 1.43 h), Ironman Florianópolis (11.80 ± 1.52 h), Ironman Frankfurt (12.03 ± 1.38 h) and Ironman Kalmar (12.08 ± 1.47 h) to list the top six races. N-way ANOVA for country and event location indicated statistically significant differences (p < 0.001). Table 3 gives an overview of the top 25 countries by participation (number of unique records), including the number of races, the number of recorded race times, the number of athletes, and the overall race times (mean, SD, min, max). The fastest athletes originated from Belgium (11.48 ± 1.47 h), followed by athletes from Denmark (11.59 ± 1.40 h), Switzerland (11.62 ± 1.49 h), Austria (11.68 ± 1.50), Finland (11.68 ± 1.40 h) and Germany (11.74 ± 15.1 h) to name the best six nations. N-way ANOVA for country and event location indicated statistically significant differences (p < 0.001).

Table 2 Overview of the top 25 Ironman event locations by participation (number of unique records).
Table 3 Overview of the top 25 countries by participation (number of records).

With respect to the ML models, Table 4 summarizes the evaluation results, with all performing very similarly.

Table 4 Summary of ML models set up and performance.

Figure 4 shows the feature relative importance for the first predictive model (Random Forest Regressor). The athlete’s country was the most important variable (0.46), followed by the athlete’s age group (0.18), event location (0.13), gender (0.09), air temperature (0.08), and water temperature (0.06).

Fig. 4
figure 4

Random Forest Regressor features relative importance.

Figure 5 shows the features’ relative importance for the second predictive model (XG Boost Regressor). Gender (0.18) was the most important variable, followed by the split discipline running (0.16), the split discipline cycling (0.14), athlete’s country (0.13), athlete’s age group (0.08), air temperature (0.08), the split discipline swimming (0.08), event location (0.07), and water temperature (0.07).

Fig. 5
figure 5

XG Boost Regressor features relative importances.

Figure 6 shows the features’ relative importance for the third predictive model (CatBoost Regressor). The athlete’s country (0.38) was the most predictive variable, followed by the athlete's age group (0.15), event location (0.11), gender (0.09), air temperature (0.09), water temperature (0.06), and split disciplines cycling (0.05), swimming (0.04), and running (0.04).

Fig. 6
figure 6

CatBoost Regressor features relative importances.

Figure 7 shows the set of features’ importances for the fourth and last model (Decision Tree Regressor). The athlete’s country (0.47) was the most predictive variable, followed by the athlete's age group (0.18), event location (0.12), gender (0.10), air temperature (0.09), and water temperature (0.05).

Fig. 7
figure 7

Decision Tree Regressor features relative importances.

Figure 8 shows the swim times by the type of the swim course by gender. Differences between genders and race conditions were statistically significant.

Fig. 8
figure 8

Swimming times by the type of swimming course.

Figure 9 shows the cycling times by the type of cycling course by gender. Differences between genders and race conditions were statistically significant.

Fig. 9
figure 9

Cycling times by the type of cycling course.

Figure 10 shows the running times by the type of the running course by gender. Differences between genders and race conditions were statistically significant.

Fig. 10
figure 10

Running times by the type of running course.

Discussion

This study aimed to investigate the locations of the fastest Ironman racecourses globally, focusing on age group triathletes participating in all Ironman races. The hypothesis was that the USA would host the fastest Ironman racecourses, given the considerable number of Ironman age group triathletes from the USA who actively compete in races held within the USA. The most important findings were (i) a majority of successful Ironman age group triathletes originated from the USA, followed by athletes from the United Kingdom and Canada, where these countries exhibited average overall race times that were significantly slower compared to the fastest countries, (ii) most of the age group triathletes completed Ironman races held in the USA such as Ironman Wisconsin, Ironman Florida and Ironman Lake Placid, (iii) the fastest Ironman race times were achieved by athletes aged 35 years or younger (fastest men in age group 30–34 years, fastest women in age group 25–29 years), (iv) the fastest age group Ironman triathletes originated from Belgium, Denmark, Switzerland, Austria, Finland and Germany, and (iv) the fastest overall race times were recorded in the Ironman races held in Copenhagen, Hawaii, Barcelona, Florianópolis, Frankfurt and Kalmar. Further important findings were that three of the four predictive models identified the country of origin and the age group of the athletes as the most important predictors. Regarding environmental conditions such as weather (i.e. water and air temperatures) and course characteristics, these variables showed the lowest influence on performance compared to the other variables. Flat cycling and flat running courses were associated with faster overall race times.

Most successful Ironman finishers originated from the USA

Our hypothesis was confirmed by the initial significant finding, which indicated that the most successful age group of Ironman triathletes originated from the USA, followed by athletes from the United Kingdom and from Canada. Moreover, our hypothesis was further confirmed as most age group triathletes completed Ironman races held in the USA. These outcomes can be explained by the distribution of Ironman race locations, with the first five races being situated in the USA (Ironman Wisconsin, Ironman Florida, Ironman Lake Placid, Ironman Arizona, and Ironman Hawaii), followed by the Ironman races held in Austria and France. A study investigating the origin and age group of the fastest Ironman age group triathletes competing in Ironman Hawaii between 2003 and 2019 showed that North American athletes were the most performant and the most frequent participants in Ironman Hawaii27.

The annual schedule of the Ironman circuit includes numerous races (www.Ironman.com/races). Examining the North American region for 2023 (www.Ironman.com/im-north-america), one Ironman race is scheduled in Mexico, two in Canada, and 12 races are set to take place in the USA. In contrast, the European race calendar for 2023 (www.Ironman.com/im-europe) reveals a larger number of 20 planned Ironman races. While one might assume that the popularity of the Ironman triathlon is higher in North America, given its origin as a US–American invention and the larger number of athletes competing there, it is noteworthy that fewer Ironman races are offered in North America compared to Europe (www.Ironman.com/races). Hence, the geographical location of an Ironman race within a country or a continent may significantly influence participation26,27,28. A study analyzing the participation and performance trends in Ironman Switzerland from 1995 to 2011 showed that 90% of the triathletes originated from Europe, with 31.9% from Switzerland and 18.9% from Germany28.

Additionally, the importance of a race might also influence the participation of the athletes. In the Powerman Duathlon World Championship held in Switzerland from 2002 to 2011, most of the finishers were from Switzerland, followed by participants from European countries (i.e., Germany, France, Italy, Belgium, Spain, Great Britain, the Netherlands, and Denmark)29. Analyzing Ironman Hawaii as the Ironman World Championship and its qualifying races, it was observed that American triathletes dominated both participation and performance in both Ironman Hawaii and its qualifiers30. An analysis of 39,706 finishers from 124 countries who competed in Ironman Hawaii between 1985 and 2012 showed that most finishers originated from the USA, followed by triathletes from Germany, Japan, Australia, Canada, Switzerland, France, Great Britain, New Zealand, and Austria31. Although fewer races are offered in North America than in Europe, more US–Americans compete in Ironman races. Future studies should explore potential explanations for this difference in comparing the motivations of North American and European Ironman triathletes.

The fastest Ironman race courses are in Europe

A further important finding was that the fastest Ironman race times were recorded in the Ironman race held in Copenhagen, Hawaii, Barcelona, Florianópolis, Frankfurt and Kalmar. One might expect that the fastest race times would be achieved in US–American races since most athletes were from the USA. However, it is noteworthy that despite the USA having the largest group of Ironman age group triathletes, they did not produce the fastest. Surprisingly, the second group (United Kingdom) and the third group (Canada) of triathletes also did not yield the fastest participants in terms of race performance, which was unexpected given their participation rates. The observation that triathletes originating from the USA were not the fastest is based on the average race speed. This should be considered a limitation of this study since a high distribution of lower-performing athletes will result in a lower average value. Ironman Hawaii is the only race among the fastest races outside of Europe and located in the USA. This is simply explained by the fact that Ironman Hawaii is the Ironman World Championship.

The fastest Ironman triathletes are from Europe

We found that the fastest Ironman age group triathletes were from Europe, especially from Belgium, Denmark, Switzerland, Austria, Finland, and Germany. A study investigating 302,535 Ironman triathletes competing between 2002 and 2015 in 253 different Ironman races explored the impact of nationality on pacing30. The findings showed that Germans (both women and men), had the fastest performance, closely followed by Australian, Austrian, and Brazilian triathletes. US–Americans did not rank among the fastest nationalities30.

The dominance of a particular country in a specific sports discipline can be attributed to a combination of environmental and individual factors31. However, the available evidence mainly pertains to long-distance running events, where factors such as altitude, lifestyle, natural surroundings, historical background, and genetic characteristics have been linked to the emergence of athletes32,33. Similarly, in the context of running events, there is a need for future studies to explore the environmental factors associated with athletes’ place of origin, including aspects such as social, training culture, socio-economic aspects, and political support. These studies should also consider an analysis of lower levels, considering cities of origin.

The best Ironman race times were achieved at the age 35 years or younger

A further important finding was that the fastest Ironman race times were achieved by men triathletes aged 45 years or younger. Previous studies have shown that the peak performance age for the Ironman triathlon is around 32–33 years for both women and men34. However, over the years, the age of peak performance in elite women and men triathletes has shifted to approximately 34–35 years7,35,36,37. The variations observed in the studies can be attributed to differences in the time periods analyzed and the specific age groups considered. As a result, considering the growing popularity of Ironman events among non-professional athletes and the increasing commitment of participants38, individuals aiming to qualify for ‘Ironman Hawaii’ should consider getting involved in the sport at a younger age18.

The present study's findings align with an analysis of Olympic track and field data, which demonstrated that the age of peak performance increases with the distance of the foot race, with women generally achieving peak performance at younger ages39. An analysis of the World Championships or Olympics triathlon between 2008 and 2012 found that the age of peak total performance was at ~ 28 years40. Considering the mode of exercise and duration of endurance events, the age of peak performance increases with the event duration, ranging from ~ 20 years (swimming, ~ 2–15 min) to ~ 39 years (ultra-distance cycling, ~ 27–29 h) with minimal difference between men and women41.

The aspect of environmental characteristics

The algorithms showed that the origin of the athlete was the most predictive variable whereas environmental characteristics showed the lowest influence on overall race time. Little is known regarding the influence of environmental conditions such as ambient temperature in Ironman triathlon. In Ironman Hawaii, body core temperature increased during the race and correlated negatively with the position in the age group42. We also found that flat cycling and flat running courses were associated with faster overall race times. A case study investigated the pacing strategy of a female winner regarding elevation changes. The authors found that velocity varied with changes in elevation, but the athlete minimized fluctuations in heart rate and watts43. Furthermore, Ironman triathletes maintaining faster relative speed in the downhill segments were more successful regarding their estimated final race time44. Future studies need to investigate more deeply the influence of environmental conditions on Ironman race performance.

Limitations, strength, practical applications, and implications for future research

A limitation of this study is that it focuses specifically on the distinguishing characteristics of the Ironman triathlon. Therefore, caution should be exercised when generalizing the findings to other triathlon formats, such as the sprint triathlon, Olympic distance triathlon, and Half-Ironman Ironman 70.3, as these formats have shorter durations. Another limitation to consider is the nature of using averages as seen in the results for the origin of the fastest age group athletes. As seen for the athletes originating from the USA, a high distribution of lower-performing athletes will result in a lower average value. The R2 of the best model shows that our models are limited. For future studies, more variables (e.g. altitude, humidity, experience of the athletes, training regimes) should be included. Unmeasured confounding variables might also have an influence on race times. The ML models used are limited in terms of their interpretability and potential biases. A further limitation is cross-validation and testing on a separate validation set was not performed. This could ensure the generalizability of the results. The strength of the study was in its innovative data analysis approach, which incorporates various machine learning regression models. A further strength was the inclusion of environmental characteristics, which showed that these variables have no major influence on overall race performance. These findings provide valuable insights to age group triathletes and coaches. It highlights the importance of race location, suggesting that participating in Ironman races held in the USA may offer better performance opportunities and, by extension, a better chance to qualify for the Ironman World Championship. For governmental bodies involved in sports and event planning these findings indicate the significance of race location and its impact on participation and performance trends. Government agencies that promote sports and organize events can use this information to make informed decisions about hosting Ironman races. Researchers and academics can use these findings to further investigate the factors influencing performance in Ironman triathlons. They can delve deeper into the environmental and individual characteristics that contribute to the success of athletes from specific countries. Additionally, researchers can explore motivations and psychological factors that differ between North American and European triathletes. However, future studies need to investigate more deeply the influence of environmental conditions on Ironman race performance.

Conclusions

The origin of the athlete was the most predictive variable whereas environmental characteristics showed the lowest influence. Flat cycling and flat running courses were associated with faster overall race times. Most successful Ironman age group triathletes originating from the USA, the United Kingdom, and Canada, but the fastest athletes originated from European countries such as Belgium, Denmark, Switzerland, Austria, Finland, and Germany (11.74 ± 15.1 h). The fastest overall race times were achieved in Ironman Copenhagen, Ironman Hawaii, Ironman Barcelona, Ironman Florianópolis, Ironman Frankfurt and Ironman Kalmar. For any Ironman age group triathlete aiming to achieve a fast Ironman race time and to qualify for Ironman Hawaii, it is advisable to consider participating in an Ironman race held in Europe, preferably before reaching the age of 35 years.