Introduction

It is important to be able to estimate one’s own cognitive abilities accurately. If we underestimate our capacity for a task, we may preclude ourselves from beneficial outcomes. On the other hand, overestimating our abilities can be dangerous if we are then unable to cope with the required challenges. One such challenge all humans face is spatial navigation. There is evidence that self-reported navigation ability is related to objective measures of this ability1,2,3,4. A number of studies have shown this relationship using standardised measures of self-reported spatial abilities, such as Santa Barbara Sense of Direction Scale5 or Wayfinding Self-Efficacy Questionnaire6. The relation holds both on an individual level and on a group level: men tend to perform better on navigation tasks7,8,9,10, and, as a group, also to self-rate themselves as better navigators11. Beyond the topic of navigation, previous studies have found that positive self-estimates of ability are associated with better outcomes in tasks such as motor learning12, 13, game-based learning environments14, science assessments15, and overall academic success16. These self-estimates are often viewed from the standpoint of self-efficacy: how estimates go on to predict outcomes17.

While self-estimates are related to navigation ability, the relationship between the two is complicated by additional evidence that some groups of people may over- or under-estimate their navigation performance. These findings do not deny the well-established relation between self-reports and navigation ability. Rather, what they indicate is that there must be other factors that influence self-reports of navigation ability besides navigation ability itself. Given the dynamic and multi-dimensional nature of the relationship between self-reports and navigation ability, it is likely that between-subject factors (e.g. gender, education, age) and cross-cultural differences (e.g. gender stereotypes, country-specific education, social policies, etc.) affect both self-estimates and performance in wayfinding tasks18, 19. This complexity of intersecting factors has meant that prior efforts in exploring this relationship have been necessarily limited in scale, with only recently, large scale studies appearing (e.g. based on the Sea Hero Quest mobile game application18, 19) that enable more comprehensive, population-level assessments of navigation and influencing factors.

Cross-cultural effects are likely to play a role in the efficacy of self-estimates. For instance, in some cultures, it may be seen as a positive to present oneself as competent at tasks, while in others it may be important to be self-effacing8, 11, 19,20,21,22,23,24,25,26,27. Existing research also indicates that gender and age factors are important elements influencing the reliability of self-estimates. Previous studies have reported that older adults and men tend to overestimate their navigation abilities22, 28,29,30. This overestimation reported by older adults is often associated with a possible lack of awareness of their cognitive decline, which becomes especially noticeable in laboratory-based settings, navigation tasks implemented in unfamiliar places or using modern virtual reality (VR) software31,32,33. Other studies have found that females tend to rate their spatial skills as lower compared to men even when there is no objective difference in performance29, 34,35,36,37,38. To explain this gender disparity, Cross et al.35 attributed female lower spatial self-evaluation to women’s higher susceptibility to social influences and their greater dependency on social information. As spatial skills have been traditionally regarded as more masculine qualities39, females are more likely to conform to this gender-stereotyped expectation of their own abilities and consequently, they tend to respond more modestly to self-estimate questions in spatial, navigating or wayfinding tasks. The awareness of these negative stereotypes of women’s abilities may in turn influence female performance on these tasks. Nori and Piccardi34 reported however that in some cases in which the overall population of individuals claims high levels of self-efficacy on a typically masculine task, gender differences disappear if women share the general belief of high competency on the specific task (e.g. in studies with Italian Air Force pilots40, 41). This positive self-perception may be caused by more assumed ‘masculine cognitive characteristics’ related to analytical, rationale, and mathematical thinking42 of women in such populations, which appears to be predictive of their better overall wayfinding ability43.

We hypothesise that the impact of cultural norms on self-estimates in women also applies to larger cross-national samples where self-estimates are likely to be affected by country-specific stereotypic gender beliefs, societal norms and cultural variation. In support of this claim, Oettingen44 argued that common value systems to which individuals living in different countries or geographical regions are exposed during their lifetime can contribute to their self-evaluation on a particular task. When characterising the role of these correlates, she pointed to six cultural dimensions (power distance, individualism, masculinity, uncertainty avoidance, long-term orientation and indulgence) formulated by Geert Hofstede45, 46 as the theoretical framework for understanding between- and within-countries variations in self-reported abilities and their effects on task performance. Oettingen’s theory has been applied in studies explaining the effects of selected cultural determinants of self-beliefs such as individualism vs. collectivism dimension in various fields e.g. education and business47, 48, and recently cultural factors were found to be a significant predictor of the academic attributional style in a study of British and Turkish students49, maths self-evaluation, anxiety and performance on maths tests across 41 countries50, maths competence across 34 countries51, intelligence and differences in self-estimates between US and Nicaraguan children and adolescents52, as well as academic achievement, learning styles, and teacher and student self-beliefs53, 54. Most of the referenced studies agree that self-estimates are predictive of performance across different countries, but that cultural dimensions and various other socio-demographic factors may additionally influence how self-evaluation influences individual or group behaviours on different types of tasks.

Cross-cultural and between-countries analyses in the field of human navigation are rather scarce55. The ones that exist often suffer from small sample sizes and limited geographical coverage. And while there are some studies that investigate general spatial abilities and cognition, these do not include route and wayfinding tasks. Because of the limitations in existing studies, the question remains of how much of the differences in self-reports of navigation ability are due to actual ability and how much they are due to other factors such as age, gender and cross-cultural differences. This question is difficult to answer because it is challenging to test many people across diverse cultures using traditional methods.

Here, we overcome this challenge by testing over 4.3 million individuals with our navigation task embedded in the videogame app Sea Hero Quest. Using Sea Hero Quest, we are able to explore the impact of quantified cultural dimensions on the gap between wayfinding performance and self-estimated navigation ability across 46 countries.

Results

Wayfinding performance was measured using the Sea Hero Quest mobile app19. This test of navigation is embedded in a video game in which the players steer a virtual boat through nautical environments seeking mystical sea creatures (see Fig. 1). Position was tracked during navigation and the performance calculated from the distance travelled (see the “Methods” section for details). Self-estimates were collected along with other demographics in the app via a question that asked if players judged their ability to navigate was: very good, good, bad, or very bad.

Figure 1
figure 1

Navigation task Sea Hero Quest. (A) View during active navigation (level 11). (B) Map viewed before active navigation of level 11, indicating start position (triangle) and checkpoints (numbered circles).

Age, gender, self-estimates and wayfinding performance

Following the data processing and sampling operations described in the “Methods” section, the resulting sample size used to explore the relationships between age, gender, self-estimates of navigating skills and wayfinding performance included 383,187 individuals (from 46 countries) who completed six wayfinding levels of the Sea Hero Quest game: 3, 6, 7, 8, 11 and 12 (NFemales = 172,969, MeanAgeFemales = 37.08 ± 14.53, NMales = 210,218, MeanAgeMales = 37.14 ± 13.39; for more details, see Supplementary Information, Appendix A, Table S1).

The ordinal logistic regression analysis (with self-estimates nominal responses as the dependent variable; gender and age bands as the independent variables) revealed that males were significantly more likely to report good navigating skills than females (LR \({\chi }_{1}^{2}\) = 21,895.2, p < 0.001). Based on the model (Supplementary Information, Appendix A, Table S2), men were almost two times more likely than females to rate themselves as very good navigators and approximately half as likely as women to rate their navigating skills as either very bad or bad across all age bands (Fig. 2A). Age was also found to be a statistically significant positive predictor of navigating self-estimates (LR \({\chi }_{4}^{2}\) = 1299.5, p < 0.001). The best self-reported navigation ability in the sample was reported by 40–59-year-old males (Fig. 2B), however older males (between 60 and 70 years old) rated their navigating skills more favourably than the youngest men in the study (19–29-year-olds) despite their much poorer actual wayfinding performance when compared to all younger age bands of the male sample (Fig. 2C). This finding supports the results of earlier studies22, 28 which found similar overestimation of navigating abilities amongst oldest male participants. With our larger sample we can now reveal that at each age band there is a preserved ordering of self-ratings in relation to performance, e.g. people who self-rate as very good are the best navigators (Fig. 2C). This provides evidence that when people self-rate they may do so in relation to their peers, both with regard to people of the same age and people of the same gender.

Figure 2
figure 2

Self-estimated navigation skills and wayfinding performance by gender and age. (A) Probabilities of reporting specific navigation self-estimates by females and males across five age bands represented in the sample. (B) Average self-estimates of navigation skills (with standard error bars) for males and females across five age bands. (C) Wayfinding performance (with standard error bars) of both female and male players for six SHQ wayfinding tasks (Levels 3, 6, 7, 8, 11, and 12) across five age bands.

We used a hierarchical multivariate linear regression analysis (Supplementary Information, Appendix A, Tables S3 and S4) to establish whether the reported ratings of navigation skills could predict wayfinding performance as the dependent variable. A model with self-ratings included captures more of the variance than a model without them (F3 = 536.73, p < 0.001) with self-rated ability the fourth strongest predictor of wayfinding performance amongst all independent variables used in the model (F3, 370561 = 536.73, p < 0.001, \({\eta }^{2}\) = 0.004), behind age (F1, 370561 = 55,125.38, p < 0.001, \({\eta }^{2}\) = 0.123), gender (F1, 370561 = 19,685.08, p < 0.001, \({\eta }^{2}\) = 0.044) and home environment (F2, 370561 = 587.35, p < 0.001, \({\eta }^{2}\) = 0.003), but before education (F3, 370561 = 361.11, p < 0.001, \({\eta }^{2}\) = 0.002) and commute time (F2, 370561 = 61.27, p < 0.001, \({\eta }^{2}\) = 0.0003). Independent variables in both models were tested for multicollinearity with the variance inflation factor and generalized variance inflation factor—the latter was used as some terms (e.g. categorical predictors) had degrees of freedom greater than 1. Both metrics for each predictor were close to value 1 suggesting that the results of the hierarchical regression models were not due to high correlations between each predictor and the remaining independent variables.

Cultural determinants of self-estimates and wayfinding performance

To explore how cultural differences might relate to self-ratings, we aggregated the data from the 46 countries sampled into the 11 cultural clusters derived by Ronen and Shenkar56 (Fig. 3A; also see “Methods” for details): Germanic, Eastern European, Nordic, Far East, Near East, Confucian Asia, Latin Europe, Eastern Europe, Anglo, Arabic and African. The highest proportions of population with either good or very good self-estimates of their navigating skills were recorded in Germanic, Eastern European and Latin American cultural clusters (96%, 90% and 88% of their respective samples), whereas groups of countries with the most modest beliefs about their navigating skills were Latin Europe, Nordic and Confucian Asia (85%, 84%, and 81%, respectively; proportions of self-reported navigation skills by country and cultural cluster available in the Supplementary Information, Appendix A, Tables S5 and S6). However, more positive self-estimates did not translate to better wayfinding performance at the national level; there was no significant correlation between these two metrics (Fig. 3H). Notably, we replicated our prior finding that GDP correlates with navigation performance19 (here focusing on wayfinding in 46 countries) Spearman’s rho = 0.572, p < 0.001. By contrast self-estimates do not correlate with GDP (Spearman’s rho = 0.11, p = 0.48). Thus, it appears that increased economic wealth in a country is associated with better performance, but not better self-estimates of performance.

Figure 3
figure 3figure 3

Self-estimated navigation skills vs. wayfinding performance across countries and cultural clusters. (A) Global map of cultural clusters as defined by Ronen and Shenkar56. Map generated with a custom code in the R programming language version 4.2.1 https://cran.r-project.org/. Conditional modes (controlled for age and gender) of self-estimated navigation skills for cultural clusters (B) and countries (E) with indicated standard errors. Conditional modes (controlled for age and gender) of measured wayfinding performance for cultural clusters (C) and countries (F) with indicated standard error bars. Self-estimated ability vs. wayfinding performance gap by cultural cluster (D) and country (G) estimated by calculating the difference between min–max normalised conditional modes of navigation skills and wayfinding performance (range [− 1,1] where − 1 denotes maximum possible underestimation, whereas 1 indicates maximum overestimation). (H) National self-estimated navigation ability and the wayfinding performance are not significantly correlated (Spearman's rho = − 0.2, p = 0.181). See Supplementary Information, Appendix A, Fig. S1 for results split by gender.

We next explored the gap between a country's self-rated performance and actual performance. We found that similar to country clusters of self-ratings, country clusters also showed tendencies to be over or under confident as a group. Near Eastern and Germanic countries were most likely to overestimate their ability and Nordic countries most likely to underestimate their ability (Fig. 3). Examining the geographic spread of the data further highlights the tendency for neighbouring countries and countries within cultural groups to be more similar (Fig. 4). We found a similar pattern for both men and women when the linear models were fit to men and women separately (Supplementary Information, Appendix A, Fig. S1). Thus, overestimation of performance in some countries is not due to, for example, predominantly the men overestimating their performance and female participants accurately estimating them.

Figure 4
figure 4

Global representation of self-estimated navigation ability, measured wayfinding performance and the self-estimate-performance gap. All maps generated with a custom code written in the R programming language version 4.2.1 https://cran.r-project.org/. (A) Self-estimated navigation skills by country; players from countries in darker red colour reported higher self-estimates. (B) Wayfinding performance measured during the navigation tasks; countries filled with darker red colour performed better. (C) Self-estimated navigation skills vs. measured wayfinding performance gap; players from countries filled with light pink to dark red colours overestimated their navigation skills, whereas countries in darker blue largely underestimated their navigation ability. Countries with no data filled with light grey.

Based on past literature42, 43, we considered that one reason some countries might overestimate ability is that they have a stronger association with male typical stereotypes, with navigation one of these stereotypes. If a country feels it is important to be good at male associated activities, it is possible that nationals of that country would be more confident in these abilities. In support of this we found that greater masculinity ratings for a country was associated with greater overconfidence of its population (Spearman’s rho = 0.43, p = 0.004; Fig. 5). We also found a similar association with purely self-estimate ratings (Spearman’s rho = 0.40, p = 0.008). To explore more broadly the cultural dimensions that might impact the gap we examine the full set of 6 metrics developed by Hofstede45, 46 (Supplemental Information, Appendix B, Table S7). We found that masculinity was the strongest predictor of the gap and survived Bonferroni correction for significance in this model. We also found that Hofstede's cultural dimension of uncertainty avoidance was also a predictor of the gap in this statistical threshold corrected model. Nations who are more keen to avoid uncertain situations were more likely to overestimate navigation ability (see Supplemental Information, Appendix B).

Figure 5
figure 5

Relationship between the self-estimate-performance gap and Hofstede's masculinity metric (Spearman’s rho = 0.43, p = 0.004).

Because we found differences between men and women in self-ratings, we explored how national patterns differed for men and women. We found relatively consistent patterns for both men and women across the countries nationwide. In countries with high self-ratings, both women and men rated themselves similarly as high, and in countries with low self-ratings, similar low self-estimates were present for both gender groups. Thus, the patterns observed at the national level do not appear to be strongly influenced by gender, but rather by overall cultural attitudes.

Having previously reported that the navigation performance gap between men and women was correlated with the gender gap index19, we also explored the self-ratings–performance gap in relation to such a metric. We found there was significant positive correlation between the self-rating/performance gap and the gender inequality index of a country (Spearman’s rho = 0.40, p = 0.007). Thus, the more unequal a country in terms of gender gap, the greater the overconfidence.

Discussion

Here we explored variation across nations in terms of their population’s self-estimates at navigating, their ability to perform a virtual navigation task and the gap between these two measures. Extending in more depth than prior studies, we found that older men were more likely to overestimate their ability. We now report that, while across the global sample performance across all participants was related to pooled self-estimates of navigation ability, at the national level a nation’s self-estimates and performance were not correlated. We found that participants in Germanic countries were much more likely to report high self-estimates than other countries, whereas Confucian Asian countries reported the lowest self-estimates. Near Eastern countries showed the largest over-estimation in their abilities, whereas Nordic countries showed the greatest under-estimation in their abilities. We found that across nations the gap between self-estimates and performance was associated with the extent of affirmation for gender roles, the greater the association with masculinity in a country, the more likely citizens of that country were to overestimate performance. We discuss how these findings advance understanding of how self-estimates of ability relate to actual performance for a core human ability: spatial navigation.

Consistent with prior research, we found that across the entire sample male participants were twice as likely as females to rate themselves as very good navigators5, 34,35,36,37,38. Men also displayed a much lower tendency to report their navigating skills as either bad or very bad. Middle-aged and older participants generally held more favourable views about their navigating abilities than younger adults and this effect was particularly strong in the male group. For example, the oldest males in the sample (i.e. 60–70-year-olds) rated their wayfinding skills higher than the youngest 19–29-year-old men despite their performance being much worse than all younger males including the youngest age group. This finding is congruent with the research from van der Ham et al.29, 30 who reported that overestimation of spatial navigation abilities increases with age, with this more common for males than females. It also validates the earlier research by Taillade et al.22, 28 who found that older individuals rated their spatial abilities more positively than younger individuals, but they also displayed poor accuracy in their self-estimates due to age-related decline in performance.

We also replicate our past finding that the GDP of countries correlates with the performance of these countries19, and now extend this to reveal that GDP is not correlated with self-ratings. Thus, while the economic wealth of a country (raising education, health care and access to travel) appears to enhance cognition, it does not relate to the way in which a population will self-reflect on their abilities.

Our results provide the largest cross-cultural study of the difference between self-reported navigation skill and ability. While across the whole world population higher self-estimates are associated with better performance, we found no correlation between a nation's average performance and a nation's self-estimated ability. Germanic countries were found to self-rate their abilities much more positively than other countries, with other cultural clusters (such as Nordic nations) showing similar overall ratings scores for self-estimates. If self-ratings for navigation are indicative for self-ratings in general, it would suggest Germanic cultures value taking a positive attitude to self-achievement at tasks, while Nordic nations value modesty in considering one’s abilities. We found that countries from the Confucian Asia and the Far East were striking examples of underestimation. Despite reporting very modest ratings of their navigating abilities, they performed well in comparison to other cultural clusters. On the other hand, countries representing the Germanic cluster were more susceptible to overestimation: they performed relatively poorly while expressing very positive beliefs about their spatial navigation abilities. We hypothesised that, due to past evidence for gender bias in self-estimates studies of navigation40, 41, that the attitude towards masculinity in the population might manifest in bias in the self-estimates—with high masculinity associated with higher self-estimates. We found that both the population self-estimates and the over/under-estimation in performance were associated with population attitudes to masculinity. This finding supports Oettingen’s44 theory that a domain-specific sense of competence, in this case wayfinding, is likely to vary between countries and cultures based on the shared value systems which an individual has experienced. We show here this extends not only to a self-reported estimate, but to over- and under-estimation of performance. Notably, we found little evidence that such relationships with masculinity were different for men and women. Thus, generally, it is not the self-estimates of men or the self-estimates of women within a country (or their gap to performance) that underlie the association with masculinity, but collective populations.

Exploring beyond masculinity as a metric we examined other metrics collected by Hofstede45, 46 in a model with corrected thresholds and found that uncertainty avoidance was also significantly correlated with overestimation of abilities, albeit less so than masculinity. This indicates that people in countries where certainty is valued have a tendency to consider themselves good navigators, whether or not they, as a group, are. It is currently unclear why this might be and future research will be useful to explore this.

A limitation to the current research is the treatment of culture at the nation-state level. Within most nations, different cultures exist, and for many nations, different languages are spoken by sub-populations. Prior research has shown that language, attitudes to childhood, and the environment occupied by sub-groups of a nation can have an impact on spatial cognition18, 57,58,59,60,61, 61,62,63,64,65. Thus, it seems likely that both the language and the environment of relevant sub-groups will affect the relationship between self-ratings and navigation ability at a population level. One further challenge is that different cultural groups will differ in access to technology and smart-phones, and our current results are limited to a self-select group of participants who have access to such technology. However, our current results are unlikely to be mediated by such effects since Germanic and Nordic countries both have good access to technology and occupy different ends of the self-rating scale for nations. Future research involving more traditional cultures will require careful consideration regarding the use of technology in the assessment of cognition. Another limitation of our method was that we only used 1 question with 4 levels to probe self-rated navigation ability, rather than asking a set of questions with known reliability, such as the Santa Barbara Sense of Direction Scale5. However, in the design of the game and setting out to provide 17 different languages, we opted to have a single simple question that would appear clear on a mobile phone screen and be easy to answer. See62 for a discussion of the design process.

In summary, here we reveal world-wide patterns in self-ratings and over/under-confidence of a core cognitive skill: navigating. We replicate prior age and gender patterns with significantly more power than previous studies. We find that attitudes to masculinity across a nation are associated with self-estimates in ability and their over/under-estimation. Future research will be useful to extend these findings to other cognitive domains.

Methods

Sea Hero Quest game design

The research questions were tested using a large-scale dataset collected via Sea Hero Quest—a mobile game application designed to obtain benchmarks of typical spatial abilities and navigation behaviours of healthy populations worldwide. The game contains eighty wayfinding, path integration, radial maze and other, research-unrelated, game engagement levels along with additional optional socio-demographic questions which asked players about their age, gender, country, education, rating of their navigation skills, hand used for writing, the environment they grew up in as well as their average daily travel time and a typical amount of sleep at night. A detailed description of the game design is provided in Coutrot et al.19. The ecological validity of Sea Hero Quest has already been confirmed by Coutrot et al.63 who found that the navigational performance recorded using the game reflects real-life wayfinding behaviour.

The full Sea Hero Quest dataset contains behavioural and socio-demographic data collected between April 2016 and April 2019 from over 4.3 million players globally with over 50 million gameplays recorded across all wayfinding and path integration tasks. Access to the full dataset is described in the Data and code availability statement of this manuscript.

Informed consent and ethical considerations

Informed consent was obtained from the participants. The starting screen of the Sea Hero Quest game included debriefing and informed consent sections with clearly identified goals of the study as well as the purpose and extent of data collection. Additionally, the full explanation of the study was provided within the gameplay (via the ‘journal’ icon) and the consenting players could access this section and withdraw from playing at any point of the game.

All consenting participants voluntarily downloaded and played the Sea Hero Quest mobile application game. This study was conducted as part of a larger research project which has been approved by the UCL Ethics Research Committee under the Project Number: CPB/2013/015. All experiments and methods including data collection, processing and analysis were carried out in accordance with relevant guidelines and regulations.

Participants and sample size

A subset of 383,187 individuals (representing 46 countries; NFemales = 172,969, MeanAgeFemales = 37.08 ± 14.53, NMales = 210,218, MeanAgeMales = 37.14 ± 13.39) out of the total number of over 4.3 million people who downloaded and played the Sea Hero Quest mobile game application was used in the analysis. The sample size was restricted by selecting only those players who provided the self-estimate scores of their navigation skills along with other socio-demographic information such as age and gender and by narrowing the age range of individuals to between 19 and 70 years old in order to control for fake age demographics as well as previously reported selection bias19 which was found to translate to an unusual spike in performance in older players. Due to a large number of missing values in age, gender and self-estimates variables these data filtering operations excluded 3,394,521 individuals. Also, as not every participant played all game levels, only the records of those who completed six wayfinding testing levels (3, 6, 7, 8, 11, and 12) and the first two practice levels (levels 1 and 2) were considered in this study. By following this criterion, we excluded 398,487 out of the remaining 818,193 individuals who played the game and provided their socio-demographic information. The final sample size was additionally adjusted by (a) removing outliers—defined as players who scored at least two standard deviations above and below the mean of wayfinding performance metric (see below for details), and (b) keeping the records of only those individuals who represented countries with at least 500 players. These two data processing operations lead to 34,065 participants excluded from the sample. Finally, we removed gameplays of individuals representing countries (e.g. Vietnam, Albania, Saudi Arabia, and Croatia) which didn’t feature in any of the cultural clusters described in the Cultural clustering of countries subsection below (2454 excluded). Following these data cleaning activities, the resulting final sample size for testing the hypotheses was 383,187 individuals representing 46 countries. The country-by-country descriptives of the sample are provided in the Supplementary Information, Appendix A, Table S1.

Wayfinding tasks

For the purpose of this study, we only used a selection of wayfinding levels in which the overall goal was to navigate a boat from the origin to the destination in various spatial layouts. They included a number of checkpoints (represented as buoys) which players had to visit in a specific sequence in order to complete the task. Before each game level, players were first shown a map of the corresponding layout environment which they could look at for as long as they needed. At some levels the maps were obscured which forced players to learn the layouts by exploration and made the navigation tasks more difficult. The first two levels of the game were designed as tutorials to allow players to familiarise themselves with the game controls. The performance on these two tasks was used in the analysis to account for any variation in computer experience amongst the players: the distances travelled on testing wayfinding levels were divided by the sum of distances recorded at Levels 1 and 2. This approach to Sea Hero Quest data processing was first defined by19 and since then it has been successfully implemented in other works based on data collected with Sea Hero Quest to ensure reproducibility and robustness of the subsequent analyses.

Cultural clustering of countries

One of the focal points of the analysis was to assess the differences in wayfinding performance and navigating self-estimates between countries as well as groups of countries clustered together based on their cultural similarities. In order to determine the groups of culturally similar countries we followed the cultural clustering implemented in the GLOBE 2004 study64 with further extensions proposed by Ronen and Shenkar56. The original GLOBE 2004 project grouped 52 still existent countries into 10 culturally distinct clusters based on cultural dimensions such as power distance, uncertainty avoidance, societal institutional collectivism, assertiveness, gender egalitarianism etc. which vastly overlapped previously introduced Hofstede’s cultural dimensions45. Ronen and Shenkar56 additionally analysed clusters defined in 11 most influential studies (including Hofstede's study and the GLOBE project) and proposed 11 global and 15 consensus clusters to group 70 distinct countries. In this study, the Ronen & Shenkar’s solution has been employed to group 46 countries represented in the sample into 11 global clusters as displayed in Table 1. As some countries (i.e. Albania, Croatia, Egypt, Lebanon, Macedonia, Puerto Rico, Saudi Arabia, Serbia, and Vietnam) never featured in the GLOBE or Ronen & Shenkar’s studies, their cluster membership was problematic and therefore we have removed players who represented these countries from further analysis.

Table 1 Ronen and Shenkar’s cultural clusters for 46 countries represented in the sample.

Measuring self-estimates of navigation skills

The self-estimates of one’s navigation skills were collected from the players via the optional question “How good are you at navigating?” asked after completion of the second wayfinding task of the game (Level 2). The available answers were distributed on a four-point Likert-type scale without a neutral/average score: ‘very bad’, ‘bad, ‘good’, and ‘very good’. The obtained responses were used as a proxy for measuring the self-reported navigation ability. Our analysis treated the self-estimates responses as either categorical (as recorded by the game) or numeric values. A numeric version of this variable was obtained by assigning a numeric value to each response according to the following coding: ‘very bad’ = -2, ‘bad’ = -1, ‘good’ = 1, ‘very good’ = 2. This numeric-type measurement of navigation skills was applied to quantify the average (arithmetic mean) self-estimates for each gender, age band and country as well as to calculate the country-specific gap between the self-reported ability and measured wayfinding performance.

The average self-estimated navigation skills for each country and cultural cluster in the sample were estimated with conditional modes by extracting the intercepts for 46 countries in the sample and 11 cultural clusters from linear mixed models with fixed effects for age and gender and random effect for country (or cluster, respectively): Self-Estimated Navigation Skills ~ Age + Gender + (1|Country) and Self-Estimated Navigation Skills ~ Age + Gender + (1|Cluster). We used lme465 and lmerTest66 packages for the R programming language to implement these models.

Wayfinding performance metrics

Measuring wayfinding performance of individual gameplays followed the methodology of correcting distance presented by Coutrot et al.19 and was estimated by calculating the Euclidean distances between point coordinates (sampled at Fs = 2 Hz) for each individual gameplay-trajectory recorded at the first attempt of the relevant game level. It should be noted here that, during the game, players were allowed to repeat levels as many times as they wished and, for that reason, the scope of the analysis was limited to the first attempts only.

Following Coutrot et al.19, the distances of trajectories in relevant wayfinding tasks were then corrected by dividing them by sum of mean Euclidean distances travelled at Levels 1 and 2 across up to first three attempts (corrected distances). As some participants played Levels 1 and 2 only once but some others approached them multiple times, we considered only up to the first three attempts to control for any undesired outcomes or technical issues encountered in these early tutorial levels and to minimise the influence of potential training effect on these two levels. The obtained corrected distances for each wayfinding level were additionally scaled (i.e. z-score standardised) to allow unbiased comparison between levels of distinct spatial layouts.

We then calculated the average (i.e. arithmetic mean) distance travelled by each individual player across six wayfinding tasks used in this analysis by simply summing up individual corrected and standardised distances obtained for each level and then dividing it by a number of levels each player completed. Finally, as the estimated metric was in fact a mean distance travelled from the origin to the destination (i.e. the smaller the value, the better the performance), we reversed its sign for easier interpretation (i.e. the larger the value, the better the performance). It represented a single and unbiased measurement of one’s wayfinding performance across multiple wayfinding tasks.

Wayfinding performance for countries and cultural clusters was estimated with conditional modes by extracting the intercepts for 46 countries in the sample and 11 cultural clusters from linear mixed models with fixed effects for age and gender and random effect for country (or cluster, respectively): Wayfinding Performance ~ Age + Gender + (1|Country) and Wayfinding Performance ~ Age + Gender + (1|Cluster). We used lme465 and lmerTest66 packages for the R programming language to implement these models.

Measuring the gap between self-estimated navigation skills and wayfinding performance

The gap between self-reported navigation skills and wayfinding performance was estimated as the difference between the min–max normalised (range [0, 1]) self-estimated ability for each country or a cultural cluster (i.e. average self-estimates controlled for age and gender, extracted as country/cluster conditional modes from the linear mixed models described earlier) and the min–max normalised (range [0, 1]) wayfinding performance (i.e. average wayfinding performance controlled for age and gender, extracted as country/cluster conditional modes from the linear mixed models defined above). The resulting gap metric was distributed on the scale with range of [− 1, 1], where − 1 denoted maximum possible underestimation of navigation skills compared to the measured wayfinding performance, value 0 indicated perfectly accurate estimation of navigation skills across country/cluster, whereas value 1 expressed the maximum possible overestimation of wayfinding abilities.

Statistical analyses and additional predictors of wayfinding performance

Due to the applied standardisation and outlier removal approaches, the normality assumptions were met for some analyses but not for others, and therefore our methods implemented a range of parametric and non-parametric statistical techniques depending on underlying distributions of wayfinding performance metrics, specific cultural dimensions, global indices and particular nature of research questions. These are indicated in the main text of the manuscript and the supplementary appendices whenever statistical tests are reported.

Also, apart from self-estimated navigation skills and typical demographic variables such as age and gender, we used a number of other socio-demographic and behavioural variables such as the highest level of education obtained, the daily amount of commute time, and the type of home environment in which the participants grew up as predictors of wayfinding performance in the hierarchical multivariate linear models reported in the Results section and the Supplementary Information (Appendix A, Tables S3 and S4).