Quantifying human performance in chess

From sports to science, the recent availability of large-scale data has allowed to gain insights on the drivers of human innovation and success in a variety of domains. Here we quantify human performance in the popular game of chess by leveraging a very large dataset comprising of over 120 million games between almost 1 million players. We find that individuals encounter hot streaks of repeated success, longer for beginners than for expert players, and even longer cold streaks of unsatisfying performance. Skilled players can be distinguished from the others based on their gaming behaviour. Differences appear from the very first moves of the game, with experts tending to specialize and repeat the same openings while beginners explore and diversify more. However, experts experience a broader response repertoire, and display a deeper understanding of different variations within the same line. Over time, the opening diversity of a player tends to decrease, hinting at the development of individual playing styles. Nevertheless, we find that players are often not able to recognize their most successful openings. Overall, our work contributes to quantifying human performance in competitive settings, providing a first large-scale quantitative analysis of individual careers in chess, helping unveil the determinants separating elite from beginner performance.


I. INTRODUCTION
Countless individual careers shoulder the forward momentum in the sciences [1][2][3][4][5], arts [6][7][8][9][10] and sports [11][12][13][14].Indeed, the recent availability of large-scale datasets is nowadays providing an unprecedented opportunity to study the drivers of human performance in all such different domains.In science, for example, Google Scholar allows to track individual careers of academics, where individual performances can be quantified via their impact, i.e. the attention received from the research community in form of citations.In areas like arts, despite the definition of quality being somehow elusive, new data has recently shed light on the role of early career exhibitions in reputed venues in the eventual success of the artist [7].
Data-driven investigations of individual activity are a pillar of sports performance.Who can forget the success story described in the book moneyball for baseball [15]?The day Sandy Alderson realized that on-field strategies and player evaluations were better conducted by based on statistical data-than by the collective wisdom of old baseball men [15], the game-as we knew it-changed.In tennis, network techniques suggested the identification of Jimmy Connors as the best player of the past [16], a difficult task which requires arbitrary external criteria when comparing across eras.All in all, sport analytics are now commonplace in most major sports, providing clues for individual and team performance to boost success rates [17,18].Interestingly, while sports have benefited from scientific methods, they have in turn become a frontier to develop new scientific tools to investigate success, innovation and learning, as one of the primary domains where growth and success are measurable in a data-driven fashion.
In this work we focus on individual careers in the competitive sport of chess.Moreover, chess is a highly intellectual activity which shares similarities to science.Thus, it is often located amid the two domains, a game where players use simple rules resulting in highly complex plays, often developing different personal styles able to influence long-term success in the game.Besides, the volume of online chess games freely available for analyses (several billions), makes chess a perfect candidate for testing hypothesis involving human performance in competitive settings.So far chess has predominantly been looked at at the level of single games.For example, past research focused on the role of memory in games [19] and showed that opening popularity follows the well-known Zipf's law [20].However, these analyses did not use individual player-level data, treating games from different players on equal footing [19,20], or focused on a small number of players [21].Indeed, little attention has been devoted to individual careers and their evolution.In particular we ask-what separates skilled players from the rest?Earlier studies found that the answer is not intelligence [22], and the role of deliberate practice remains heavily debated [23][24][25].Here we perform a comprehensive large-scale analysis of the habits of skilled and less skilled individual players over time, providing an anatomy of human performance in the popular game of chess.We characterize players' careers in terms of hot-streaks, diversity and specialization in the opening sequences of their games, and analyze their diversity as a function of career stage.We find evidence for the presence of both hot and cold streak phenomena, revealing a surprising tendency for beginners to have longer hot-streaks as compared to expert players.By sequencing the opening moves of players at different skill levels, we show that beginners start with more diverse set of first moves, while advanced players and experts rarely start their games differently when playing as white.Yet, expert players display a broader response repertoire, showing the ability to surprise their opponent with a greater variety of responses.Moreover, when accounting for different variations of the openings, experts show a deeper knowledge of different variations within the same line, hinting at a deeper understanding of the game.Lastly, analyzing behaviour in time, we find that players explore more during the beginning of their careers, but tend to specialize using and exploiting only fewer openings at later career stages.Overall, our large-scale characterization of individual gaming behavior supports chess as a suitable laboratory to quantitatively investigate individual careers and human performance, demonstrating simple differences in playing habits and behaviours of beginners and experts.

II. RESULTS
In this work, we rely on large-scale data extracted from lichess.org, a popular open-source Internet chess server, consisting of 123 million games between 0.98 million players (see Sec. IV A).In the lichess dataset, each player's career can be tracked over time, with detailed information on each of the played games, i.e. moves, opening, win/loss, and its skill level.This is quantified by the Elo rating (see Sec. IV D 2), which measures the level of past performance of the player, it increases when a player beats an opponent and decreases upon a loss.As an illustrative example, in Fig. 1a we show the career of Grandmaster (GM) Magnus Carlsen on lichess.org,indicating his Elo in each game and the game outcome: win, loss or draw.
Figure 1a seems to suggest that for GM Carlsen wins and losses tend to be clustered together.Indeed, prior works tracking wins and losses in sports hotly debate the existence of hot-streaks [26,27], a phenomenon that has also been found to be ubiquitous in artistic and scientific careers [6,28].
To quantitatively check for the existence of such phenomena in all chess careers, we calculate the length of hot (series of wins) and cold (series of losses) streaks for each player in the dataset, and compare them with lengths expected in a reshuffled null model (see Sec. IV C) for each player.In Fig. 1b we show the resulting curves, properly normalised with the null model.We find the existence of statistically significant hot streaks, possibly associated with confidence spillovers from previous victories.Long streaks of chess wins are reminiscent of players entering the so-called zone, a state of focus where peak performance is possible [29,30].
Interestingly, cold streaks are also observed, typically longer than hot ones, indicating that times of poor performance tend to last more than periods of intense success.In physical sports, loosing streaks are often found to be the effect of an injury.Here, we speculate that similar phenomena might be in place even in chess, possibly due to lack of confidence, loss of focus and similar decrease in mind fitness more than purely athletical shape.
We can refine such analysis by further separating players by skill (Elo).Categorizing players into 4 categoriesbeginner, intermediate, advanced, expert (see Methods).We find that weaker players experience comparatively longer hot streaks than stronger players (Fig. 1c).A reason for this could be that confidence spillovers from last victory may have greater impact on future outcomes at a lower skill levels.
Another possible driver of the observed disparity in hot streaks across beginners and experts can reside in how experts diversify their moves.In competitive sports, some players diversify their techniques while others may specialize.Strategy diversification might make players harder to predict, thus enabling them to surprise their opponents.By contrast, specialization, e.g., deeper knowledge of certain opening positions, may allow players to exploit opponents navigating familiar situations.Indeed, such an exploitation-exploration (specializationdiversification) dichotomy is a common mechanism governing the dynamics of many diverse self-organized and adaptive systems [31][32][33][34].
In chess-and sports in general-the balance of this trade-off may depend on skills.We thus investigate the extent to which skill level influences the approach to the game.In particular, we study the diversity in the player's arsenal of game openings across different Elo ratings (Fig 2).We calculate the Shannon entropy of the distribution (see Methods) of first move as white for each player and report the results in Fig 2a .We find that beginners tend to open games with a diverse collection of first moves (as white) when compared to stronger players.Thus, our analysis captures beginners exploring a wider variety of first moves than experts, who instead are likely to begin with a typical move.At a first glance, this result might seem surprising, as skilled players are supposed to have better knowledge of opening theory.Yet, this may be linked to the ability of more skilled players to easily transpose into different opening variations in the following moves.Better awareness of transposition theory among experts may allow them to reach many different openings from the same starting move, thus potentially eliminating the need to diversify in the first move itself.So, overall, do experts specialize at the cost of diversity?To investigate further, we ask-how does skill level determine response diversity (as black)?For the top 5 white moves observed-e4, d4, N f 3, c4 and e3, we group the games of each player based on these moves and calculate the response diversity of the black to the white player.Results are shown in the different boxplots of Fig 2b.Surprisingly, we observe a contrasting result.As white, beginners encounter the lowest diversity in black responses.This is captured by the low response entropy for all 5 of white's most played opening moves.Hence, beginners lack experience to the plethora of possible responses, which perhaps leaves gaps in their game.
Lastly, we point out that this increase in the diversity of responses at higher skill levels, might be what prevents players from increasing their Elo, as the potential to be surprised by your opponent keeps increasing as one climbs the Elo ladder.
From the first move onward, players enter into established chess theory, where the many top variations of opening moves are well-explored.The next natural question to ask at this point is-How do players diversify beyond the first move as player move into opening theory?The beginning usually plays out like a wellchoreographed dance, evolving in already classified open-ing sequences with standard names such as "Sicilian Defense", "Queen's Pawn Game", and so on.In Fig. 3a, we show the top 9 openings used by players on lichess.com.Focusing on such opening sequences, we explore the specialization players achieve in the opening sequence.Results are shown in Fig. 3b, where we define the "favorite opening" of a player as the most used one, assuming it is played at least 100 times.
Interestingly, the majority of players end up in their favorite openings only around 10% to 30% of the time.Furthermore, we find that expert players start with their favorite opening significantly more times than their second favourite.This is marked by the distribution falling below the diagonal line.Contrarily, beginners lie much closer to the diagonal, indicating that their favorite opening is played comparably to the runner up, thus pointing out a lack of opening specialization.
Further analyses reveal that expert playing behavior comes in a variety of shapes and sizes, i.e., there are players who specialize and players who flexibly switch openings (diversify).This is marked by the bi-modal nature of the distributions observed in Fig. 3b, column  4.
At the individual level, we find on average less diversity in opening selection (main lines) among experts, as shown in Fig. 3c.As mentioned earlier, the ability to arrive into known openings through transposition, i.e., different sequences of moves that players may use to reach the same final configuration, might be unique to expert players.Arriving into fewer openings may allow experts to use learned chess theory and use optimal moves from memory, saving crucial time and preventing build-up of mental fatigue during the game.
However, accounting for the many different variations of the openings (see Methods), it is the experts instead who encounter the most diversity.This hints that experts like to enter into certain main openingsperhaps the ones they specialize in-which they followup by expanding their repertoire in the variations to surprise opponents and catch them off-guard, a strat-egy not unique to chess but key in many competitive sports.Furthermore, upon investigating temporal organization of openings (main lines) used by a player, we find that experts switch openings between consecutive games more often than beginners (see Supplementary Information, Sec.S3).Thus, experts encounter higher temporal diversity in openings.
At this point one might wonder-how much exactly does specialized knowledge of favorite openings aid in victory?A naive argument would suggest that players would tend to prefer those main lines that give them the best results.If this is the case, the favourite opening of each player-the one mostly used-would be the one that gives the best performance, that is the highest winrate.
To investigate this, we calculate for each player the winrate of each of the player's top-3 most played openings and plot it against the frequency of their use.Results are shown in Fig. 4a for a sample of the players.Surprisingly, there are players whose top used opening performs worse than their lesser used openings.Besides, optimal players (black curves)-those who play more often their better performing openings-are just a few.
To quantify this effect in the whole population, we calculate for each player the difference in the winrate of the most played opening and the second most played one, showing its distribution in Fig. 3b.Our analysis reveals that when expert players do encounter their favorite opening, their winrate is more likely to be lower than their second favourite opening, when compared to beginners.We note that players who do better in their second most played opening-as compared to their most played one-are experiencing sub-optimal opening encounters.Thus, we find that stronger players encounter sub-optimal openings more often than weaker players.Discovered sub-optimal encounters may be an opportunity for players to improve.
Lastly, we explore diversity as a function of different stages of players' careers.Selecting players with at least 3000 games, we split them into 3 equal stages: early (0-1k), mid (1k-2k) and late career (2k-3k).For each play, we compute opening diversity in the different career stages and report it in Fig. 5.For both the opening move (a) and the opening sequence (main lines) (b), we find that players explore more in the initial stages of their careers, becoming more specialized in later stages, perhaps exploiting the knowledge of certain openings they have learned.

III. DISCUSSION
Quantifying performance and unveiling the drivers of success is an ubiquitous pursuit in modern society, especially important in competitive settings, where skills, techniques, strategies, and achievements need to be compared.Indeed, in many sports, measuring and analysing performance has nowadays become a common practise [35,36].In this work, we propose chess as a natural lab- oratory to investigate human behavior [37][38][39][40].Differently from most other disciplines, chess has no stochastic component, hence performance can be more directly associated with skill-as quantified by Elo.Having this in mind, here we performed a data-driven investigation of almost 1 million individual careers carving their way to success in the online platform lichess.org.Looking at entire careers, we found the presence of hot and cold streaks.These are bursts of victories and losses, already observed in science and other domains [6,9,28], which might be due to periods of particularly successful physical fitness, creativity or even confidence.Accounting for skill level, we noticed that beginners have a higher chance to experience a repeated sequence of wins, and that the typical length of a hot-streak is inherently related to the skill level of a player.Moreover, no matter individual ability, player performance is characterized by even longer period of repeated failure.
Even just looking at simple patterns in the openingsthus neglecting the full complexity of game sequences-, we were able to characterise individual playing behavior across different career stages.In particular, expert players were shown to behave differently from the very first move of the game, displaying a lower diversity in openings.Looking at chess as a process of interactions and reactions, we focused on the black's response to the white player's moves, finding that experts encounter the highest diversity from black.However, after accounting for different variations within the openings we discovered that experts were more diverse instead, hinting at a deeper understanding of the complexity of the different variations within the same line.Such findings corroborates some very recent ideas on opening similarity and complexity independently presented in Ref. [41], focusing on prediction of future openings and opening preparation, about which we recently became aware.Looking at individual careers over time, opening diversity was found to decreases at their later stages, pointing towards higher specialization as a player becomes more experienced.In addition, experts tend to play their favorite opening sequence much more than beginners, providing evidence for a tendency towards specialization.Nevertheless, counter-intuitively, we also found that players often do not have the ability to recognize their most successful opening, i.e. the one associated with the highest win-rate.Surprisingly, this is particularly true for more expert players, who have a higher chance of sub-optimal encounters in opening, possibly because of the depth of responses and variations within opening lines coming from a skilled opponent.
The study we have presented has two limitations.First, we kept our focus on openings only, which constitute one of the many phases of chess games.Nevertheless, this simple approach proved to be enough to reveal how experts differ from beginners in simple quantifiable ways.It also complements existing work on recall abilities of players for chess positions as a function of skill level [38].A first natural extension in this direction would consist in analysing also other parts of the game, such as middle game and endings.A second limitation is that, when associating a skill level to a player, we inevitably considered Elo as a static, immutable measure.Instead, this rating systems is clearly in constant evolution throughout the career of a player.While including this dynamical aspect of ranking would surely add a missing aspect to the analysis, it is worth stressing that our measure is still a good proxy for skill level, as we have neglected the initial phase of the careers-associated to the steepest growth/change in Elo.
Taken together, our work represents a first step towards understanding the game mechanisms associated to performance in the careers of chess players.Future work might enrich this analysis by considering the complexity of chess games as a whole via considering the full sequences of moves instead of focusing on the important phases of the game only.Finally, and more broadly, it would be interesting to extend our approach to other ecosystems, investigating tensions between specialization and diversity in other contexts-from Go to tennis and boxing matches-where opening and response constitute a crucial part of the game.

A. Data
For our analysis, we use all games played on the online chess server lichess.org between 2013 and 2016.Such data covers 123 million games played between 0.98 million players.There are different games available to the players on the platform: bullet, blitz, and rapid.The analysis presented in this paper is restricted to Blitz games, which are fast and tactical but still allow for some strategy in the game overall unlike bullet games which last only 1 minute at most and littered with pre-moves.The most popular time controls for blitz are 5 mins and 3 mins.We specifically focused on this type of games since "speed" chess is played across all levels, from beginners to grandmasters.

B. Measuring Diversity
We measure the diversity of openings of a player by calculating the Shannon entropy [42] of the distribution of frequency of opening moves or opening sequences (see Fig. 2 and Fig. 3 respectively).Notice that for the analysis in Fig. 2b, we selected only games where the player starts as white.

C. Null models for hot and cold streaks
To calculate the expected lengths of hot streaks in a player's career, we build a null model where we reshuffle the temporal order of the associated sequence of games, thus preserving the total number of victories, losses and draws.Then, we compute the length of each hot and cold streak (sets of consecutive wins and loses) observed in this reshuffled sequence.The presence of hot-and coldstreak phenomena can be then investigated by comparing the number of hot and cold streaks of a given length in the actual careers with respect to these reshuffled sequences.

Openings
A chess opening is the initial stage of a chess game-a sequence of first few moves.It usually consists of established theory; the other phases are the middlegame and the endgame.All games can be associated with a unique main opening line, within which there can be many variations.Many opening sequences have standard names such as the Sicilian Defense, Ruy Lopez, Italian Game, Scotch Game etc.

Elo rating
The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games such as chess.A player's Elo rating is represented by a number which may change depending on the outcome of rated games played.
We show the Elo ratings of all players in our dataset in Supplementary Information (Sec.S1).The career lengths of players as a function of their skill is shown in Supplementary Information (Sec.S2).Experts tend to play more than beginners.
After every game, the winning player takes points from the losing one.The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game.Two players with equal ratings who play against each other are expected to score an equal number of wins.A player whose rating is 100 points greater than their opponent's is expected to score 64 %.For each player, we work with the Elo averaged in all their games (expect the first 100 games when it fluctuates a lot).We note that players of similar skill levels (Elo ratings) are matched to compete.

Separating players by skill level
We separated the player into the 4 skill levels as follows.We first arranged the players in ascending order of their Elo rating (average calculated over all their games).We then created Elo bins that divide players in 4 equally sized skill categories.Finally, we labelled these bins asbeginner, intermediate, advanced, expert respectively.

FIG. 1 .
FIG. 1. Hot and cold streaks in chess careers.Visualization of the career of the Grandmaster (GM) Magnus Carlsen.Wins and losses drive the Elo rating up or down.(a) Relative number of hot streaks (red) and cold-streak (blue) of length ≥ L streak as a function of L streak calculated for each player.Results are averaged over all players.Losses tend to be more clustered than victories as individual cold streaks are on average longer than hot streaks.(b) Relative number of hot streaks of length ≥ L hotstreak as a function of L hotstreak , averaged over the players in each skill categories separately (i.e.beginner, intermediate, advanced, expert).Weaker players have longer hot streaks than more expert ones.

FIG. 2 .FIG. 3 .
FIG. 2. Diversity and specialization in the first move and black's response.(a) Boxplots showing diversity (entropy) of first move by a player as white, calculated over all players individually and aggregated into the 4 different skill levels.Weak players start games with diverse collection of first move as white when compared to stronger players.(b) Boxplots showing diversity of black's response experienced by white player, for each of white's top 5 most played first moves-e4, d4, N f 3, c4 and e3 (in descending order of popularity).As white, weakest players encounter lowest diversity in responses captured by low response entropy-for all of white's most played opening moves, except Nf3.

FIG. 4 .
FIG. 4. Sub-optimal opening encounters.(a) Winrate of top 3 openings of a player against opening frequency.Each connected curve corresponds to a player.We show 15 random players who play at least 100 games with each of their top 3 openings.Curves of players whose winrate increases monotonically with the frequency of the associated opening are depicted in black and are deemed optimal.(b) Distribution of difference δw in winrate of associated to favourite and second favourite opening and winrate of their second favourite opening for the whole population of players.Different curves correspond to different skill levels.Dashed lines indicate mean values of the distributions.Stronger players encounter less optimal openings more often than weaker players.

FIG. 5 .
FIG. 5. Diversity in opening with career stage.We show the diversity in the opening move (a) and opening sequence (b) of moves.In later parts of one's career, diversity decreases, players prefer certain openings and specialize more in them-playing them more often.