Quantifying the complexity and similarity of chess openings using online chess community data

Chess is a centuries-old game that continues to be widely played worldwide. Opening Theory is one of the pillars of chess and requires years of study to be mastered. In this paper, we use the games played in an online chess platform to exploit the “wisdom of the crowd” and answer questions traditionally tackled only by chess experts. We first define a relatedness network of chess openings that quantifies how similar two openings are to play. Using this network, we identify communities of nodes corresponding to the most common opening choices and their mutual relationships. Furthermore, we demonstrate how the relatedness network can be used to forecast future openings players will start to play, with back-tested predictions outperforming a random predictor. We then apply the Economic Fitness and Complexity algorithm to measure the difficulty of openings and players’ skill levels. Our study not only provides a new perspective on chess analysis but also opens the possibility of suggesting personalized opening recommendations using complex network theory.


INTRODUCTION
Since its creation in the 6th century, chess has fascinated an uncountable number of people and nowadays, 1500 years after its birth, it counts more than 600 million regular players (https://www.un.org/en/observances/ world-chess-day).Chess, which is considered by many to be among the noblest intellectual arts, has strongly influenced human history.For instance, chess has been one of the main arenas where the Soviet Union and the United States fought for intellectual supremacy, as exemplified by the 1972 world championship match between Bobby Fisher and Boris Spassky.Iconic is also the match between world champion Garry Kasparov and IBM supercomputer Deep Blue, won by the latter, which established the superiority of the computer over the human mind in computational problems, making this event a milestone in the history of artificial intelligence.Despite this, people did not lose interest in chess and instead started using computers to improve the game's comprehension further.
Given the popularity of chess, it is not surprising that also science devoted considerable attention to this game.Early theoretical studies date back to C. E. Shannon [1].However, only recently, thanks to the advent of the internet and online chess platforms, it has been possible to analyze a vast amount of data with the tools of statistical physics and complex systems [2].For instance, Refs.[3][4][5] found that chess openings obey Heaps' and Zipf's laws, two statistical regularities which are often considered the footprint of complexity [6].Also, the history of recorded games shows long-range memory effects [7,8].Other studies focused on chess players' ratings and their evolution [9,10], and on the popular level learning of the game [11].
One of the factors determining the complexity of chess is the vast number of different possible playable games, which C. E. Shannon [1]estimated to be around 10 120 .However, even if there are more than 71,000 different reachable positions after the first four moves, only a tiny amount is ob-served in real games since not all moves are equally good.The study of the sequence of initial good moves is called Chess Opening Theory.Currently, the most authoritative resource on this subject is the Encyclopedia of Chess Openings, which classifies openings in 500 different ECO codes [12].Openings are a central part of chess, and top-level players spend a significant fraction of their time studying novel opening ideas and memorizing new lines.
Mastering Opening Theory requires a deep knowledge of chess that is typically well beyond the capabilities of amateurs.As a consequence, only world-level professional players have a complete understanding of the Opening Theory in its entirety.Here we show that by considering a large community of chess players and leveraging complex network theory, it is possible to exploit the emerging social intelligence of the community ("the wisdom of the crowd") to overcome this limit.The idea is that even if each player in the community has a partial individual knowledge, all this knowledge can be suitably combined to obtain a complete picture of the whole Opening Theory.This approach allows us to quantify chess features that only chess experts could appreciate so far: (i) the similarity between openings; (ii) the complexity of openings; (iii) the quality of players' opening repertoires.

The bipartite network of chess players and openings
Network theory is one of the pillars of complexity since most complex systems spontaneously arrange into graphs, chess making no exception [13].Recently, bipartite networks received an increasing interest since the graph representation of many systems displays this peculiar arrangement of nodes.A graph is bipartite if two classes of nodes exist such that the nodes of the same class do not connect, while links connect the nodes of different classes.For example, one can represent the world trade network with the bipartite network formed by the products and by the countries exporting them.By leveraging that network, it is possible to obtain state-of-the-art long-term GDP forecasts [14] and to predict the industrial upgrading of countries [15,16].Remarkably, we can identify bipartite networks also in chess where they can be used to gather novel insight into this game.
In technical terms, a bipartite network is a network whose nodes can be divided into two sets P, O such that there are no links between nodes belonging to the same set.Denoting by P the number of nodes in the first set and by O the number of nodes in the second one, an unweighted directed bipartite network can be represented by a P × O matrix M such that M po = 1 if p ∈ P and o ∈ O are connected and M po = 0 otherwise.In this study, we consider the bipartite network of chess players and chess openings, which we built using games played on the online chess platform lichess.comwith a Blitz time control (see Data section).We chose Blitz games since this format is the most played online.The Lichess platform uses the Glicko-2 system to rate players (see Data section).We consider the 500 chess openings with their ECO code as appearing in the "Encyclopedia of Chess Openings" [12,17].Each ECO code corresponds to two nodes in the network since we distinguish between playing with White pieces or with Black pieces.For instance, if a game between player A with White and player B with Black falls under ECO code C20 (King's pawn game), then A is connected to the opening C20W (King's pawn game with White), while B to the opening C20B (King's pawn game with Black).In this first part of the analysis, we selected a sub-sample of chess players with a rating above 2000 and who played at least 100 games with Black and 100 games with White during the period considered, from October 2015 to September 2016 (one year).This way, we ended up with a network composed of 2513 players and 982 openings (no player played 18 of the 1,000 openings during the time-laps we analyzed).The matrix M thus satisfies   1, the openings a randomly selected player used during the period July 2016-September 2016 (green nodes) and the new openings he/she used in the period October 2016-December 2016 (red nodes).As it is possible to see, new openings are close to those openings the player already used.b) Probability for a never used opening to start to be played as a function of its density, defined as the fraction of neighbour openings the player already uses.This probability is increasing in density, meaning players tend to learn openings close to those they already know.The network topology defines closeness.

The network of chess openings
Relatedness networks have been widely used in the economic complexity literature to quantify the similarity between nodes belonging to the same layer of bipartite networks [18].For instance, considering the country-product network, the assumption is that two products require similar capabilities for being produced if they appear together in the export basket of many different countries or, in other words, if they co-occur often.In the same way, here we build the relatedness network of chess openings leveraging the idea that two of them are similar to play if many players play them both.We thus define the relatedness W * o1o2 between two openings o 1 , o 2 as However, the resulting matrix W * contains many spurious co-occurrences.Two openings may often co-occur just because players with high diversification use them both by chance or because the openings are ubiquitous but not similar.In order to filter out such spurious co-occurrences, we exploit a null model, namely the Bipartite Configuration Model (BiCM) [19].The idea is to retain only those links that can not be explained only through the ubiquity and the diversification of nodes and that thus are statistically significant; we report more details in Methods.In the following we denote by W the matrix resulting from this procedure, whose elements satisfy and we call relatedness network of openings, or simply network of openings, the network defined by such matrix.
We show the network of chess openings in Fig. 1.It is composed of 924 nodes out of the initial 982 since we filtered out isolated nodes and small components formed by pairs of nodes.We then applied the Leiden algorithm [20] to this network to detect communities, which we indicate with different colours in the figure.There are ten clusters, three corresponding to openings played from the White perspective and the remaining seven to those played from the Black perspective.The three White clusters almost perfectly correspond to the three main choices White has as the first move: 1. e4 (65% of games), 1. d4 (24% of games), 1. c4 (3% of games) (The percentages come from the Lichess database https://lichess.org/analysis#0).More precisely, • King's pawn opening (light blue), where White's first move consists in advancing the king's pawn by two squares (1.e4) • Queen's pawn opening (red), characterized by White choosing as first move to advance the queen's pawn by two squares (1.d4) • English opening (light green), where White opens by moving the c2 pawn by two squares (1.c4).Also Reti opening (1.Nf6) is contained in this cluster since it often involves advancing the c2 pawn by two squares as second move (2.c4) Also Black clusters nicely maps to Black's main choices, but their number is higher since Black's reply also depends on White's first move.The clusters are • e5 openings (khaki), where Black plays the move e5 as reply to a King's pawn opening (1.e4 e5) or as reply to the English opening (1.c4 e5) • e6 openings (dark blue), which can be divided in two main categories, as also evident from the shape of the cluster.The French opening (1.e4 e6) on the right, where e6 comes as reply to the King's pawn opening, and Queen's pawn game openings (Catalan, Bogo-Indian, Queen's Indian and Nizmo-Indian) on the left, where Black plays e6 as second move after White having opened with her Queen's pawn (1.d4 Nf6 2. c4 e6) • Sicilian defense (orange), characterized by Black replying with c5 to White's King's pawn opening (1. e4 c5).This community also contains other openings characterized by Black playing c5 such as the Symmetric English (1.c4 c5).
It is worth remarking that both these openings are very similar to Benoni defense and this explains their closeness in the network

Forecast of future openings
As explained in the previous section, the relatedness between two openings quantifies how similar to play they are.Thus, the opening network allows measuring the distance between openings.We already assessed this qualitatively by inferring the clustered structure of the network.Now, we go a step further by showing that we can use the opening network to predict which openings a player will start to play in the future.The idea is that those openings close to a player's opening repertoire are easier for her to learn.We sketch this situation in Fig. 2a.We plotted in green the openings a randomly selected player used during July 2016-September 2016 and in red those openings she started to play during the following three months, while small grey dots are all the remaining openings the player did not use.We observe that new openings are close to those used in the first time interval, giving rise to a sort of "adjacent possible" effect [21,22].The adoption of a brand new opening opens the possibility of playing new openings which are connected to it: the learning of Opening Theory can be seen as an expansion into the "adjacent possible" identified by the opening network.
In order to quantify the role played by the closeness of the opening network in the learning of new openings, we considered a sample of 8831 players, forecasting the activations of new openings.We report the details about the data used in the Methods.We denote by M i po the adjacency matrix obtained considering the matches played by these players in the period July 2016-September 2016 and by M f po the matrix corresponding to October 2016-December 2016.In these terms, the activations a are those openings not played in the first period, that is M i pa = 0. We then define the density ρ pa of activation a for player p as [18] where W is the adjacency matrix of the openings network previously defined.The density ρ pa is thus the fraction of openings connected to a the player p already used in the first time period.We define the transition probability P (ρ) as the probability an activation with density ρ is played in the second period considered, i.e.
. This quantity is plotted as function of the density in Fig. 2b.We see that the transition probability is an increasing function of the density and for large values of ρ the probability for an opening to start to be played is about four times larger than at ρ ≈ 0. This plot indicates that the density can be used to forecast the adoption of new openings.Therefore, we define the transition predictor y pred pa as where β is the density threshold separating openings that are predicted to be used from those that are predicted to remain unused.We tested the performance of this predictor on the activation using as ground truth the bipartite matrix of the second period pa .Its Best F1 Score, corresponding to β = 0.2, is 0.16, to be compared to Best F1 Score= 0.04 obtained with a random predictor.We report The definition of the Best F1 Score and more details about the predictor's performance in the Methods.This score, even if not notably high, is a good result for several reasons.First, we notice that players usually have many possible openings to start to play, but they use only a few.Second, in the similar context of the bipartite network country-product, state-of-the-art machine learning techniques reach a Best F1 Score≈ 0.04 [16].Finally, here we are not interested in obtaining the best forecast possible.Instead, we want to demonstrate that our opening network contains useful information.Comparison between players' fitness and players' Glicko2 rating.We see a strong correlation between the two quantities, meaning that players' opening preparation is crucial in determining their ability to win games and so to reach a high rating.In the inset, we show the binned average rating as a function of the fitness.
The standard deviation of the sample determines error bars.

The fitness of players and the complexity of openings
Not all openings are equally easy to play since some require a deep knowledge of chess theory.This is an aspect our opening network does not capture, but, as we will show, we can exploit the information contained in the playeropening bipartite network to estimate how difficult to play an opening is.First of all, we introduce the normalized matrix N whose entries N op are given by the fraction of games in which player p used opening o where n op is the number of times player p chose opening o.This matrix defines a bipartite network of players and openings with links weighted by the frequencies of the played openings.We then exploit the Economic Fitness and Complexity (EFC) algorithm [23] to compute the complexity of openings Q o and the fitness of player F p .The former quantifies how tough to play openings are, while the latter measures the opening skills of players.The EFC algorithm is a recursive non-linear map which has been successfully applied to rank nodes of bipartite networks [23][24][25][26] and which, in its original form, is defined by the following map where by t we denote the iteration step, while Q o and F p are given by the fixed points of such a map p .The original map of Eq. ( 4) suffers of convergence issues so that we use a slightly different map, the non-homogeneous EFC (NHEFC), which delivers almost the same results but with no convergence problems [27].We report the details of the NHEFC algorithm, its convergence and its implementation in the Methods.We add here few considerations about Eqs.(4).The first expression implies that an opening has low complexity if low-fitness players play it.This is no surprise since one expects low-fitness players to use only simple-to-play openings.The second expression states that the weighted average of the complexity of openings a player uses is the player's fitness, whereas the frequencies of the played openings determine the weights.Note that this differs from the standard EFC algorithm.In the original formulation, the fitness at the first iteration is given by the diversification F (1) p = d p , while in our implementation F (1) p = 1.As already mentioned, openings play a significant role in chess games, so we expect players with high fitness also to have a high rating.In order to assess if this is the case, we considered a sample of 18, 253 players.We built the corresponding frequency matrix N and we applied the NHEFC algorithm.Again, details on the data are available in the Methods.In Fig. 3, we show the scatter plot with players' rating against their fitness.There is a strong correlation between these two quantities, as confirmed by a Spearman correlation coefficient of 0.64.In the inset, we report the average rating as a function of the fitness (error bars defined by the standard deviation).Remarkably there are two flat regions corresponding to low-rated and high-rated players, respectively, in which an increase in the fitness does not affect the rating.This means that Figure 4: Complexity of openings.a) Average opening repertoire of low-rated chess players visualized using the opening network.We selected all players with ratings between 1500 and 1600, and for each opening, we computed how frequently they use it.Dark colours correspond to frequently used openings, while light colours to infrequently used ones.The size of nodes is inversely proportional to the openings' complexity, so small nodes are hard-to-play openings, while big nodes are easy-to-play openings.We see that low-rated players tend to use low-complexity openings frequently.b) As panel a, but for players with ratings between 1900 and 2000.c) As panel a, but for players with ratings between 2300 and 2400.d) As panel a, but considering the opening repertoire of world champion Magnus Carlsen.As a player's rating increases, low complexity openings tend to be used less frequently, while the frequency of more complex openings increases.
beginners should first learn the basic ideas of chess before focusing on opening theory.At the same time, for high-rated players, it is not very easy to improve only by focusing on openings since other aspects, such as endgames or time management, also start to be very relevant.Finally, we studied the complexity's meaningfulness by analyzing how players' opening repertoire changes depending on their rating.This is done in Figs.4a, 4b and4c, where we considered players with rating in the ranges 1500 − 1600, 1900 − 2000 and 2300 − 2400 and we plotted their average opening repertoire on the opening network.The sizes of nodes are inversely proportional to their complexity so that easy-to-play openings are represented as large circles, while dark colours indicate frequently used openings.We see that the opening repertoire of low-rated players is concentrated mainly on low-complexity openings, which are less frequently used by high-rated players.Analogously, Fig. 4d shows the opening repertoire of world champion Magnus Carlsen (nickname DrNykterstein): here, some of the less complex openings are completely absent, while several small nodes, corresponding to more complex openings, are dark and so frequently used.This confirms that complexity is a good indicator to quantify the difficulty of openings.

DISCUSSION
Chess is probably the most fascinating board game, and, despite its old origins, a considerable number of people still play it all around the world.Opening Theory is one of the most complex aspects of this game and requires years of study and practice to be mastered.As a consequence, amateurs only have minimal knowledge of chess openings.However, as we showed in this work, it is possible to extract information about the Opening Theory that goes well beyond the knowledge of single players by considering a whole online chess community.This allows to analyze aspects and answer questions that otherwise would require the help of chess experts.As a first step, we use the playeropening bipartite network to obtain the one-layer relatedness network of chess openings.Two openings linked in this network are similar to play; thus, the opening network quantifies the distance among chess openings.We obtain ten clusters by applying a community detection algorithm to such a network.Three of them contain openings seen from White's perspective and almost perfectly correspond to White's three main choices for her first move (1.e4, 1. d4. 1.c4).The remaining seven correspond to Black's most common reply to White's first move.This structure is non-trivial and can not be directly derived from the Encyclopedia of Chess Openings (ECO) classification, thus showing that our entirely data-driven network-based approach allows unveiling hidden similarities between chess openings.We then exploit the opening network to forecast which openings a player will start to play in future.We do this relying on the assumption that those openings that are "surrounded" by openings the player already uses should be easy to learn since they are similar to what she already knows.In practical terms, we introduce density, which measures the surroundedness of openings and we show that the probability for an opening to be used is an increasing function of this quantity.Forecasts based on the density reach a best F 1 score about four times larger than those obtained by a random predictor.Finally, we exploit a variant of the Economic Fitness and Complexity algorithm to obtain a data-driven definition of openings' complexity and players' fitness.The latter quantifies how skilled players are in openings and shows a 0.64 correlation with the rating of players.The former measures how challenging to play openings are and we demonstrate its meaningfulness by showing that low-rated players tend to focus on low-complexity openings, while high-rated players also exploit complex openings.
We conclude by pointing out that the analysis here discussed also opens the possibility of devising personalized recommendations for chess players.For instance, by leveraging the opening network, it is possible to suggest to players openings they could learn easily and, taking into account their fitness and the complexity of openings, such suggestions can be modulated based on players' skill.Moreover, the opening network combined with the complexity allows visualizing in a single image the opening repertoire of a player, thus making it possible to understand weaknesses in her opening preparation or compare two players effortlessly.This makes us think these tools can be helpful to scientists interested in studying the game of chess and to any chess player willing to improve their opening repertoire.

Data
We used data gathered from the online chess platform lichess.comfor carrying out our analysis.These data are freely available at https://database.lichess.org/.

Definition of Blitz games
We only selected Blitz games since this format is the most played online.Two integer numbers define the time control of one player, e.g., "X+Y", where X is the clock initial time in minutes and Y is the clock increment in seconds.Lichess considers a game to fall in the Blitz category if the estimated time of an average match, which is supposed to end in 40 moves per player, T = X × 60 + 40 × Y in seconds per player is such that 179 ≤ T < 479.For instance, the estimated duration of a "5+4" game is 5×60+40×4 = 460 seconds for each player, so that "5+4" games belong to the Blitz category.

Players' strength
Lichess estimates the ability of players with the Glicko-2 rating system [28].After each rated match, the Glicko-2 value of both players changes according to the result of the match.Generally, who win get their Glicko-2 value increased.Therefore, the higher the Glicko-2 value, the more skilled are the players.Many gaming platforms use the Glicko-2 system.With respect to the ELO system adopted by the International Chess Federation (FIDEfrom the French translation -), Glicko-2 takes into account confidence intervals, i.e., an uncertainty in the assigned rating.

List of chess openings
We collect the list of 500 chess openings with their ECO code according to the "Encyclopaedia Of Chess Openings" [12,17].A comprehensive list of openings can be extracted from https://en.wikipedia.org/wiki/List_of_chess_openings.We consider each opening from the point of view of both the White player and Black player, so that, in fact, the number of total openings in our analysis is 1,000.

Bipartite network and relatedness network
In order to build the bipartite network we used blitz games played from October 2015 to September 2016 and we applied the following filtering procedure: • we selected only players with rating above 2000 because we expect low rated player to use some openings just by chance; considering also them would add noise to the projected opening network; • we selected the games where the rating of the two players differs at most by 50, since if the difference between the two players is very high, then the high rated player could use a bad opening just to have more fun; • we removed players playing less than 100 games with White and 100 games with Black so to have for each player a large statistic.
We ended up with a total of 472, 183 games involving 2, 513 players and 982 different openings.

Forecast
In order to forecast the use of openings we used two different time periods.We used blitz game data from July 2016 to September 2016 to compute the density, while the goodness of predictions have been assessed using data ranging from October 2016 to December 2016.Note that this last period does not overlap with that considered in the building of the opening network, this being an important point in order to reliably assess the goodness of the predictions.Also in this case, we retained only games with a maximum rating difference of 50 and we selected only players who did 50 matches with the White pieces and 50 games with the Black ones, so to have a large enough statistic for computing the density, ending up with a total of 88, 31 players.

Fitness and Complexity
The fitness of players and the complexity of openings has been obtained using games played in the period October 2015 to September 2016.Differently from what done for building the bipartite network, we considered all ratings, but we made the same filtering with respect to the rating difference and the number of games played.This gives us a total of 3, 746, 135 games played between 18, 253 players who used 988 different openings.We also used the Lichess Elite Database https://database.nikonoel.fr/to get 138 games played by Magnus Carlsen in November 2021.

Validation of projected networks
In order to build the relatedness network of chess openings we project the bipartite matrix M po connecting players to openings.As explained in the main text, this can be done by using Eq. ( 1), that is In this way, we obtain the matrix W * connecting those openings appearing together in the opening repertoires of many different players.However, such a matrix is generally almost fully connected due to spurious co-occurrences.
Consequently, one has to use a null model to retain only statistically significant links and filter out the spurious ones.In this work, we exploit the Bipartite Configuration Model (BiCM) [19,29,30], which is based on the theory of exponential random graph; in particular we used the python bicm library https://bipartite-configuration-model.readthedocs.io/en/latest/.
The BiCM is based on a canonical ensemble of random graphs defined by constraining (on average) the degree sequences of both node sets (so ubiquity and diversification).We then obtain the probability distribution of such an ensemble by maximizing Shannon entropy under this constraint.This probability distribution reads • Z({θ p }, {µ o }) is the partition function of the Hamiltonian M|{θp},{µo}) .
It can be shown that the probability distribution factorizes and the probability that for player p and opening o to be connected in the random network is while the numerical values of the Lagrange multipliers are obtained by solving the system Once we have the probability of the links we can generate N random bipartite networks and, for each of them, compute the projected matrix W * .At this point, we validate the links of W * , setting to one all those links which are in 99% of the cases larger than the corresponding link in the random matrices.We set to zero the rest of the links.In this way, we obtain the validated matrix W defined in Eq. ( 2).Here the threshold 99% is arbitrary and sets the confidence level.

Goodness of prediction test
Using the density measure we introduced above, it is possible to predict if a player will start to use a certain opening.Here we discuss about how to evaluate the goodness of these predictions.We recall that our predictions, denoted by y pred pa , are defined by Eq. ( 3) and are obtained using data in the period July 2016-September 2016, while the ground truth is obtained from data in the period October 2016-December 2016 and is defined as y true pa = M (f ) pa .We recall that by a we denote the activations, so those openings not used by player p during the first period; β is the density threshold separating the openings that we predict will be played from those that we predict will not.The most common metrics to evaluate the goodness of predictions are [31]: • Precision, defined as the ratio between true positives and positives (true positives plus false positives).In our case is the ratio between the number of activations we correctly predict to be played in the second period and the total number of activations we predict to be started to play.High Precision means that openings that are predicted to be played are often played in the second period; • Recall, given by the ratio of true positives and the sum of true positives and false negatives.A high recall implies that openings that are predicted not to be played are rarely played in the second period; • F1 Score, defined as the harmonic mean of Precision and Recall.The F1 score is particularly suitable when the data are unbalanced, meaning that there are many more negatives than positives (or vice versa).A high F1 Score thus implies that both the Precision and the Recall are also high.
All these indicators depend on the threshold β.We then compute the Best F1 Score using the threshold β which maximizes the F1 Score, that is Best F1 Score = max In the case under consideration the best threshold is β = 0.2.Using this threshold we obtain • Precision 0.10 • Recall 0.47 • Best F1 Score 0.16 The Economic Fitness and Complexity algorithm The Economic Fitness and Complexity (EFC) algorithm [23,32] is an iterative non-linear map initially designed to study the Country-Product bipartite network.It allows to compute the fitness, which is an indicator of the manufacturing capabilities of a country, and the complexity, which quantifies how sophisticated and challenging it is to produce a good.This approach outperforms other techniques and allows to obtain state-of-the-art long-term GDP forecasts [14].In our study, we apply this algorithm to the player-opening bipartite network, thus associating to each player p a fitness F p .The higher the fitness, the more challenging openings the player plays.To each opening p we associate a complexity Q p , which measures how difficult is that opening to play.These quantities, as mentioned above, are defined through a non-linear map given by Eqs.(4).The iteration of Eqs.(4) leads to a fixed point which has been proved to be stable and non-dependent on initial conditions [33].However, the EFC algorithm as defined above, in some situations has convergence issues, so we decided to follow the Servedio et al. approach [27] to estimate players' fitness and the complexity of openings.where δ is a parameter that can be taken arbitrarily small and does not influence the fixed point of the map; here we use δ = 10 −3 .We denote by P o and F p the fixed point of the map, in these terms the complexity of openings Q o is recovered as while the fitness of players are simply given by F p .

Figure 1 :
Figure 1: Network of openings.Relatedness network of openings obtained by projecting and validating the player-opening bipartite network.Openings close on this network are similar to play since they often appear together in players' opening repertoires.Using the Leiden community detection algorithm, we identified ten clusters that compose the network.These clusters are represented in the network using different colors, each corresponding to a different opening choice.

M po = 1 if player p played opening o 0 otherwise
We can then define for each player p their diversification d p as the number of distinct openings they use d p = O o M po and for each opening o its ubiquity u o as the number of players who used it u o = P p M po .

Figure 2 :
Figure 2: Prediction of future openings.a) We plot, on the top of the opening network of Fig. 1, the openings a randomly selected player used during the period July 2016-September 2016 (green nodes) and the new openings he/she used in the period October 2016-December 2016 (red nodes).As it is possible to see, new openings are close to those openings the player already used.b) Probability for a never used opening to start to be played as a function of its density, defined as the fraction of neighbour openings the player already uses.This probability is increasing in density, meaning players tend to learn openings close to those they already know.The network topology defines closeness.

Figure 3 :
Figure 3: Fitness of players.Comparison between players' fitness and players' Glicko2 rating.We see a strong correlation between the two quantities, meaning that players' opening preparation is crucial in determining their ability to win games and so to reach a high rating.In the inset, we show the binned average rating as a function of the fitness.The standard deviation of the sample determines error bars.
Instead of Eqs.(4) we use the non-homogeneous map M|{θp},{µo})Z({θ p }, {µ o }), where• M is the adjacency matrix of the random bipartite network• {θ p } and {µ o } are the Lagrange multipliers associate respectively to the diversification of players {d p } and to the ubiquity of openings {u o } • H( M|{θ p }, {µ o }) is the Hamiltonian, defined as