Collective dynamics of stock market efficiency

Summarized by the efficient market hypothesis, the idea that stock prices fully reflect all available information is always confronted with the behavior of real-world markets. While there is plenty of evidence indicating and quantifying the efficiency of stock markets, most studies assume this efficiency to be constant over time so that its dynamical and collective aspects remain poorly understood. Here we define the time-varying efficiency of stock markets by calculating the permutation entropy within sliding time-windows of log-returns of stock market indices. We show that major world stock markets can be hierarchically classified into several groups that display similar long-term efficiency profiles. However, we also show that efficiency ranks and clusters of markets with similar trends are only stable for a few months at a time. We thus propose a network representation of stock markets that aggregates their short-term efficiency patterns into a global and coherent picture. We find this financial network to be strongly entangled while also having a modular structure that consists of two distinct groups of stock markets. Our results suggest that stock market efficiency is a collective phenomenon that can drive its operation at a high level of informational efficiency, but also places the entire system under risk of failure.


Results
Our results are based on the daily closing indices of 43 major world stock markets from January 2000 to October 2020 ("Methods" section for details and Table S1 for the list of market indices). These historical time series represent the stock market indices after adjustments for all applicable splits and dividend distributions (daily adjusted closing prices). From these time series, we estimate the log-returns R(t) ("Methods" section for details) of each market index where t stands for the closing date. Figure 1A illustrates the time evolution of R(t) for the S&P 500 index (an influential indicator of the USA equity market), while Fig. S1 shows the behavior of R(t) for all stock markets in our study. Next, we sample the log-return series with a 500-day sliding window (shaded gray in Fig. 1A), roughly corresponding to 2 years of economic activity. The sliding window moves with daily steps, and www.nature.com/scientificreports/ for every step, we estimate the normalized permutation entropy H(t) 32 . This procedure creates a time series of the permutation entropy H(t) for each stock market, as shown in Fig. 1B for the S&P 500 (see Fig. S2 for all markets). As detailed in the "Methods" section, the permutation entropy is estimated from ordering patterns among consecutive values of R(t) and quantifies the degree of randomness in the occurrence of these patterns. We expect an entirely regular series to have H ≈ 0 , while a completely random series displays H ≈ 1 . Thus, the higher the value of H(t), the more random the log-return series is around time t, and so the more informationally efficient is the stock market at that particular time. Conversely, a decrease in H(t) indicates the emergence of a more regular (and possibly more predictable) behavior of the log-return series, and thus a less efficient period of the stock market. We have also estimated the average behavior of the efficiency H(t) over all stock markets. Figure 1C shows that the aggregate behavior is smoother than the behavior observed for individual stock markets and appears to reflect major financial events such as the "global financial crisis" (2007)(2008), as around this period H(t) displays lower values of entropy.
Investors' behavior tends to synchronize during stock market crashes 5 , and investment strategies can propagate shocks through financial networks and lead to the emergence of strong correlations among financial markets 33 . Similarly, we expect these collective behaviors to strongly affect the efficiency dynamics of stock markets and produce joint movements in H(t) that may organize markets into hierarchical structures with similar efficiency trends. To investigate this possibility, we first estimate the correlation distance of the efficiency time series among all pairs of markets ("Methods" section for details), creating the correlation distance matrix shown in Fig. 2A. Next, we use Ward's minimum variance method ("Methods" section for details) to build up a dendrogram representation of the distance matrix, which is also depicted in Fig. 2A. Our results indicate that stock markets form a hierarchical structure regarding their long-term efficiency evolution. However, we do not observe large clustering structures in the distance matrix. Indeed, by determining the number of clusters by maximizing the silhouette score (as shown in Fig. 2B, see "Methods" section for details), we end up with 16 clusters in which 15 consist of only a few markets (the largest cluster consists of 5 markets) and 1 with only a single market. These groups of markets exhibit similar long-term temporal profiles of H(t) (see Fig. S3), but this global analysis does not capture short-term movements of H(t) among stock markets.
To start investigating short-term collective behavior in the efficiency of stock markets, we sample the time series of H(t) with a 1-year sliding window and create a ranking of markets based on the average efficiency within each time-window. Next, we investigate the stability of these efficiency rankings by estimating the Kendall rank correlation coefficient (Kendall-τ ) among all possible pairs of time-windows. This analysis results in the correlation matrix shown in Fig. 3A, where rows and columns represent the last date of each time-window. By www.nature.com/scientificreports/ definition, the diagonal elements of this matrix are unitary (that is, the efficiency rank in a given time window is perfectly correlated to itself). Values along a given row or column indicate how similar is the efficiency rank of that particular date with past and future rankings. Thus, we would expect large diagonal blocks with high Kendal-τ values if these efficiency ranks were stable over long periods. However, we observe small diagonal blocks with about 1-month width, indicating that these efficiency ranks are stable only under very short periods. We have also calculated the correlation distance of the efficiency H(t) among all pairs of markets for each time-window and applied the same clustering approach used for the entire time series of H(t) (that is, Ward's minimum variance dendrogram segmented at the maximum silhouette score). This approach produces groups of markets with similar efficiency evolution in each time-window. We compare the temporal stability of these groups by estimating the adjusted rand index among all clustering results in each time-window. This coefficient measures the agreement between two clusterings by counting pairs of elements assigned in the same group while controlling for the overlap expected by chance. Values of adjusted rand index around zero indicate that the two clusterings are no more similar than random partitions, while 1 indicates perfect agreement between them. Figure 3B shows the matrix plot of the adjusted rand index for all pairs of clustering results. The small block-diagonal structures of the adjusted rand index matrix indicate that groups of markets with similar H(t) profiles remain stable only for approximately 4 months.
The results of Fig. 3 demonstrate that short-term collective patterns in the efficiency evolution of stock markets change with time and are different from those obtained at long time scales. The dynamical patterns we have found indicate that simple partitions are not enough to capture the complex interactions among financial markets and motivate the use of a different approach that considers these entangled interactions. To do so, we propose to build a complex network where nodes represent the stock markets, and links indicate two markets that clustered together at least once over time. We also assume that the connection between two stock markets is weighted by the number of times those particular markets are grouped in the same cluster. This representation allows us to aggregate the short-term information into a global and coherent picture where stock markets whose efficiency dynamics are correlated during some period are connected; furthermore, the strengths of these connections indicate the intensity of the interactions among the markets. Figure 4A shows this network representation for the 43 stock markets in our study. We observe that this complex network forms a complete graph as it presents all possible connections among all stock markets. Thus, world stock markets are strongly globalized regarding their efficiency such that simultaneous periods of high or low efficiency may involve a large number of markets. This result suggests the existence of systemic risk for the "spreading" of low-efficient states but, at the same time, it indicates that high-efficient states can also globally emerge. Although the density of this financial network is maximum, the interactions among the stock markets are not uniformly distributed. The Gini coefficient of the edge weights is 0.18 (on a scale where zero means perfect equality and one maximal inequality) and strengthens the idea that some markets may have a higher impact on the efficiency dynamics of the entire system. Figure 4B shows a centrality ranking based on PageRank 34 where the Amsterdam AEX Index (Netherlands) and KOSPI Composite Index (South Korea) emerge as the most influential markets, while the two Russian index (MOEX Russia Index, Russian Trading System (RTS) Index) are the less influential for the efficiency dynamics of the world stock markets.
The inequality in the edge weights distribution also suggests that the financial network of Fig. 4A may have a modular structure in which groups of markets are more similar among themselves than with other groups. To probe for this possible modular structure, we use the stochastic block model approach [35][36][37] . As detailed in "Methods" section, we have fitted different stochastic block models to our network data, finding that the nested  www.nature.com/scientificreports/ remaining 19 indices: 12 from Europe, and other markets from Argentina, Canada, Indonesia, Russia, Saudi Arabia, South Africa, and Thailand. Despite the existence of many exceptions, the geographical distance appears to play a role in these partitions. However, more important than understanding the particularities of each module is the emergence of this modular structure. Although the associations among stock markets are quite entangled, this modular structure suggests that some groups of markets are more similar to each other such that low-or high-efficiency states are more likely to encompass these modules.

Discussion
We have presented an investigation of dynamical efficiency patterns of 43 major world stock markets during the past 20 years. To do so, we have relied on a physics-inspired approach in which the efficiency of a stock market at a particular time is defined by estimating the permutation entropy within sliding time-windows of log-returns of stock market indices. Our results indicate that stock markets can be hierarchically organized into groups of markets having similar long-term efficiency trends. However, we have also found that these long-term clusters are not enough to fully understand the collective dynamics of market efficiency. Indeed, our research has revealed that short-term collective patterns in the evolution of efficiency vary with time, and are different from those at longer time scales. We have observed that efficiency ranks of stock markets are stable only during relatively short periods of time, no longer than a month or two. Similarly, the clustering of markets with similar efficiency profiles is stable only for approximately 4 months. Because of these facts, we have proposed a complex network representation where nodes are stock markets, and connections among them indicate markets that are clustered together at least once during the 20 years of our data. We have further considered that links between a pair of markets are weighted by the number of times they appear in the same cluster regarding their short-term efficiency dynamics. Our results show that this financial network is fully-connected, indicating that the efficiency of stock markets is strongly entangled and globalized. Previous works have already demonstrated that systemic failures in stock markets emerge from a synchronization process 9 that takes place in social networks 38,39 and may cause bubbles that eventually burst 12 . Studies also suggest the existence of strong correlations between fund investment strategies as a cause of systemic risks and shock propagation 33 . In this context, the intricate financial network uncovered by our work suggests a systemic risk for the "spreading" of low-efficiency states but, at the same time, indicates that high-efficiency states can also emerge at a global level. Although our financial network forms a complete graph, the link weights among stock markets are not uniformly distributed, allowing us to identify the most influential markets. Furthermore, we have found this financial network to have a modular structure that comprises two market groups whose efficiency dynamics is more similar within the groups as it is with markets outside of the groups. Therefore, despite stock markets being quite entangled in terms of their efficiency profiles, the modular structure indicates that low-efficient and high-efficient states are more likely to emerge within these groups.
Because efficiency in stock markets can be an opportunity for profit as well as an early signal of an impending financial crisis, it would be interesting to incorporate our approach into models that try to predict stock market prices to measure financial transaction risks. For instance, researches used Google Trends to inform investors when to buy or sell stocks and found that their strategy was 326% better than the traditional buy-and-hold strategy 10 . Despite the impressive improvement of this strategy, it does not inform the buying or selling volumes, or the risks that are involved in the proposed transactions. Thus, quantifying the time dependence of the stock efficiency may help economic agents to quantify their transaction risks.
Naturally, our study has its limitations. The correlations among stock markets expressed in our network representation do not carry information about the direction of the influence. It would therefore be interesting to consider other measures that are capable of determining the causality of these associations. While we have used the longest available period in our data set (20 years), further investigations may include longer historical data and probe the effects of financial market age and other features that the financial network carries. Similarly, it would be enlightening to create other financial networks with high-frequency intra-day data, or to use a multiscale approach. It is also worth noticing that stock market indices comprise the aggregated prices of several stocks, and it would be fascinating to investigate a similar financial network composed of a large number of individual stocks. We believe these limitations open several possibilities for future research, and we hope they will inspire other studies with a goal to better understand financial data with physics-inspired approaches.

Methods
Data. The data set used in our study was obtained from the Yahoo! finance historical data API (via the Python module yfinance 40 ), the Wall Street Journal market data 41 , and investing.com (via the Python module investpy 42 ). We have first collected the ticker symbol of all 43 major stock market indices (Table S1), and next retrieved the adjusted daily closing prices of each one from Yahoo! finance in the period from January 1, 2000 to October 31, 2020 (each time series has 5204 data points). The tickers missing or with incomplete data in the Yahoo! finance database were then retrieved from the Wall Street Journal markets. For tickers that were not available or remained incomplete with the previous data source, we retrieved data from investing.com. Thus, all data necessary to reproduce our findings are freely available.
Permutation entropy. The permutation entropy 32 is a complexity measure originally proposed for characterizing time series. This measure is calculated from a probability distribution related to local ordering patterns among consecutive time series elements. To define the permutation entropy, we consider a generic time series Mainly because of its simplicity, discrimination capabilities, and fast computational evaluation, the permutation entropy framework has successfully been used in many applications 44-52 . Efficiency degree of stock markets. To define the informational efficiency H(t) used in our study, we first estimate the logarithmic daily price returns (or log-return) series defined by 53 where log P(t) and log P(t − 1) are the natural logarithm of the closing indices at time t and t − 1 . Next, we sample the log-return series with a 500-day sliding window that moves ahead one trading day at a time. For each of these time-windows, we calculate the normalized permutation entropy H(t); here, t stands for the last date in the time-windows. This procedure defines a new time series representing the permutation entropy (our measure of informational efficiency) in each window H(t) (Fig. 1B). The permutation entropy estimates the degree of randomness in a time series, therefore, entropy values close to one indicate markets at a high informational efficiency state. Conversely, values smaller than one indicate less efficient states of stock markets.
Hierarchical clustering procedure. To compute the similarities in the time evolution of the efficiency degree among stock markets, we use the correlation distance matrix where ρ(H i , H j ) is the Pearson correlation coefficient between the entropy time series H i of i-th stock market and H j the same quantity for j-th stock market. For the dynamical clustering analysis, H i and H j represent the time series segments of stock markets i and j obtained by sampling the efficiency time series with a 1-year sliding window. From this correlation distance matrix ( Fig. 2A), we use the Ward linkage criteria 54,55 to hierarchically cluster the stock markets (dendrogram in Fig. 2A). This clustering procedure recursively merges pair of clusters that minimally increase within-cluster variance.
The threshold distance used to cut the dendrogram and determine the number of clusters is obtained by maximizing the silhouette score 56 . This coefficient quantifies the consistency of the clustering procedure and is defined by the average value of where a i is the cohesion (the average intra-cluster distance) and b i is the separation (the average nearest-cluster distance) for the i-th index. The higher the average value of the silhouette for all indices, the better the cluster configuration. We use the Python module scikit-learn 57 to compute the silhouette scores and the SciPy 58 package to compute the correlation distance matrix.
Stochastic block models. We estimate the modular structure of Fig. 4B by using the stochastic block modeling approach [35][36][37] . This method has the advantage of directly estimating the marginal probabilities that the network is partitioned by a certain number of groups and the probability that a node belongs to a particular group during the inference process. We have tested different stochastic block models (SBM) to fit our network data: usual SBM, degree-corrected SBM (DCSBM), nested SBM, and nested DCSBM. We have considered the best model as the one with the smallest minimal description length 37 , based on the statistical evidence that the model is not mistaking stochastic fluctuations for actual modular structure. This means that groups in the (1) (� s ) � → (x s−(d−1) , x s−(d−2) , . . . , x s−1 , x s ), (2) p(π i ) = the number of s that has type π i (n − d + 1) , p(π i ) ln p(π i ), (4) R(t) = log P(t) − log P(t − 1), www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.