Abstract
The understanding of complex systems has become a central issue because such systems exist in a wide range of scientific disciplines. We here focus on financial markets as an example of a complex system. In particular we analyze financial data from the S&P 500 stocks in the 19year period 1992–2010. We propose a definition of state for a financial market and use it to identify points of drastic change in the correlation structure. These points are mapped to occurrences of financial crises. We find that a wide variety of characteristic correlation structure patterns exist in the observation time window and that these characteristic correlation structure patterns can be classified into several typical “market states”. Using this classification we recognize transitions between different market states. A similarity measure we develop thus affords means of understanding changes in states and of recognizing developments not previously seen.
Introduction
Time series are typical experimental results we have about complex systems. In the analysis of such time series, stationary situations have been extensively studied and correlations have been found to be a very powerful tool. Yet most natural processes are nonstationary. In particular, in times of crisis, accident or trouble, stationarity is lost. As examples we may think of financial markets, biological systems, reactors (both chemical and nuclear) or the weather. In nonstationary situations analysis becomes very difficult and noise is a severe problem. Following a natural urge to search for order in the system, we endeavor to define states through which systems pass and in which they remain for short times. Success in this respect would allow to get a better understanding of the system and might even lead to methods for controlling the system in more efficient ways. We here concentrate on financial markets because of the easy access we have to good data, because of our previous experience and last but not least because of the strong nonstationary effects recently seen.
Results
Using a similarity measure we were able to classify several typical market states between which the market jumps back and forth. Some of these states can easily be identified in this similarity measure. However, there are several states in which the market only stays for a short period. Thus, these states are sparsely embedded in time. With a kmeans clustering analysis, we were able to identify these states and disclose a detailed dynamics of the market's state.
Our findings offer insight for constructing an “early warning system” for financial markets. By providing a simple instrument to identify similarities to previous states during an upcoming crisis, one can judge the current situation properly and be prepared to react if the crisis materializes. Certainly, an indication for a crisis is also given when the correlation structure undergoes rapid changes.
Another possible application of the similarity measure is risk management. Given the similarity measure, the portfolio manager is aware of periods in which the market behaved completely differently and thus can choose not to include them in his calculations. He can furthermore identify regions in which the market behaved similarly and refer to these regions when estimating the correlation matrix.
Our empirical study is a first step towards the identification of states in financial markets which are a prominent example of complex nonstationary systems.
Discussion
The effort to understand the dynamics in financial markets is attracting scientists from many fields^{1,2,3,4,5,6,7,8}. Statistical dependencies between stocks are of particular interest, because they play a major role in the estimation of financial risk^{9}. Since the market itself is subject to continuous change, the statistical dependencies also change in time. This nonstationary behavior makes an analysis very difficult^{10,11}. Changes in supply and demand can even lead to a two phase behavior of the market^{12,13}. Here, we use the correlation matrix to identify and classify the market state. In particular we ask: How similar is the present market state, compared to previous states? To calculate this similarity we measure temporal changes in the statistical dependence between stock returns.
For stationary systems described by a (generally large) number K of time series, the Pearson correlation coefficient is extremely useful. It is defined as
Here the r_{i} and r_{j} represent the time series of which the averages 〈…〉 are taken over a given time horizon T. σ_{i} and σ_{j} are their respective standard deviations. When calculating the correlation coefficients of K stocks, we obtain the K × K correlation matrix C, which gives an insight into the statistical interdependencies of the time series under study.
It is necessary to consider data over large time horizons T so as to obtain reliable statistics. This leads to a fundamental problem that arises in the case of nonstationary systems: To extract useful information from empirical data we seek a correlation matrix from very recent data, in order to provide a good description of current correlation structure. This is because correlations change dynamically due to the nonstationarity of the process, making it very difficult to estimate them precisely^{14,15,16,17}. However, if the length T of the time series is short, the correlation matrices C are noisy. On the other hand, to keep the estimation error low, T can be increased, but this leads to a correlation matrix that generally does not describe the present state very well. Various noise reduction techniques provide methods to conquer noise^{18,19,20,21,22}.
In several nonstationary systems, it is possible to obtain a large number of correlated data over time. Such systems include, but are not restricted to, financial markets (which show nonstationary behavior due to crises), biological or medical time series (such as EEG), chemical and nuclear reactors (nonstationary behavior includes, in particular, accidents) or weather data. In the following, we only consider the financial markets, since we have studied extensively some very high quality data of this system, the nonstationary features of which have been quite striking in the last years. We propose a definition of a state which is appropriate for such systems and suggest a method of analysis which allows for a classification of possible behaviors of the system. When T/K < 1, which is the case we are interested in, the correlation matrix becomes singular. However, one can still make significant statistical statements, e.g., for the average correlation level whose estimation error decreases as 1/K. In the following, we focus on correlation matrices C(t_{1}) and C(t_{2}) at different times t_{1} and t_{2} measured over a short time horizon. These have therefore a pronounced random element. We take these objects as the fundamental states of our system. We now propose, as a central element, to introduce the following concept of distance between two states. We define the similarity measure
to quantify the difference of the correlation structure for two points in time, where … denotes the absolute value and 〈…〉_{ij} denotes the average over all components. Note that in this case, the random component that is unavoidable in the definition of the states of the system is strongly suppressed by the average over numbers.
To apply the above general statements to a specific example, we analyze two datasets: (i) we calculate ζ(t_{1}, t_{2}) based on the daily returns of those S&P 500 stocks that remained part of the S&P during the 19year period 1992–2010 and (ii) we study the fouryear period 2007–2010 in more detail based on intraday data from the NYSE TAQ database. Since the noise increases for very highfrequency data^{23,24,25}, we extract onehour returns for dataset (ii). For onehour returns, we consider this market microstructure noise as reasonably weak.
However, sudden changes in drift and volatility are present on all time scales. They can result in erroneous correlation estimates. To address this problem, we employ a local normalization^{26} of the return time series in dataset (i). The results of dataset (i) are presented in left panel of Fig. 1. In this figure, each point is calculated on correlation matrices over the previous two months. This new representation gives a complete overview about structural changes of this financial market of the past 19 years in a single figure. It allows to compare the similarity of the market states at different times. To make this procedure concrete, consider the following example. Pick a point on the diagonal of this panel and designate it as “now”. From this point the similarity to previous times can be found on the vertical line above this point, or the horizontal line to the left of this point. Light shading denotes similar market states and dark shading denotes dissimilar states. We can furthermore identify times of financial crises with dark shaded areas. This indicates that the correlation structure completely changes during a crisis. There are also similarities between crises, as between the “credit crunch” that induced the 2008–2009 financial crisis and the “market meltdown”, the burst of the “dotcom bubble” in 2002. A further example is the overall rise in correlation level in the beginning of 2007. This event can be mapped to drastic events on the Shanghai stock exchange^{27}.
Using dataset (ii) we are able to obtain a more detailed insight into recent market changes, as shown in the right panel of Fig. 1. This area is represented by the lower right square of the left panel. Using intraday data we calculate the correlation matrices on shorter time scales. We choose a time horizon of one week because it provides insight into changes in the correlation structure on a much finer time scale. This enables us to identify a short subperiod within the 2008–2009 crisis (in the beginning of 2009) during which the market temporarily stabilizes before it returns to the crisis state. While the correlation structure during the crisis displays an overall high correlation level, the correlation structure of the stable period is similar to the period before the crisis, one of the typical states in a calm period, which is identified from daily data in dataset (i). This phenomenon might be related to the market's reaction to news about the progress in rescuing the American International Group (A.I.G.)^{28}. The correlation structure of the stable period and the 2008–2009 crisis is shown in Supplementary Fig. S2 online.
The evolutionary structure presented in Fig. 1 illustrates that the correlation matrix sometimes maintains its structure for a long time (bright regions), sometimes changes abruptly (sharp blue stripes) and sometimes returns to a structure resembling a structure the market has experienced before (white stripes). This suggests that the market might move among several typical market states. To extract such typical market states, we perform a clustering analysis in the results of dataset (i). From our clustering analysis (see section Methods), we find that there are “hidden” states sparsely embedded in time, in addition to regimes that dominate the market during a continuous period and are easily found by eye. The clustering analysis separates the set of all correlation matrices, which are measured on short disjoint subintervals of the observation period, into distinct clusters. The cluster centers then correspond to distinct correlation structures which we identify as market states. We note that this definition of market state depends on the length of the full observation period to some extend. Enlarging the observation period might join previously distinct clusters, reducing it might further divide a cluster.
For the clustering analysis, we use disjunct twomonth time windows ending at the respective dates. Because of the window length, some financial crashes cannot be resolved. Our aim is rather to identify the evolution of the market, which is, in some cases, induced by financial crisis. We can confirm in Fig. 2 that the typical states obtained from the clustering analysis indeed correspond to different characteristic correlation structures. To visualize the characteristic structures of each state, we calculate its average correlation matrix and sort the companies according to their industry branch, as defined by the Global Industry Classification Standard (GICS)^{29}. In the resulting matrices, the industry branches correspond to the blocks on the diagonal. The correlation between two branches are given by the offdiagonal blocks. The results are illustrated in Fig. 2. We can confirm that the typical states obtained from the clustering analysis indeed correspond to different characteristic correlation structures.
Our analysis also offers insight into market structure dynamics. Figure 4 shows the temporal behavior of the market state. The market sometimes remains for a long time in the same state and sometimes stays only for a short time. The typical duration depends upon the state: Some states (e.g., state 1 and state 2) appear in clusters in time while other states appear more sparsely in time (e.g., state 4). There seems to exist a global trend on a long time scale, although the market state is switching back and forth between states.
In Fig. 2, we can see differences between the states in the correlation between branches as well as in the correlation within a branch. The correlation within the energy, information technology and utilities branches is very strong in all states. State 1 shows an overall weak correlation, while states 3 and 4 feature in addition a strong correlation of the finance branch to other branches. State 2 shows very unusual behavior: In the period of the dotcom bubble, many branches are anticorrelated with one another. In states 5, 6 and 7, the overall correlation level rises, although certain branches, such as energy, consumer staples and utilities, are either strongly or weakly correlated with other branches. While some distinct characteristics of states can be easily identified, some states look quite similar by eye. Their unique properties can be illustrated when subtracting the market's overall correlation structure, as illustrated in Fig. 3. For example, state 3 and 4 look very similar in Fig. 2. However, Fig. 3 unveils that the correlation within the Energy sector (E) is completely diverse.
We observe that the energy branch (E) is either strongly correlated to the rest of the market, weakly correlated, or even anticorrelated. To analyze this behaviour we study the histogram of the correlation coefficients C_{ij}(t). We present the results in Fig. 5. In the months leading up to the credit crunch in October 2008, we observe a bimodal structure in the histogram (see also Ref. [30]). It corresponds to the time period when the Energy branch shows a strong anticorrelation with other branches. The bimodality suggests that a subset of stocks – in this case, predominantly the Energy stocks – decouples from the rest of the market. During the crash, the histogram shows a very narrow distribution around large values of the correlation coefficients, which corresponds to state 8 in Fig. 2, where the branch structure is lost almost completely in an overall strongly correlated market.
Methods
Construction of stock returns
Let S be the price of a specific stock and Δt the interval on which the return is calculated. For our study, we chose the arithmetic return, defined as
For dataset (i), we chose Δt to be 1 day and calculate the stock returns of each day. For dataset (ii), we chose Δt as 1 hour. Furthermore, we obtain this 1hour return for every minute of a trading day between 10:45 am and 2:45 pm. We obtained the daily data of dataset (i) from finance.yahoo.com. The intraday data are obtained from the New York Stock Exchange's TAQ database.
Local normalization
Sudden changes in drift and volatility can result in erroneous correlation estimates. To address this problem, we employ a local normalization method^{26}. For each return r(t) we subtract the local mean and divide by the local standard deviation,
The local average 〈…〉_{n} runs over the n most recent sampling points. For daily data, n = 13 yields nearly normal distributed time series, as discussed in Ref. [26].
Outline of topdown clustering
Our clustering analysis is based on a topdown scheme: All m correlation matrices are initially regarded as a single cluster and then divided into subclusters by a procedure based on the kmeans algorithm^{31,32,33}. The process can be described as follows:

1
Choose two initial cluster centers from all matrices. Label all other matrices by the more similar cluster center in terms of ξ^{(L)}.

a
Recast two new subcluster centers to the “center of mass” of the matrices in the subcluster.

b
Relabel all matrices to the new subcluster center.

c
Repeat this process until there is no change in labeling.

a

2
Take the best division out of all possible m(m – 1)/2 initial choices, which gives the least average (ξ^{(L)})^{2} to the cluster centers.
We stop this division process when the average distance from each cluster center to its members becomes smaller than a certain threshold. To identify the typical market states presented in the manuscript, we chose the threshold at 0.1465 as it represents (approximately) the best ratio of the distances between clusters and their intrinsic radius. One can obtain finer structures by choosing smaller threshold values, ultimately until all the matrices are identified as different components. The complete results of the clustering analysis is illustrated in Fig. 6.
A possible alternative method for obtaining the states may be choosing a certain number of cluster centers from the beginning. However, of besides being more computationally intense, we believe that predefining the number of states would be a significant disadvantage compared to our dynamic approach.
Alternative measure: Difference of largest eigenvalue of correlation matrices
A similar result can be archived using a different approach. The largest eigenvalue λ_{max} of the correlation matrix C describes the collective motion of all stocks. We can also define the similarity measure by the distance of these eigenvalues,
The advantage of this technique is that the noise in the correlation matrix only contributes to small eigenvalues (See Refs. [18] and [19]). Thus, by only taking into account the largest one, we filter out the noise. However, this approach also presumes that the corresponding eigenvector does not change. Our results indicate that the largest eigenvalue almost remains constant, but this might not always be the case. Especially in financial crises. An alternative distance measure which takes into account the changes in the eigenvectors is considered in Refs. [34, 35]. The obtained similarity measure is shown in Supplementary Fig. S1 online.
References
Voit, J. The Statistical Mechanics of Financial Markets (Springer, Heidelberg, 2001).
Mantegna, R. N. & Stanley, H. E. Stock market dynamics and turbulence: parallel analysis of fluctuation phenomena. Physica A 239, 255–266 (1997).
Borghesi, C., Marsili, M. & Miccichè, S. Emergence of timehorizon invariant correlation structure in financial returns by subtraction of the market mode. Physical Review E 76, 026104 (2007).
Cooley, T. F. & Quadrini, V. Financial markets and firm dynamics. The American Economic Review 91, 1286–1310 (2001).
Pelletier, D. Regime switching for dynamic correlations. Journal of Econometrics 131, 445–473 (2006).
Xu, X. E., Chen, P. & Wu, C. Time and dynamic volumevolatility relation. Journal of Banking & Finance 30, 1535–1558 (2006).
King, M. & Wadhwani, S. Transmission of volatility between stock markets. The Review of Financial Studies 3, 5–33 (1990).
Lee, S. The stability of the comovements between real estate returns in the uk. Journal of Property Investment & Finance 24, 434–442 (2006).
Bouchaud, J. & Potters, M. Theory of Financial Risks (Cambridge University Press, Cambridge, 2000).
Eckmann, J.P., Kamphorst, S. O. & Ruelle, D. Recurrence plots of dynamical systems. EPL (Europhysics Letters) 4, 973 (1987).
Casdagli, M. C. Recurrence plots revisited. Physica D: Nonlinear Phenomena 108, 12–44 (1997).
Plerou, V., Gopikrishnan, P. & Stanley, E. Econophysics: Twophase behaviour of financial markets. Nature 421 (2003).
Plerou, V., Gopikrishnan, P. & Stanley, H. E. Two phase behaviour and the distribution of volume. Quantitative Finance 5, 519–521 (2005).
Rosenow, B., Gopikrishnan, P., Plerou, V. & Stanley, H. E. Dynamics of crosscorrelations in the stock market. Physica A 324, 241–246 (2003).
Tastan, H. Estimating timevarying conditional correlations between stock and foreign exchange markets. Physica A 360, 445–458 (2006).
Drozdz, S., Kwapien, J., Grmmer, F., Ruf, F. & Speth, J. Quantifying the dynamics of financial correlations. Physica A 299, 144–153 (2001).
Schäfer, R., Nilsson, N. & Guhr, T. Power mapping with dynamical adjustment for improved portfolio optimization. Quantitative Finance (2009).
Laloux, L., Cizeau, P., Bouchaud, J.P. & Potters, M. Noise dressing of financial correlation matrices. Physical Review Letters 83, 1467–1470 (1999).
Plerou, V. et al. Random matrix approach to cross correlations in financial data. Physical Review E 65, 066126 (2002).
Guhr, T. & Kälber, B. A new method to estimate the noise in financial correlation matrices. Journal of Physics A: Mathematical and General 36, 3009–3032 (2003).
Schäfer, J. & Strimmer, K. A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Statistical applications in genetics and molecular biology 4 (2005).
Ledoit, O. & Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance 10, 603–621 (2003).
Epps, T. W. Comovements in stock prices in the very short run. Journal of the American Statistical Association 74, 291–298 (1979).
Münnix, M. C., Schäfer, R. & Guhr, T. Compensating Asynchrony Effects in the Calculation of Financial Correlations. Physica A 389, 767–779 (2010).
Münnix, M. C., Schäfer, R. & Guhr, T. Impact of the Ticksize on Financial Returns and Correlations. Physica A 389, 4828–4843 (2010).
Schäfer, R. & Guhr, T. Local normalization: Uncovering correlations in nonstationary financial time series. Physica A 389, 3856–3865 (2010).
What The Market Is Telling Us. Bloomberg Businessweek (2007). Cover story.
A.I.G. Sells $39.3 Billion in Assets to N.Y. Fed's Fund'. The New York Times (2008).
http://www.standardandpoors.com/indices/gics/en/us. accessed on 16 Aug 2012.
Wyart, M. & Bouchaud, J.P. Selfreferential behaviour, overreaction and conventions in financial markets. Journal of Economic Behavior & Organization 63, 1–24 (2007).
MacQueen, J. B. Some methods for classification and analysis of multivariate observations. In Cam, L. M. L. & Neyman, J. (eds.) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 281–297 (University of California Press, 1967).
Faber, V. Clustering and the continuous kmeans algorithm. Los Alamos Science 22, 138–144 (1994).
Tanaka, N. K., Awasaki, T., Shimada, T. & Ito, K. Integration of chemosensory pathways in the drosophila secondorder olfactory centers. Current biology 14, 449–457 (2004).
Reigneron, P.A., Allez, R. & Bouchaud, J.P. Principal regression analysis and the index leverage effect. Physica A 390, 3026–3035 (2011).
Allez, R. & Bouchaud, J.P. Eigenvector dynamics: theory and some applications Preprint: arXiv:1108.4258. (2011).
Acknowledgements
MCM acknowledges financial support from the Fulbright program and from Studienstiftung des Deutschen Volkes. TS acknowledges support from the JSPS Institutional Program for Young Researcher Overseas Visits, GrantinAid for Young Scientists (B) no. 21740284 MEXT, Japan and the Aihara Project, the FIRST program from JSPS, initiated by CSTP. THS acknowledges support from project 79613 of CONACYT, Mexico. HES thanks the NSF for support.
Author information
Authors and Affiliations
Contributions
MCM, TS and RS performed the calculations and analyzed the results. MCM and RS prepared the figures. All authors designed the research and wrote and reviewed the paper.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Supplementary Information
Supplementary Information
Rights and permissions
This work is licensed under a Creative Commons AttributionNonCommercialNo Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/3.0/
About this article
Cite this article
Münnix, M., Shimada, T., Schäfer, R. et al. Identifying States of a Financial Market. Sci Rep 2, 644 (2012). https://doi.org/10.1038/srep00644
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep00644
This article is cited by

Multiscale Multifractal Detrended Fluctuation Analysis and Trend Identification of Liquidity in the China's Stock Markets
Computational Economics (2023)

Activity of vehicles in the bus rapid transit system Metrobús in Mexico City
Scientific Reports (2022)

Collective correlations, dynamics, and behavioural inconsistencies of the cryptocurrency market over time
Nonlinear Dynamics (2022)

An analysis of systemic risk in worldwide economic sentiment indices
Empirica (2020)

Traceability and dynamical resistance of precursor of extreme events
Scientific Reports (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.