Relationship between Macroeconomic Indicators and Economic Cycles in U.S.

We analyze monthly time series of 57 US macroeconomic indicators (18 leading, 30 coincident, and 9 lagging) and 5 other trade/money indexes. Using novel methods, we confirm statistically significant co-movements among these time series and identify noteworthy economic events. The methods we use are Complex Hilbert Principal Component Analysis (CHPCA) and Rotational Random Shuffling (RRS). We obtain significant complex correlations among the US economic indicators with leads/lags. We then use the Hodge decomposition to obtain the hierarchical order of each time series. The Hodge potential allows us to better understand the lead/lag relationships. Using both CHPCA and Hodge decomposition approaches, we obtain a new lead/lag order of the macroeconomic indicators and perform clustering analysis for positively serially correlated positive and negative changes of the analyzed indicators. We identify collective negative co-movements around the Dot.com bubble in 2001 as well as the Global Financial Crisis (GFC) in October 2008. We also identify important events such as the Hurricane Katrina in August 2005 and the Oil Price Crisis in July 2008. Additionally, we demonstrate that some coincident and lagging indicators actually show leading indicator characteristics. This suggests that there is a room for existing indicators to be improved.

"During six weeks in late 1937, Wesley Mitchell, Arthur Burns, and their colleagues at the National Bureau of Economic Research developed a list of leading, coincident, and lagging indicators of economic activity in the United States as part of the NBER research program on business cycles. Since their development, these indicators, in particular the leading and coincident indexes constructed from these indicators, have played an important role in summarizing and forecasting the state of macro-economic activity" 1 .
Business cycles are important. Economists and policymakers closely follow macroeconomic indicators, especially leading ones meant to precede business cycles, to discern whether we could expect expansions or contractions in the near future.
Given that there are many different causes of cyclical business expansions and contractions, there are also different symptoms or early warning indicators for economic upturns or downturns. Some macroeconomic indicators may perform better in specific periods, while others may be more suitable for forecasting business cycles in other sets of conditions. Some of the criteria for indicators performance include: (1) economic significance; (2) statistical adequacy; and (3) consistency 2 . There have been numerous revisions of historical lists of leading economic indicators since the 1930s, when the first list was created by Mitchell and Burns 3 at the National Bureau of Economic Research (NBER). Based on a study of approximately 500 macroeconomic indicators, they identified 21 indicators as most trustworthy. A follow-up study conducted by Moore 4 investigated 800 indicators, and in 1961, the first composite indicator of leading indicators was created 5 . In 1967, Moore and Shiskin introduced a specific scoring system for an evaluation of one hundred time series 6 . In the early 1970s, in the midst of two recessions (1970 and 1974), NBER and the Bureau of Economic Analysis (BEA) jointly worked on revisions of the nominal indicators to account for high inflation 7 .
In 1989, an interesting approach to construct macroeconomic indicators was built by Stock and Watson 1 . Their assumption was that the co-movements of the economic time series are related to an unobserved variable called "state of the economy. " The Stock and Watson leading indicator is very different from the older NBER and BEA indexes. It is based on a VAR model with seven selected leading variables, mostly focusing on interest rates and interest rate spreads, building permits, durable goods orders, and part-time work in non-agricultural industries 1 .
The Stock and Watson indicator was retired after 14 years of existence (1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003), monthly measurements, and reports. Based on newer methods for assessing the current state of the economy and considerable advancements in measurements of business cycle trends, research has produced the Chicago Fed National Activity Index (CFNAI), which is also the most direct replacement of the Stock-Watson indexes. CFNAI is a monthly index, constructed from 85 monthly indicators based on an extension of the methodology used to construct the original Stock-Watson indexes. Economic activity usually has a trend tendency in growth/decline over time, and a positive value in the CFNAI signifies growth above the trend, while a negative value of the index corresponds to a growth below the trend.
In 2011, researchers at the Conference Board proposed a structural change in the Leading Economic Index (LEI), replacing three components: (1) incorporating a Leading Credit Index (LEI) instead of real money supply M2; (2) replacing ISM Supplier Delivery index with ISM New Order Index; and (3) changing the Reuters/ University of Michigan Consumer Expectation Index by a weighted average of consumer expectations based on surveys administered by the Conference Board and Reuters/University of Michigan 8 . These Conference Board changes, along with research reports indicating varying roles of macroeconomic indicators in their relationships with the business cycle, opened many interesting questions and motivated analysis of the current leading, coincident, and lagging indexes to improve forecasting of economic activity.
Sophisticated forecasting techniques have been used to infer the direction of the economy based on analysis of diffusion indexes 9 . Though sophisticated, they were based largely on trial and error. The principal component analysis (PCA) 10 provides a systematic method for defining indicators, but it suffers from two shortcomings: (1) Correlation with time leads and lags. In most cases, change in one variable may affect other variables with time delay, which results in correlation with time lead/lag. PCA, in its simplest form, studies equal-time correlation. Therefore, one needs to time-shift variables to each other and seek to maximize the correlation coefficients as functions of the amount of the time-shifts. This is complicated and requires a great deal of computing resources when one has a large number of time series. (2) How to identify statistically meaningful ones out of all the eigenmodes of the correlation matrix. There are few established significance tests for PCA. Random Matrix Theory (RMT) provides a systematic method for testing significance of eigenmodes, but it critically depends on the requirement that the number of times series is "sufficiently" large and all the autocorrelations are trivial. This paper presents a novel method for overcoming the shortcomings of standard PCA and introduces a novel analytical framework for studying macroeconomic indicators and assessing their leading role in forecasting business cycle turning points.
The rest of this paper is organized as follows. First, in the data section, we describe the data used in our empirical analysis. Then, in the methods section, we offer a detailed explanation of our methodologies. In the results section, we describe our results. The final section offers concluding remarks.

Data
We analyze 57 macroeconomic indicators (18 Leading, 30 Coincident, 9 Lagging) and 5 other time series, listed in Table 1. The composite indexes of leading, coincident, and lagging indicators produced by the Conference Board, Inc 11 . are summary statistics for the U.S. economy. The other variables are Import and Export Price Indexes, both for all commodities, Japan/U.S. Foreign Exchange Rate, M2 Money Stock, and St. Louis Adjusted Monetary Base. Data are taken from Federal Reserve (FRED) Economic Data 12 . All the time series are monthly from January 1998 to December 2017 for 20 years, 240 months in total. Description of these data is given in Supplementary  Information, Section 1.
In order to fix the direction of positive/negative growth rates in these time series to coincide with boom/bust of business cycles, the sign of several indexes such as unemployment rate is inverted as "inversely cycled variables" (indicated by asterisks); in general, unemployment rate decreases as business conditions improve. Also we remark that seasonally adjusted indexes given by the FRED Economic Data are used if available. Non-seasonally adjusted indexes are 8, 16-18, 51-55, 58-59. We verified, however, that no significant seasonality is present in all these time series (see Supplementary Figs. S1-S3).

Methods
In order to overcome the shortcomings illustrated in the introduction, we employ novel analytical tools for identifying statistically significant correlation in the time series data, and isolate co-movements in this paper. Although many of these methods are described in the book by some of the present authors 13 , to make the present paper self-contained, we give the following concise review.
Complex Hilbert PCA (CHPCA). CHPCA has been successfully used in various fields, such as meteorology/climatology, signal processing, finance, and economics [13][14][15][16][17][18][19][20][21][22][23][24] . This method introduces an imaginary part to the original time series α w t ( ), which was obtained by Hilbert transformation. We refer to the complexified signal corresponding to α w t ( ) as ∼ Rotational random shuffling method (RRS). In order to identify which eigenmodes of ∼ C are the significant mode (signal) and which are the noise, we employ RRS 23,25 . In this method, we cut off the intercorrelation between the time series by randomly shuffling each of them independently and carrying out the CHPCA analysis. www.nature.com/scientificreports www.nature.com/scientificreports/ By doing this many times, and comparing the actual eigenvalues and the distribution of the simulated eigenvalues, we can identify which modes are significant. We then construct the significant correlation matrix ∼ C (sig) made only from significant modes and use it for the following analysis.
Hodge decomposition. This is a tool used to untangle time lead/delay: For example, if a time series A is , how should we summarize the movements of A, B, and C? In the current analysis, we have 62 time series, which make this problem difficult. The Hodge decomposition splits all the flows, namely, the phases of the significant correlation coefficients αβ  C (sig) , to gradient flow on one hand, and the circular flow on the other. The former is proportional to a difference in Hodge potentials of the two nodes at the ends of the link, and the latter is divergence-free flow. The resulting Hodge potential serves as a measure of hierarchical order of individual nodes.
Clustering analysis (CA). This method 25,26 was inspired by percolation analysis in condensed matter physics. We define the order of items from leading to lagging by their Hodge potential. Then the variable α w t ( ) is linked with its nearest neighbor ′ β w t ( ) if they are similar. This procedure creates two types of clusters; one made by combining positive changes and the other by aggregating negative changes. Synchronization network. One can visualize the lead/lag relationship between nodes, using the link created by the significant correlation matrix ∼ C (sig) , utilizing the Hodge potential. Details of the above methods are given in Supplementary Information, Section 3.

Results
Eigenvalue distribution. Among all the eigenmodes of the complex correlation matrix ∼ C, we have found that those with the six largest eigenvalues of the complex correlation matrix ∼ C are statistically significant from the RRS analysis. The largest eigenvalue λ = . , and the accumulation of the six significant eigenvalues, 55.4% of the sum rule. We thus see that about half of the total strength of fluctuations in the macro indicators can be explained just by random noise. Details are given in the Supplementary Information, Section 4.1.

Significant eigenvectors.
The components of the first eigenvector are shown in Fig. 1 on the phase-absolute value plane. The larger the absolute value, the more important its role in the comovement. The statistical significance level is determined by adding a random time series as the 63rd component and by measuring its magnitude. We have carried out this simulation 10 4 times and determined the 1, 5 and 10% significance levels shown in this plot, making sure that the correlation structure of the original data (less the random time series) is kept unchanged.
In Fig. 1, time runs from left to right; that is, components are ahead of those located on their right-hand side. On the whole, these results confirm the assignment of leading, coincident, and lagging to each of the macro indicators by the Conference Board. We, however, find that some indicators are categorized incorrectly. To be specific, the two lagging indicators, Bank Prime Loan Rate (#51) and Consumer Price Index for All Urban Consumers: All Items (#56), appear to be regarded as coincident indicators. Furthermore, all of the other macro indicators, excluding Japan / U.S. Foreign Exchange Rate (#60), move coherently as front runners in the coincident indicators or as rear runners in the leading indicators. We elaborate on this issue in later subsections.
Further details of the significant eigenvectors are discussed in Supplementary Information www.nature.com/scientificreports www.nature.com/scientificreports/ The parallel analysis based on the RRS tells us that the eight largest eigenvalues of the PCA are significant; there are two extras, in contrast to the CHPCA.
A similarity measure has been defined to make an explicit connection between the eigenvectors of CHPCA and those of PCA. Using the similarity measure we found that each aspect (real or imaginary) of the eigenvectors of CHPCA has a corresponding eigenvector of PCA. For instance, the real part of the first eigenvector of CHPCA is virtually indistinguishable from the corresponding eigenvector of PCA. Much more importantly, the imaginary part of the first eigenvector of CHPCA resembles the second and third eigenvectors of PCA. We thus see that the two orthogonal aspects of the first eigenvector are well described by the three eigenvectors of PCA. Certainly, partial information on the dynamic correlations between macroeconomic indicators is carried by the three eigenvectors of PCA. However, one could hardly reach the whole picture on the comovement of indicators as simply manifested in the first eigenvector of CHPCA with the results of PCA alone; reconstruction of the first eigenvector of CHPCA out of the three eigenvectors of PCA is very difficult.
These results allow us to conclude that the CHPCA is much better than the PCA for investigation of dynamic correlations involved in complex systems, including economic cycles. We refer the readers to the Supplementary Information, Section 4.3 for details of the comparison between CHPCA and PCA.
Hodge synchronization network. In order to address the issue noted above, namely the leading/coincidental/lagging property of the indicators, we need to take into account all six eigenmodes. This can be readily done by examining the significant correlation matrix made of only the significant eigenvectors (see (S12)).
The absolute value | | α  C 63 (sig) is the measure of the strength of the correlation between the indicator α and β.
However, due to the fact that the time-range is finite, it is not equal to zero even if there is no correlation. In order to remove those fictitious correlations, we have introduced a random time series as the 63rd data and calculated , we determine the indicator α and β to be anti-correlated with time lead determined by its phase minus π.
In this manner, we obtain a network, where links are made of the remaining αβ  C (sig) and flows are the phases determined above. We name this Hodge Synchronization Network (HSN). Community decomposition of HSN leads to three communities, as seen in the left panel of Fig. 2. Here we detected the communities by maximizing the modularity with the greedy algorithm, and obtained the optimized layout for the network in a spring-electrical model.
We carry out the Hodge decomposition of this network. This is because the Hodge Potential is made of flows: The smaller the flow, the smaller the difference of the Hodge potentials of the pair of nodes, and vice versa. This property guarantees that the Hodge potential of each indicator reflects its lead time. The right panel of Fig. 2  www.nature.com/scientificreports www.nature.com/scientificreports/ shows the average and standard deviation of the Hodge potentials of each community. We find here that the red community is leading, the blue coincidental, the green lagging.
The Hodge potential values for individual indicators in each of the original classifications are given in Fig. 3, where the indicators are also classified according to which community they belong to. Here we observe that the indicators' categorization by community is almost identical to the one by the original assignment. This result opens a door for realizing the algorithmic categorization of a given data set of macroeconomic time series.
Using the value of the Hodge potentials as the vertical coordinate and determining the horizontal coordinate by the Charge-Spring algorithm, we obtain the synchronization network shown in Fig. 4. Due to the construction, the indicators are listed from top to bottom in the leading order determined by their Hodge potentials. Each of the complexified time series, to which the standardization procedure has been applied, is expanded in terms of the eigenvectors   Figure 5(b,c) display the relative intensities, Î t ( ) 1 and Î t ( ) 2 , of the mode signals of the two dominant eigenmodes as a function of time. The three major peaks except for the second largest peak in the total volatility I t ( ) as shown in Fig. 5(a) are explained by the first eigenmode. In contrast, the second largest peak in the middle of 2005 is almost entirely ascribed to the second eigenmode. Also the leading subpeak, constituting the main peak of I t ( ), coincides with the largest peak of Î t ( ) 2 located in the summer of 2008. We recall that Hurricane Katrina hit the Gulf area in August 2005 and the oil bubble burst in July 2008. Needless to say, these events had profound influences on the oil and oil-related industries in the U.S. Certainly, the second eigenmode is well demonstrated by the industrial production indexes for crude petroleum and natural gas extraction (#34), crude oil (#33), mining (#30), and mining of natural gas (#31), as shown by Fig. S5 in Supplementary Information. We thus see that the total intensity of the macroeconomic fluctuations is clearly resolved by the two dominant eigenmodes, with well-defined economic interpretations. economic cycles. The mode signals  a t ( )'s enable us to see to what extent economic cycles are described by the significant eigenmodes. We first construct representative leading, coincident, and lagging indexes by averaging the standardized log-difference or simple difference of the original data over each of the three categories. The results are given in the panel (a) of Fig. 6, where the representative indexes are successively accumulated in the time direction. The panels (b), (c), and (d) of Fig. 6 compare the results based on the original data with the corresponding results obtained by selecting the first eigenmode alone, the first and second eigenmodes, and all of the six significant eigenmodes, respectively.

Mode signals.
The essential features of the economic cycles, including the Great Recession after the Lehman crisis, are well explained by the first eigenmode alone. However, we see that the strength of the cyclic behavior of the lagging indicator is underestimated in the first eigenmode. The quantitative agreement with the original data is progressively improved by taking account of the contribution of the significant eigenmodes term by term, from the first mode up to the sixth mode. The results obtained with full account of the significant eigenmodes are almost indistinguishable from those based on the original data. www.nature.com/scientificreports www.nature.com/scientificreports/ cluster analysis. Propagation of macroeconomic shocks across the individual indexes from leading to lagging are displayed in Fig. 7(a,b), where the indicators are ordered vertically from top (leading) to bottom (lagging) not by their original identification numbers (1-62) but by their Hodge potentials, as shown in Fig. 3.
Although we can visually observe clusters of the indicators, it remains only subjective. As has been described in the previous section, we adopt the percolation model to identify clusters in an algorithmic way. Obviously, clusters thus detected depend crucially on the choice of the threshold g c . If we adopt a too small value of g c , the indicators would fragment to a number of tiny pieces. For a too large value of g c , on the other hand, most of the indicators would be connected to form a single group. If we carefully adjust g c close to the percolation threshold in the indicator lattice system, various scales of clusters are formed with a power law distribution. Near the percolation threshold, we can thereby extract information on the clustering properties of indicators in the most effective way. This algorithm for detecting clusters is illustrated in Supplementary Fig. S4. By reiterating the percolation calculations with varied g c , we have found that a percolation transition takes place around g c = 0.45 and 0.6 in the model system for positive and negative changes of the indicators, respectively. Incidentally, the total numbers of clusters thus obtained are 306 and 479 for positive and negative changes of the indicators. The results are shown

Discussion
Understanding the challenges ahead of us, we suggest that the roles of macroeconomic indicators might change with time and with fluctuating economic dynamics, and real-time analysis using noise-reducing methodologies might be appropriate to offer improved forecasting of the business cycle.
In this paper we study 57 US macroeconomic indicators as well as 5 money/trade indexes and their relationships with the US business cycle. We analyze the importance of various macroeconomic indicators and their leading/lagging roles with respect to economic expansions and contractions. We build on almost a century's worth of research in investigating dynamic societal processes, such as innovation, and the influence of such processes on economic behavior. Business cycles reflect trends of economic activities captured by specific macroeconomic indicators whose characteristics we examine in this study.
We expand the set of methodological approaches to analyzing business cycles and important economic turning points by proposing novel methods such as the Hodge Decomposition for hierarchically ordering macroeconomic indicators by their leading/lagging roles in relation to business cycles. Additionally, we extract significant information by using noise-reducing algorithms to identify economically significant events. Moreover, by using a synchronization network approach and clustering analysis of temporal positive and negative changes of macroeconomic indicators, we find significant consecutive collective behavior among macroeconomic indicators.
In our study, first we addressed the question of identifying the most prominent leading macroeconomic indicators within the 20-year time period that we analyze, between Jan. 1998 and Dec. 2017. We found that, besides the indicators classified as leading by the Conference Board, some of the coincident and lagging indicators as well as certain money/trade indexes show lead indicator characteristics. Namely, coincidental indicators such as the industrial production indexes for wood product manufacturing (#39) and for iron and steel manufacturing (#44) seem to be indicative of the business cycle. Similarly, the lagging indicators: Bank Prime Loan Rate (#51) and the Consumer Price Index (#56) surface as lead business cycle indicators. Additionally, the import and export price indexes (#58) and (#59) respectively, as well as the M2-Money Stock (#61) and the adjusted monetary base (#62) seem to precede turning points in the business cycle.
Second, we have used our CHPCA and Hodge decomposition methodologies (described in detail in the methods section) to identify specific, significant economic events by analyzing the fluctuation intensity in macroeconomic indexes. We were able to identify the impact of the Global Financial Crisis  Our results show that most of the variability in the macroeconomic indicators can be explained by only six significant eigenvectors, out of the 62 time series analyzed. We can interpret this result as having a small number of significant sources of macroeconomic variability. There are several possible directions for future research based on the results and the issues that we faced in this study. First, we used monthly data, and changing the data resolution to either weekly or daily may improve the findings. Second, we conducted the analysis only for U.S. macroeconomic indicators, and a future research direction may include application of our proposed methodologies to more countries and even regions or unions such as the European Union. Third, we based our investigations on only 62 time series covering a 20-year period, and if we increase the data set or complement it by including pricing data, the results may improve. Finally, creating a comprehensive dynamic forecasting tool of business cycles requires continuous, ongoing further research. One such future research direction might include improving the proposed methodology in this paper or taking another methodological approach including a different data set in the future.