A multilevel analysis of financial institutions’ systemic exposure from local and system-wide information

In the aftermath of the financial crisis of 2007–2009, the growing body of literature on financial networks has widely documented the predictive power of topological characteristics (e.g., degree centrality measures) to explain the systemic impact or systemic exposure of financial institutions. This study shows that considering alternative topological measures based on local sub-network environment improves our ability to identify systemic institutions. To provide empirical evidence, we apply a two-step procedure. First, we recover network communities (i.e., close-peer environment) on a spillover network of financial institutions. Second, we regress alternative measures of vulnerability (i.e. firm’s losses)on three levels of topological measures: the global level (i.e., firm topological characteristics computed over the whole system), local level (i.e., firm topological characteristics computed over the community to which it belongs), and aggregated level by averaging individual characteristics over the community. The sample includes 46 financial institutions (banks, broker-dealers, and insurance and real-estate companies) listed in the Standard & Poor’s 500 index. Our results confirm the informational content of topological metrics based on a close-peer environment. Such information is different from that embedded in traditional system-wide topological metrics and can help predict distress of financial institutions in times of crisis.


Literature review
Our study relates to the fast growing literature on contagion in financial networks (see 3 for a survey). A central question in this body of research is whether the network structure enhances the vulnerability of the financial systems as a whole and of its components. The degree distribution is a common measure to characterize the network structure; for example, Boss et al. 4 document this question using real data on interbank liabilities in Austria. A similar exercise is proposed in 5 and 6 for the Brazilian interbank network. Caldarelli et al. 7 argue that, as in other types of networks such as social media, degree distributions in financial networks are found to be well described by a power law. An interesting feature emerging from this research is the identification of core-periphery structure in financial market especially in interbank networks. Thereby, the system is comprised of a small number of highly interconnected banks at the core along with poorly connected institutions on the periphery (see 8 ). Part of the literature shifts the perspective from the system to individual nodes and extends the analysis to centrality measures that are designed to identify the importance of a node in a network (see 9 ). One of the primary targets of this research is then to assess whether the centrality of nodes explains their financial exposure or systemic importance. To document this issue, several centrality measures used in network science have been applied to financial systems such as degree, eigenvector, or Katz centrality to cite the most frequent. For example, Craig et al. 10 apply a modified version of the eigenvector centrality on German credit registers to explain individual bank risk. They find a negative relationship between centrality measures and the probability of default. In 11 , Martinez-Jaramillo et al., compute several centrality measures into a composite measure of Mexican data to characterize the banking system. Puhr 12 find a positive relationship between the Katz centrality and systemic risk. In these examples, centrality measures are applied on single layer networks implicitly assuming a unique source of connection between financial institutions, such as contractual obligation. However, the reality is more complex. Potential transmission channels are diverse 13,14 such as cross-lending relationships, derivatives, similarities, or common holdings in the portfolio structures, among others 15 . Each channel in turn may create a specific network of dependence among the financial institutions at stake. To feature the different layers, one strand of the literature has developed a holistic approach that considers multiple channels of transmission, while keeping the network representation simple. To this end, the links are recovered from the analysis of the dependence structure in stock market returns. The approach builds on the premise that stock prices reflect all the relevant information regarding an institution. As such, the dependence between stock returns enables us to assess whether two institutions are related, regardless of the specific channels through which the transmission operates. Thereby, it provides a synthetic measure of interconnectedness between institutions. In this vein, Billio et al. 16 has made a pioneering contribution. The author uses the Granger causality test over monthly returns of hedge funds, banks, broker/dealers, and insurance companies to build their network. It is then possible to predict the systemic risk level of financial institutions out-of-sample based on centrality measures that are computed on the retrieved network structure. Diebold and Yilmaz, Giudici and Parisi, Betz et al. [17][18][19] performed similar studies. Gandica et al. 20 document the presence of sub-structures within the network of densely connected nodes. Sub-networks across time in the US market were identified using a community detection algorithm. An interesting finding of this research is that these sub-networks stemming from stock returns dependence do not fully coincide with trivial ex-ante categories such as financial industries (e.g., broker-dealers vs. banks). It should be noted that, while being different, this approach shares similarities with a related strand of the literature where stock market returns can also be used to compute systemic risk measures based on commonalities in the market as a whole (e.g., see 21 or the PCA analysis in 16 ).
Our contribution is closely related to the analysis developed in 16 . The main differences are threefold. First, we rely on a modified procedure, as developed in 2 , to retrieve the financial network. As discussed in 2 , this approach is better suited when the underlying network is time varying. Second, we segment our network into sub-networks by applying a community detection algorithm. Third, we compute topological measures at three levels: (i) the firm level over the whole system as in 16 , (ii) the firm level over the community, and (iii) the community level by averaging individual characteristics within identified communities. Another related contribution is that of 1 , who examine the exploratory power of a large set of centrality measures of interbank spreads.

Dataset
Our empirical analysis of firms' systemic risk crucially depends on two main elements: (i) an accurate representation of the underlying pre-crisis financial network. The ability to properly assess relevant network metrics that will serve as explanatory variables in our regression settings also depends on this. (ii) A sound measure of systemic risk in times of crisis (our endogenous variable). To address both issues, we rely on a purely market-based approach and retrieve the monthly cum-dividend stock prices from Thomson Reuters Eikon for each company in a sample covering the period from January 1990 to December 2014 for the former and over the crisis period (July 2007-December 2008) for the latter. We derive our financial spillover network data using the methodology developed by 2 . The data feature the time-varying bilateral relationships of 155 financial institutions with Standard Industrial Classification (SIC) codes from 6000 to 6799 from the S&P 500 over the pre-crisis period (more details about the creation of the network based on raw return data are provided on the supplementary information). Three subsequent filters on this network were needed to recover the final network on which precrisis topological measures are computed. First, institutions appearing in the sample only after the start of the financial crisis in January 2008 were removed. Second, to filter out the noise in our data, the analysis is restricted to stocks with at least 36 consecutive monthly observations. Eventually, the financial institutions that disappear before or during the financial crisis were dropped. Our final sample consists of 46 financial institutions including banks, broker-dealers, and insurance and real-estate companies listed on the S & P 500 index. Note that such a network is similar in size or larger than most related studies on market-based financial spillovers such as 22,23 , or 24 . Based on the resulting pre-crisis network representation, network metrics are computed at various levels (firm-global, firm-local, community), as detailed in Step 2 below. Finally, systemic risk metrics are detailed in Step 3, based on transformations of either raw prices or log-returns in times of crisis.

Methodology
First, we recover the financial network for the set of institutions in our sample. Second, based on a community detection algorithm, we break down the whole network into sub-networks, and compute pre-crisis topological measures at both the system-wide and community levels. Third, we derive a measure of vulnerability in times of crisis. Fourth, we regress our measure of vulnerability on previous standard topological measures along with community-based ones to assess their marginal explanatory power. Each step is detailed below.
Step 1: Financial temporal networks. Our financial networks were built following 2 (see the supplementary information or 2 for more details on the computation of the network based on raw return data). The linkages between financial institutions are retrieved from stock market returns by means of a Bayesian time-varying VAR framework and Granger causality testing procedures. According to this approach, an incoming link is created www.nature.com/scientificreports/ from institution j to institution i if the time series associated with the return on firm j Granger causes the time series of the return on firm i. Figure 1 provides an illustrative snapshot of the network before the 2007-2009 crisis. The details of the methodology can be found in 2 , and the main features are summarized in the supplementary information.
Step 2: Independent variables. In a second step, we break down the whole network into sub-networks, grouping the most connected nodes together. To this end, we apply the Louvain method 25 , which is designed to detect communities within complex networks. This well-known method relies on a greedy optimization algorithm that attempts to optimize the modularity of a partition of the network, which is the strength of division of a network into modules or communities with high modularity capturing dense connections between the nodes within modules but sparse connections between nodes in different modules. The optimization is performed in two steps. First, the method looks for "small" communities by optimizing modularity locally. Second, it aggregates nodes belonging to the same community and builds a new network whose nodes are the communities. These steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities emerges (for further details, see 25 ). Equipped with a clearly identified community structure, we compute several topological measures by considering the whole network or only the community-based local environment. For the sake of clarity, the entire set of variables is divided into three groups. First, global topological metrics are computed at the firm level over the whole system, as is usually the case in the literature. Second, we compute the local topological metrics at the firm level over close peers. Unlike the previous metrics, for each node, we exclusively consider the other nodes within the same community when computing our metrics, thereby limiting topological metrics to intra-community links. Third, to further explore the influence of the local environment, we average individual characteristics over each community to compute aggregated topological metrics. This means that all the nodes included in the same community display similar values for this set of variables. Turning to the topological metrics we are considering, we account for a wide set of standard measures: in-degree centrality, out-degree centrality, betweenness centrality, clustering centrality, m-reach centrality, inverse m-reach centrality, in-Katz centrality, and out-Katz centrality (detailed explanations of the variables can be found in 20 ). To take advantage of historical changes in the network as well as auxiliary information on firms' industries, we complete our analysis with a set of less conventional metrics. Hence, sectoral entropy assesses how diverse a community is in terms of sectors. Inter in(out)-degree depicts the number of in(out)-degrees between communities. The inter-intra degree is computed as the ratio of the inter-community degree over the number of intra-community degrees. The measure is applied separately to in-and out-degrees. In addition, we compute the ratio between a firm's in(out)-degree and its community in(out)-degree as a measure of the node's commitment 26 . Finally, two temporal metrics are added. To this end, we simply compute a modified m-reach measure while considering that the contagious process involves a time lag. For example, if a firm displays an out-degree of 3 at time t, the value for its 1-reach centrality is 3. For the 2-reach centrality measure, we consider the connections of the 3 neighboring nodes at t + 1 instead of time t in the usual case. If this value is equal to 4, the 2-reach centrality measure is 7. The measure takes a 1-period delay at each order of the propagation process into account. We consider a separate measure for both incoming and outgoing links. Figure 2 provides an overview of the variables constructed for the empirical analysis.
Step 3. Dependent variables. Following 24 , we use two indicators of firm vulnerability based on the loss in firm's stock value to test the predictive power of our topological measures: (i) the cumulative stock returns, and alternatively (ii) the peak-to-trough returns. The former measure is based on daily log-returns and the latter on prices. The cumulative returns are computed for each firm as the sum of its stock returns over the crisis period. The maximum drawdown is computed as the maximum loss of the stock price over the crisis period from a peak to a trough, with the trough defined as the minimum value of the price between the date of the peak to the end of the crisis. As in the literature, we build on the premise that firms' vulnerability is better revealed in the crisis period. The crisis can therefore be viewed as a natural experiment to identify vulnerable institutions and test whether such fragility could have been detected prior to the event by looking at alternative characteristics such as topological metrics. For robustness purposes, we use two definitions for the crisis period: Step 4: Penalized regression analysis. Given the large number of regressors, we apply the elastic net (EN) model 27 to perform shrinkage and identify the most relevant variables 28 . Along with the LASSO approach, this has been widely used in the literature to control the degree of sparsity in the model and select only the most significant coefficients 29 . The objective function in EN regression is expressed as follows: The EN penalty is controlled by the mixing parameter α , averaging two mainstream shrinkage penalty functions, namely the one referring to the LASSO (L1 penalty) 30 and that associated with the ridge regression (L2 penalty) 31 . Note that if α = 1 , the EN approach falls into the LASSO selection model whereas if α = 0 , we recover a full ridge regression approach. As recalled by 32 , whereas the LASSO approach allows to shrink and select, the EN procedure not only shrinks and selects but also makes sure that the selected model also minimizes the www.nature.com/scientificreports/ Kullback-Liebler distance to the true DGP (thus meeting the so-called 'oracle property'). In our implementation, as done in the applied literature in various fields (see 32 or 33 to quote only a few) we consider a value of 0.5 for α , which allows us to perform an equally weighted approach between ridge regression and LASSO. Finally, the parameter controls the strength of the penalty, that is, the number of covariates set to zero (if α = 0 ) or shrink toward zero (if α = 0 ) for minimization purposes. The regression framework is convenient to assess the role of local environment information as opposed to traditional global environment information for computing topological metrics and analyzing systemic risk. For the sake of clarity, we can more specifically formulate three testable hypotheses. If H1 is true, only LT metrics should appear significant from a regression including both local and global topological metrics. If H2 is true, only GT should emerge as significant. If H3 is true, both LT and GT variables should appear significant when included in the same model. Three levels of analysis: System wide firms-based metrics, Community firm-based metrics and Community-based metrics. The figure displays the labels of the topological metrics along with their corresponding level. We consider three levels. System wide firms-based metrics are computed for each firm based on information stemming from the whole network. Community firm-based metrics are computed for each firm based on information stemming exclusively from the community (i.e. sub-networks). Communitybased metrics are computed at the community level. Each category is associated to a stylized network to illustrate which area is used to compute the metrics. Metrics surmounted by an * correspond to the variables labelled "new" in the regression sections from each relevant category.

Results
Before interpreting the regression results, we briefly discuss the cross-correlations among our selected topological metrics. Exploring the dependence among all our metrics may help to have a better sense of their informational content; for example, a very strong correlation would suggest the absence of specific information in an isolated variable. Next, we estimate a baseline model, which is limited to traditional centrality metrics computed at the network level. Then, we extend the specification to include community-based metrics. Eventually, we include all the variables in the model. This stepwise procedure aims to assess the robustness of our findings.
The correlation matrix. We use a heatmap to display the correlation coefficients (see Fig. 4 First level: Global topological metrics. We first report the results for the regular metrics. In total, the model includes 8 variables, all computed at the system-wide level. The list of variables can be found in Fig. 2. We just omit the two temporal (GT) variables in this baseline model. The rows in Table 1 report the names along with the signs of the coefficient of the variables that appear significant in one of the four columns. Each column corresponds to an alternative definition of the dependent variable. We consider four alternatives depending on the definition of both the crisis period, a period of either 12 or 18 months, and the vulnerability of the firm, computed either as cumulative returns or maximum drawdown. Four variables are identified as significant regardless of how the dependent variable has been computed with the exception of the out-degree variable, which is not significant in one model. As expected, the signs for the cumulative returns and the maximum drawdown are opposite. Higher vulnerability corresponds to smaller cumulative returns and larger maximum drawdown. www.nature.com/scientificreports/ To illustrate our results, we can take the example of the betweenness centrality metric. Note that betweenness centrality is the number of shortest paths that pass through a node. In this context, each edge represents the existence of spillovers between the asset returns of two financial institutions. The path means that distress in one institution can be channeled to another in the network through indirect connections. The betweenness centrality metric therefore captures whether a financial firm is located in the path of these financial spillovers. It can also be viewed as a measure of the capability of a firm to transmit the crisis between separated groups of firms. Our results support that the higher the number of paths passing through a firm, the greater the losses experienced by this firm during the crisis and thus, the more vulnerable it appears to be.

Second level: global and aggregated topological metrics.
In the second estimation, we add all the variables averaged by community (AT) to the baseline model (GT). By doing so, we can assess the stability of the firm-based results along with the community-level analysis. Results are depicted in Table 2. The traditional metrics appear robust when compared to the first-level estimations. Only three changes are noticeable. The out-  Table 1. Penalized multivariate regressions for system wide firm-based metrics (GT). −, + denote the sign of the coefficient for significant variable. The variables not significant are not reported. The model is estimated by using Elastic Net regression. Each column corresponds to alternative definition of the dependent variable. In columns 1 and 2, the dependent variable is computed as cumulative returns over respectively 12 months and 18 months. In columns 3 and 4, the dependent variable is computed as the maximum drawdown over 12 months and 18 months, respectively. www.nature.com/scientificreports/ degree variable is now significant with respect to the four alternative definitions of the dependent variable, which previously appeared as significant in three models. The betweenness centrality is no more significant when we consider a long period of 18 months for the crisis and the maximum drawdown to quantify firm losses. Eventually, in-Katz centrality is no more significant when using an 18-month crisis period and the cumulative returns. Among the 10 variables computed at the community level, three are significant once traditional system-wide metrics, clustering, 2-reach and community size, are controlled for. This result means that the larger a community, the more vulnerable a firm is, for example.
Third level: complete model. We can now consider the full model. The specification includes the three categories of variables, adding community firm-based metrics (noted as LT in Fig. 2) to the previous model (noted as GT and AT in Fig. 2). We also add our set of alternative metrics (noted as new-GT and new-AT in Fig. 2). As explained in Step 2 of the methodology, these variables exploit the information embedded in the temporal changes in the network, the sectoral diversity of our financial firms, as well as the intra-and extracommunity links. The results are reported in Table 3. System-wide metrics are stable for betweenness and clustering. Out-Katz centrality performs better than the out-degree variable adding up the influence of the neighboring nodes. On the other hand, inverted 2-reach centrality replaced the in-Katz centrality to capture financial vulnerability. The size of the community remains significant, as expected and explained in the previous section.
Turning to the community firm-based (LT) metrics, two variables expressing outgoing and incoming links are the dominant explanatory variables at this level. The first is the out-degree centrality. Interestingly, we have previously noticed that this variable was significant when computed at the network level. Now that the two variables computed at the network and the community level are competing in the same model, the one at the community level emerges as significant. The second variable is the inverted 2-reach centrality. We note that in this case, the same metrics computed at both the system-wide and the community level are significant while included in the Table 2. Penalized multivariate regressions for system wide firm-based (GT) metrics and community-based metrics (AT). −, + denote the sign of the coefficient for significant variable. The variables not significant are not reported. The model is estimated by using Elastic Net regression. Each column corresponds to alternative definition of the dependent variable. In columns 1 and 2, the dependent variable is computed as cumulative returns over respectively 12 months and 18 months. In columns 3 and 4, the dependent variable is computed as the maximum drawdown over 12 months and 18 months, respectively.  Table 3. Penalized multivariate regressions for system wide firm-based metrics, community-based metrics and community firm-based metrics. −, + denote the sign of the coefficient for significant variable. The variables not significant are not reported. The model is estimated by using Elastic Net regression. Each column corresponds to alternative definition of the dependent variable. In columns 1 and 2, the dependent variable is computed as cumulative returns over respectively 12 months and 18 months. In columns 3 and 4, the dependent variable is computed as the maximum drawdown over 12 months and 18 months, respectively. www.nature.com/scientificreports/ same model. Among the set of alternative metrics, five variables appear as significant: inter-intra-outdegree centrality, outdeg-intra-outdeg, sectoral entropy, temporal inverted 2-reach, and temporal 2-reach. Recall that inter-intra-out-degree centrality, is defined as the ratio between the community out-degree to the other communities in the network and the out-degree within the same community. What we want to express by such a metric is the influence that one community has over the whole system, normalized by the own intra-dependence. The higher the variable, the more vulnerable the firm. The second metrics is the outdeg-intra-outdeg. The creation of this metric was inspired by 26 , defined as the node's commitment with its own community. Here, the metric is defined as the ratio between the node's out-degree and the total out-degree within its community. The sign associated with the first column when the dependent variable is constructed as cumulative returns is positive. The sign is negative when considering the maximum drawdown. This result means that the higher the variable, the less vulnerable the firm. The sectoral entropy captures the diversity of sectors within the community. In the descriptive analysis of 20 , sectoral diversity was suggested to be a determinant systemic risk when computed as the global sector-interface. Our estimations based on a formal regression analysis confirm this feature. However, it is worth noting that the metric is sensitive to the time length, appearing as significant only for the shortest crisis period (12 months Non-penalized regression analysis. As a last exercise, we re-estimate the model using ordinary least squares (OLS). Penalization methods such as LASSO or EN offer relevant solutions to select acute sets of predictors for small sample sizes relative to the number of variables. While regular OLS regression minimizes the sum of squared residuals to find the value of the estimated coefficients, penalized regressions augment the minimization problem using a penalty term. As a result, penalized regression leads to an increase in the estimation bias by Table 4. Non-penalized multivariate regressions for system wide firm-based metrics, community-based metrics and community firm-based metrics. This table reports the cross-sectional regression of alternative measures of financial firms vulnerability on centrality measures selected in Table 3. The dependent variable is (i) the cumulative returns over 12 months in column 2, (ii) the cumulative returns over 18 months in column 3, (iii) the maximum drawdown over 12 months in column 4, (iv) the maximum drawdown over 18 months in column 5. Estimation is performed by ordinary least squares. Robust standard error are reported in parentheses. ***, **, * stand for statistical significance at 1%, 5% and 10% levels. www.nature.com/scientificreports/ shrinking coefficients, with non-relevant ones becoming (nearly) zero. To check the robustness of our results, we apply OLS on the full set of the selected regressors. Under regular conditions (i.e., Gauss-Markov hypotheses) OLS estimators are unbiased. This two-step procedure can be of particular interest to specifically check the sign of the coefficients. However, statistical tests can still suffer from low power due both to the limited degree of freedom despite having dropped some regressors and to multicollinearity among the remaining ones. Neither the shrinkage approach nor the linear regression is a panacea. However, cross validating their results can allow us to be more confident in our conclusions. The estimated coefficients along with robust standard errors are reported in Table 4. Overall, our findings are confirmed. We do not observe reversals in signs. For most variables in each category, they remains significant predictors of a firm's vulnerability when considering the 10% significance level. In a few cases, the variables are significant in a lower number of models. In three cases, inverted 2-reach centrality, sectoral entropy, and temporal inverted 2-reach, we can no longer detect statistical significance at a 10% level in the four specifications.