Extracting the multi-timescale activity patterns of online financial markets

Online financial markets can be represented as complex systems where trading dynamics can be captured and characterized at different resolutions and time scales. In this work, we develop a methodology based on non-negative tensor factorization (NTF) aimed at extracting and revealing the multi-timescale trading dynamics governing online financial systems. We demonstrate the advantage of our strategy first using synthetic data, and then on real-world data capturing all interbank transactions (over a million) occurred in an Italian online financial market (e-MID) between 2001 and 2015. Our results demonstrate how NTF can uncover hidden activity patterns that characterize groups of banks exhibiting different trading strategies (normal vs. early vs. flash trading, etc.). We further illustrate how our methodology can reveal “crisis modalities” in trading triggered by endogenous and exogenous system shocks: as an example, we reveal and characterize trading anomalies in the midst of the 2008 financial crisis.

Social activity of individuals follows certain rhythms at different time scales. Many of the individual activities, such as sending e-mails and making phone calls, are likely to be done in particular time intervals within a day (i.e., diurnal or circadian cycles), and the total daily activity could heavily depend on the day of the week (i.e., weekly cycles) [1][2][3][4][5][6] . In general, these social activity rhythms emerging at different time scales may be correlated with each other; for instance, it has been shown that face-to-face contacts between classmates in a primary school follow a common diurnal cycle driven by the daily class schedule 7,8 , yet at the same time they would also share activity rhythms at longer scales such as weekly and monthly, reflecting the annual school schedule.
As social communications between humans form temporal social networks, financial transactions between banks also shape time-varying networks [9][10][11][12][13] . In the interbank market, for instance, overnight bilateral lending and borrowing between banks organize temporal networks whose structure changes on a daily basis, because the overnight financial contracts last for only one day 11,12 . Thus, financial markets, similarly to social networks, can be interpreted as complex systems where each agent's activity should be captured and characterized at appropriate time scales 11,12 . In fact, many commonalities have recently been found between social communication patterns of humans and financial interaction patters of banks at particular temporal resolutions 12,14,15 .
In recent years, non-negative tensor factorization (NTF) has been frequently used to extract temporal activity patterns in various social contexts, such as face-to-face contacts 8,16 , Twitter posts 17 and players' matches in online games 18 . Among these studies, Gauvin et al 8 . showed that NTF is highly effective in detecting diurnal rhythms of students' activity in a primary school 7,19 . Characterizing diurnal rhythms lead the authors to uncover the multi-timescale community structure formed by classmates: students' temporal activity cycles, rather than the aggregated history of contacts, allows to reveal meaningful patterns that explain the complexity of children contact networks and communities.
Given the similarity between social and financial temporal dynamics, justified by underlying human factors, and the fact that multiple activity cycles are present at different frequencies in human social activities, our question is whether similar rhythms exist also in financial systems. In this work, we uncover hidden multi-timescale patterns of banks' activities in the Italian online interbank market, e-MID 20 . In previous studies, it has been recognized that there exist activity patterns in banks' financial transactions at a particular time scale, such as inter-day 12 and intra-day scales [21][22][23] . However, it is still unknown whether these patterns coexisting at different time scales are dependent on each other, in which case banks exhibiting a given interday activity pattern are likely to follow a particular intraday pattern.
We employ NTF as a tool to detect multi-timescale patterns, in which banks' activities are captured by a tensor having three dimensions: the list of banks, time of the day, and date. We will show that the PARAFAC decomposition of the 3-way tensor 24,25 indicates that banks' trading activities can be classified into several patterns over the data period of 2001-2015. In particular, the NTF allows us to identify an anomalous pattern around the Lehman Brothers' collapse in September 2008. Our multi-timescale NTF approach reveals not only how banks facing the financial crisis changed their diurnal trading rhythms, but also on what dates such anomalies emerged.
Over the last couple of years, many proposals for measuring and controlling systemic risk have been presented, yet most of them focus on the static nature of financial networks [26][27][28] . However, as we show in this study of the Italian e-MID interbank market, various patterns and anomalies systematically emerge at different time scales, which should be taken into account in the design of financial regulations. Importantly, even in a situation in which bank activity has certain patterns at intra-and inter-day scales, it remains of paramount importance to understand whether the intra-day patterns are independent of the inter-day patterns (i.e., mono-timescale patterns) or they are correlated with each other (i.e., multi-timescale patterns). Our framework to extract multi-timescale patterns advances the understanding of when and how banks react to systemic shocks, and in the future it could contribute to controlling the spread of financial systemic risk 9,10,29-31 .

Data
We use the time-stamped data on bilateral interbank transactions occurred between January 2001 and December 2015 in the Italian online interbank market (e-MID). e-MID is an online platform for financial institutions provided by the Italian company e-MID SIM S.p.A. based in Milan, Italy (henceforth, we call financial institutions as "banks" for brevity). Banks in need of lending or borrowing funds may find a counterpart by posting an order on the online platform, which makes e-MID a marketplace in which lenders and borrowers are matched. Each transaction record in this data represents when a loan contract agreement has been reached between two banks and how much funds have been lent (in million Euros). In addition, the data contains information about the type of each loan agreement, i.e., a lender-proposed transaction or a borrower-proposed transaction. We define a lender-proposed (resp., borrower-proposed) transaction to be a transaction proposed by a bank that lends to (resp., borrows from) its counterpart.
As in the other interbank markets, the vast majority of transactions (>86%) in the e-MID market are overnight lending and borrowing (i.e., a loan contract lasts just for one day), while there are other types of transactions having longer maturity lengths such as two weeks, three month, etc. To eliminate the influence of differences in maturity length, we focus on overnight transactions of unsecured Euro deposits labeled as "ON" (i.e., overnight) or "ONL" (i.e., overnight large, namely overnight transactions greater than 100 million Euros). Table 1 summarizes the basic statistics of this dataset.
There are 3,839 business days over the data period between Jan 2, 2001 and Dec 31, 2015, on each of which transactions are made between 8:00 and 18:00. There are 289 banks that conducted at least one transaction over this period: 194 of them are Italian banks while the rest are foreign banks. Regardless of the nationality of banks, we examine the transactions conducted between 8:00-18:00 in Italian standard time. We do not consider a time difference between countries where the headquarters of the banks are located. The total number of transactions (ON and ONL) that we use in the analysis amounts to 1,148,699. It should be noted that the 289 banks are not always active over the data period (cf., Fig. 1a). In fact, the daily average of the number of participating banks is around 95, and the average number of daily transactions is approximately 300 (cf., Fig. 1b). Although the time-series behaviors of the numbers of active banks and transactions are non-stationary, there is a stable scaling relationship between them (cf., Fig. 1c) 12 . The dataset is commercially available from e-MID SIM S.p.A (http://www.e-mid.it/).

Method
As noted in the previous section, banks are not always active in the interbank market. At the daily scale, some banks have transactions only on a certain fraction of business days while having no transactions on the other days 12,21 . If we look at intraday time scales, on the other hand, the frequency of a bank having at least one transaction is not uniformly distributed over the time intervals 21,22 . These heterogeneities in temporal bank activity are illustrated in Fig. S1.
Since the maturity of loan contracts is overnight, it is important to identify what are the days in which banks are more likely to be active. Conversely, since the participation of banks in the trading activity can change over the course of a day, it is also crucial to understand intraday activity changes. Due to the multi-scale nature of bank trading activity, directly studying the volume of transactions of each bank over time would not allow us to detect inter-and intra-day dynamics. Thus, we need to disentangle each bank's activity across the two dimensions (i.e., time scales): interday and intraday. To this aim, we represent our data in a three-dimensional array (i.e., tensor), whose entries represent the amount of trades each bank performs at a given time of a given day. Given this multi-dimensional representation, our main goal is to extract some meaningful correlated multi-timescale activity patterns related to groups of banks sharing a similar amount of transactions at the same time (respectively, intraday and interday). This can be achieved by means of non-negative tensor factorization (NTF), as described in the following.

Non-negative tensor factorization (NTF) at different time scales.
Let us consider a three-dimensional tensor χ ∈ × ×  I J K , where I = N is the number of banks, J = T is the number of time intervals of bank activity in a day (here between 8:00 and 18:00), and K = D is the number of days. The entry ∈ +  x ijk of the tensor χ denotes the total amounts of trades (in million Euros) conducted by the i-th bank during j-th time interval of the k-th day. To extract the intra-and inter-day trading patterns characterizing banks with similar amount of transactions over time, we rely on the NTF, which decomposes the tensor χ into the sum of several rank-one tensors, namely components (cf., Fig. 2).
Here, we implement the NTF by applying the well-established method called PARAFAC (parallel factor analysis) or CANDECOMP (canonical decomposition) with non-negativity constraints 24,25,32 . The decomposition can be written as ijk r R ir jr kr 1 ∑ ≈ = where ∈ R  denotes the number of components (i.e., the rank of the tensor), and the operator ° represents outer product. Eq. (1) is the so-called canonical polyadic decomposition (CP) of a tensor χ, where ∈ + a r N  ,  ∈ + b r T and ∈ + c r D  represent r-th component factors that respectively encode the membership of a bank to the component, the intervals in a day and the days in which the component is active (i.e. b r and c r are the intra-and inter-day activity patterns of the groups of banks in a r ).
× denote the factor matrices, whose r-th columns are the vectors a r , b r and c r , respectively. The factor matrices A, B and C are calculated by solving the following minimization problem with non-negativity constraints: represents the Kruscal form of the tensor decomposition (i.e., the RHS of Eq. (1)), and F   ⋅ denotes the Frobenius norm.
This minimization problem can be rewritten to perform the minimization with respect to one factor matrix at a time. Let X (n) denote a matrix created by the mode-n matricization (or flattening) of tensor χ, where each row of X (n) consists of a vector corresponding to a given index of mode-n. That is, X (1) , X (2) and X (3) are N × TD, T × ND and D × NT matrices, respectively. They can be expressed in terms of factor matrices as follows: where  denotes the Khatri-Rao product, which is the "matching columnwise" Kronecker product defined as . The minimization problem for PARAFAC decomposition Eq. (3) can be now reformulated as The minimizers of A, B and C of this problem are computed using the non-negative alternate least squares method (ANLS) combined with the Block Principal Pivoting method (BPP) developed by 33 . Our implementation is based on the Tensor Toolbox 34,35 and the MATLAB codes available from 36 .

Rank size.
To determine the number of components R (i.e., rank size) used in the NTF model, we rely on the Core Consistency Diagnostic 24 . Given the tensor element x ijk , we can rewrite Eq. (2) as where λ nmp denotes the (n, m, p) element of the superdiagonal binary tensor  (i.e., λ nmp = δ nm δ mp δ np ). Now let g ijk denote the (i, j, k) element of the core tensor , which is obtained by fitting the data to the Tucker3 model 24 . In the Tucker3 model, the minimization problem reads where X is a N × TD matrix converted from tensor χ, and G is also a matricized version (R n × R m R p ) of the core tensor . With the Tucker3 decomposition, the tensor is written as From the lemma of Bro and Kiers 24 , the core tensor  will be identical to the superdiagonal tensor  if the Tucker3 model is perfectly fitted and the factors have full column rank. If  is significantly different from the superdiagonal tensor , by contrast, it means that there are non-negligible interactions between factors and the PARAFAC model is not appropriate. Thus, we could assess whether the data should be fitted to the PARAFAC model or the Tucker3 model by measuring the distance between  and .
We employ the Core Consistency (CC) value proposed by Bro and Kier 24 as a measure of the distance between  and : where we imposed the constraint R n = R m = R p = R in implementing the Tucker3 decomposition. CC takes 100 if the PARAFAC model perfectly fits the data and less than 100 (possibly a negative value) if the model does not fit perfectly. Note that a rise in R will reduce fitting errors while increasing the possibility of over-fitting. In general, as R increases, there arises more interactions between components, which makes the core tensor  far from superdiagonal, resulting in a low value of CC. Therefore, it is reasonable to stop increasing R before the symptom of over-fitting emerges. Our determination of R is given as c c where CC(⋅) denotes the core consistency as a function of the number of components, and L cc is a threshold parameter. In line with 16 , we set L cc = 85, and to minimize randomness created by a PARAFAC decomposition, we implement PARAFAC decomposition 20 times for a given rank size and use their mean as the value for CC(R′). Our implementation uses the MATLAB code available form 37 which is based on the alternating nonnegativity-constrained least squares with block principal pivoting 38 .
It should be noted that if one implements a low-dimensional clustering method, such as non-negative matrix factorization, separately at different time scales, then in general one would obtain different numbers of mono-timescale patterns at different scales. In that case, however, it would not be possible to know whether or not there exist multi-timescale patterns such that banks exhibiting a common intraday pattern also show a common interday pattern.

Results
Synthetic temporal financial markets. To validate the accuracy of our multi-timescale decomposition, we first implement NTF on a synthetic financial market whose properties are fully known ex ante. Suppose that there are three groups of banks, each of which having particular intra-and inter-day trading patterns. First, we introduce intraday trading patterns by considering the "fitness" of banks. In the fitness model, the probability that banks i and j trade at time t is given by  Fig. 3a. Banks belonging to group 1 are most active in the early morning (blue solid), and banks in group 2 exhibit the highest trading activity around noon (red solid), and banks in group 3 are most active at the end of the day (black dotted). More specifically, the fitness value of a bank belonging to group s at time interval t is given as i t s s , where f(t; μ s , σ) is the p.d.f. of normal distribution with mean μ s and standard deviation σ. We set (μ 1 , μ 2 , μ 3 ) = (0, T/2, T) and σ = T/4. Second, the interday trading patterns are captured by variations in a bank's participation probability, which is the probability that a bank participates in the market on a given day, apart from whether or not the bank is able to find a trading partner ex post. For day d = 1, …, D, the participation probability of bank i, denoted by q i,d , is given as (Fig. 3b For simplicity, we assume that trade volumes are identical for all trades so that the (i, j, k) element of the 3-dimensional tensor is equal to the total number of trades that bank i has during time interval j on day k. We will introduce a volume heterogeneity when analyzing the empirical data in the next section. We set N = 120, T = 20 (i.e., resolution Δ = 30 for hours between 8:00 and 18:00 in each day) and D = 1000.
Using the above synthetic tensor, we can now implement our multi-timescale tensor factorization. The factorization of tensor χ allows us to measure the activity of each individual element: a bank, an intra-day time interval, and a day. The activity of intra-day time interval j of component r is given by b jr , and analogously the activity of day k of component r is represented by c kr . Figure 4 illustrates that PARAFAC decomposition well extracts the true multi-timescale patterns. The core-consistency value strongly suggests R = 3 since CC(4) takes a large negative value (Fig. 5).
Empirical data. Now we analyze empirical data. Given the accuracy of our method in the synthetic model, we expect the NTF method to extract latent multi-timescale patterns in the real-world financial market, e-MID. Here, we implement the N × T × D tensor factorization described in Method, where we set intraday time resolution (in minutes) Δ = {3, 5, 10, 15, 30, 45, 50}, and D is set at the total number of business days between January    Core-Consistency. The CC value for a given rank size is illustrated in Fig. 6a for Δ = 15. The CC value becomes lower than 85 for R ≥ 4, suggesting that R = 3 should be selected. In general, imposing different intraday resolutions may lead us to choose different values of R (Fig. 6b). There is a tendency that a higher resolution requires a larger number of components for the PARAFAC decomposition to be appropriate. In the following, we employ Δ = 15 and thereby R = 3 as the benchmark case, the rationale of which will be discussed in the next subsection. Figure 7a shows intra-and inter-day activities for Δ = 15, in which case there are three different patterns. The intra-day activity of Component 1 is characterized by its bimodal pattern (Since the assignment of component index is originally arbitrary in the NTF implementation, we re-assign index in the ascending order of cumulative intra-day activity between 8:00 and 10:00). It has two peaks around 10:00 and 17:00. On the other hand, both Components 2 and 3 have a single distinct peak in the early morning, after which their activities are very low. However, the distribution of intraday activity of Component 3 is more skewed than that of Component 2, which makes Components 1 and 3 two polar cases and Component 2 in between them. We would call the trading patterns of Components 1, 2 and 3 as normal trading, early trading, and flash trading, respectively.

Intra-day and inter-day activity.
The activity patterns for different temporal resolutions are also presented in Fig. S2. For the case of a finer intra-day resolution (e.g., Δ = 5, Fig. S2a), there are more than three components, and the activity of an additional component (i.e., Component 4) turns out to be very similar to the flash-trading pattern we saw in Fig. 7a. On the other hand, when the temporal resolution is low (e.g., Δ = 45, Fig. S2c), there are only two components and the flash-trading pattern is no longer detected. Given these observations, it is natural to set Δ = 15 as a benchmark case, where apparently independent intra-day patterns can be captured. It should be noted that the following results are not sensitive to the temporal resolution level as long as R = 3 is to be selected (e.g., Δ = 30, Fig. S2b). Figure 7b illustrates the inter-day activity of each component. For Components 1 and 2, the activity decreased radically during the global financial crisis of 2007-2009, which was initiated by a significant decline of the US house prices (the peak date is indicated by blue dotted). In contrast, the activity of Component 3 spiked around the collapse of Lehman Brothers in September 2008 (indicated by red dashed). Figure 7c shows how the share of a given component evolved over time.
These observations suggests that Component 3 captures the "crisis mode" of bank trading while Components 1 and 2 represent more normal trading patterns. The crisis-mode interpretation is reinforced by the intra-day trading pattern of Component 3, in which banks trade only in the very early morning. In fact, we will show that, although we did not use information about the directionality of trades, this anomaly comes from the fact that banks in need of liquidity tried to obtain loans, rather than to extend credit, in the midst of the financial crisis.
Banks' affiliation to components. Each element of vector a r represents the extent to which a bank's trading pattern is captured by Component r. Thus, we can extract a subgroup of banks whose trading patterns are characterized by Component r at least to some extent. Here, the set of banks belonging to component r is given by I r = {i′} for which ′ a i r is within the 90th percentile of = a { } ir i N 1 (Fig. 8a). It should be noted that a bank may belong to multiple components if its trading pattern has multiple features (Fig. 8b).
The bank activity patterns emerging at the daily scale, captured by the row average of a c r r  , exhibit explicit differences across components (Fig. 8c). While all types of banks have not been active since 2009, the normal-trading feature of banks belonging to Component 1 represents their behavior until 2008, and the trading   In the e-MID data set, a trading bank is classified into either of the following four types: aggressor lender, quoter lender, aggressor borrower and quoter borrower. In the e-MID online platform, a bank posts a request for loans or a proposal for lending, and a loan contract is made if another bank accepts the posted request. A bank is called "aggressor lender" when the bank accepts a request for loan posted on the e-MID platform, and the bank posted the request and borrowed fund is called "quoter borrower". "quoter lender" and "aggressor borrower" can also be understood analogously. In addition, we can also ask to what extent the nationality of banks can explain the differences in trading patterns.
In fact, the majority of transactions in the e-MID market were done between aggressor lenders and quoter borrowers, which is a common property to all the component types (Fig. 9a). However, more than half of the banks belonging to Component 3 are quoter borrowers and the share of aggressor lenders is significantly lower than that. As for the nationality, the share of Italian banks among all the banks belonging to Component 1 is significantly low while the corresponding fraction is significantly large for the banks in Component 2 (Fig. 9b) at 90% statistical significance, respectively. This suggests that foreign banks (i.e., banks from countries outside Italy) are likely to exhibit the normal-trading pattern, and Italian banks are more likely to employ the early-trading pattern than foreign banks do.
These observations give us important implications about the multi-timescale trading patterns in the financial market. First, flash trading, which represents the intra-day trading pattern of Component 3, can be attributed mostly to quoter borrowers. Recall that flash trading was observed mostly in the period around the Lehman collapse in September 2008, but we do not know ex ante whether such a trading pattern was driven either by demand or supply. The fact that Component 3 is largely attributed to quoter borrowers suggests that the flash-trading pattern during the financial crisis was conducted by banks that attempted to obtain liquidity as early as possible by posting quotes for loans. This may be regarded as evidence that some banks in fact faced a serious liquidity shortage at the time of the Lehman collapse 40,41 . Our multi-timescale NTF approach reveals not only how banks reacted to the fear of liquidity shortage (i.e., intra-day pattern), but also specific dates on which the fear was most evident (i.e., inter-day pattern).
Second, differences in the nationality of banks can lead to the variety of trading patterns. It turns out that foreign banks (Italian banks) are more likely to employ normal trading (early trading) than early trading (normal trading). On the other hand, the source of the crisis modality cannot be explained by the nationality of banks (Fig. 9b), suggesting that banks had similar chance of being in a crisis mode regardless of their nationalities.

Discussion
In this work, we presented an analytic framework based on non-negative tensor factorization (NTF) to extract temporal activity patterns of financial systems. Despite previous studies on online financial markets recognized the existence of trading activity patterns at specific time scales (e.g., inter-day 12 , or intra-day 21-23 ), we demonstrated-to the best of our knowledge for the first time-that activity patterns coexist at different time scales and depend upon each other. Our methodology allowed us to uncover the hidden multi-timescale patterns of trading activities in an online interbank market (e-MID 20 ).
Within our framework, banks' activities were represented by means of a tensor of three dimensions, namely the list of banks, time of the day, and date. Leveraging the power of NTF (based on the PARAFAC decomposition of the 3-way tensor 24,25 ), we uncovered the multi-timescale nature of financial trading dynamics in e-MID, suggesting that banks' trading patterns could be classified into subgroups exhibiting significantly different multi-timescale patterns. Our framework also allowed us to attribute roles to banking institutions (aggressor lender/borrower, or quoter lender/borrower) based on their trading patterns, yielding interpretable analytic insights.
By modeling the multi-timescale dynamics occurred over the period covered by our data (2001-2015), the proposed NTF-based framework allowed us to identify trading anomalies in the midst of the financial crisis. For example, around the time of Lehman Brothers' collapse in September 2008, our approach showed how banks changed their trading rhythms, and on what dates such anomalies emerged. Understanding trading patterns in financial systems is of fundamental importance for many reasons: first, a rigorous characterization of such systems' trading dynamics could help central banks to effectively intervene in the interbank market. In turn, this would contribute to reducing systemic risk in the financial system as a whole. Financial linkages created by bilateral transactions between banks can lead to a global network of interconnected risk, in which a failure of one bank can immediately spread over the entire system 9,29,30 . Extracting multi-timescale patterns of financial markets allows for a better understanding of when and how banks react to systemic shocks.
Of course, the current approach has certain limitations. First, the tensor representation of interbank transactions does not capture information about the network aspect of the interbank market. While the current NTF takes bank ID, time and day as inputs, the interconnectivity between banks is still ignored. Given that many previous studies revealed that interbank networks contain rich information regarding systemic risk 9,29,30 , including structural information in the framework could improve the results.
Second, while we focused on two different time resolutions, namely intra-and inter-day, there could also be other timescales, such as weekly or monthly periodicity, that could contain meaningful information for characterizing the trading behavior of banks. However, the introduction of additional time scales would add more dimensions to the tensor, which would make it more challenging to find explicit patterns using the PARAFAC decomposition.
Third, the current analysis did not fully explore the role of multi-scale activity patterns in propagating systemic risk. It would be interesting to examine to what extent each of the three activity patterns could promote or prevent financial contagion of bank defaults. To the best of our knowledge, the multi-timescale aspect of the source of financial systemic risk has not been investigated so far. This appears as a promising future direction of this line of research.
Finally, while the current work focused on the Italian e-MID market, this is just a small part of the entire financial system. There are many other interbank markets in different countries, and there are also other types of interbank transactions such as trading of government bonds and credit default swaps. An important question to ask would therefore be: are the multi-timescale patterns found in our analysis ubiquitous or they are specific of markets that obey trading dynamics typical of the e-MID market?
This work lays out the foundations for other researchers to study how patterns and anomalies emerge in financial systems at different temporal resolutions: in the future, such a framework could contribute to controlling the spread of financial systemic risk 9,10,29-31 . It would be interesting to investigate whether our method could be used both in predictive and prescriptive ways to help determine what could happen under certain risk conditions, as well as to provide possible intervention strategies prior to shocks or cascading collapses in financial systems.