Change Point Detection in Correlation Networks

Many systems of interacting elements can be conceptualized as networks, where network nodes represent the elements and network ties represent interactions between the elements. In systems where the underlying network evolves, it is useful to determine the points in time where the network structure changes significantly as these may correspond to functional change points. We propose a method for detecting change points in correlation networks that, unlike previous change point detection methods designed for time series data, requires minimal distributional assumptions. We investigate the difficulty of change point detection near the boundaries of the time series in correlation networks and study the power of our method and competing methods through simulation. We also show the generalizable nature of the method by applying it to stock price data as well as fMRI data.


Introduction
Many systems of scientific and societal interest are composed of a large number of interacting elements, examples ranging from proteins interacting within each living cell to people interacting with one another within and across societies.These and many other systems can be conceptualized as networks, where network nodes represent the elements in a given system and network ties represent interactions between the elements.Network science and network analysis are used to analyze and model the structure of interactions in a network, an approach that is commonly motivated by the premise that network structure is associated with the dynamical behavior exhibited by the network, which in turn is expected to be associated with its function.In many cases, however, network structure is not static but instead evolves in time.This suggests that given a sequence of networks, it would be useful to determine points in time where the structure of the network changes in a non-trivial manner.Determining these points is known as the network change point detection problem.Given the connection between network structure and function, it seems reasonable to conjecture that a structural change in a network be coupled with a change in its function.Consequently, detecting structural change points for networks could be informative about functional change points as well.
In this paper, we consider the change point detection problem for correlation networks.These networks belong to a class of networks sometimes called similarity networks which are obtained by defining the edges based on some form of similarity or correlation measure between each pair of nodes [Onnela et al., 2012].Examples of correlation networks appear in many financial and biological contexts such as stock market price and gene expression data [Onnela et al., 2004, Mizuno et al., 2006, Bhan et al., 2002, Kose et al., 2001, Mantegna, 1999].In general, when evaluating correlation networks, the full data is used to estimate the correlation between nodes.When using this approach for longitudinal data, it is sometimes implied that the network structure is the same over time.This assumption may be inaccurate in some cases.For example, in Onnela et al. [2004] a stock market correlation network is created from almost two decades of stock prices.In reality the relationship between the stocks, and therefore the structure of the underlying network, likely change over such a long period of time, an issue that was addressed in Onnela et al. [2004] by dividing the data into shorter time windows.Similarly, in functional magnetic resonance imaging (fMRI) trials it is likely that the brain interacts differently when given different tasks [Keightley et al., 2003], or possibly even during a given task, so it may be inaccurate to assume a constant brain activity correlation network in trials with different tasks.
Suppose that a network is constant or may be assumed so until a known point in time before undergoing sudden change.In this case the underlying data should be split up at the change point into two parts, and two separate correlation networks should be constructed from the two subsets of the data.In reality the location of the change point, or possibly several change points, is not known a priori and must also be inferred from the data.This example belongs to a wider class of change point detection problems, which has been well studied in the field of process control.
When the observed node characteristics are independent and normally distributed, methods exist to detect changes in the multivariate normal mean or covariance [Hawkins and Zamba, 2005, Zamba and Hawkins, 2006, Lowry et al., 1992].These methods however, are generic but were not developed in the context of networks.There have been some promising efforts at change point detection for structural networks, but in this case the actual network is observed over time rather than having to rely on correlations of node characteristics that are used to construct the network [Lindquist et al., 2007, 2008, Peel and Clauset, 2014].
In this paper we propose a computational framework for change point detection in correlation networks that is free from distributional assumptions.This framework offers a novel and flexible approach to change point detection.The change point detection method suggested by Zamba [2009] is adapted to our framework and its power to detect change points is compared through simulation.Also, we investigate the general difficulties of change point detection, especially near the boundaries of the data, both analytically and through simulation.Finally, we apply our algorithm to both stock market and fMRI correlation network data and demonstrate its success at detecting functionally relevent change points.

Notation
Assume that the system under investigation consists of a fixed number of n nodes with characteristics observed at T distinct time points, where the observed characteristics are ) is an unknown function, with all columns of Y i.i.d.(independent and identically distributed) and where cov(Y j ) = Σ j .We also assume that the rows of Y , corresponding to observations at individual nodes, are centered to have temporal mean 0 and scaled to have unit variance.Note that the centering and scaling, resulting in standardized observations for each node, can always be performed.
We define a set of diagonal matrices D(i, j) T ×T for 1 ≤ i < j ≤ T such that We define the covariance matrix S(i, j) n×n on the subset of the data ranging from the ith column to the jth column, i.e., from time point i to time point j (1 ≤ i < j ≤ T ), to be: In order to detect a change point, we wish to find the value of k in the range that maximizes the differences between S(1, k) and S(k + 1, T ), where ∆ is picked large enough to avoid ill-conditioned covariance matrices (∆ > n).The rationale for this approach is that if there were a change point in the data, the sample correlation matrices on each side of the change point ought to be different in structure.We choose the squared Frobenius norm as our metric for the distance between two matrices.Let our matrix distance metric be: where tr is the matrix trace operator.We wish to test the hypotheses:

Existing methodology for change point detection
Consider for a moment the case where the vector Y j is multivariate normal with expectation µ 1 and variance-covariance matrix Σ 1 before the change point and expectation µ 2 and variancecovariance Σ 2 after it.We denote this for j > k.A multivariate exponentially weighted moving average (EWMA) model has been developed for the detecting when µ 1 changes to µ 2 [Zamba andHawkins, 2006, Lowry et al., 1992].A likelihood ratio test for detecting change points in the covariance matrix Σ 1 at a known fixed point k was considered by Zamba [2009] and Anderson [1984].The likelihood ratio test statistic for detecting a change point at k is where | • | is the matrix determinant operator.
As this approach makes the assumption that the location of the change point is known to be at k whereas in reality the location of the change point is unknown, the method can be extended to allow an unknown change point location by considering max 1+∆≤k≤T −∆ {Λ k }.When the Y j are normally distributed then, for a fixed k, −2 log(Λ k ) follows a chi-square distribution for large T and for large T − k.Taking the maximum of Λ k over all possible k results in a less tractable analytic distribution for the test statistic due to the necessity of correcting for multiple testing.For this reason, along with the fact that we do not wish to restrict ourselves to these distributional and asymptotic assumptions, we note that (2) can be easily adapted to the framework developed in Section 2.3 by defining d(k) = Λ k and proceeding as usual.This suggests that different definitions of our matrix distance metric d(k) can lead to substantially different results even in the same general framework.This idea is pursued further in Section 3.2.

Simulation based change point detection
It is of interest to establish a method of change point detection that does not require any distributional assumptions on Y j , and we develop such a method in this section.Our approach is based on the bootstrap which offers a computational alternative that can well approximate the distribution of Y j through resampling.Under H 0 , the Y j are all independent and come from the This is repeated for b ∈ {1, . . ., B} where B is the total number of bootstrap samples.∆ is a "buffer" that limits the change point detection from searching too close to the boundaries.We recommend ∆ ≈ n.In the case where a change point location k is closer than ∆ to either 1 or T , the change point will not be detected but, as will be seen in Section 2.4, these cases are near impossible to detect regardless of how small we make ∆.For each k ∈ {1 + ∆, . . ., T − ∆}, A z-score is then calculated for each potential change point k ∈ {1 + ∆, . . ., T − ∆} as The change point occurs for the value of k where the z-score is largest, so we let } for each bootstrap sample b.This is also performed on the observed data, with where | • | is the cardinality of the set.If the p-value is significant, i.e., if sufficiently few bootstrap replicates Z (b) exceed Z, then we reject H 0 and declare a change point exists for the value of k with the highest z-score, i.e., at arg max k z(k).
It is also often the case that there exist more than one change point.In this case the data is split into two segments, one before the first change point and the other after it, and the bootstrap procedure is then repeated separately for each of the two segments.If a significant change point is found on a segment, then that segment is split in two again and this process is repeated until no more statistically significant change points are found.In practice, this procedure terminates after a small number of rounds because each iteration on average halves the amount of data which greatly reduces power to detect a change point after each subsequent iteration.

Difficulty of detection near the boundary
We alluded above to the difficulty of detecting change points near the boundaries of the data, and will now investigate this issue in more detail.When a change point k is very close 0, then the empirical covariance matrix S(0, k) is constructed using a very small amount of data and its estimate is unstable with high variance.Similarly, when k is very close to T , S(k + 1, T ) suffers from the same problem.This makes change point detection hard: if the empirical covariance matrix is highly variable, the noise from the estimation of the covariance matrices can make any possible differences between Σ 1 and Σ 2 statistically difficult to detect.
In an attempt to quantify just how difficult of a problem change point detection is near the boundary, we find the analytic form of ).These combine to give us the result.For the more detailed algebra expanded upon, see Appendix 6.1.
The implication of the Theorem (Equation ( 3)) is that the expected difference asymptotes to infinity as k approaches 0 or T and is minimized when k = T /2 .We expect that the qualitative nature of this result generalizes beyond the normal distribution.The increase in E[d(k)] is confirmed through simulation under H 0 and demonstrated in Figure 1.The implication is that the noise in the estimation of the covariance matrices on both sides of the change point is minimized when both S(0, k) and S(k + 1, T ) have sufficient data for their estimation.When k is close to 0, then even though S(k + 1, T ) has low variability, the large increase in variability of S(0, k) leads to an overall noisier outcome.This demonstrates that the strength of the method is only as strong as its weakest estimate.For the purposes of study design and data collection, if we suspect that a change point occurs at a certain location, perhaps for theoretical reasons or based on past studies, we need to ensure that there is sufficient data collected both before and after the suspected change point if we are to have any hope of detecting it.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 50 100 3 Simulation

The relationship between T and n for statistical power
Estimation of the covariance matrix requires T to be large relative to n because the empirical covariance matrix has n(n − 1)/2 values that need to be estimated, so there is high variability in many estimates if T is small.If T is too small, then even if a change point exists, the empirical covariance matrix may be so variable that the change point is undetectable.This problem is exacerbated when trying to detect change points near the boundary as discussed in Section 2.4.
While it is intuitive that T needs to grow as some function of n in order to maintain any reasonable statistical power to detect change points, it is unclear what that function of n is.We investigate here further, using simulation, at what rate the number of longitudinal observations T needs to grow with system size n in order to maintain the same statistical power.We consider the case where a single change point occurs at the midpoint T /2 , and Y j ∼ M V N (0, Σ 1 ) for j ≤ T /2 and Y j ∼ M V N (0, Σ 2 ) for j > T /2 where: where ρ = 0.9.In other words, Σ 2 is a block or partitioned matrix with exchangeable correlation within the blocks on the diagonal, and 0s in the off-diagonal blocks.We simulate instances of Y in this fashion 10000 times for each of n = 4, 8, 12.
In Figure 2 we compare the performance of the method, measured by the proportion of the 10000 iterations resulting in a statistically significant change point, described in Section 2.3 for change point detection for the three different values of n.The asymmetry in Figure 2 around the true change point is caused by having Σ 1 first followed by Σ 2 .If the order of Σ 1 and Σ 2 is reversed, then the asymmetry becomes reversed as well.We find that the probability of detecting the correct change point is the same for all n if we increase T by a quadratic rate in n as T (n) = n(n−1)+C for the constant C, a functional form we discovered by numerical exploration.In our simulations we considered the α = 0.05 significance level and C = 30.The intuition behind a quadratic rate is that as n increases, the number of entries in the empirical covariance matrix increases quadratically and therefore the noise in the Frobenius norm increases quadratically.Increasing T quadratically with n appears to balance out the added noise for increasing the dimensions of the correlation matrices, and stabilizes the statistical power to detect the change point.We would recommend that if one wants to increase n, then there needs to be an associated increase in the number of observations that is quadratic in n in order to retain the ability to detect a change point with the same power.

Comparison of different matrix norms
Up until this point our proposed method has dealt with taking the Frobenius norm of the difference of empirical correlation matrices.The choice of the Frobenius norm was simply for algebraic simplicity of Theorem 1 (Equation ( 3)).Though it is more simple than many other matrix norms for such calculations, there is no reason to believe that the Frobenius norm is uniformly the best choice of matrix norm if the objective is to maximize the statistical power of change point detection.There may be some change points that the Frobenius norm is good at detecting, but there may be other change points for which a different matrix norm or distance metric would be more suitable.We investigate this question more closely in this Section.
Because the Frobenius norm sums the squared entries of a matrix, it is intuitive to expect that the Frobenius norm would be ideal for detecting change points in systems that demonstrate large- 0.2 0.3 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q n=4 n=8 n=12 likely would not be very powerful in detecting small-scale local changes in network structure.The rationale for this is that by summing over all the changes in the network structure, if there are very few changes relative to the entire network, then the Frobenius norm would be dominated by noise from the largely unchanged matrix elements.
We consider a different matrix norm, the Maximum norm, that has an appeal for the case of small-scale, local changes.The Maximum norm of a matrix is simply the largest element of the matrix in absolute value.Intuitively, this norm would be ideal if there was just a single, but very large, change in the covariance matrix.If only one element of the covariance matrix changes, but the change is quite large, the Maximum norm would still be able to detect this change.Here the Frobenius norm would likely fail due to the sum of the all the changes being dominated by noise.
The likelihood ratio test in Equation ( 2) is more similar to the Frobenius norm than the Maximum norm in that it utilizes all entries in the covariance matrix rather than using only one element.As a result, we may expect the likelihood-ratio distance metric to be more similar to the Frobenius norm in performance than it is to the Maximum norm.
We compare the Frobenius norm, the Maximum norm, and likelihood-ratio in Equation ( 2) through simulation with varying degrees of the proportion of the network that is altered at the change point.To do this, we generated Y j from a multivariate normal distribution with T = 400 and a single change point occurring at t = 200.Prior to the change point Y j ∼ M V N (0, Σ 1 ) for j ≤ t and after the change point Y j ∼ M V N (0, Σ 2 ) for j > t, where we modify the dimension of the upper-left block of Σ 2 to change the proportion of the network that is altered at the change point.In each case ρ is selected such that the change point is detected with 50% power using the Frobenius norm.This provides a reference for how the Frobenius norm compares with the Maximum norm and the likelihood-ratio.The results are displayed in Figure 3, which confirms our intuition.When a small proportion of the network is altered, the Maximum norm is more powerful at change detection than the Frobenius norm and likelihood ratio metric.When a large proportion of the network is altered at the change point, then the Frobenius norm and likelihood ratio metric are more powerful.While the likelihood ratio metric is more similar to the Frobenius norm than it is to the Maximum norm, it is still less sensitive to wide-spread subtle network changes than the Frobenius norm.
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of network altered at change point Power to detect change point q q q q q q q Likelihood−ratio Frobenius Maximum is used even when one is not sure a change point exists.If there is some a priori knowledge of a type of change point perhaps specific to the problem at hand, then that information could be used to select an appropriate norm.For example, suppose we investigate a network constructed from stock return correlations and the time period under investigation happens to encompass a sudden economic recession.The moment the recession strikes, it is likely that there will be large-scale changes in the underlying network, and therefore the Frobenius norm might be a good choice.
One important issue that deserves emphasis is when the choice of the norm to use should be made.It is very important that the choice of norm is made prior to looking at the data.If the analysis is performed multiple times repeatedly with different choices for the matrix norm, and the norm with the "best" results is selected, this would be deeply flawed and would invalidate the interpretation of the p-value.See Gelman and Loken [2013] for an informative discussion of the problem of inflated false positive rates that result can come when the choice of the specific statistical procedure to use, in this case the norm, is not made prior to all data analysis.

Detecting multiple change points
It may be the case that more than one change points occur in the data.In this case, the method described in Section 2.3 can still be applied to search for additional change points by splitting the data into two segments at the first significant change point and then repeating the procedure on each segment separately.This process is repeated recursively on segments split around significant change points until no additional statistically significant change points remain.Each test is performed at the α = 0.05 level (or at another user-specifid level).Though multiple comparisons may seem like a potential problem here, in fact there is no problem because further tests are only performed conditional on the previous change points being elected as statistically significant.This prevents the false positive rate from being inflated.A sliding window approach can also be used to detect multiple change points as is done in Hawkins et al. [2003] and Peel and Clauset [2014].
We investigate the performance of our method for detecting multiple change points through simulation.Consider the case where T = 400 time points are observed for n = 10 nodes in a network.Data follows a multivariate normal distribution with mean 0 and covariance Σ 1 for 1 ≤ t ≤ 100 and for 201 ≤ t ≤ 300, but has covariance Σ 2 for 101 ≤ t ≤ 200 and for 301 ≤ t ≤ 400.
We define Σ 1 and Σ 2 as in Section 3.1 with ρ = 0.9, except that the upper left block of Σ 2 is 5 × 5 in this case.The probability that a change point is detected at each particular location is estimated from 10000 iterations and is shown in Instead, the change points at t = 100 and t = 300 are picked up first.The reason the change point at t = 100 is easier to detect than the one at t = 300, despite each being equally far from its respective boundary, is because the data is ordered with the first 100 observations generated from Σ 1 and the last 100 from Σ 2 .If this is reversed, then t = 300 becomes the change point most likely to be detected.After the first change point is detected, power is reduced for the remaining change points due to the reduction in sample size that occurs due to fragmenting the data into segments.

Correlation networks of stock returns
Our first data analysis example deals with networks constructed from correlations of stock returns.Networks constructed from correlations of stock returns have been used in the past to investigate the correlation structure of markets as well as to detect changes in their structure [Mantegna, 1999, Onnela et al., 2003a,b].Here we use a data set first analyzed in Onnela et al. [2004] and apply our change point detection methods to it.
A total of n = 114 S&P 500 stocks were followed from the beginning of 1982 to the end of 2000, keeping track of the stock price at closing for T = 4786 trading days over that time period.
This data is publicly available and had been gathered for analysis previously where correlation networks were constructed based on the correlation between log returns in moving time windows [Onnela et al., 2003b].If the price of the ith stock on the jth day is P ij , then the corresponding log return is R ij = log(P ij ) − log(P i,j−1) ).
Given that the stock market evolves constantly, and given the long time interval in the observed data, it may not be safe to assume that the correlation between the log returns of any two stocks stays fixed over time.If a correlation network were constructed by assuming edges between every two stocks with correlation greater than some threshold, and if all of the 19 years of data were used at once, the resulting network would likely be an inaccurate representation of the market if in fact the true underlying network changes with time [Onnela et al., 2006].A more principled approach would be to first test for change points in the correlation network and then build multiple networks around those change points if necessary.
Following the method proposed in Section 2.3, the stock price data is resampled 500 times assuming the null hypothesis of no change points, and the observed data is compared with these simulations to determine if and where a change point occurs.These simulation results are dis- played in Figure 5.There is strong statistical evidence of a change point at the end of the year 1987 evidenced by a p-value < 0.002.We used the Frobenius norm as our focus was on finding events that could lead to large-scale shocks to the correlation network.There were several other significant change points, but we focus on the first and most significant change point here.The stock market crash of October 1987, known as "Black Monday", coincides with the first detected change point [Onnela et al., 2003a].The stock market crash evidently drastically changed the relationship between many of the stocks leading to a stark change in the correlation network.For this reason it is advisable to consider the network of stocks before and after the stock market crash separately, as well as splitting the data further around potential additional significant change points, rather than lumping all of the data together to construct a single correlation network.

Correlation networks of fMRI activity
Our second data analysis example deals with networks constructed from correlations in fMRI activity in the human brain.The Center for Cognitive Brain Imaging at Carnegie Mellon University collected fMRI data as part of the star/plus experiment for six individuals as they each completed a set of 40 trials [Mitchell et al., 2004].Each trial took approximately 27 seconds to complete.The subjects were positioned inside an MRI scanner, and at the start of a trial, each subject was shown a picture for four seconds before it was replaced by a blank screen for another four seconds.Then a sentence making a statement about the picture just shown was displayed, such as "The plus sign is above the star," and the subject had four seconds to press a button "yes" or "no" depending on whether or not the sentence was in agreement with the picture.After this the subject had an interstimulus period of no activity for 15 seconds until the end of the trial.We avoid referring to this as "resting state" due the reserved meaning of that label for extended periods of brain inactivity.Trials were repeated with different variations, such as the picture being presented first before the sentence, or with the sentence contradicting the picture.MRI images were recorded every 0.5 seconds, for a total of about 54 images over the course of a trial, corresponding to a total of 40 × 54 = 2160 images total.Each image was partitioned into 4698 voxels of width 3mm.The study data are publicly available [Just, 2001].Step 1: picture presented Step 2: picture removed Step 3: sentence presented Step 4: sentence removed q Predicted change point Assuming t = 12 to be a change point, as our analyses suggest, we construct two networks between the ROIs, one from S * (1, 24) corresponding to the first 12 seconds of the trial (recall that i and j in S * (i, j) index time points that are 0.5 seconds apart) and one from S * (25, 54) corresponding to the remaining 15 seconds of inactivity in the trial.The networks are constructed such that an edge is shown between two nodes if and only if their pairwise correlation is greater than 0.5 in absolute value.The two networks are displayed in Figure 7.The correlation threshold to determine if an edge is present was selected such that the network after the change point had 20 edges, and the same threshold was used for both before and after networks.After the change point, there is a clear increase in connectivity in the network.An explanation for this could be that the behavior after the change point corresponds with constant inactivity, whereas before the change point there is a mixture of inactivity, thinking, decisions, and other mental activity likely taking place.These different mental activities could dampen the observed correlations when averaged all together.

Discussion
In this paper, an existing change point detection method was adapted to correlation networks using a computational framework.Many past treatments of change point detection make a distributional assumption on the observed characteristics, but our framework utilizes the bootstrap in order to avoid this restriction.Traditional methods also assume independence between observations and upon first glance this assumption seems unreasonable.For instance, consider the stock market data.Stock prices are often modeled as a Markov process, which implies a strong autocorrelation between consecutive observations.For this reason the stock prices themselves cannot be used as input for our algorithm, but rather the log returns are used.Similar to the random noise in a Markov chain being independent, the assumption of the returns being independent is more reasonable.
We extended our framework to allow for multiple change points.If the first change point is found to be statistically significant, then the data is split into two parts on either side of the change point and the algorithm is repeated for each subset.This process of splitting the data around significant change points continues until there are no more significant change points.The fMRI data analysis in Section 4.2 only found a single significant change point but, due to the multiple changes in stimuli, there are likely multiple points in time where the structure of interaction between regions of interest in the bran change.The reason only one change point was significant is due to the loss of power that occurs when splitting the data, which could be remedied by collecting higher temporal frequency imaging data.With each split of the data, T is approximately halved while n remains the same and, as shown in Section 3.1, this greatly lowers the power to detect further change points.However, using higher temporal resolution data, it should be possible to use this framework to detect multiple change points in correlation networks.

Proof of Theorem 1
The proof starts by writing the Frobenius norm as a sum over all pairs of observations, and The last line is the result of Theorem 1 in Equation (3).
unknown, we approximate it with the empirical distribution f (• | Σ) which gives each observed column vector Y j an equal point mass of 1/T .This is equivalent to resampling from the columns of Y with replacement.Let Y (b) be one of the bootstrap resamples from Y , where each Y (b) j are generated from f (•, Σ).

Figure 1 :
Figure 1: Difficulty of change point detection near the boundaries of data.With T = 200, n = 20, and Σ = I n (the identity matrix of order n), for each potential change point 1 < k < T , we estimate E[d(k)] by averaging d(k) over 10000 simulations under H 0 and show the location of expected values with markers.These empirical estimates are contrasted with the theoretical expectation, shown as a solid line, given by Theorem 1, Equation (3).
scale, network-wide changes in the correlation patterns.On the other hand, the Frobenius norm Distance from true change point Probability to

Figure 2 :
Figure 2: Change point detection statistical power as a function of n and T .The y axis represents the probability that a change point is detected at a particular time point.The x axis is the distance of a time point in either direction from the true change point.These probabilities are estimated based on 10000 iterations for each n.

Figure 3 :
Figure 3: Power comparison for different matrix norms.For each point on the x-axis, a value of ρ in the definition of Σ 2 is selected such that the Frobenius norm has 50% power to detect a change point.As the proportion of the network altered at the change point increases, we adjust the value of ρ correspondingly.

Figure 4 .Figure 4 :
Figure 4: Detection of multiple change points.The y axis represents the probability a change point is detected in a bin, each containing five adjacent time points, based on 10000 simulations total.The true locations of the change points are marked with vertical blue lines.

Figure 5 :
Figure 5: Change point detection in stock returns.With n = 116 stocks tracked over T = 4786 days (∼ 19 years), the blue line is the empirical z-score z(k) while the clustered black lines are the z-scores simulated under H 0 using bootstrap as defined in Section 2.3.A significant change point is detected near the end of the year 1987 corresponding to the well documented crash at the end of that year.

Figure 6 :Figure 7 :
Figure 6: Detecting change points in fMRI data.The z-scores z(k) as defined in Section 2.3 are plotted against time k.The first two seconds and last two seconds of the trial are ignored which corresponds to ∆ = 4, given that the images are taken every 0.5 seconds.