Robust Statistical Detection of Power-Law Cross-Correlation

We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.


Theory
We assume we are given two independent processes in discrete time, X 1 (t) and X 2 (t), such that X 1 (t) = X a 1 (t) + X b 1 (t) and X 2 (t) = X a 2 (t) + X b 2 (t) where X a 1 (t) and X a 2 (t) are linear processes with auto-covariance functions of the form: L 1 (t)t 2H−2 and L 2 (t)t 2G−2 . The L i are slowly varying functions at infinity, i.e. L i (t)/L i (ct) → 1 as t → ∞ for any c > 0 [1,2], and the time-series X b i (t) are deterministic trends of a fixed polynomial order d. This formulation allows for non-Gaussianity, for confounding non-stationarities and for the presence of a high frequency component of the power-spectrum differing at those high frequencies from a power law. This semi-parametric class includes the class analysed by [3], viz. stationary Gaussian time-series with an auto covariance function proportional to t 2H−2 (1 + O(1/t β )) since 1 + O(1/t β ) is slowly varying at infinity. Moreover these assumptions rigourously formalize and generalize Equations (1) and (2) of the main paper.
We denote the integrated time-series in the lower-case: x(t) = t i=1 X(t) and fractional Gaussian noise time series as X G (t) and use subscripts to distinguish time-series when we consider more than one, e.g. x i (t), X i (t), X G i (t). The main result of this section is Theorem 1.9, a central limit theorem for the vector of DCCA cross correlation coefficients; this corresponds to Equation 3 of the main paper. This is proved by first verifying the statement for the special case of fractional Gaussian noise (Proposition 1.5) and then proving that the DCCA coefficients converge to those of fractional Gaussian noise at large time-scales (Proposition 1.7).
Fractional Gaussian noise [4] is the unique Gaussian zero mean process, which is power-law auto-correlated time series with Hurst exponent H, whose integral, fractional Brownian noise x G i (t) has autocovariance function: In order to simplify notation we write F 2 DCCA (n) := def F 2 X1,X2 (n). Thus F 2 DF A (n) = F 2 X,X (n). In order that we can show that the DCCA coefficients are asymptotically normal, we first need to show that the time-varying DCCA coefficients F 2 j,X1,X2 (n) are sufficiently uncorrelated for application of a central limit result.
Proof. Define Q d = I − P d which is the orthogonormal projection onto the complement of the polynomials of degree d in R n . Let Σ H be the univariate covariance matrix of the terms in the first and j th window of size n. Then for d = 1: After computing the Taylor expansion of the integral around j = ∞ in MAPLE we find it has order j 2G+2H−8 . The extension to higher order detrending (d > 1) is staightforward. When H = G, then we have: is the Froebenius norm of Q d ΣQ d and Q d projects to a subspace of the orthogonal complement of the polynomials of degree up to d. Moreover using the Cauchy-Schwarz inequality for the inner-product trace(A, B ) on matrices, the result follows for H = G. Thus, for higher degree polynomial detrending, the order of the covariance function is less than or equal to that of DCCA (1).
is stationary for any n.
Proof. The proof that F 2 k,X1,X2 (n i ) is stationary is identical for the proof for DFA in [3] using Lemma 1.2. No modifications are necessary for polynomial trending.
is a covariance matrix which does not depend on n i or T and where the X G i are independent fractional Gaussian noises.
is a stationary Gaussian vector in j. We may then use conditions on functions of a Gaussian Vector sufficient for a central limit theorem ( [5], Theorem 2). It is possible to show that in the large n limit, the elements of the covariance matrix required by these conditions, which is proportional to (I −P )Σ 1,j (1−P ), decay faster than 1/j 2 , and thus the vector (z j 1 , . . . , z j n , w j 1 , . . . , w j n ) satisfies the conditions (since if the order is slower than this then the order of the DFA coefficients cov(F 1,X G is slower than j −4 which is impossible by an adaptation of Proposition 1.1). The generalization to the multivariate case is straightforward by way of the Cramer Wold device [6]. Proof. This then follows from the Delta method [7] levering the results of the central limit theorem for DFA [3] and DCCA (our Proposition 1.4).
Proof. See [8] for the proof.
and assume that the Hölder exponents of x 1 and x 2 are greater than or equal to the exponents of fractional Brownian motion, then as n → ∞, T /n → ∞ Gaussian noise with the same Hurst exponent as X i .
Proof. Our proof consists of three steps; firstly we show that the DCCA coefficients of the process given by subsampling x 1 and x 2 tend to those of fractional Brownian motion in the limit of subsampling, by using Theorem 1.6. Secondly we show that for a given ratio of subsampling to window size the DCCA coefficients of both the unsubsampled version and the subsampled version tend to each other. Thirdly we show that these imply that in the limit of window size, the DCCA coefficients tend to the law of the coefficients of fractional Gaussian noise.
Part I: We define the subsampled processes as L j (n) −1/2 X j (nt). It is easy to show that the DCCA coefficients of the subsampled process tend to those of fractional Gaussian noise; this follows by Theorem 1.6 for large window sizes.
Part II: We have: . , x j (n)) and let f j n be the least squares poly. fit of degree d to x j and f j n/k to k(x j ), but with terms repeated: f j n/k = ( ×k f j n/k (k), . . . , f j n/k (k), f j n/k (2k), . . . , f j n/k (2k), . . . , f j n/k (n)). Then we can use the Cauchy Schwarz inequality: It is possible to show, using Riemann sums, that, if f * is the poly. of degree d which minimizes the generative mean squared error to x(t) (f * = min f 1 0 (f (nt) − x(nt)) 2 ) then: Since least squares is a convex and differentiable optimization problem, we have that f n and f * are close to one another and so are f n/k and f * . Therefore we can obtain bounds for f n − f n/k since: To obtain rates bounding the right hand expressions we need to obtain the parameter of strong convexity for the m.s.e. ( 1 0 (f (nt) − x(nt)) 2 dt). To do this we observe that the Hessian of the m.s.e. for the linear polynomials is calculated, letting f (t) = a(t) + b, by: This implies that the smallest eigenvalue λ 1 of the Hessian matrix is a constant and greater than zero. Similarly for the polynomials of degree d we have a non-degenerate and symmetric Hessian whose smallest eigenvalue is thus greater than zero. Therefore ( [9] (page 63-65)): Here we have the difference of two mean squared errors on the interval [0, n]. We have that with probability 1 − Cn H n : The transition from the second to the third line uses Equation (16) and Equation (15). The transition to the last line uses the assumption of the proposition. Similarly with probability 1 − C(n/k) H n/k : Moreover we have that x 1 and x 2 are almost surely Hölder continuous with exponent H − φ for any φ < 0 [10] and using the assumption of the proposition. Thus there exists E s.t.: More by the assumption of the proposition, with probability 1 − Cn H n : implies that with probability at least (1 − 2 n max(H,G) n ): Thus let 0 < δ < 1 and k = n δ . Since F 2 j,X1,X2 is a stationary sequence (Lemma 1.3), then, with probability at least (1 − 2C n (1−δ)max(H,G) n 1−δ ), then: ). Then, with probability 1 − : Applying Boole's equality: Set = /[T /n] Then: Since: Thus provided: then the convergence in probability holds. PART III: to show convergence in distribution we use the Portmanteau theorem [11]. Let f be a bounded continuous function and define S X1,X2 (n) = 1 n H+G F 2 X1,X2 (n) and S X G The first and the last lines converge in virtue of Part II (conv. in prob. implies conv. in dist.) and the second line converges in virtue of Part I.
where the X G j is a fractional Gaussian noise with the same Hurst exponent as X i .
Proof. The proof is identical to the proof for the DCCA coefficients. Proof. By Proposition 1.7, to apply the delta method we need that ,X2 (n) obeys the central limit theorem. Equation (21) implies that we require: This holds if T = o(n H+G−δ/2+1/2 L 1 (n)L 2 (n)) (25)

Covariance Calculation
We now give an explicit form for the covariance of Theorem 1.9. Abbreviate Q d to Q, where the superscripts Q n etc. denote the window size.

Test 2: Cross-Correlation between Power-Law Autocorrelated Processes
The second test is a test for cross-correlation, which is not necessarily power-law cross-correlation. Its novelty lies in the fact that it is valid for general power-law auto-correlated time-series, potentially contaminated with non-stationary trends, and maximizes the test power among coefficients derived from our theory. We find a positive measure w ∈ R r so that w ρ DCCA (n) has maximum Σ variance. w = argmax u∈R r :ui,≥0 r i=1 u=1 u Σu The choice of Σ ensures that: w Σ(H, G)w < w Σw We proceed as for Test 1 (in the main paper), finding a such that in the large N, n i limit: Pr(w ρ DCCA (n) < a) < α 4 Software rho DCCA.m calculates ρ DCCA correlation coefficients over scales for arbitrary degree detrending. test scaling region calculates p-values for the null-hypothesis of no power-law correlation. test minimum variance calculates the p-values for the correlation coefficient maximizing test-power over a specified range of scales.