Exact results of the limited penetrable horizontal visibility graph associated to random time series and its application

The limited penetrable horizontal visibility algorithm is an analysis tool that maps time series into complex networks and is a further development of the horizontal visibility algorithm. This paper presents exact results on the topological properties of the limited penetrable horizontal visibility graph associated with independent and identically distributed (i:i:d:) random series. We show that the i.i.d: random series maps on a limited penetrable horizontal visibility graph with exponential degree distribution, independent of the probability distribution from which the series was generated. We deduce the exact expressions of mean degree and clustering coefficient, demonstrate the long distance visibility property of the graph and perform numerical simulations to test the accuracy of our theoretical results. We then use the algorithm in several deterministic chaotic series, such as the logistic map, H´enon map, Lorenz system, energy price chaotic system and the real crude oil price. Our results show that the limited penetrable horizontal visibility algorithm is efficient to discriminate chaos from uncorrelated randomness and is able to measure the global evolution characteristics of the real time series.

Currently there are four ways of converting univariate time series into complex networks.The first one is Pseudo-periodic time series transitions [8] that analyze pseudo-periodic time series.The second one is the visibility graph (VG) method, which was first proposed by Lacasa et al. [9][10].To facilitate computation, Luque et al. [11][12] proposed a simplified horizontal visibility algorithm (HVG) based on the visibility algorithm.Bezsudnov et al. [13] proposed a parameter visibility method.Gao et al. [14] proposed a limited penetrable visibility method (LPVG) and multiscale limited penetrable horizontal visibility graph (MLPHVG).The third one is the phase space reconstruction method [15][16].It begins with a phase space reconstruction of time series analysis, maps fixed-length time series segments into nodes of a network, and then uses the correlation coefficients (or distances) between these nodes to determine whether they are connected or not.And the last one is the coarse graining method [17][18][19][20] by which fluctuations of time series are transformed into signal sequences.A fixed-length signal sequence is treated as a network node that connects nodes of time series in chronological order, and a weighted complex network with direction is then constructed.In recent years, researchers have used complex network theories to study multivariate time series [21][22][23][24].These four methods all effectively maintain most of the properties of different types of time series, and they have been successfully used in many different fields [25][26][27][28][29][30].
We deduce the exact mean degree and the clustering coefficient, and we prove that the limited penetrable horizontal visibility graph associated with any independent and identically distributed random series has a small world characteristic.To verify our theoretical solution, we acquire simulation results by using several deterministic chaotic series (a logistic map, an Hénon map, the Lorentz system, and the energy price chaos system) and a real-world crude oil price series that confirms the accuracy and usability of our exact results.

II. RESULTS
We here supply several exact results of LPHVG associated with random time series and apply them to several deterministic chaotic series (a logistic map, an Hénon map, the Lorentz system, and the energy price chaos system) and a real-world crude oil price series.Degree distribution.Let X(t) be a real valued bi-infinite time series of independent and identically distributed (i.i.d.) random variables with a probability density f (x) in which x ∈ [a, b], and consider its associated LPHVG with the limited penetrable distance ρ = 1.Then To prove this conclusion we first calculate the probability that an arbitrary datum with value x 0 has a limited penetrability at most a one-time visibility of k other data.We thus list all sets of possible configurations for data x 0 with k = 4 (see  S10)).We then deduce the rules of when a given configuration contributes to P (k) (see rules i-iv) and obtain a general expression for P (k) (see Eq. (S12)).The detailed proof of this result is shown in Appendix Theorem S1.This is an exact result for a limited penetrable horizontal visibility graph with the limited penetrable distance ρ = 1.We conclude that for every probability distribution f (x), the degree distribution P (k) of the associated LPHVG has the same exponential form.In addition, from this result we can obtain the more general result (Theorem S2 in Appendix) in which X(t) is a real bi-infinite time series of i.i.d.random variables with a probability distribution f (x) in which x ∈ [a, b], and can examine its associated LPHVG with the limited penetrable distance ρ.Then Note that when ρ = 0, then P (k) ∼ exp[−(k − 2)ln(3/2)], the result in Ref. [11].In fact, when ρ = 0 the LPHVG becomes the HVG (see Methods Section).When ρ = 1, the result is Eq. ( 1).Therefore Eq. ( 2) is an extension of the previous result [11] that indicates that the degree distribution P (k) of LPHVG associated with i.i.d.random time series has a unified exponential form.
To further check the accuracy of our analytical results, we perform several numerical simulations.We generate a random series of 3000 data points from uniform, gaussian, and power law distributions and their associated limited penetrable horizontal visibility graphs.Figs 1(a) and 1(b) show plots of the degree distributions of the resulting graphs with a penetrable distance ρ = 1 and ρ = 2.Here circles indicate a series extracted from a uniform distribution, and squares and diamonds indicate series extracted from gaussian and power law distributions, respectively.The solid line indicates the theoretical results of Eq. ( 2).We find that the theoretical results agree with the numerics.Note that a prerequisite for our theoretical results is that the length of the time series must be infinitely long, i.e. the series size N → ∞, so we can assert that the tail degree distribution of LPHVG associated to i.i.d.random series deviated from the theoretical result is only due to the effect of the finite size.In order to check the effect of the finite size, we define the relative error (E(k)) and the mean relative error (M E) to measure accurate between the numerical result under the finite size and the theoretical result, and use a cutoff value k 0 to denote the onset of finite size effects.
where, P num (k) and P the (k) represent the degree distribution of the numerical result and theoretical result respectively.We generate the random series from uniform distribution with different the series size.We have generated 10 realizations of each series size N .Mean degree.Using Eq. ( 2) we calculate the mean degree < k > of the LPHVG associated with an uncorrelated random series, We next deduce the more general expression of mean degree < k(T ) >.We consider an infinite periodic series of period T (with no repeated values in a period) denoted X t = {..., x 0 , x 1 , x 2 , ..., x T , x 1 , x 2 , ...}, where x 0 = x T .Let ρ ≪ T for the subseries Xt = {x 0 , x 1 , x 2 , ..., x T }.Without losing generality, we assume x 0 = x T corresponds to the largest value of the subseries, and x 1 , ..., x ρ , x T −ρ , ...x T −1 corresponds to the (2ρ + 1)nd largest value of the subseries.We then can construct the LPHVG associated with the subseries Xt .If the LPHVG has E links and x i is smallest datum of Xt , because no data repetitions are allowed in Xt , the degree of x i is 2(ρ + 1) during the construction of LPHVG, when ρ = 1, see Fig. S1.We delete node x i and its 2(ρ + 1) links from the LPHVG.The resulting graph has E − 2(ρ + 1) links and T nodes.We iterate this process T − (2ρ + 1) times (see Fig. 2 for a graphical illustration of this process in the case ρ = 1, T = 10), and the total number of deleted links is now The resulting graph has 2(ρ + 1) nodes, i.e., x 0 , x 1 , ..., x ρ , x T −ρ , ...x T −1 , x T , see Fig. 2(h) for ρ = 1 and T = 10.Because these 2(ρ + 1) nodes are connected by E r = 2(ρ+1) 2 links, the mean degree of a limited penetrable horizontal visibility graph associated with X t is Note that Eq. ( 5) holds for every periodic or aperiodic series in which T → ∞, independent of the deterministic process that generates the series.This is the case because the only constraint in its derivation is that data within a period are not repeated.
Using Eqs. ( 2), (6), and (7) we also obtain the local clustering coefficient distribution P (C min ) and P (C max ), i.e., For a proof of this result see Theorem S3 in the Appendix.8)], the solid red line corresponds to the theoretical prediction of P (C max ) [Eq. ( 9)].
Long distance visibility.In a limited penetrable horizontal visibility graph associated with a bi-finite sequence of i.i.d.
random variables extracted from a continuous probability density f (x), the probability P ρ (n) that two data points separated by n intermediate data points are connected is Note that P ρ (n) is again independent of the probability distribution of the random variable X.For a detailed proof of this result see Theorem S4 in the Appendix.Fig. 5(a) shows the adjacency matrix A of the limited penetrable horizontal visibility graph associated with a random series with a different limited penetrable distance.When A(i, j) = 1, we plot ρ = 0 (circle), ρ = 1 (triangle), ρ = 2 (square), and ρ = 3 (diamond) at (i, ρ, j) and (j, ρ, i).Fig. 5(a) shows a typical homogeneous structure in which the adjacency matrix is filled around the main diagonal.In addition, the matrix indicates a superposed sparse structure caused by the limited penetrable visibility probability P ρ (n) = 2ρ(ρ+1)+2 n(n+1) that introduces shortcuts into the limited penetrable horizontal visibility graph.These shortcuts indicate that the limited penetrable horizontal visibility graph is a small-world phenomenon.Fig. 5(b) shows that the theoretical result in Eq. ( 10) agrees with the numerics.
These results are exact with regard to the topological properties of the limited penetrable horizontal visibility graphs associated with i.i.d.random series via the limited penetrable horizontal visibility algorithm.
Application to deterministic chaotic time series.These results can be used to discriminate between random and chaotic signals.Because stochastic and chaotic processes share many features, discriminating between them is difficult, and methods of identifying random processes and discriminating between deterministic chaotic systems and stochastic processes has received extensive study in recent decades [31][32][33][34].Most previous algorithms have been phenomenological and computationally complicated.Thus new methods that can reliably distinguish stochastic from chaotic time series are needed.Recently Lacasa et al. [11][12] used the horizontal visibility algorithm to characterize and distinguish between stochastic and chaotic processes, and they demonstrated that it could easily distinguish chaotic from random series.Here we use our new theory to distinguish chaotic series from random series and compare with the horizontal visibility algorithm [11], and we address four deterministic time series generated by the Logistic map [35] x t+1 = µx t (1 − x t ), µ = 4, the Hénon map [36], the Lorenz chaos system [37], and the energy price-supply-economic growth system [27], Fig. 6 shows the limited penetrable horizontal visibility graphs of 3000 data points extracted from two different chaotic maps and two different chaotic system with ρ = 0 to the left, ρ = 1 in the middle, and ρ = 2 to the right.We calculate their degree distribution numerically (top panel) and the relationship between degree and clustering coefficient (bottom panel).In every case P (k) deviates from Eq. ( 2) and C(k) deviates from Eqs. ( 6) and (7).We also find that the degree distributions of the LPHVGs associated with these chaotic maps and chaotic systems can be approximated using the exponential function in each case, and we conjecture that there is a functional relationship between the random and chaos dimensions [11].Thus the parameter λ = ln[(2ρ + 3)/(2ρ + 2)] is the frontier between random series and chaotic series and serves to distinguish randomness from chaos.The bottom part: The relationship between degree and clustering coefficients.
Application to real crude oil future price series.As a further example, we use data from the U.S. Energy Information Administration on the crude oil future contract 1 (Dollars per Barrel) from 4 April 1983 to 28 March 1985, and find that they exhibit chaotic and long-range correlations [38][39].We select 500 sample data points and demonstrate that we can use our method to distinguish chaotic series from random series when the data sample is small (although for theoretical results we need infinite data).Fig. 7 shows the results of the horizontal visibility graph [see Fig. 7(a)] and the limited penetrable horizontal visibility graph [see Fig. 7(b)] of 500 data points extracted from crude oil futures.We find that the degree distributions both deviate from Eq. ( 2), which means the crude oil future price sequence is not random but chaotic.Comparing the results of HVG (ρ = 0 in LPHVG) and LPHVG, we find that here LPHVG works better than HVG because the selected crude oil price series is too short and there are fewer links in HVG.An advantage in this case is that we can choose a suitable parameter ρ when constructing LPHVG.

III. DISCUSSION
We have introduced a limited penetrable horizontal visibility algorithm, a more generalized case of the horizontal visibility algorithm [11][12] in which the limited penetrable distance is ρ = 0. We obtain exact results on several properties of the limited penetrable horizontal visibility graph associated with a general uncorrelated random series, the reliability of which has been confirmed by numerical simulations.In particular, the degree distribution of the graph has the exponential form )) holds for every periodic or aperiodic series T → ∞, independent of the deterministic process that generates them.The clustering coefficient C has a relationship with degree k, i.e., C min (k introduces shortcuts to the limited penetrable horizontal visibility graph that exhibit a small-world phenomenon.Because these results are independent of the distribution from which the series was generated, we conclude that all uncorrelated random series have the same limited penetrable horizontal visibility graph and, in particular, the same degree distribution, mean degree, clustering coefficient distribution, and small world characteristics.This algorithm can thus be used as a simple test for discriminating uncorrelated randomness from chaos.We show that the method can distinguish between random series that follow the theoretical predictions and chaotic series that deviate from them.In addition, we employ the method to measure the global evolution characteristics of time series by using LPHVG, and the empirical results confirm its validity. Our exact results presented here are extension of previous work [11].We adjust the limited penetrable parameter ρ to the actual situation in order to distinguish chaos from uncorrelated randomness.The method can serve as a preliminary test for locating deterministic fingerprints in time series.If we determine that P (k) has an exponential tail that deviates from Eq. ( 2), or that C(k) deviates from Eqs. ( 6) and ( 7), we apply embedding methods to the series.Topics of further research could include whether this algorithm is also able to quantify chaos, the relationship between such standard chaos indicators as Lyapunov exponents and the correlation dimension, how to tune the limited penetrable parameter ρ, how to use the limited penetrable horizontal visibility graph to handle two-dimensional manifolds, the topological properties of the visibility graphs (VG) and limited penetrable visibility graphs (LPVG), and expanded applications of LPHVG.

IV. METHODS
Limited Penetrable Horizontal Visibility Graph (LPHVG).The limited penetrable visibility graph (LPVG) [30] and the multiscale limited penetrable horizontal visibility graph (MLPHVG) [14] are a recent extension of the VG [9] and HVG [11][12] used to analyze nonlinear time series.The limited penetrable horizontal visibility graph (LPHVG) is a geometrically more simple and analytically solvable version of LPVG [30] and MLPHVG [14].To define it we let {x i } i=1,2,...,N be a time series of N real numbers.If we set the limited penetrable distance to ρ, LPHVG maps the time series into a graph with N nodes and an adjacency matrix A. Nodes i and j are connected through an undirected edge (A ij = A ji = 1) when x i and x j have limited penetrable horizontal visibility (see Fig. 9), i.e., if at most ρ intermediate data This mapping is a limited penetrable horizontal visibility graph (LPHVG).When we set the limited penetrable distance ρ = 0, LPHVG degenerates into HVG [11].When ρ = 0, there are more connections between any two nodes in LPHVG than in HVG.Measurement of the Global Evolution Characteristics of Time Series using LPHVG.A time series is defined X = {x(t)}, t = 1, 2, ..., N .To characterize the evolution of the time series using LPHVG, we divide the time series of the entire scale of the time window into equal small-scale segments and assume that the length of the sliding window is L. We define l the step length between sliding time windows.To ensure that small-scale segments of the time series are continuous, we require that l < L. This allows us to obtain T = [(N − L)/l + 1] small-scale time windows, where [...] is the rounding function.For every small-scale time window t, we transform time series into the LP HV G(t) using the limited penetrable horizontal visibility algorithm.The topological structure of LPHVG changes with time t.To describe this process from the global perspective, we use the Euclidean distance to measure the relationship between LPHVGs.We define the Euclidean distance between LP HV G(t m ) and LP HV G(t n ) to be We then determine the distance matrix and assign a threshold value to θ θ = min{d rand tm,tn } tm =tn , d rand tm,tn ∈ D rand T ×T .
Here D rand T ×T is the distance matrix associated with the independent and identically distributed random time series.Using the threshold θ, we define the correlation index γ, Here γ tm,tn is the correlation degree of LPHVG at time t m and time t n , and γ tm,tn can be visualized using a recursive graph constructed using the formula where Θ(x) is the Heaviside function.We use the formula to plot the relationship between LPHVGs in two-dimensional coordinates in which both the abscissa and the ordinate are at time t.In the recursive graph when the Euclidean distance between LP HV G(t m ) and LP HV G(t n ) is sufficiently close, i.e., when ℜ(t m , t n ) = 1, we plot the red dot at (t m , t n ) and (t n , t m ).Note that at (t m , t m ) and (t n , t n ), i.e., at the main diagonal, the red dots remain throughout, and we can use it to characterize the global dynamic changes in correlation.
Proof: Using a method similar to that presented in Refs.[10,11], we select a generic datum x 0 to be the seed.We calculate the probability that an arbitrary datum with value x 0 has a limited penetrable visibility of exactly k other data.From the definition of LPHVG, when x 0 has penetrable visibility of k data there will be at least two penetrable data and two bounding data, one penetrable and one bounding datum on the right-hand side of x 0 and one on the left-hand side, such that the k − 4 remaining visible and penetrable visible data are located inside two bounding data.Note that k = 4 is the minimum possible degree (see Fig. S1).To derive the degree distribution of the associated LPHVG, we first compute some easy terms.Fig. S1 shows the simplest case P (k = 4) in which there are two penetrable data (x −1 , x 1 ) and two bounding data (x −2 , x 2 ).To assure that k = 4, we set the height of both the penetrable and bounding data greater than x 0 , i.e., x −1 ≥ x 0 , x 1 ≥ x 0 and x −2 ≥ x 0 , x 2 ≥ x 0 .Then (S1) In order to simplify Eq. (S1), we define the cumulative probability distribution function F (x) of any probability density f (x) to be where dF (x)dx = f (x),F (a) = 0 and F (b) = 1.With a loss of generality, we assume a = 0, b = 1 , i.e., F (0) = 0 and Here the relation between f and F holds, i.e., Using Eqs.(S2) and (S3), we rewrite Eq. (S1) to be When P (k = 5) (see Fig. S2 ), four configurations emerge: Case 1: C 1 0 , in which x 0 has penetrable variables x −1 and x 1 , bounding variables x −2 and x 3 , and a right-hand side inner variable x 2 .Case 2: C 2 0 , in which x 0 has penetrable variables x −1 and x 2 , bounding variables x −2 and x 3 , and a right-hand side inner variable x 1 .Case 3: C 1  1 , in which x 0 has penetrable variables Note that an arbitrary number of hidden variables n 1 j , n 2 j , m 1 j , m 2 j eventually are located between the inner data and the bounding variables (or the penetrable data and the bounding data) and this must be taken into account in the probability calculation.The geometrical restrictions for the hidden variables are n 1 j < x 2 , n 2 j < x 1 , j = 1, 2, ..., r for C 1 0 , C 2 0 and m Because these are independent and identically distributed random variables, p 1 0 can be calculated From Eq. (S3) we now have (S8) Using the same method, we find the identical results for p 2 0 , p 1 1 and p 2 1 and then we have We thus conclude that a configuration C j i contributes to P (k) with a product of internals when (i) the seed variable [S] provides a contribution of i , i = 0, 1, 2, ..., j = 1, 2, ....When k = 6, however, there are 13 possible seed data x 0 configurations, and it is labeled C i 0 , C j 1 , C r 2 .Similar to P (k = 5), we derive Here C j 1 leads to the same expression as configurations in k = 5 and thus we can derive p j 1 by applying the four rules.Fig. S3 shows that C i 0 and C r 2 are geometrically different and are formed from a seed x 0 and two penetrable variables.In configurations C 4 0 and C 4 2 there are three penetrable variables, one of which (x 1 in C 4 0 and x −1 in C 4 2 ) is smaller than x 0 .When calculating P (k) the role of this smaller penetrable variable is similar to the inner variable.Thus without loss of generality we refer to this smaller penetrable variable as the inner variable.There are two bounding and two concatenated inner variables, and the concatenated variables produce concatenated integrals.For example, when we apply the same formalism as for k = 5 we find that when k = 6, in the case of C i 0 , Using Eq. (S8), when k = 5, every integral depends on x 0 , and thus we integrate each term to find this dependence on.
Here, however, there are two concatenated inner variables, and two concatenated inner variables generate the dependence on the integrals and hence on the probabilities.Thus in the general case each configuration is not equiprobable and does not provide the same contribution to the probability P (k).To weight the effect of these concatenated contributions, we use the definition of p i .
Since P (k) is formed by k − 3 contributions labeled C i 0 , C j 1 , ..., C r k−4 in which the subindex denotes the number of inner data present at the left-hand side of seed x 0 , we conclude that in general the k − 4 inner variables make the following contributions to P (k): Note that p n m is symmetric with respect to the seed and the penetrable variables.Adding this modification to the four rules we calculate a general expression for P (k), i.e., (S12)

Fig. 1 (FIG. 1 :
FIG. 1: (a) Plot of the degree distribution of the resulting graphs with penetrable distance ρ = 1 and ρ = 2, (b) semi-log plot of the degree distribution of the resulting graphs with penetrable distance ρ = 1 and ρ = 2, (c) the test results of the resulting graphs with penetrable distance ρ = 1 (ensemble averaged over 10 realizations), (d) the test results of the resulting graphs with penetrable distance ρ = 2 (ensemble averaged over 10 realizations).

FIG. 2 :
FIG. 2: Graphical illustration of the constructive proof of < k(T ) >, considering a LPHVG with ρ = 1 extracted from a periodic series of period T = 10.

Fig. 3 (
FIG. 3: (a) A simplified period-50, period-100, period-200 and period-250 time series of 1000 data, (b) plotted the mean degree of the resulting LPHVGs with different ρ (circles correspond to the periodic-50, triangles correspond to the periodic-100, squares correspond to the periodic-200 and diamonds correspond to the periodic-250 time series, the black, blue, red and green solid line correspond to the theoretical result respectively.

FIG. 5 :
FIG. 5: (a) Adjacency matrix of LPHVG associated to a random series with different ρ, (b) plot of the relationship of ρ,n and P ρ (n) (the solid line correspond to the theoretical result [Eq.(10)], circles correspond to the numerical simulation result for ρ = 0, triangles correspond to the numerical simulation result for ρ = 1, squares correspond to the numerical simulation result for ρ = 2, diamond correspond to the numerical simulation result for ρ = 3.

FIG. 6 :
FIG. 6:The upper part: Semilog plot of the degree distributions of Limited penetrable horizontal visibility graphs associated to series generated through Logistic map, Henon map, Lorenz chaotic system and Energy price-supply-economic growth system.

FIG. 7 :
FIG. 7: Semilog plot of the degree distributions of LPHVGs associated to crude oil price series.

Fig. 9 (FIG. 9 :
Fig.9(b)  shows the new established connections (red lines) when we infer the LPHVG on the basis of HVG with a limited penetrable distance ρ = 1.Note that the limited penetrable horizontal visibility graph of a given time series has all the properties of its horizontal visibility graph, e.g., it is connected and invariant under all affine transformations of the series data[9,11].

Fig. S1 .
Fig.S1.Set of possible configuration for a seed data x0 with k = 4.The green dots are penetrable data, the blue dots are bounding data.

x − 2 1 )Fig. S2 .
Fig.S2.Set of possible configurations for a seed data x0 with k = 5.The sign of the subscript in xi indicates whether the data are located on the left-hand side of x0 or on the right-hand side.The sigh of the subscript in C j i indicates the number of inner data located on the left-hand side of x0, the superscript in C j i indicates the different cases.The signs n 1 j , n 2 j , m 1 j , m 2 j indicate the number of the hidden data.

Fig. S3 .
Fig.S3.Set of possible configurations for C i 0 , C r 2 with k = 6.

(a) p i 0 has k − 4
concatenated internals (the right-hand side of seed x 0 ); (b) p j 1 has k − 5 concatenated internals (the right-hand side of seed x 0 ) and an independent inner data contribution (the left-hand side of seed x 0 ); (c) p r 2 has k − 6 concatenated internals (the right-hand side of the seed x 0 ) and another two independent inner data contributions (the left-hand side of seed x 0 ); . . .(d) p j k−5 has k − 5 concatenated internals (the left-hand side of seed x 0 ) and an independent inner data contribution (the right-hand side of seed x 0 ); and (e) p i k−4 has k − 4 concatenated internals (the left-hand side of seed x 0 ).