Multiscale limited penetrable horizontal visibility graph for analyzing nonlinear time series

Visibility graph has established itself as a powerful tool for analyzing time series. We in this paper develop a novel multiscale limited penetrable horizontal visibility graph (MLPHVG). We use nonlinear time series from two typical complex systems, i.e., EEG signals and two-phase flow signals, to demonstrate the effectiveness of our method. Combining MLPHVG and support vector machine, we detect epileptic seizures from the EEG signals recorded from healthy subjects and epilepsy patients and the classification accuracy is 100%. In addition, we derive MLPHVGs from oil-water two-phase flow signals and find that the average clustering coefficient at different scales allows faithfully identifying and characterizing three typical oil-water flow patterns. These findings render our MLPHVG method particularly useful for analyzing nonlinear time series from the perspective of multiscale network analysis.

Uncovering complicated behavior from nonlinear time series constitutes a fundamental problem of continuing interest and it has attracted a great deal of attention from a wide variety of fields on account of its significant importance. Different methodologies have been developed to fulfill this challenging task, e.g., chaotic analysis 1 , fractal analysis 2-3 , recurrence plot 4 , complexity measure 5 , multiscale entropy 6 , and time-frequency representation 7 . In recent years, a new multidisciplinary methodology using complex network has emerged for characterizing complex systems [8][9][10][11][12][13][14] , especially the complex network analysis of time series, which has undergone a dramatic advance. Many efficient methods have been proposed to infer complex networks from univariate or multivariate time series [15][16][17][18][19][20][21][22][23][24][25] . In particular, Lacasa et al. proposed visibility graph 17 and horizontal visibility graph 18 which allow mapping a univariate time series into a complex network. The visibility graph theory has established itself as an efficient tool for probing the dynamics underlying real complex systems from time series [26][27][28][29][30][31][32][33][34][35][36][37][38][39][40] . The (horizontal) visibility graph leads to a natural graph-theoretical description of nonlinear systems with qualities in the spirit of symbolic dynamics. More recently, we extended the visibility graph to develop a limited penetrable visibility graph (LPVG) 41,42 and found that LPVG presents a good anti-noise ability especially for the analysis of signals polluted by noise. Our LPVG method has been successfully applied to analyze gas-liquid flow signals 42 , signals from electromechanical system in process industry 43 , and EEG signals associated with manual acupuncture 44 and Alzheimer's disease 45 .
As a further study, we in this paper develop a novel multiscale limited penetrable horizontal visibility graph (MLPHVG) to analyze nonlinear time series from the perspective of multiscale and complex network analysis. In particular, we use two examples to demonstrate the validity of our method, i.e., (a) EEG signals recorded from healthy subjects and epilepsy patients; (b) experimental flow signals from oil-water two-phase flows.
The brain is one of the most complex systems. Epilepsy is a paroxysmal disorder of the brain, characterized by sudden occurrence of unprovoked seizures. The underlying mechanism of epileptic seizure is still elusive. Since the fluctuations of EEG signals are associated with the occurrence of epileptic seizures, the characterization of epileptic seizure from EEG signals becomes quite important. We apply our method to analyze two sets of EEG data recorded from numbers of healthy and epileptic subjects. We combine the support vector machine and network statistical measures including the average clustering coefficient and clustering coefficient entropy to detect epileptic seizures at different scales. We interestingly find that the network statistical measures present significant difference between healthy subjects and epilepsy patients. The classification accuracy is 100% at the scale factor 2. These results indicate that our method allows efficiently classifying and identifying EEG signals recorded from healthy subjects and epilepsy patients during epileptic seizures.
Liquid-liquid two-phase flows are widely encountered in many industrial processes. The mixture flow of immiscible oil-water can be viewed as a complex system with typical features of instability, transient and randomness. In recent years, the interest in oil-water two-phase flows has greatly increased due to the development of petroleum industry. The oil and water usually coexist during the oil-well production, and these two immiscible fluids can distribute themselves in various temporal-spatial configurations, known as flow patterns. Different flow patterns exhibit distinct local flow behaviors, how to identify and uncover the underlying dynamics of different flow patterns from experimental measurements has represented a challenge of significant importance. We carry out oil-water two-phase flow experiment to obtain the flow signals and then use our proposed method to identify and characterize different flow patterns from the experimental measurements. The results suggest that our method enables to identify distinct flow behaviors underlying three typical oil-water flow patterns. The above findings render our MLPHVG method particularly powerful for characterizing a dynamical process underlying a given nonlinear time series of time dependent complex system.

MLPHVG analysis of EEG signals.
The EEG data sets analyzed in this paper are from the experiments carried out by Andrzejak et al. 46 . We use two EEG data sets (set A and set E) and each data set consists of 100 single-channel EEG segments of 23.6 s duration. These segments were selected and cut out from continuous multichannel EEG recordings after visual inspection for artifacts, e.g., due to muscle activity or eye movements. Set A consists of segments taken from five healthy subjects through surface electrodes using the international 10-20 electrode placement scheme. Set E from five epilepsy patients consists of segments selected from all recording sites exhibiting ictal activity during seizure activity. All EEG signals were recorded with the same 128-channel amplifier system, using an average common reference. The data were digitized at sampling rate of 173.61 Hz. Band-pass filter settings were 0.53-40 Hz. We derive 200 MLPHVGs (multiscale limited penetrable horizontal visibility graphs) corresponding to two sets of EEG signals with the limited penetrable distance being 1. Then we calculate the average clustering coefficient 47 and clustering coefficient entropy 24 from the derived networks. We combine the average clustering coefficient and clustering coefficient entropy to generate two-dimensional feature vectors and then employ SVM (Support Vector Machine) to realize the classification of sets A and E. In particular, we employ the leave-one-out cross-validation and 10-fold cross-validation to estimate the classified results of the features derived from MLPHVGs. The leave-one-out cross-validation 48 consists of removing one sample from the dataset (set A and E), constructing the decision function on the basis only of the remaining dataset and then testing on the removed sample. In this fashion this process is repeated 200 times independently, with a different sample left out for testing every time. After 200 cross validations, we obtain the predicting labels for all samples and measure the fraction of correctly predicted samples over the total number of samples in the dataset. In addition, we employ the 10-fold cross-validation to estimate the classification accuracy. For one realization of 10-fold cross-validation, the 200 samples from sets A and E are randomly partitioned into ten equal subsets; nine subsets are used for training and one subset remains for testing. This procedure is repeated ten times so each subset serves once for validation and then we obtain predicting labels for all samples from ten subsets and a classification accuracy for one implementation of 10-fold cross-validation can be obtained by measuring the fraction of correctly predicted samples over the total number of samples in the dataset. In order to reduce bias introduced by randomly partitioning dataset in the cross-validation, we implement the 10-fold cross-validation 10 times independently and the final classification accuracy of set A and E can be estimated by taking the average of the 10 independent realizations of 10-fold cross-validation. The classification accuracy using leave-one-out cross-validation and 10-fold cross-validation at different scales are presented in Fig. 1. Notably, the classification accuracy is high over different scales and the highest value is 100% at scale 2. We in Fig. 2 show the joint distributions of the average clustering coefficient and clustering coefficient entropy for sets A and E at scale factor 2. This EEG database has been recognized as a benchmark for developing seizure detection models, and many researchers have used this database to test their proposed methods [49][50][51][52][53][54][55][56] . To give a few examples, Nigam et al. 49 proposed a method using multistage nonlinear pre-processing filter in combination with a diagnostic artificial neural network to classify sets A and E with a classification accuracy of 97.2%. Kaya et al. 50 presented a novel method based on one-dimensional local binary pattern to classify sets A and E and the classification accuracy is 99.5%. Kannathal et al. 51 proposed a method based on various entropy measures and adaptive neuro-fuzzy classifier to classify sets A and E with a classification accuracy of 92.22%. Subasi 52 employed wavelet feature extraction and a mixture of expert model to distinguish sets A and E and obtained a classification accuracy of 94.5%. Polat et al. 53 classified sets A and E with a classification accuracy of 98.72% by using a hybrid system based on decision tree classifier and fast Fourier transform. Nicolaou et al. 54 integrated the permutation entropy with the support vector machine to classify sets A and E with a classification accuracy of 93.55%. Zhu et al. 55 proposed a weighted horizontal visibility graph to classify sets A and E and the classification accuracy is 100%. Zamir 56 developed linear least squares-based preprocessing models to classify sets A and E with a classification accuracy of 100%. There are many published good classification results of this EEG dataset and the above are just a few of them. More results can be found in Ref. 57. Therefore, our method allows accurately classifying EEG signals recorded from healthy subjects and epilepsy patients.

MLPHVG analysis of experimental flow signals.
The two-phase flow signals are from our oil-water two-phase flow experiment, which was carried out in a vertical upward 20 mm-inner-diameter plexiglass pipe at Tianjin University. The experiential media are tap-water and No. 3 white oil. These two immiscible fluids mix themselves and then flow together into the vertical testing pipe. Three oil-in-water flow patterns have been observed, i.e., oil-in-water slug flow, oil-in-water bubble flow, oil-in-water VFD flow (Very Fine Dispersed bubble flow). The conductance sensor 58 is designed to capture the flow behavior and the measured flow signals are stored by data acquisition devices. The sampling rate is 4000 Hz. We use the high-speed camera to observe and define oil-water flow patterns. We infer multiscale limited penetrable horizontal visibility graphs from our experimental measurements and the limited penetrable distance is 1. Then we employ the average clustering coefficient to analyze the derived complex networks corresponding to three typical vertical oil-in-water flow patterns. The results are shown in Fig. 3, in which each error bar is calculated from different flow conditions for the same flow pattern at the same scale factor. We can see that, the multiscale distributions of the average clustering coefficient for different flow patterns exhibit distinct features, which allows identifying three different oil-water flow patterns. For the oil-in-water slug flow, small numbers of oil droplets simultaneously follow the cap shaped oil slugs. Its flow behavior exhibits the feature of intermittent oscillation and its flow structure presents non-homogenous distribution. The flow of oil slug through the sensor will lead to a large fluctuation in the measured conductance signals. Consequently, the average clustering coefficients of oil-in-water slug flow exhibit large values at different scales, and the deviation of the average clustering coefficients calculated from different oil-in-water slug flow conditions at the same scale factor is also the largest among three flow patterns. The turbulent energy enhances with the increase of mixture flow rate, the oil slug are broken into small oil droplets consequently. That is, oil-in-water bubble flow occurs, where oil phase exists in the form of discrete droplets flowing in a water continuum. In this flow pattern, the intermittent oscillation of oil slugs gradually disappears and the non-homogenous distribution of oil phase becomes weak. The fluctuation strength of the measured conductance signals is weakened. Correspondingly, the average clustering coefficient decreases as the flow pattern changes from oil-in-water slug flow to oil-in-water bubble flow and meanwhile the deviation value also decreases. When the mixture flow rate is high, the oil droplets are dispersed into even smaller oil droplets, i.e., an onset of oil-in-water very fine dispersed bubble flow (VFD flow). The fluctuation strength of the signals from VFD flow is further weakened. The average clustering coefficient and its deviation of VFD flow are the smallest, indicating the underlying flow behavior becomes stochastic and the distribution of oil Clustering coefficient entropy Average clustering coefficient Set A recorded from healthy subjects Set E recorded from epilepsy patients phase becomes homogenous as the flow pattern evolves from oil-in-water bubble flow to VFD flow. These interesting findings suggest that our method is capable of identifying and characterizing three typical flow patterns arising from vertical oil-water two-phase flow at different scales.

Discussions
In summary, we have articulated a novel MLPHVG strategy (multiscale limited penetrable horizontal visibility graph) for analyzing nonlinear time series. The basic idea of MLPHVG is to define temporal scales in terms of coarse-grain process and then infer limited penetrable horizontal visibility graph from coarse-grained time series for each scale to construct MLPHVG. We choose nonlinear time series from two typical complex systems, i.e., EEG signals and two-phase flow signals, to demonstrate the effectiveness of our method. Combining MLPHVG and support vector machine, we detect epileptic seizure from two sets of EEG signals recorded from numbers of healthy subjects and epilepsy patients. The results suggest that our method allows a high-accurate classification of EEG signals recorded from healthy subjects and epilepsy patients during epileptic seizures. In addition, we use our method to derive multiscale complex network from oil-water two-phase flow signals and then employ multiscale network statistical measures to characterize the constructed networks. Our results indicate that the average clustering coefficient at different scales allows faithfully revealing the change of flow behavior underlying different flow patterns. Bridging multiscale analysis and limited penetrable horizontal visibility graph provides a novel methodology for characterizing a dynamical process underlying a given nonlinear time series of time dependent complex system which widely exists in science and engineering.

Methods
The multiscale limited penetrable horizontal visibility graph (MLPHVG) method can be implemented by the following steps: For a time series of length N, {x (i), i = 1, 2, … , N}, we first define temporal scales in terms of coarse grain process 6 and get a coarse-grained time series {y s (j), j = 1, 2, … , N/s} in the following form where s represents scale factor. Next we infer limited penetrable horizontal visibility graph from the coarse-grained time series {y s (j), j = 1, 2, … , N/s}. We in Fig. 4 show a schematic diagram for demonstrating how to infer limited penetrable horizontal visibility graph from a time series. For a continuous time series of length 10, we display them in the form of vertical bars in Fig. 4(a) and regard each data point (vertical bar) as a node of a complex network. For the horizontal visibility graph (HVG) 18 , two nodes y s (i) and y s (j) are connected if one can draw a horizontal line joining y s (i) and y s (j) that does not intersect any intermediate data height. That is, a connection between two nodes y s (i) and y s (j) exists (black lines in Fig. 4(b)) if the following criterion is fulfilled: s s s Our limited penetrable horizontal visibility graph is a development of the HVG. In particular, if we set the limited penetrable distance to L, a connection between two nodes exists if the number of in-between nodes that block the horizontal line is no more than L. As shown in Fig. 4(a,b), the red lines are the new established connections when we infer the LPHVG on the basis of HVG with the limited penetrable distance being 1. Finally, based on the above procedure, we can obtain multiscale limited penetrable horizontal visibility graph (MLPHVG) by deriving the LPHVG from the coarse-grained time series at different scales. Representing a time series through a multiscale limited penetrable horizontal visibility graph, we can then explore the dynamic behaviors from multiscale analysis and network analysis, which is quantified via network statistical measures. In particular, we employ the average clustering coefficient (C) 47 , and our recently proposed clustering coefficient entropy (E C ) 24 to characterize the topological structure of inferred networks. These network statistical measures can be calculated as follows where τ i,Δ denotes the number of closed triplets centered on node i, τ i is the number of triplets centered on node i, N is the node number of the derived MLPHVG.
(b) (a) Figure 4. Example of (a) a time series (10 data values) and (b) its corresponding LPHVG with the limited penetrable distance L being 1, where every node corresponds to time series data in the same order. The horizontal visibility lines between data points define the links connecting nodes in the graph.