Discriminating chaotic and stochastic time series using permutation entropy and artificial neural networks

Extracting relevant properties of empirical signals generated by nonlinear, stochastic, and high-dimensional systems is a challenge of complex systems research. Open questions are how to differentiate chaotic signals from stochastic ones, and how to quantify nonlinear and/or high-order temporal correlations. Here we propose a new technique to reliably address both problems. Our approach follows two steps: first, we train an artificial neural network (ANN) with flicker (colored) noise to predict the value of the parameter, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α, that determines the strength of the correlation of the noise. To predict \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α the ANN input features are a set of probabilities that are extracted from the time series by using symbolic ordinal analysis. Then, we input to the trained ANN the probabilities extracted from the time series of interest, and analyze the ANN output. We find that the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α value returned by the ANN is informative of the temporal correlations present in the time series. To distinguish between stochastic and chaotic signals, we exploit the fact that the difference between the permutation entropy (PE) of a given time series and the PE of flicker noise with the same \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α parameter is small when the time series is stochastic, but it is large when the time series is chaotic. We validate our technique by analysing synthetic and empirical time series whose nature is well established. We also demonstrate the robustness of our approach with respect to the length of the time series and to the level of noise. We expect that our algorithm, which is freely available, will be very useful to the community.


Introduction
Chaotic and stochastic systems have been extensively studied and the fundamental difference between them is well known: in a chaotic system an initial condition always leads to the same final state, following a fixed rule, while in a stochastic system, an initial condition leads to a variety of possible final states, drawn from a probability distribution 1 . However, the signals generated by chaotic and stochastic systems are not always easy to distinguish and many methods have been proposed to differentiate chaotic and stochastic time series [2][3][4][5][6][7][8][9][10] . A related important problem is how to appropriately quantify the strength and length of the temporal correlations present in a time series [11][12][13][14] .The performance of these methods varies with the characteristics of the time series. As far as we know, no method works well with all data types, because methods have different limitations, in terms of the length of the time series, the level of noise, the stationarity or seasonality of the underlying process, the presence of linear or nonlinear correlations, etc. Moreover, any time series analysis method will return, at least, one number. Therefore, to obtain interpretable results, the values obtained from the analysis of the time series of interest need to be compared with those obtained from other "reference" time series, where we have previous knowledge of the underlying system that generates the data. Here we use as "reference" model a well-known stochastic process: flicker noise (FN).
A FN time series is characterized by a power spectrum P( f ) ∝ 1/ f α , with α being a quantifier of the correlations present in the signal 15 . Flicker noise has been extensively studied in diverse areas such as electronics 16,17 , biology 18,19 , physics 20, 21 , economy 22,23 , meteorology 24 , astrophysics 25 , etc. Furthermore, related to this issue, many methods described in the literature are able to evaluate the time correlation quantification α, such as the Hurst exponent H 2, 11-13, 15, 21, 26 .
In this paper we propose a new methodology that simultaneously allows to distinguish chaotic from stochastic time series, and to quantify the strength of the correlations. Our algorithm, based on an Artificial Neural Network (ANN) 27 , is easy to run and freely available 28 . We first train the ANN with flicker noise to predict the value of the α parameter that was used to generate the noise. The input features to the ANN are probabilities extracted from the FN time series using ordinal analysis 29 , a symbolic method widely used to identify patterns and nonlinear correlations in complex time series [30][31][32] . Each sequence of D data points (consecutive, or with a certain lag between them), is converted into a sequence of D relative values (smallest to largest), which defines an ordinal pattern. Then, the frequencies of occurrence of the different patterns in the time series define the set of ordinal probabilities, which in turn allow to calculate information-theoretic measures such as the permutation  6) with β = 2] and its PDF. (c),(d) Uniformly distributed white noise and its PDF. We see that the PDF of the deterministic map is identical to the PDF of the noise. (e) and (f) PSD of the Schuster map, Eq. (8) with parameter z = 1.5, and of a Flicker noise with α = 1. We note that the PSD of the Schuster map has a long decay that is very similar to a 1/ f α decay of the noise. (g) PDF of m summed logistic maps (Eq. 7 with r = 4), which approaches a Gaussian as m increases. (h-k) Examples of empirical time series analyzed: (h) a stride-to-stride of an adult walking in a slow velocity, interpreted as an stochastic process; (i) daily number of sunspots as a function of time (in years), where its fluctuations are interpreted as stochastic; (k) voltage across the capacitor of an inductor-less Chua electronic circuit, whose oscillations are known to be chaotic. entropy (PE, described in Methods). The PE has been extensively used in the literature, due to the fact that is straightforward to calculate, and it is robust to observational noise. Interdisciplinary applications have been discussed in Ref. 33 and, more recently, in a Special Issue 34 .
After training the ANN with different FN time series, x s (α), generated with different values of α, we input to the ANN ordinal probabilities extracted from the time series of interest, x, and analyze the output of the ANN, α e . We find that α e is informative of the temporal correlations present in the time series x. Moreover, by comparing the PE values of x and of x s (α e ) (a FN time series generated with the value of α returned by the ANN), we can differentiate between chaotic and stochastic signals: the PE values of x and x s are similar when x is mainly stochastic, but they differ when x is mainly deterministic. Therefore, the difference of the two PE values serves as a quantifier to distinguish between chaotic and stochastic signals. We use several datasets to validate this approach. We also analyze its robustness with respect to the length of the time series and noise contamination. This paper is organized as follows. In the main text we present the results of the analysis of synthetic and empirical time series, which are described in section Data sets. Typical examples of the time series analyzed are presented in Fig. 1. In section Methods we describe the ordinal method and the implementation of the algorithm, schematically represented in Fig. 2.

Analysis of synthetic datasets
The main result is depicted in Fig. 3. Panel (a) shows the normalized permutation entropyS(α e ) (Eq. (4)) vs. the timecorrelation coefficient α e . The filled (empty) symbols correspond to different types of stochastic (chaotic) time series, and the solid black line corresponds to FN time series generated with α ∈ [−1, 3], which is accurately evaluated by the ANN. For α e = 0, FN has a uniform power spectrum, characteristic of an uncorrelated signal (white noise), with equal ordinal probabilities P(i) ≈ 1/D! and, hence,S = 1. Otherwise, for α = 0, some ordinal patterns occur in the time series more often than others, and the ordinal probabilities are not all equal, which decreases the permutation entropy. These results are consistent with those that have been obtained by using different methodologies 7, 10, 26 .
In Fig. 3 we note that fBm signals have larger time-correlation (α e closer to 2, a classic Brownian motion) than the other 1. Extract the probabilities of the time series.
2. Input them to the ANN; obtain the correlation coefficient α e .
3. Generate a time series of Flicker noise with α = α e .
4. Compare the permutation entropies of the time series and of the noise.
5. Classify the time series as chaotic or stochastic. Stochastic process Table 1. Results obtained from stochastic and deterministic time series: mean and standard deviation of the α e parameter and of Ω(α e ) (Eq. 1), calculated from 1000 time series generated with different initial conditions and noise seeds.
three stochastic systems α e ≈ 0. However, their permutation entropies are very close to those of the FN signals. The key observation is that stochastic time series all fall close to the FN curve, while chaotic ones do not, namely, β x map, Lorenz system, logistic map, skew tent map, and Schuster map. The distance to the FN curve thus serves to distinguish stochastic and chaotic time series. This is quantified by whereS is the permutation entropy of the analyzed time series andS fn (α e ) is the PE of a flicker noise time series generated with the value of α returned by the ANN, α e . The results are presented in Fig. 3, panel (b), where we see that stochastic signals have Ω ≈ 0, and deterministic signals have Ω > 0. To summarize this finding, Table 1 depicts α e and Ω for ten representative systems. Next, we study the applicability of our methodology to noise-contaminated signals. We analyze the signal where X t is a deterministic (chaotic) signal "contaminated" by a uniform white noise, Y t , and η ∈ [0, 1] controls the stochastic component of Z t . For η = 0 (1) the signal is fully deterministic (fully stochastic). Figure 4(a) shows Ω as a function of η for different chaotic signals. As expected, for η = 0, Ω is high, but as η grows, the level of stochasticity increases and Ω decreases. At η > 0.5, the signal is strongly stochastic, as reflected by Ω ≈ 0.0. For comparison, in Fig. 4(a) we also present results obtained by shuffling a chaotic time series. As expected, Ω ≈ 0 for all η because temporal correlations are destroyed by the shuffling process.
We expect that the addition of a sufficiently large number of independent chaotic signals gives a signal that is indistinguishable from a fully stochastic one. This is verified in Fig. 4(b), where the horizontal axis represents the number, m, of  (1) is plotted as a function of the level of noise, η, in Eq. (2). We see that as η increases, Ω decreases, but it remains, for high values of η, different from the value obtained from shuffled data (black pentagons). Therefore, small values of Ω can still reveal determinism in noise-contaminated signals. Panel (b) shows the effect of adding several independent chaotic signals: Ω is plotted as a function of the number m of signals added. We see that as m increases, Ω decreases, indicating that the deterministic nature of the signal can no longer be detected.
independent chaotic signals added. Here a high value of Ω is observed for m = 1 (a single chaotic signal), but as m increases Ω → 0 since the chaotic nature of added signals is no longer captured (examples of the PDFs of the time series obtained from the addition of m Logistic maps were presented in Fig. 1(g), where we can observe a clear evolution towards a Gaussian shape).
To further explore the robustness of our methodology, we investigate the role of the length N of the analyzed time series in the evaluation of the Ω quantifier (Eq. (1)). Figure 5 shows Ω as a function of N, where panel (a) depicts stochastic signals, and (b) chaotic ones. We see that even for N < 10 2 , for all stochastic signals in panel (a) Ω < 0.1, which indicates that we can identify the stochastic nature of short signals. For the chaotic signals in panel (b), for N > 10 2 Ω > 0.1 (except for β x map), and for N ≥ 10 3 , Ω > 0.2 for all signals, which demonstrates that our method can also detect determinism in short signals.
As discussed in Sec. Methods, the ANN was trained with flicker noise signals with 2 20 data points. However, it is interesting to analyze how much data the trained ANN needs, in order to correctly predict the α value of a flicker noise time series. To address this point, we generate L = 1 000 FN time series and analyze the error of the ANN output, α e , as a function of the length of the time series, N, and of the value of α used to generate the time series. The results are presented in panel (c) of Fig. 5 that displays the mean absolute error, E = 1 L ∑ L =1 |α e − α|. We see that as N increases, E decreases. The error depends on both, α and N, and tends to be larger for high α due to non-stationarity and finite time sampling 10 . For FN time series longer than 10 4 datapoints, the ANN returns a very accurate value of α.

Analysis of empirical time series
Here we present the analysis of time series recorded under very different experimental conditions, as described in section Data sets. Figure 6 displays the results in the plane (α e , Ω).
The Ω values obtained for the Chua circuit data and for the laser data confirm their chaotic nature 35,36 (Ω ≈ 0. 55 and Ω ≈ 0.20 respectively). For the star light intensity Ω ≈ 0, confirming the stochastic nature of the signal 37 . For the number of sunspots, which is a well-known long-memory noisy time series, Ω ≈ 0. In this case the value of α obtained (α ≈ 2) confirms the results of Singh et al. 38 where a Hurst exponent close to 1 was found. Regarding the five time series of RR-intervals of healthy subjects, our algorithm identifies stochasticity (Ω ≈ 0) in all of them, which is consistent with findings of Ref. 9 .
The last empirical set analyzed reveals the nature of the dynamics of human gait: regardless of the age of the subjects, Ω ≈ 0 confirming the stochastic behavior discussed in 39 . In the inset we show that the α e value returned by the ANN decreases with the age, which is also in line with the results presented in 40 , obtained with Detrended Fluctuation Analysis (see Fig. 6 of Ref. 40 ). The authors interpret this variation as due to an age-related change in stride-to-stride dynamics, where the gait dynamics of young adults (healthy) appears to fluctuate randomly, but with less time-correlation in comparison to young children 40 .

Discussion
We have proposed a new time series analysis technique that allows to differentiate stochastic from chaotic signals, and also, to quantify temporal correlations. We have demonstrated the methodology by using synthetic and empirical signals.
Our method is based on locating a time series in a two dimensional plane determined by the permutation entropy and the value of a temporal correlation coefficient, α e , returned by a machine learning algorithm. In this plane, stochastic signals are very close to a curve defined by Flicker noise, while chaotic signals are located far from this curve. We have used this fact to define a quantifier, Ω, that is the distance to the FN curve. Ω serves to distinguish stochastic and chaotic time series, and it can be used to analyze time series, even when they are very short (with time series of 100 datapoints we found that Ω < 0.1 or Ω > 0.1, if the time series is stochastic or chaotic respectively, Fig. 5). We also found that small values of Ω can be used to identify underlying determinism in noise-contaminated signals, and in signals that result from the addition of a number of independent chaotic signals (Fig. 4). We have also used our algorithm to analyze six empirical datasets, and obtained results that are consistent with prior knowledge of the data (Fig. 6). Taken together, these results show that the proposed methodology allows answering the questions of how to quantify stochasticity and temporal correlations in a time series. Deterministic signals are the Chua circuit data (brown triangle) and the laser data (orange 'X' marker) that have Ω > 0.0. The other signals [the light intensity of a star (yellow dot), the number of sunspots (cyan diamond), the heart variability of healthy subjects (magenta thin diamond), and the different groups of human gait dynamics (green, blue, red, and black squares)] are stochastic and have Ω ≈ 0. For the human gait, the inset depicts the α e as a function of the age of each subject. Consistent with 40 , the correlation coefficient α e decrease with the ages.
Our algorithm is fast, easy-to-use, and freely available 28 . Thus, we believe that it will be a valuable tool for the scientific community working on time series analysis. Existing methods have limitations in terms of the characteristics of the data (length of the time series, level of noise, etc.). A limitation of our algorithm lies in the analysis of noise-contaminated periodic signals, because their temporal structure may not be distinguished from the temporal structure of a fully stochastic signal with a large α value. Future work will be directed at trying to overcome this limitation. Here, for a "proof-of-concept" demonstration, we have used a well-known machine learning algorithm (a feed-forward ANN), a rather simple training procedure, and a popular entropy measure (the permutation entropy). We have not tried to optimize the performance of the algorithm. We expect that different machine learning algorithms, training procedures, and entropy measures can give different performances, depending on the characteristics of the data analyzed. Therefore, the methodology proposed here has a high degree of flexibility, which can allow to optimize performance for the analysis of particular types of data.

Ordinal analysis and permutation entropy
Ordinal analysis allows the identification of patterns and nonlinear correlations in complex time series 29 . For each sequence of D data points in the time-series (consecutive, or with a certain lag between them), their values are replaced by their relative amplitudes, ordered from 0 to D − 1. For instance, a sequence {0, 5, 10, 13} in the time series transforms into the ordinal pattern "0123", while {0, 13, 5, 10} transforms into "0312". As an example, Fig. 7 shows the ordinal patterns formed with D = 4 consecutive values..
We evaluate the frequency of occurrence of each word, defined as the ordinal probability P(i) with ∑ D! i=1 P(i) = 1, where i represents each possible word. Then, we evaluate the Shannon entropy, known as permutation entropy 29 : The permutation entropy varies from S(D) = 0 if the j-th state P( j) = 1 (while P(i) = 0 ∀ i = j) to S(D) = ln(D!) if The normalized permutation entropy used in this work is given by: In this work, to calculate the ordinal patterns, we have used the algorithm proposed by Parlitz and coworkers 32 . We have used D = 6 and no lag, i.e., D − 1 values overlap in the definition of two consecutive ordinal patterns. Therefore, we use as features to the ANN (see below) the D! = 720 probabilities of the ordinal patterns. For a robust estimation of these probabilities, a time series of length N >> D! is needed. However, as we show in Fig. 5, the algorithm returns meaningful values even for time series that are much shorter.

Artificial Neural Network
Deep learning is part of a broader family of machine learning methods based on artificial neural networks (ANNs) 41 . In this work, we use the deep learning framework Keras 42 to compile and train an ANN. Since we want to regress the information of the features into a real value (classical scalar regression problem 43 ) an appropriate option is a feed-forward ANN. The ANN is trained to evaluate the time correlation coefficient considering as features the 720 probabilities of the ordinal patterns. We connect the input layer to a single dense layer with 64 output units connected to a final layer, regressing all the information of the inputs into a real number. Other combinations were tested with different numbers of units (16,512,1024) and layers. We found that a single layer with 64 units was sufficient to accurately predict the α value. The ANN parameters and the compilation setup are given in Table 2. As explained in the discussion we have used the feed-forward ANN as a simple option for a "proof-of-concept" demonstration. Other deep learning/machine learning methods or a different compilation setup may give better results depending on the type of data that is analyzed. The training stage of the ANN is performed using a dataset of 50 000 flicker noise time series with N = 2 20 points, where the parameter α of each time series is randomly chosen in −1 ≤ α ≤ 3 (see Section Datasets for details). We separate the dataset into two sets: the training dataset (L train = 40 000), and the test dataset (L test = 10 000). To quantify the error between the output of the ANN and the target, α, we use the mean absolute error: where L is the number of samples. The training stage is concluded and then the parameters of the ANN are fixed. After that, we apply the ANN to the test dataset, and the error in the evaluation of α e regarding α is E (L test ) ≈ 0.01.  47 ; if H > 0.5 (H < 0.5) the fGn time series exhibits long-memory (short-memory). The Hurst index is related to the α of flicker noise: for a fBm stochastic process, α = 2H + 1 and 1 < α < 3; for fGn, α = 2H − 1 and −1 < α < 1 47 .

Chaotic systems
In this paper, we analyze time series generated by five chaotic systems: 1) The generalized Bernoulli chaotic map, also known as β x map, described by: where β controls the dynamical characteristic of the map. Throughout this paper, we use β = 2, which leads to a chaotic signal 1 .
2) The well-known logistic map 1 : we use r = 4 to obtain a chaotic signal 1 .
3) The Schuster map 48 , which exhibits intermittent signals with a power spectrum P( f ) ∼ 1/ f z . It is defined as: where we use z = 0.5. 4) The skew tend map, which is defined as Here, we use ω = 0.1847 in order to obtain a chaotic signal 3 . 5) The also well-known Lorenz system, defined as: with parameters σ = 16, R = 45.92, and b = 4, which lead to a chaotic motion 49 . For analyze the time series of consecutive maxima of the x variable.

Empirical datasets
We test our methodology with empirical datasets recorded from diverse chaotic or stochastic systems. Additional information of the datasets can be found in Table 3. These are: Dataset E-I: Data recorded from an inductorless Chua's circuit constructed as 50 . The circuit was set up and the data was kindly sent to us by Vandertone Santos Machado 51 . The voltages across the capacitors depict chaotic oscillations. To detect this chaoticity we compute the maxima values of the first capacitor C 1 .
Dataset E-II: Fluctuations in a chaotic laser data approximately described by three coupled nonlinear differential equations 36 . To detect the chaoticity of the laser, we compute the maxima values of the time series. The data is available in 52 .
Dataset E-III: Light intensity of a variable dwarf star (PG1599-035) 36 with 17 time-series (segments). These variations may be caused by an intrinsic change in emitted light (superposition of multiple independent spherical harmonics 36 ), or by an object partially blocking the brightness as seen from Earth. The fluctuations in the intensity of the star have been observed to result in a noisy signal 36 . The data is open and freely available in 53 .
Dataset E-IV: Three time-series of the sunspots numbers for the period of 1976 -2013 38 , the daily sunspots numbers depicts a noisy "pseudo-sinusoidal" behavior. It is accepted that magnetic cycles in the Sun are generated by a solar dynamo produced through nonlinear interactions between solar plasmas and magnetic fields 54,55 . However, the fluctuations in the period in the cycles is still difficult to understand 56 . This type of data has been analyzed in 38 , where its stochastic fluctuations depict a Hurst exponent H ≈ 1, meaning the data carries memory. The data can be found at [57][58][59] .
Dataset E-V: Five RR-interval time-series from healthy subjects. Each time series have ∼ 100 000 RR intervals (the signals were recorded using continuous ambulatory electrocardiograms during 24 hours). It still a debate if the heart rate variability is chaotic or stochastic 9 . While some studies suggest that heart rate variability is a stochastic process 9, 60, 61 . Much chaos-detection analysis has been identified as a chaotic signal 9,62 . The dataset is open and freely available in 63 .
Dataset E-VI: Fractal dynamics of the human gait as well as the maturation of the gait dynamics. The stride interval variability can exhibit randomly fluctuations with long-range power-law correlations, as observed in 39 . Moreover, this timecorrelation tends to decrease in older children 39,40 . The analyzed dataset is then separated into 3 groups, related to the subjects' ages. Group No. 1 has data for n = 14 subjects with 3 -to 5 -yrs old; Group No. 2 has data for n = 21 subjects with 6 -to 8yrs old; Group No. 3 has data for n = 15 subjects with 10 -to 13 -yrs old; Group No. 4 has data for n = 10 subjects with 18to 29 -yrs old 39