A simple method for detecting chaos in nature

Chaos, or exponential sensitivity to small perturbations, appears everywhere in nature. Moreover, chaos is predicted to play diverse functional roles in living systems. A method for detecting chaos from empirical measurements should therefore be a key component of the biologist’s toolkit. But, classic chaos-detection tools are highly sensitive to measurement noise and break down for common edge cases, making it difficult to detect chaos in domains, like biology, where measurements are noisy. However, newer tools promise to overcome these limitations. Here, we combine several such tools into an automated processing pipeline, and show that our pipeline can detect the presence (or absence) of chaos in noisy recordings, even for difficult edge cases. As a first-pass application of our pipeline, we show that heart rate variability is not chaotic as some have proposed, and instead reflects a stochastic process in both health and disease. Our tool is easy-to-use and freely available.

: Dynamical systems can be broadly categorized as linear or nonlinear, and as deterministic or stochastic. Both chaotic and periodic processes are nonlinear and deterministic, but chaotic processes have a positive largest Lyapunov exponent (meaning that initially similar system states diverge exponentially fast) and periodic processes have a negative largest Lyapunov exponent. 7 Supplementary Figure 5: We found that inclusion of a noise term σ in the modified 0-1 test for chaos (see Eq. 3 in Methods) can lead to inaccurate results for signals with very low amplitude/standard deviation. To illustrate this effect, we ran the modified 0-1 test on a single simulation, with either 0% or 40% measurement noise, of each deterministic process in Tables 1-2. To vary standard deviation, we multiplied each simulation by a constant, such that its standard deviation was fixed at a given value, out of 100 values logarithmically spaced between 10 −4 and 10 1 . We then calculated the K-statistic (i.e. the output of the modified 0-1 test) for both σ=0 (A-B) or σ=0.5 (C-D). Here, we plot the mean K-statistic across all chaotic systems (blue), as well as the mean K-statistic across all periodic systems (red), as a function of signal standard deviation. For σ =0, K was invariant to the standard deviation of the signal, for either 0% measurement noise (A) or 40% measurement noise (B). For σ =0.5, on the other hand, K fell to zero for both periodic or chaotic signals as their standard deviation approached zero, for both 0% measurement noise (C) and 40% measurement noise (D). This is likely because the inclusion of a non-zero noise term in Eq. 3 of the 0-1 test (see Methods) overwhelms the mean squared displacement of signals with very small standard deviations. Note that for σ =0.5, K asymptotes at around a standard deviation of 0.5 for both levels of measurement noise. As such, we modified the 0-1 test to normalize the standard deviation of a test signal to 0.5 (see Methods). 8 Supplementary Figure 6: Though all analyses in the main body of the paper were performed on time-series with 10,000 time-points, we also considered how to optimize the algorithm's performance for shorter time-series. In particular, it is known that the K-statistic of the 0-1 test approaches 1 for chaotic systems and approaches 0 for periodic systems as the length of a time-series is increased, but that it can yield intermediate -and therefore ambiguous -results for shorter timeseries. As such, we ran the chaos-testing portion of our algorithm's pipeline (which consists of signal de-noising, then normalizing the standard deviation of the signal, then applying the 0-1 test, and then classifying the system as either periodic or chaotic based on some cutoff of the outputted K-statistic) on sub-samples of all non-oversampled deterministic datasets in Tables 1 and 2, including all levels of measurement noise. We did not include signals classified as oversampled in this analysis, as the downsampling step of our algorithm would shorten the length of the timeseries, which would warp the relationship between the optimal K-statistic cutoff and time-series length. We ran the pipeline on samples ranging from 1,000 to 9,000 time-points in length, in intervals of 1,000. For each time-series length, we calculated the F1 score of different K-statistic cutoffs, ranging from K=0.005 to K=0.995, in steps of 0.005. Here, we plot the smoothed vector of optimal K-statistic cutoffs for each time-series length. As expected, given the fact that the Kstatistic approaches 1 for chaotic signals as time-series length is increased, the optimal cutoff also increased as a function of time-series length, asymptoting near K=0.985 for longer time-series. Thus, if no cutoff is provided to the algorithm, it will automatically pick a cut-off based on the smoothing spline fit plotted here in red. If the predicted optimal cutoff is greater than 0.99, the algorithm picks a cutoff of 0.99 (since the K-statistic is upper-bounded by 1). With this automated cutoff selection, we have confirmed that the Chaos Decision Tree Algorithm performs at very high accuracy for time-series with 1,000 points (Supplementary Table 16), 5,000 points (Supplementary  Table 17), and 10,000 points (Tables 1-3).  Tables 2-3), but other studies emphasize the importance of first testing for stationarity before using such methods 1, 2 (due to the inherent stationarity of Fourier-based surrogates). A stationary signal is one whose unconditional joint probability distribution is time-invariant. We here assess the performance of a number of stationarity tests (see Patterson 3,4 for a thorough review of such tests and their relative strengths/weaknesses). Datasets analyzed include all stationary processes in Tables 1-2 (including bounded random walks, autoregressive processes with moving averages, and nonlinear stochastic processes, all of which are difficult edge cases for such tests 3,4 ), all unit root processes in  1 . For all surrogate algorithms, the data are first pre-processed such that the start and end points of the data and their first derivatives are matched as closely as possible. We used permutation entropy as our test statistic, so that a signal was classified as stochastic if its permutation entropy fell within the distribution of permutation entropies calculated from the 500 surrogate time-series. We tested the effect of Schreiber denoising ("Denoised") vs. no denoising ("Raw") on test accuracy. We further tested the effect of normality transformation using the Box-Cox method ("Transformed (Box-Cox)"), following the recommendation of Chan and Tong 11 , or using a rank-based inverse normal transformation ("Transformed (INT)"), vs. no normality transformation ("Non-transformed"). Moreover, we tested the benefit of excluding signals classified as non-stationary by Breitung  Here, we tested whether generating 1,000 surrogate time-series from all raw time-series would yield higher performance in discriminating stochastic from deterministic processes. Though minor, we did observe higher F1 scores with a larger number of surrogates, and so the Chaos Decision Tree Algorithm uses 1,000 surrogates in its stochasticity test. We note that it is possible, and perhaps likely, that including even more surrogate signals in the stochasticity test will yield even more accurate results, and so the algorithm also allows the user to specify how many surrogates to generate.  Table 4: To further test the relationship between stationarity and the accuracy of our surrogate-based stochasticity test, we broke down the performance of both Breitung's Variance Ratio test and our surrogate-based stochasticity test (which uses permutation entropy and a combination of Amplitude Adjusted Fourier Transform and Cyclic Phase Permutation surrogates) on all non-stationary processes analyzed in this paper, as well as stationary processes that are classically difficult to distinguish from non-stationary processes 3,4 . These included bounded random walks (which, though globally stationary, have local unit roots) and autoregressive processes with a moving average component. We found that our surrogate-based stochasticity test performed with near-perfect accuracy for linear stochastic non-stationary processes, i.e. random walks, trended random walks, and a cyclostationary autoregressive process, as well as for stationary linear stochastic processes with moving averages (which can be difficult to distinguish from non-stationary processes 3, 4 ). Where performance was slightly worse was for nonlinear stochastic processes, even though those processes were stationary (these processes were bounded random walks, the Freitas map, and the sine map). This suggests that non-stationarity may not affect the performance of surrogate-based stochaticity tests when such tests include non-Fourier based surrogates, such as the Cyclic Phase Permutation surrogates used here, though this possibility should be investigated more systematically in future work. In light of this result, and the results reported in Supplementary Supplementary Table 5: We tested the performance of three de-noising algorithms: a moving average filter (using Matlab's smooth.m function), Schreiber de-noising 16 , and wavelet de-noising using an empirical Bayesian method with a Cauchy prior (using Matlab's wdenoise.m function).

Supplementary
For each system, we created 100 noise-free simulations, and added white noise to those simulations, the amplitude of which was 40% the standard deviation of the original signals. We then applied the de-noising algorithms to the noise-contaminated signals, and calculated the Pearson correlation between the de-noised signals and the original noise-free signals. Shown: mean Pearson correlations across 100 datasets per system. Schreiber de-noising vastly outperformed the other two approaches, and so the Chaos Decision Tree Algorithm automatically uses Schreiber de-noising; the user can also specify that one of the other two de-noising methods be used.  Figure 4), with standard error, across 100 samples of three oversampled simulations (the oversampling statistic η, whose mean across 100 samples is reported on the left, is the difference between the global maximum and global minimum of the signal divided by the mean absolute difference between consecutive time-points in the data 17 . If η > 10, then the downsampling approach simply downsamples the data until η ≤ 10 or until there are fewer than 100 time-points left in the downsampled signal). The 0-1 test was performed on each dataset without downsampling, with downsampling, and with only local minima and maxima of the signal (following an alternative approach suggested by Eyébé Fouda and colleagues 18 to improve 0-1 test performance, which a user can select to use instead of downsampling). Note that for noisecontaminated oversampled data, only Schreiber de-noising followed by downsampling brings the K statistic within ranges expected for periodic or chaotic systems (highlighted).         Table 14: Spearman correlations between largest Lyapunov exponents and permutation entropy calculated from raw data (i.e. data that were not de-noised and downsampled). While the correlations are still strong and significant for the discrete-time logistic and Hénon maps, performance is very poor for the continuous Lorenz system and mean-field cortical model. This is to be expected, because permutation entropy is equivalent to Kolmogorov-Sinai entropy (which is upper-bounded by a system's positive Lyapunov exponents) for discrete systems 24 , and downsampling is effectively a discrete mapping of a continuous process. This is the same reason that de-noising and downsampling improves performance of the 0-1 test for continuous systems (Supplementary Table 6). Note that largest Lyapunov exponents in the cortical model are rough approximations (see Methods). *** p<0.001 after Bonferroni-correcting for multiple comparisons to the same set of ground-truth largest Lyapunov exponents.    Table 2 are for random values of the moving average parameter θ.

Noise-free
We sought to further test whether the moving average parameter θ had any systematic effect on our algorithm's performance. Here, we set the parameter φ to 0.99 as we did in Supplementary Tables  1-4 and in Table 2, and tested our algorithm on ARMA(1) processes with four different values of θ: -0.5, 0, 0.5, and 0.9. 10,000 time-points were generated for each simulation. Performance was high for all parameters. Measurement noise. All empirical recordings are contaminated by some level of measurement or "observational" noise, which is noise that is not intrinsic to a system. In other words, a system could be entirely deterministic, but because of measurement error, a signal recorded from that system could be noisy. To simulate such measurement error, we added random white noise of varying amplitudes to the datasets in Tables 1-5 and Supplementary Tables 1-4, 8-19.
Dynamic noise. Some systems have noise built in to the dynamics of the system. Such dynamic or "intrinsic" noise could be negligible, in that it is washed out by a system's deterministic components (and the system can therefore be modeled, in theory, by deterministic equations). Other systems, on the other hand, are significantly affected by intrinsic noise. Such systems are considered stochastic: no matter their initial conditions, they will always evolve over time differently, because their dynamics have some intrinsic randomness. In single-neuron dynamics, for example, there may be inherent stochasticity because of the probabilistic gating of voltage-dependent ion channels 25 , though such stochastic events may be "washed out" by predominantly deterministic processes on larger scales. Several such systems were analyzed in this paper, including the noise-driven sine map, the Freitas map, bounded random walks, random walks, a cyclostationary autoregressive process, an autoregressive moving-average process, a random multivariate autoregressive process, colored noise, the stochastic Lorenz system, the stochastic Rössler system, the North Atlantic Oscillation index, and essential and Parkinson's tremors. In general, it is difficult to distinguish these stochastic (i.e. intrinsically noisy) processes from deterministic processes that are contaminated by measurement noise; it is also difficult to distinguish either case from deterministic chaotic processes (see below for definition).
Linear. The state of a linear process is directly proportional to its inputs or previous state (e.g. y = ax). Nonlinear. The state of a nonlinear process is not directly proportional to its inputs or previous state (e.g. y = ax 2 ).
Largest Lyapunov exponent. The largest Lyapunov exponent of a system quantifies the largest rate of divergence of initially infinitesimally close trajectories through phase space (see below for definition).

Chaotic.
A system is chaotic if it is bounded, deterministic, nonlinear, and has a positive largest Lyapunov exponent, meaning that initially similar phase space trajectories diverge exponentially fast.

Periodic.
A system is periodic if it is deterministic, nonlinear, and has a negative largest Lyapunov exponent, meaning that initially similar phase space trajectories remain close.
Quasiperiodic. The dynamics of a quasiperiodic system exhibit regular cycles like those of a periodic system; but, unlike a purely periodic system that stably revisits the same system states, quasiperiodic systems return to states that are similar but not identical to previous states.
Quasiperiodic systems are also not chaotic, because they are not sensitive to initial conditions (i.e. they have a negative largest Lyapunov exponent).
Period-doubled. In many dynamical systems, modulation of a system parameter can lead to an abrupt change in the system's dynamics, such that it oscillates at twice its original period.
These systems are periodic (i.e. they have a negative largest Lyapunov exponent).
Strange non-chaotic. A strange non-chaotic system has a strange (i.e. fractal) phase space attractor like a chaotic system, but a negative or zero largest Lyapunov exponent. It is generally difficult to experimentally distinguish strange non-chaotic systems from chaotic systems 26 .
Hyperchaotic. A hyperchaotic system is a deterministic, nonlinear system with more than one positive Lyapunov exponent. These systems are generally difficult to distinguish from noise 27 .
Colored noise. Colored noise refers to stochastic processes with a non-uniform power spectrum (i.e. different levels of power at different frequencies). It is difficult to distinguish colored noise from chaos 27,28 .
Degree of chaos. The magnitude of a system's largest Lyapunov exponent quantifies its degree of chaos. Higher largest Lyapunov exponents indicate higher degrees of chaos, because they indicate faster rates of divergence in phase space.
Nonlinear stochastic. A nonlinear stochastic system is a nonlinear system with randomness built in to its evolution over time, making it difficult to distinguish from chaotic systems 27 .

Stationarity.
A stationary process is one whose joint probability distribution is time-invariant; in other words, for a stationary process, statistical properties like mean and variance do not fluctuate over time.
Attractor. An attractor is the orbit in phase space toward which a deterministic system tends to evolve. The attractors of chaotic systems are called "strange attractors" because they have a fractal structure.
Phase space. A space representing all possible states of a system. A single point in phase space corresponds to a single state of the system. For example, for one particle, a single point in phase space determines that particle's location and momentum. A dynamical system produces, in general, trajectories in its phase space, i.e., the system's state changes with time.
Schreiber de-noising algorithm. Almost two decades ago, Schreiber introduced a simple nonlinear noise-reduction algorithm 16 , which replaces each point in a time-series with the average value of that point's "neighborhood" in phase space (see above for definition of phase space). The algorithm first uses delay coordinate embedding to create a map that is topologically equivalent to a system's ground-truth phase space attractor (a very common procedure in nonlinear time-series analysis 29 ). Each point's neighborhood in phase space is defined by the number of steps k in that point's past and the number of steps l in that point's future that are used to construct embedding vectors, as well as the radius r of that point's neighborhood in phase space. The parameters k and l are set to 1, and the radius r is set to the standard deviation of the time-series.
Surrogate testing. A common approach for testing if a given time-series reflects a deterministic process is to create surrogates of that time-series, which share some key features with the original time series, such as its power spectrum and amplitude distribution, but are otherwise stochastic 1, 12 . A "test statistic" is then calculated for both the original time-series and for the set of surrogate datasets, and if the value of the test statistic for the original time-series lies outside the distribution of values for the surrogate datasets, then the original time-series likely reflects a deterministic process 1, 12 . We follow Zunino and Kulp 30 and use permutation entropy as our test statistic (see below). We further tested a range of surrogate data generation algorithms, and picked a combination of Amplitude Adjusted Fourier Transform (AAFT) surrogates 12 and Cyclic Phase Permutation (CPP) surrogates 14 , which led to the highest performance in detecting signal stochasticity (Supplementary Tables 2-3).
0-1 test for chaos. Gottwald and Melbourne's 0-1 test for chaos uses a given signal to drive a simple 2-dimensional system, and calculates the growth rate K of the mean square displacement of that system. K will approach 0 for periodic systems and will approach 1 for chaotic systems 26,[31][32][33][34] . See Methods for more details. While the test has been used for some physics and engineering applications, it has seen only very limited application to biology 35 .
Receiver operating characteristic (ROC) curve. An ROC curve assesses the accuracy of a binary classifier by plotting its true positive rate vs. false positive rate for different discrimination thresholds (in the case of the 0-1 test, the threshold in question is the cutoff for what K-statistic values that are classified as chaotic or as periodic). The more accurate a classifier is across discrimination thresholds, the larger its area under the curve in an ROC plot will be.
Permutation entropy. Permutation entropy is an extremely quick-to-compute and noiserobust measure of a signal's complexity 36 . Following Zunino and Kulp 30 , the Chaos Decision Tree Algorithm uses permutation entropy to test if a time-series is deterministic or stochastic, by comparing the permutation entropy of a signal to the permutation entropies of its surrogates ( Figure 1, Methods): if a given time-series reflects a predominantly deterministic process, then it will have a lower permutation entropy than its surrogates, since surrogates are inherently stochastic and therefore higher entropy than a matching deterministic (even deterministic chaotic) process. It's also for this reason that permutation entropy tracks degree of chaos 36,37 , as stronger chaos means less predictability, and therefore more entropy. More formally, we should in general expect a close relationship between permutation entropy and systems' degree of chaos, because permutation entropy is equivalent to Kolmogorov-Sinai entropy for a broad class of discrete-time dynamical systems 24,[38][39][40] . Kolmogorov-Sinai entropy is a mea-