Introduction

One of the challenges underpinning the study of many real-world systems is that the mechanisms governing their behaviour are not easily accessible and can only indirectly be observed through their visible dynamics; making a parallelism with genetics, the phenotype should be used to infer the genotype. One of the most basic questions that can be made is whether observed time series are generated by stochastic processes or are instead the result of a deterministic dynamics. In the latter case, models can be used to forecast the evolution of the system with an arbitrary precision, something not possible for a stochastic process, and ultimately, to completely describe the system under study. On the other hand, a stochastic dynamics may indicate that the observable under study is not the most appropriate one. Answering to such question can nevertheless be a daunting task. First of all, the extracted time series are usually a combination of both aspects, i.e. the observations we perform on a system are most probably polluted by some kind of observational or systemic noise. Second, real-world systems can only be described by finite, and usually short, time series: the answer may then vary according to the used methods and parameters1,2,3. Third, it is long known that some deterministic systems produce time series that are (almost) indistinguishable from noise: chaotic systems. Chaotic systems can be defined as a class of dynamical systems displaying three properties: sensitivity to initial conditions, topologically transitiveness, and presence of dense periodic orbits4. Finally, noise and chaos are two intermingled concepts, as for instance the former can induce the latter5.

Given a time series, determining its stochastic or deterministic nature is in theory simple and involves measuring two quantities. On one hand, the maximum Lyapunov exponent (λ), describing the sensitivity to initial conditions; and the Kolmogorov–Sinai entropy per unit time (hKS) on the other hand, i.e. the amount of information the system is generating. Yet, these two measures can only be calculated in infinitely long time series; the answer is thus not readily available for real, and hence finite, observations. Not surprisingly, many other methods have been proposed, e.g. based on finite-size Lyapunov exponents6, power spectra7, non-linear forecasts8,9,10, recurrence networks11,12 or visibility graphs13,14.

A promising approach is the one yielded by the concept of permutation patterns, introduced in 2002 by Bandt and Pompe15. Given a time series xt, this is partitioned in (usually, but not always, overlapping) subsets of consecutive values (xt, xt+τ, …, xt+τd); d, i.e. the number of values composing each subset, is called the embedding dimension, and is usually set between 3 and 7; and τ is the embedding delay, or the time between consecutive points, usually set to 1. A permutation pattern (also called order and ordinal pattern interchangeably in the literature) πt is then associated with each subset, i.e. the permutation that has to be applied to the subset to obtain an increasing set of values—see Fig. 1a. Once this process is repeated for all subsets of values, statistics can be calculated over the resulting permutation pattern distribution, to obtain e.g. entropies15,16,17 or irreversibility18,19,20, or to discriminate the nature of the underlying systems21,22,23,24,25. When compared to other time series metrics, this approach has several major advantages: it is almost parameter-free, in that symbols (i.e. the permutation patterns) naturally emerge from the time series; it has low sensitivity to noise and is invariant under monotonic transformations of the values; is computationally inexpensive; and, possibly the most important point, naturally takes into account the causal order of elements. The interested reader will find several reviews on the concept of permutation entropy26,27,28, and many works discussing specific aspects of it29,30,31,32,33,34, as e.g. the choice of the best embedding dimension.

Fig. 1: Mapping time series to ordinal patterns.
figure 1

a Example to illustrate the methodology of mapping a time series into a set of overlapped ordinal patterns, for an embedding dimension d = 3 and an embedding delay τ = 1. Each pattern is obtained by substituting the original values by the respective ranking. For instance, the second value if the smallest of the first three and is thus mapped to number 1 in the first pattern. Note that different symbols have been used to denote the main parameters used in permutation patterns analysis; b reports the symbols used in this work and common variants in the literature.

Given the success of permutation patterns and permutation entropy in characterising time series, a natural question is whether they can be used to assess their chaotic or stochastic nature. Many approaches leveraging on this idea have been proposed, both using variants of the permutation entropy alone and by combining it with other time series metrics. In this review, we provide an in-depth analysis of the theoretical foundations and the applicability of such methods to time-discrete systems, organised into three categories: individual metrics based on the permutation patterns idea; combination of metrics yielding plane representations; and pipelines, i.e. decision trees like algorithms35. These metrics are then compared using a set of synthetic time series, to assess both their discriminative performance and their computational cost. We further discuss some preliminaries on the use of such metrics, including their applicability to continuous systems, and their associated theoretical limits. This review will therefore serve as a guide, both to the practitioner new to this topic who wants to analyse real-world time series and to the experienced researcher looking for new venues of investigation.

Some preliminary practical considerations on evaluating permutation patterns

Before digging into the study of the individual algorithms based on permutation patterns, we here discuss three preliminary issues that can cause problems to the researcher new to this topic. Note that we suppose the reader to already be familiar with the concept of permutation entropy; otherwise, we invite him/her to refer to several comprehensive reviews available in the literature26,27,28.

  • Labelling of permutation patterns. There are two ways of naming permutation patterns, which have vastly been used in the literature in an interchangeable way. Let us consider the case of the pattern associated with values x = {8, 3, 6}. On one hand, this pattern may be defined as the sequence of ranking positions of the individual elements, in ascending order; the first element of the ranking would then be 3, followed by 6 and 8. Substituting the values by the respective ranking, one gets π = (3, 1, 2). On the other hand, the pattern can be created by combining the indexes of the elements giving the correct ascending order. In the previous example, the smallest element is the second, followed by the third and the first; the resulting pattern is then π = (2, 3, 1). This difference in labelling does not affect any result, as what is important is the relative frequency of patterns.

  • Naming convention. Parameters used in the calculation of permutation patterns and the corresponding results have been denoted by different symbols in the literature—see Fig. 1b for a list of the most common variants. Special attention should be paid in those case in which the same symbol represents two different things, as e.g. L being both the embedding delay36 and dimension37.

  • Equal values in the time series. When analysing real-world time series, it is not uncommon to find equal values, something especially frequent when dealing with low-resolution time series, as e.g. weak biological signals. Note that this is not compatible with the previous definition of permutation patterns, which requires values to be uniquely sorted. The two main solutions were already proposed in the original paper of Bandt and Pompe15. First, one can disregard equal values by e.g. converting the inequality x1 < x2 < x3 to x1 ≤ x2 ≤ x3; second, one can numerically break equalities by adding small random perturbations, e.g. a normally distributed noise of low amplitude. A third solution entails mapping the equal values onto the same symbols38; to illustrate, the time series x = {6, 6, 82} is transformed in the permutation pattern π = (1, 1, 3). However, this latter strategy does not lead to a uniform distribution for a totally random sequence, i.e. for white noise. The impact of equal values is still a matter of debate in the scientific community. It has been shown that spurious temporal correlations can not only be introduced by repeated values31 but also that they have a small or even negligible impact in real-world time series with a decent resolution34. In any case, the reader should be aware of this limitation and eventually pre-process the data accordingly.

Finally, as previously introduced, this review is mainly focussed on the analysis of time-discrete systems. Still, the methods here presented have been the foundation for extensions covering irregularly sampled and continuous time series. Regarding the former case, Kulp et al.39 analysed for the first time irregularly sampled time series, i.e. when data are missing or a time jitter is present. In both cases, a low degree of irregularity does not prevent missing patterns (introduced in section “Individual metrics”) to detect determinism. The interested reader can also explore subsequent works40,41. On the other hand, several studies have focussed on continuous systems39,42,43,44,45 and on continuous systems with delay46,47. The reader should nevertheless be aware that (at times, hidden) assumptions of these approaches is that the sampling frequency is the correct one to describe the underlying dynamics, something which can be estimated using classical tools like the Nyquist frequency, and that the embedding delay τ is fixed to one, even though multi-scale analyses are possible and beneficial48,49.

Individual metrics

Missing ordinal patterns

The first proposal for a metric able to discriminate stochastic from deterministic dynamics using permutation patterns and entropy was formulated back in 2006 by Amigó and co-workers50. They started from realising that, while some chaotic maps are able to produce time series much resembling random noise, at the same time they cannot reproduce all ordinal patterns. To illustrate, consider the case of the well-known logistic map f(x) = 4x(1 − x); it is easy to show that the pattern π = (2, 1, 0) can never appear, as well as the general pattern π = (\(*,2,*,1,*,0,*\)) (where \(*\) denotes any other ranking position), as the succession f2(x) < f(x) < x can never occur. This, in turns, triggers an avalanche of longer forbidden patterns, which also cannot occur50. Given one time series, one can then suspect the presence of a deterministic dynamics if at least one permutation pattern never shows up.

This approach nevertheless presents a caveat: when analysing short time series, i.e. in many real applications, patterns may be missing just because of a statistical effect, i.e. not enough dynamics has been observed to allow all patterns to emerge. These are the so-called false missing patterns. Amigó et al.51 showed that the minimum time series length to avoid such effect is (d + 1)!, with d being the length of the patterns. Alternatively, one can analyse how the number of forbidden patterns decays as a function of the time series length, being such decay faster in deterministic maps51.

A more quantitative statistical test was subsequently proposed by the same authors52. Suppose a time series on which permutation patterns of size d are extracted using sliding windows overlapping at a single point (i.e. the last point of a window is the first point of the next one). The total number of resulting subwindows is denoted by K. Next, one counts the number of times a specific permutation pattern i is observed and denotes it as νi, with i = {1, 2, …, d!}. The deterministic nature of the time series can be tested through a χ2 test with statistic:

$${x}^{2}=\mathop{\sum }\limits_{j=1}^{d!}\frac{{({\nu }_{j}-K/d!)}^{2}}{K/d!}.$$
(1)

In other words, the aim is to compare the observed distribution of pattern frequencies with the one expected in a completely random time series—which, by definition, is uniform. The advantage of this test is that not only it detects cases of missing permutation patterns but also situations in which their relative frequency is unbalanced.

Going back to the concept of missing patterns, an additional method based on it was proposed in 2019 by Olivares and co-workers45. This method is first based on dividing the original time series into sliding, nonoverlapping windows of size w. The number of unobserved (i.e. missing) ordinal patterns is then counted within each one of these windows, and the average number \(\langle N(w,d) \rangle\) is calculated—note that this number is given as a function of w and of the embedding dimension d. Finally, the decay of the missing patterns is modelled with a stretched exponential function:

$$\langle N(w,d) \rangle =A{e}^{-B{w}^{\gamma }},$$
(2)

where A is a constant, B is the characteristic decay rate and γ is the stretching exponent. The estimated B and γ can then be used to classify the time series; specifically, stochastic time series are characterised by B < 10−2 and γ > 0.6 when using d = 6, while chaotic systems (even polluted by additive noise) have larger values of B and smaller values of γ.

Being the concept of missing or infrequent ordinal patterns the foundation of all methods here presented, a review of all real-world applications would exceed the scope of the present work. The interested reader can nevertheless refer to the many reviews on the topic26,27,28,53,54.

Permutation spectrum test

Kulp and Zunino proposed in 2014 a method that can be seen as an evolution of what previously described, and which is based on the idea of the permutation spectrum test. If the method of Amigó et al.52 was based on reconstructing and studying the shape of the probability distribution created by permutation patterns, Kulp and Zunino55 shift the focus to the variability of such distribution.

As usual, let us suppose a time series, which is here divided in non-overlapping windows of length l. The probability distribution of permutation patterns, here called spectrum55, is then calculated, i.e. how many times each pattern appears in each window. Finally, the standard deviation of the probability associated with each pattern is calculated. The key to distinguish between different types of dynamics resides in those standard deviations. For instance, all patterns will have no variability at all (i.e. a standard deviation of zero) for periodic signals—as the same permutations will appear over and over without any variation. Conversely, stochastic time series will be characterised by a positive variability for all permutation patterns. Finally, chaotic signals will lie in the middle, with some patterns (i.e. the forbidden ones) having zero standard deviation.

This approach has a remarkable advantage: by studying subwindows of the original time series, it is able to detect intermittent dynamics, provided a suitable value of l is chosen. On the other hand, all patterns have to appear, and l must be large enough—using the standard rule proposed by Amigó et al.51, l (d + 1)!, with d being the embedding dimension; hence, the time series should globally be much larger than l, in order to have enough values for the calculation of the standard deviation. All in all, this could be a problem in real-world applications, where the length of the time series is limited.

The permutation spectrum test has been applied to several real-world time series, including the pressure signal acquired from a turbulent combustor with bluff-body and swirler as flame holding devices56, measures of flame fronts in a lean swirling premixed flame generated by a change in gravitational orientation57 and ocean ambient noise58.

Complementing permutation entropy with cylinder sets

One theoretically simple way of discriminating chaotic from stochastic processes is by calculating their KS entropy hKS, which represents the entropy required to characterise the evolution of trajectories59. In other words, and given an initial value x0, hKS tells us how the precision of our prediction for value xn decreases with n. As such, it has been proven that there is a relationship between hKS and the Lyapunov exponents of the system60. A test is then easy to define: hKS = 0 for stable systems; chaotic systems are characterised by finite (positive) Lyapunov exponents, and hence finite (yet, greater than zero) hKS; finally, for stochastic systems, as the future cannot be forecasted at all, hKS diverges to infinity. This is nevertheless not simple from an applied point of view, as the computation of hKS can only be done in an approximate way61. A solution may come from the permutation entropy. Specifically, it has been shown that eP is equal to hKS for one-dimensional (1D) and 1D-like maps62,63. Note that here eP is the time derivative of the permutation entropy, i.e. eP = [EP(d + 1) − EP(d)]/T, d being the embedding dimension and T the time resolution (or the length of the time steps) of the series. Yet, what about more complex chaotic systems?

Politi proposed an elegant solution64. First of all, he defines σi(j) as the dispersion of xi(j) (with 1 ≤ j ≤ d), i.e. of all the values in the jth position of the ith ordinal pattern. σi(j) thus tells us how much uncertainty we have about the value of the time series, if we know that such value appears in the jth position of the ith ordinal pattern. As the system evolves over time, the largest value of σi(j) will always be found for j = d, and hence the important value is σi(d). This value is then used to define a relative permutation entropy, as \({\tilde{E}}_{p}(d)={E}_{p}(d)+D\langle {{{{{{\mathrm{log}}}}}}}\,({\sigma }_{i}(d))\rangle\). Note that 〈〉 denotes the average, while D is the fractal (information) dimension of the system. Finally, he demonstrates that \({\tilde{e}}_{p}(d)=[{\tilde{E}}_{p}(d+1)-{\tilde{E}}_{p}(d)]/T\) yields a reliable estimation of hKS.

A final issue has to be solved: how can one calculate \({\tilde{e}}_{p}(d)\) when D is not known? D can be estimated through the Kaplan–Yorke formula65, but this in turns requires knowing the Lyapunov exponents of the system. The solution entails substituting D with an unknown parameter δ, yielding \({\tilde{e}}_{p}(d,\delta )\)—note that the standard permutation entropy is recovered for δ = 0. When δ < D (δ > D), \({\tilde{e}}_{p}(d,\delta )\) will converge to the asymptotic value hKS from above (respectively, from below). Therefore, D can be estimated as the value of δ such that \({\tilde{e}}_{p}(d,\delta )\) is independent of d.

This approach, although theoretically powerful, presents two important drawbacks. First of all, it is theoretically complex and requires a deep understanding of dynamical systems. Second, it requires long time series, as \({\tilde{e}}_{p}(d)\) is given by an asymptotic process for large values of d—which leads back to the condition l (d + 1)!51. These are probably the reasons why, to the best of our knowledge, this approach has never been applied to the study of real-world time series.

Successions of permutation patterns

If all previous methodologies were based on the analysis of the appearance of permutation patterns taken in an isolated way, another possibility is available: consider how they follow one another. Specifically, let us consider a time series xt, and the corresponding sequence of permutation patterns πt. Transitions between patterns can be evaluated as the number of times (or the probability) that a given permutation pattern is followed by a second one. To illustrate, the time series x = (1, 2, 3, 4) will have the pattern πt=1 = (1, 2, 3) followed by πt=2 = (1, 2, 3). This can then be mapped into a ordinal patterns transition graph Gπ, whose element gi,j is the probability that pattern j follows pattern i.

A few observations have to be made on this mathematical object. First of all, it is associated with an important increase in the dimensionality of the problem, i.e. one moves from having d! patterns (where d is the embedding dimension) to d!2 transitions. This allows for a much richer description of the system’s dynamics but also implies that longer time series are needed, in order to achieve reliable results. Second, elements of G may not be independent. To illustrate, let us consider the time series x = (1, 2, 3, y), where y can be any value. Independently on the value of y, the second pattern will never be π = (3, 2, 1), i.e. a monotonic decreasing pattern, as the last part of the previous pattern (i.e. (2, 3)) constrains the initial part of the second one. This can easily be solved through two different strategies. First, by considering non-overlapping patterns, such that, for an embedding dimension of d, only patterns πt, πt+d, πt+2d, … are analysed. Second, by using an embedding delay τ > 1, to also ensure that consecutive patterns are calculated over different data. On the other hand, the calculation of this transition graph inherits the main advantages of the permutation pattern methodology: its conceptual simplicity, the reduced computational cost, especially when compared to other graph-based techniques13,66,67,68, and its robustness against noisy data.

The concept of transitions has first been proposed by Small in 201344 and consisted in identifying those permutation patterns that followed each other. Note that the resulting structure (i.e. a network) was undirected and unweighted, that is, pattern transitions probabilities were disregarded. Different dynamical systems, including chaotic maps and stochastic systems, were then identified by calculating the ordinal pattern length that maximised the amount of information in the network.

Transitions were then exploited by Borges et al.48, who specifically focusses on the probability of self-transitions—i.e. when a pattern is followed by the same one. Authors propose a metric assessing the probability of self-transitions, i.e.

$${p}_{{{{{{\rm{st}}}}}}}=p({\pi }_{i},{\pi }_{i})=\mathop{\sum}\limits_{i}{g}_{i,i}.$$
(3)

pst is trivially equal to 1/d! for completely random time series (provided τ > 1 and the time series is long enough), as a permutation pattern can be followed by any other one with the same probability. Nevertheless, pst increases with the determinism of the time series, making it easy to detect different types of coloured noise and chaotic systems. This is used to construct a discrimination model, based on the application of support vector machines on the parameters of a fit of the evolution of pst as a function of τ, giving excellent classification scores48.

Leveraging on a similar idea, Olivares et al.69 propose to look at the local and global dynamics of transition patterns. Specifically, they propose the use of two metrics. The first one is the minimum pattern entropy, where the entropy of each pattern is calculated through the distribution probability of the patterns following it. The second metric is the conditional permutation entropy of the whole transition matrix, calculated as the sum of each pattern entropy weighted by its stationary distribution. Note that the former metric described the local (i.e. pattern-centric) dynamics, while the latter the global one. Additionally, by leveraging on the similarity with the concept of networks, both metrics are also called node and network entropies. The final result is a pair of values for each time series, which can then be represented in a bi-dimensional plane—an approach that will largely been exploited in the next section.

In spite of the conceptual simplicity of this approach, few real-world applications have been proposed. Beside the examples included in the original papers44,48,69, the interested reader can also refer to the use of permutation pattern transitions to characterise electrocardiogram data70.

When permutation patterns meet other metrics

If the entropy and variability of the distribution associated with permutation patterns have proven to be useful, there is also a natural limit in the amount of information that can be conveyed by a single metric. A natural evolution is then to consider multiple (complementary) metrics at the same time and represent them as points in a plane (or a space). In this section, we review the planes that have been proposed in the literature, starting from the celebrated complexity–entropy plane.

The original complexity–entropy plane

The first example of a complexity–entropy plane was proposed by O. Rosso and co-authors71. Time series are characterised by two values, which are used to position them in a bi-dimensional plane. The first one is the classical permutation entropy, normalised in the interval [0, 1]: ES[P] = S[P]/Smax, where P is the distribution created by the permutation patterns, S the classical Shannon entropy, and \({S}_{{{{{{\rm{max}}}}}}}={{{{{{{\mathrm{log}}}}}}}\,}_{2}N\). (Note that the original work71 uses natural logarithms, as opposed to logarithms in base 2 common in information theory.) The second metric is the intensive statistical complexity measure CJS[P], defined as:

$${C}_{{{{{{\rm{JS}}}}}}}[P]={Q}_{J}[P,{P}_{e}]{E}_{S}[P].$$
(4)

QJ is a measure of disequilibrium, i.e. of how far away is the distribution created by permutation patterns to an uniform distribution Pe, as is calculated through the well-known Jensen–Shannon (JS) divergence72,73:

$${Q}_{J}[P,{P}_{e}]={Q}_{0}\left\{S[\frac{P+{P}_{e}}{2}]-\frac{S[P]}{2}-\frac{S[{P}_{e}]}{2}\right\},$$
(5)

with Q0 being a normalisation constant. QJ is thus greater than zero is some permutation patterns are “privileged”, in the sense that they appear with a frequency higher than expected. It is important to note that this complexity measure goes beyond a simple entropy, as it assesses both the randomness and the correlation structures within the time series. Thanks to this, multiple values of CJS can be obtained for each value of ES, as was previously demonstrated74.

When time series are located in the CJSES plane, several patterns are observed. First, chaotic systems have entropies between 0.45 and 0.7, and complexities close to maximum, owing to their internal correlation structures. Second, time series of white noise have maximum entropy and minimum complexity, as one may expect. Finally, coloured noises position themselves in the middle, with medium complexity and medium–high entropy values. A graphical representation of these regions is reported in Fig. 2, along with two additional dynamics not included in the original work71. It is worth noting that, besides these prototypical examples, the position of a dynamical system in the plane will depend on the characteristics of the associated time series. To illustrate, let us consider the case of stochastic oscillations: the position will be dictated by the interaction between the sampling frequency and the multiple scales of the dynamics. At high sampling frequencies, what are observed are oscillations, which will be located in the bottom left corner of the plane; on the other hand, for samplings close to the main frequency of the system, the result will appear as noise, located in the bottom right corner. Varying the sampling frequency between these two extrema will give rise to complex movements in the plane42.

Fig. 2: Graphical representation of the complexity–entropy plane.
figure 2

Graphical representation of the plane as proposed by Rosso et al.71, and of the approximate position of chaotic systems, and coloured and white noises. Dynamics in parenthesis (i.e. non-linear stochastic and regular oscillations) were not included in the original work.

Besides being the foundation of the other methods described in this section, the complexity–entropy plane has buttressed a huge number of real-world analyses, owing to its theoretical and computational simplicity. These include applications in economy and finance, including stock22,75, bond76,77, cryptocurrencies78,79 and commodity market analyses80; characterisation of lasers dynamics46,47,81; of plasma and solar wind82,83,84,85,86; turbulence phenomena87; stream flows88,89,90; the study of the dynamics of the brain91 and of its cells92,93; of the dynamics of blood cells94; the characterisation of ocean ambient noise95; flame front dynamics57 and of vehicle behaviour96. It has even been applied to problems in image recognition97,98 and art99.

The Fisher-entropy plane

Olivares et al.100 proposed a modified version of the previously seen complexity–entropy plane by substituting the complexity by the discrete and normalised version of the Fisher’s Information Measure (FIM)101. This latter is defined as:

$$F[P]={F}_{0}\mathop{\sum }\limits_{i=1}^{N-1}{[{({p}_{i+1})}^{1/2}-{({p}_{i})}^{1/2}]}^{2},$$
(6)

where F0 is a suitable renormalisation coefficient. While related to the Shannon’s entropy, FIM presents an important difference: it is defined as a function of the difference in probability between consecutive elements of the distribution, as opposed to their raw value—or, in other words, as the distance between two contiguous probabilities. FIM is thus zero when all probabilities p are the same, and maximal when one single p = 0—i.e. essentially the opposite of what would be expected in the case of the Shannon’s entropy. When compared to the original complexity–entropy plane, this new plane allows for a better characterisation of the dynamical changes induced by modifications in the system’s parameter, and it is able to distinguish between different types of chaotic dynamics100.

The idea introduced by Olivares and co-workers100 has seen real-world applications in several works, spanning from neural networks92 and electroencephalography (EEG) signal102,103 analysis, ecosystems’ dynamics104, status of mechanical machines105 and image recognition98 It is also worth noting that the Fisher-entropy plane has inspired the so-called Fisher–DisEn plane106, which uses the dispersion entropy107 instead of the classical Shannon’s one.

Multi-scale planes

In all the previous methods, the embedding delay τ is considered as a fixed value, and in most applications it is assumed that τ = 1. This assumption is valid provided that the system is temporally discrete (i.e. we observe the system at discrete time moments) and that the observation frequency is correctly tuned to the system dynamics. It is nevertheless difficult to fulfil these conditions for some real and numerical systems. One may consider, for instance, the case of the human brain: while EEG data are usually recorded at 250 or 500 Hz, they present multiple characteristic time scales that are condition dependent108,109. Analysing the raw data with a fixed τ may therefore lead to incomplete, or even misleading, results.

A solution to this problem was proposed through the definition of a multi-scale complexity–entropy plane42. Given a system, the two metrics of complexity CJS and entropy ES of the standard complexity–entropy plane are calculated as a function of τ, which is no longer fixed. The result is a parametric curve complexity–entropy plane, characterising the system through the two metrics with the embedding delay as parameter. In the case of most real-world applications, this implies calculating CJS(τ) and ES(τ) for τ = [1, 2, …]; on the other hand, when the system under analysis is time continuous, τ can assume any fractional value greater than zero. A similar analysis was later performed43, fixing τ but varying the sampling time in low-dimensional continuous chaotic systems.

As an example of an application, Zunino et al.42 resorted to the well-known Mackey–Glass equation, a paradigmatic time-delay system110; the embedding delay for which the complexity CJS(τ) is maximal corresponds to the characteristic temporal scale of the equation, i.e. the minimally required sampling rate to capture all the information related to the non-linear correlations. Zunino et al.42 also applied this multi-scale plane to several real-world examples, including laser, river flow, geophysical, climate and financial data. Other examples include turbulence analysis87,111, stream flows89, flame front dynamics57 and laser beam wandering112. Still, one has to note that the line dividing this multi-scale approach from the standard complexity–entropy plane is fuzzy, as many works using the latter include some analysis on the sensitivity to the embedding delay τ.

Finally, it is worth noting that the work of Zunino and colleagues42 inspired a modification of the plane, called the multi-scale weighted complexity entropy causality plane17,113, in which permutation patterns probabilities are weighted according to the amplitude of the signals generating them.

Additional mono-parametric complexity–entropy curves

What was introduced by Zunino et al.42 and previously discussed can be seen as a parametrisation of the complexity and entropy metrics, specifically as a function of the embedding delay τ. Yet, more parametrisations are possible114. Specifically, Ribeiro and co-workers propose to substitute the Shannon entropy with a monoparametric entropy, and specifically with the Tsallis q entropy115, defined as:

$${S}_{q}(P)=\mathop{\sum }\limits_{j=1}^{d!}{p}_{j}{{{{{{{\mathrm{ln}}}}}}}\,}_{q}\frac{1}{{p}_{j}}.$$
(7)

In the previous equation, pj is the appearance probability of pattern j, q is a real parameter and, finally, \({{{{{{{\mathrm{ln}}}}}}}\,}_{q}\) is the q logarithm: \({{{{{{{\mathrm{ln}}}}}}}\,}_{q}x=\int\nolimits_{1}^{x}{t}^{-q}dt\). In a similar way, the JS divergence of Rosso et al.71 is substituted by a generalisation of Kullback–Leibler divergence defined as: \({K}_{q}(P| R)=-\sum {p}_{i}{{{{{{{\mathrm{ln}}}}}}}\,}_{q}{r}_{i}/{p}_{i}\).

Instead of a single point, time series are now described by a curve, which displays an interesting feature: for stochastic time series it form loops, while this does not happen for chaotic time series in all embedding dimensions. The absence of a loop is a consequence of the absence of some patterns; therefore, this approach can be seen as a more complete variant of the missing pattern one50. Beyond this, the characterisation of the curves in the plane can also be used to extract other properties of the time series, most notably the Hurst exponent114.

Ribeiro et al.114 applied this approach to several real-world time series, including laser dynamics, crude oil prices, sunspot index, heart dynamics and Earth’s magnetic activity. Other applications include the study of the dynamics of a real estate market116 and the analysis of EEG signals117.

Similarly to the Tsallis entropy case, another generalisation has been proposed118, based on substituting the Shannon’s entropy for the Rényi’s one. This is defined as:

$${S}_{r}(P)=\frac{1}{1-r}{{{{{{\mathrm{log}}}}}}}\,\mathop{\sum }\limits_{i=1}^{n}{p}_{i}^{r},$$
(8)

where r is the so-called entropic index119. Due to the monotonic nature of Sr, curves in the Rényi complexity–entropy plane do not form loops. Still, their curvature is able to discriminate between stochastic, chaotic and period time series.

It is finally worth noting that both approaches have been merged under the form of three planes based on combining Tsallis and Rényi’s complexity and entropy metrics120. Nevertheless, as the underlying distributions are obtained through the power spectrum of the signal and not through permutation patterns, this approach is not further described here.

The Tarnopolski plane

In 2016, M. Tarnopolski introduced a plane to characterise time series that, while having been developed independently, is still indirectly based on the concept of permutation patterns121. This plane is defined by two metrics:

  1. 1.

    The Abbe value \({{{{{{{\mathcal{A}}}}}}}}\), i.e. half of the ratio of the mean square successive difference to the variance122,123:

    $${{{{{{{\mathcal{A}}}}}}}}=\frac{n}{2(n-1)}\frac{\mathop{\sum }\nolimits_{i = 1}^{n-1}{({x}_{i+1}-{x}_{i})}^{2}}{\mathop{\sum }\nolimits_{i = 1}^{n}{({x}_{i}-\bar{x})}^{2}},$$
    (9)

    with \(\bar{x}\) being the mean of \(\left\{{x}_{i}\right\}\). \({{{{{{{\mathcal{A}}}}}}}}\) quantifies the smoothness of the time series, spanning between zero (perfectly smooth) to one (white noise).

  2. 2.

    The probability \({{{{{{{\mathcal{T}}}}}}}}\) of finding a turning point, i.e. when, in three consecutive observations, the middle value xi is lower or higher than the two surrounding ones (xi−1 and xi+1). Note that \({{{{{{{\mathcal{T}}}}}}}}\) can easily be interpreted in the light of permutation patterns, as it is equivalent to the probability of four of the six possible patterns for d = 3.

While only the relation between the position in this \({{{{{{{\mathcal{A}}}}}}}}-{{{{{{{\mathcal{T}}}}}}}}\) plane and the Hurst exponent was originally studied121, Zunino et al.124 builds on this to show that the plane is also capable of discriminating between different types of complex time series. Additionally, the authors of this last paper introduced a multi-scale version of the \({{{{{{{\mathcal{A}}}}}}}}-{{{{{{{\mathcal{T}}}}}}}}\) plane, based on downsampling the original time series using subwindows and on calculating the position on the plane as a function of the subwindow size. In spite of the simplicity of this approach, it has been applied to the study of few real-world time series, including the analysis of stock markets, electroencephalography and heart dynamics124, and of stellar variability125.

Building a full pipeline

All the previous discussed methodologies for detecting chaotic dynamics are based on the assumption that a researcher is controlling the process and thus that uncommon situations are manually discarded. To illustrate, a periodic dynamics will probably present missing (or less frequent) permutation patterns and may thus be classified as chaotic; yet, a simple graphical representation would discard this option. A final question may therefore be: can a fully automated pipeline be constructed, able to discriminate between stochastic, chaotic and periodic dynamics without external intervention?

Toker et al.126 proposed a solution to this question under the form of a decision tree35. As a first step, they propose to exclude the possibility of stochasticity by comparing the permutation entropy of the original time series with that of two sets of surrogates: the Amplitude Adjusted Fourier Transform127 and the Cyclic Phase Permutation128. If the permutation entropy falls within one of these two distributions, then the time series is classified as stochastic. If this is not the case, a few additional steps are executed. The signal is denoised, through the Schreiber’s algorithm129, and downsampled if an oversamplig is detected. Finally, a modified version of the 0–1 test130,131 is applied, and the time series is classified as either chaotic or periodic. With the exception of a few free parameters, which are calibrated using a large collection of test time series, this pipelines allows to classify signals without any expert judgement.

Comparison of performance and cost

To compare the performance of all methodologies addressed in this review, we have considered three discrete chaotic dynamics and a linearly correlated stochastic process:

  • Chaotic logistic map. Paradigmatic example usually employed as a testing ground to illustrate new concepts in the treatment of dynamical systems. It is a polynomial mapping of degree two132: xn+1 = rxn(1 − xn). We here consider its chaotic regime when r = 4.

  • Linear congruential generator. 1D map, xn+1 = Axn + B mod(C), normally used to produce pseudo-random integers in the range 0 < Xn < C, which makes its classification a challenging task. Here we consider A = 7141, B = 54773 and C = 259200133.

  • Sinai map. Two-dimensional attractor, defined as \({x}_{n+1}={x}_{n}+{y}_{n}+\delta \cos (2\pi {y}_{n})\) mod(1), yn+1 = xn + 2yn mod(1). Here we consider δ = 0.1, for which a totally developed chaotic dynamics is obtained133.

  • Noises with 1/fk power spectrum. Starting from a pseudo random variable with Gaussian distribution probability function, the desired power spectrum is obtained by the Fourier filtering method127. Note that the degree of long-term correlations is directly proportional to k.

For the systems listed above, we have generated 10 independent realisations of 105 data samples, after discarding the first 105 iterations of the chaotic maps to avoid any transient behaviour. Each realisation started with different random initial conditions. The following analyses report the mean value over the realisations of the different metrics. Additionally, a noisy chaotic environment is generated by adding white Gaussian noise of zero mean to the original noise-free chaotic sequences. Different noise levels σ, defined by the ratio between the standard deviation of the noise and the original data, were considered. Ordinal patterns were evaluated with D = 5 and lag τ = 1, except for estimating the probability of self-transitions for which τ > 1 is required (as discussed in the section “Successions of permutation patterns”).

First, when seeking the presence of forbidden ordinal patterns we found that logistic chaotic dynamics and the x coordinate of the Sinai map are the only sequences that present forbidden patterns, on average 89 and 0.7, respectively. In fact, a minimum pattern length D is necessary to detect forbidden patterns in a noise-free deterministic dynamics134. This drawback is the source of misclassifications of the underlying dynamical nature of the congruential map and the y coordinate of the Sinai map. Furthermore, a distinction between different correlation degrees for the k-noises cannot be achieved, since no forbidden patterns are observed for these stochastic processes. As previously seen, a more quantitative approach involves characterising the decay in the number of unobserved (forbidden or missing) ordinal patterns as a function of the sequence length. The corresponding results are depicted in Fig. 3a–e. As the weight of the noise component increases, all noisy chaotic maps locate in the same curve approximating to the bottom right corner, B ~ 10−2 and γ ~ 1—parameters corresponding to an uncorrelated dynamics. Notably, decay rate allows identifying determinism even when the dynamics is noisy, with the noisy logistic map and the x-Sinai identified as chaotic up to noise levels 0.5 and 0.2, respectively—see Fig. 3b, d. On the other hand, the noisy congruential map and the y-Sinai share planar location with k-noises for all noise levels considered. However, for the y-Sinai sequence, temporal correlations are found for low noise contaminations—see Fig. 3e. Moreover, a quantification of the long-term correlations of k-noises are characterised by different planar locations.

Fig. 3: Unobserved and distribution of visible ordinal patterns methodologies.
figure 3

a Characteristic decay rate B versus the stretched exponent γ for k-noises with k [0, 3] with step 0.2 (solid brown triangles) and logistic map (solid blue circles), congruential map (solid green squares), x-Sinai (solid magenta diamonds) and y-Sinai (solid cyan stars) with additive noise with σ [0, 1] with step 0.05. be Zoom of individual maps near the stochastic zone (B ~ 10−2 and γ ~ 1). f Distribution N(χ2) of χ2 for 104 independent realisations of all the noisy chaotic maps with additive noise with σ = 0.25, 0.5, 0.75 and 1, and k-noises with k = 0, 0.5, 1, 1.5 and 2. Dashed red line indicates the threshold \({\chi }_{D!-1,\alpha }^{2}\), for D = 5 and α = 0.05. k: correlation degree and σ: intensity of the noise contamination.

Since a deterministic map may have no forbidden ordinal patterns for a given pattern length, another option is to focus on the shape of the distribution of visible ordinal patterns through χ2 test with statistics given by Eq. 1. Figure 3f shows the distribution of χ2 for 104 independent realisations. The rejection threshold of the null hypothesis H0 (ordinal patterns are independent and identically distributed) at level α = 0.05 is \({\chi }_{119,0.05}^{2}=145.46\). It is observed that, for the noisy logistic map and the Sinai map, χ2 test rejects H0 for all considered noise levels (except for the y-Sinai with σ ~ 1) with a high degree of confidence. On the other hand, for the congruential map, χ2 test rejects H0 only for the noise-free case. Lastly, for k-noises with k < 0, χ2 test rejects H0, as can be seen in Fig. 3f. In synthesis, this approach is able to identify the determinism of all chaotic maps even in a noisy environment, except for the noisy congruential map. Yet, it lacks a distinction between chaos and stochastic dynamics.

Leaving the shape of the ordinal pattern distribution behind, it is now time to focus on its variability by applying the permutation spectrum test. The corresponding results are reported in Fig. 4. Free-noise chaotic maps exhibit substantial variability in the standard deviation of the spectrum. Particularly, the logistic map and the x-Sinai have ordinal patterns with zero deviation, due to the presence of forbidden/missing patterns—see Fig. 4a, c. On the contrary, the congruential map and the y-Sinai show some patterns with higher or lower standard deviation in comparison with the average, which demonstrate, at least, temporal correlation in the data—although a deterministic nature cannot be guaranteed, as can be seen in Fig. 4b, d. Linear correlated processes show a symmetric variability with no null values—see colour arrows in Fig. 4e–h. This symmetry reflects the reversibility of these linear processes. For k = 0, the standard deviation presents no variability, except for the first and last ordinal patterns. All ordinal patterns have the same probability of appearance, leading to a constant standard deviation. For studying a polluted chaotic dynamics, we have focussed on the noisy logistic map as it gives a more clear spectrum—see Fig. 4i–l. Noise contamination eliminates all forbidden patterns, but the “profile” of the standard deviation maintains robust. Even for high noise levels, some deviations are higher than the average—see black arrows in Fig. 4i–l. In synthesis, the permutation spectrum test, while quite sensitive, is rather qualitative in discriminating chaos from stochasticity, i.e. it requires a visual inspection of the results.

Fig. 4: Standard deviation permutation spectrum.
figure 4

Standard deviation of the permutation spectrum estimated from 200 windows of size l = 103 and for a the logistic map, b the linear congruential generator, c x and d y coordinate of the Sine map. k-noises for e k = 0, f k = 1, g k = 2 and h k = 3. il Noisy chaotic logistic map with additive noise with σ = 0.25, 0.5, 0.75 and 1, respectively. Red dashed line indicates zero standard deviation and arrows point out characteristic ordinal patterns. k: correlation degree and σ: intensity of the noise contamination.

Following the structure of the previous part of the work, transitions between ordinal patterns are next, and specifically self-transitions pst. The evolution of pst has been analysed as a function of τ (τ [2, 50]); after estimating the fitting parameters β0, β1 and its error R2, as proposed by Borges et al.48; the k-mean clustering algorithm of MATLAB© was used to classify the dynamical nature of the noisy chaotic maps and the k-noises. Figure 5a depicts the clustering obtained by the triplet {β0, β1, R2}. Only two misclassifications, k-noises with k = 0 and 0.2, are observed, leading to an accuracy of 96.61%. The drawback of this (at first) excellent discrimination is that the Gaussian white noise is classified as chaotic. Note that the projection in the plane β0 versus β1 delivers a quantification of the degree of long-term linear correlations of the k-noises. The second option using transitions involves characterising the local and global features of the transition matrix of all ordinal transitions, by plotting the minimum pattern entropy versus the conditional permutation entropy69. This representation plane places all chaotic maps below the k-noises, the latter ones locating near the diagonal—see Fig. 5b. As noise level increases, the positions of chaotic systems get closer to the linear stochastic one. This characterisation identifies the presence of forbidden patterns with null value of the minimum pattern entropy, as the case of logistic map. Note that the determinism of the congruential map is appropriately contrasted with linear stochastic dynamics by its location.

Fig. 5: Ordinal transition approaches.
figure 5

a Clustering representation of the triplet {β0, β1, R2} (β0 and β1 being parameters of the model applied to pst(τ) defined in ref. 48 and R2 stands for the error of the fitted model) and b minimum pattern entropy versus conditional permutation entropy for k-noises (k [0, 3] with step 0.2) and chaotic maps with additive noise with σ [0, 1] with step 0.05. Results in b were obtained by using 106 data points to have a reliable contrast. Noise-free sequences (solid symbols). Noisy sequences (open symbols). k: correlation degree and σ: intensity of the noise contamination.

Global and local features of the stationary ordinal pattern distribution can be characterised by the complexity–entropy and Fisher-entropy planes, respectively—see Fig. 6. In the former plane, the noisy logistic dynamics is distinguished from k-noises up to approximately 0.4 of noise level, as evidenced in Fig. 6ai–ei. However, the rest of the chaotic maps overlap with the stochastic curve, even for noise-free sequences. When local aspects of the distribution are taken into account by the Fisher information, we found a clearer contrast between different dynamics, even for high observational noise pollution—see Fig. 6aii–eii. Both planes characterise the congruential map as slightly correlated dynamics. Another way of incorporating local features is through the q-parametric curves in the q-complexity–entropy plane. Figure 7a–d shows the loops formed by the k-noises, the hallmark of stochasticity. The absence of a loop is only found for the noise-free logistic map and the x-Sinai—see Fig. 7e, g. Since a closed q-curve is based on the existence of forbidden patterns, this methodology fails to detect determinism in the chaotic sequences that materialise all ordinal patterns for a given D, as is the case of the congruential map, the y-Sinai and the noisy logistic map, as observed in Fig. 7f, h. Lastly, we found that the Tarnopolski plane delivers a quite good quantification of the long-term correlations of k-noises, especially for small values of k, as can be seen in Fig. 7i. Nevertheless, the contrast between an uncorrelated dynamics and a noisy chaotic dynamical is rather confused. This lies in the fact that a pattern length of D = 3 is not long enough to unveil determinism in a sequence.

Fig. 6: Global versus local representation planes.
figure 6

Localisation of the k-noises (k [0, 3] with step 0.2) and chaotic maps in the ai permutation complexity–entropy plane and aii Fisher-entropy plane. Additive noise with σ [0, 1] with step 0.05 is been depicted. Noise-free sequences (solid symbols). Noisy sequences (open symbols). Zoom of individual maps near the stochastic zone: bi, bii logistic map, ci, –cii congruential map, di, dii x-Sinai and ei, eii y-Sinai. k: correlation degree and σ: intensity of the noise contamination.

Fig. 7: q-parametrisation of the complexity–entropy plane and Tarnopolski plane.
figure 7

Entropy-complexity plane with q [10−4, 1]  [1, 103] with steps 10−3 and 10−2, respectively, for k-noises with a k = 0 and 0.2, b k = 0.4, 0.6, 0.8 and 1, c k = 1.2, 1.4, 1.6, 1.8 and 2, d k = 2.2, 2.4, 2.6, 2.8 and 3, e logistic map with additive noise with σ [0, 1] with step 0.05, f linear congruential map, g x and h y coordinate of the Sinai map. i Turning point probability \({{{{{{{\mathcal{T}}}}}}}}\) versus the Abbe number \({{{{{{{\mathcal{A}}}}}}}}\) for k-noises (k [0, 3] with step 0.2) and chaotic maps with additive noise with σ [0, 1] with step 0.05. Inset plot shows zoom near the stochastic zone (\({{{{{{{\mathcal{T}}}}}}}}=2/3\) end \({{{{{{{\mathcal{A}}}}}}}}=1\)). Noise-free sequences (solid symbols). Noisy sequences (open symbols). k: correlation degree and σ: intensity of the noise contamination.

As a last point, and with the aim of comparing the computational cost of all methodologies, we have calculated their running times when a totally uncorrelated Gaussian sequence of 106 data points is analysed using D = 3, 4 and 5, and τ = 1 in a MATLAB© implementation. Figure 8 shows the comparison between the running time averaged over 10 independent runs. The evaluation of the q-parametric curve in the entropy–complexity plane corresponds to the highest computational cost, since a large range of q values is needed to obtain the curves—see Fig. 7. The fitting procedure to estimate certain parameters of the corresponding model together with multi-scale analysis place the decay of the number of unobserved ordinal patterns and the self-probability approach as the second and third most costly algorithms, respectively. This is due to the fact that the former considers several sequence sizes, while the latter covers a wide range of values of τ (τ [2, 50]). The rest of the algorithms have similar running times.

Fig. 8: Computational cost.
figure 8

Comparison between the computational cost of each methodology when analysing an uncorrelated Gaussian sequence of 106 data points using D = 3, 4 and 5, and τ = 1 (except for pst, for which τ [1, 50]). The averages over ten independent realisations are reported.

The limits of permutation patterns

Throughout this review, we have seen that permutation patterns and metrics from them derived can help distinguishing chaos from stochasticity in a wide array of conditions. Does this mean that permutation patterns can be the universal panacea? Unfortunately, this is not the case, and the researcher should always be cautious about discarding a chaotic behaviour just on the basis of a single test. To illustrate this point, we here show how a simple map can ad hoc be built that is deterministic and sensitive to initial condition but still have permutation patterns compatible with a stochastic dynamics.

The main idea is to disentangle the sequence of permutation patterns from the values creating them, for thus generating a time series with a custom distribution of patterns. To illustrate, let us consider the permutation patterns for d = 3, as shown in Fig. 9a. Any sequence of three values [x(t), x(t + 1), x(t + 2)] will be associated with a permutation pattern, which in turn constrains the next possible pattern. For instance, if the pattern is (1, 2, 3) (i.e. x(t) < x(t + 1) < x(t + 2)), the following one can only be (1, 2, 3), (1, 3, 2) or (3, 1, 2). It is then possible to create a matrix with all the possible transitions between permutation patterns and a periodic sequence visiting all patterns with equal frequency—see Fig. 9b. Starting from one pattern, values have to be added to the time series according to the limits imposed by the next pattern. Therefore, if x(t) < x(t + 1) and the pattern to be fulfilled is (1, 2, 3), the following value has to be in the range (x(t + 1), lsup], where lsup is the maximum value of the map. As an example, we are here using a logistic map y(t + 1) = 4y(t)[1 − y(t)], with y(0) = x(0) the initial condition of the map. Going back to the previous example, the third value used to fulfil the pattern (1, 2, 3) will be x(t + 2) = x(t + 1) + [1 − x(t + 1)]y(t + 1).

Fig. 9: Synthetic map.
figure 9

Constructing a synthetic chaotic map with a perfect distribution of permutation patterns. a Labelling of the six permutation patterns for d = 3. b Matrix representing the allowed transitions between permutation patterns and example of a possible periodic sequence of patterns. c Example of the time series that can be obtained from the previous sequence of permutation patterns. d Probability distribution of values for the resulting time series. e Sensitivity to initial conditions, calculated as the difference between two values of the resulting time series as a function of the number of iterations separating them. The solid black line indicates the median value, and dark and light grey bands, respectively, one and two standard deviations. The horizontal blue dotted line represents the expected value of the difference in a random time series with the same probability distribution.

The resulting time series x has some interesting properties. First of all, it is a completely deterministic map, as x(t + 1) is fully defined by x(t) and y(t), which are in turn fully defined by x(0), i.e. by the initial condition. Second, the map inherits the strong sensitivity to initial conditions of the logistic map. Finally, and most importantly here, it has a perfectly balanced distribution of permutation patterns, as by construction all of them are visited with equal frequency. This implies that any method based on single permutation patterns (i.e. not on transitions), e.g. on counting missing patterns, will wrongly classify this time series as stochastic. Similarly, any other measure based on patterns, as, for instance, irreversibility ones18,19, will be misled. Note that this reconstruction method can easily be extended, for instance, to ensure that pattern transitions have equal probabilities.

Figure 9c reports an example of the resulting time series; Fig. 9d, an histogram of the value appearance probability, calculated over a time series with 106 points; and Fig. 9e, the distribution of the difference between the values of two time series, as a function of the number of iterations, when the initial values differ by 10−3.

In synthesis, any method designed to discriminate between stochastic and chaotic time series is based on one or more hypotheses; in the case here considered, all methods assume that the chaotic nature of the underlying system manifests as an imbalance in the permutation patterns. While this is a reasonable assumption in most real-world systems, one must be cautious, as the presence of such imbalance is a sufficient but not necessary condition for a chaotic dynamics.

Conclusions and outlook

As shown in this review, many tests based on permutation patterns have been proposed to assess the chaotic versus stochastic nature of time series; whether one of them is clearly better than the other ones is nevertheless a complex question, with several aspects influencing the answer. Generally speaking, almost all methods are misled by the presence of observational noise, and this is especially relevant in the case of methods relying on the existence of forbidden ordinal patterns. These latter techniques, including the classical entropy–complexity plane and its q-parameterised version, misclassify noise-free chaotic dynamics that materialise all ordinal patterns for a given value of D. Similarly, the permutation spectrum test yields a debatable characterisation for such kind of sequences. On the contrary, the decay of unobserved—forbidden or missing—ordinal patterns as a function of the sequence length gives a better characterisation, in the sense that allows quantifying the degree of linear correlations of stochastic processes and distinguishing them from noisy chaotic sequences. On the other hand, transitions between ordinal patterns unveil a sort of dynamical permutation hallmarks that differentiate, up to medium noise levels, linear stochastic processes from chaotic sequences, even for those having no forbidden patterns. This is the case of the representation plane defined by the minimum pattern entropy versus the conditional permutation entropy; still, this method also presents the drawback of requiring longer time series to obtain reliable results, which could be a major limitation in real-world applications. Ordinal pattern distribution-based methodologies, such as χ2 test introduced by Amigó and co-workers, are extremely robust against noise contamination, yet are only able to reject or accept the null hypothesis that the sequence is independent and identically distributed. Lastly, when considering highly polluted chaotic sequences, the Fisher-entropy plane stands out as the best option to contrast them from linearly correlated stochastic systems.

In spite of generally remarkable results, we believe that two barriers are preventing permutation patterns to be regarded as the gold standard in the detection of chaos. First of all, all methods here presented are implicitly based on the hypothesis that chaos reflects in an imbalance of permutation patterns, which in turn indicates the presence of a non-linearity135. This can nevertheless not be the case, as shown in the section “The limits of permutation patterns”. Most of our knowledge about permutation patterns and derived metrics comes from numerical experiments. Yet, from a stricter theoretical perspective, permutation entropy is not directly measuring chaos, nor permutation patterns represent temporal causal structures; even the relationship between this and the KS entropy is known only for a limited number of dynamical systems16,62. A more complete theoretical foundation is clearly needed.

Second, the success hitherto achieved by permutation patterns will be the drive behind the development of even more sensitive tests. We speculate that this will be attained through two complementary paths. On one hand, novel tests will shift towards micro-scale analyses, e.g. towards the study of the dynamics of individual patterns and sequences thereof. This will only be a prolongation of the historical trend, which has already moved from initial macro-scale approaches (including entropies and missing patterns) to more localised ones (like transition probabilities). Focussing the attention to one or few permutation patterns will bring several benefits, including the reduction of the computational cost, hence the possibility of analysing structures at longer temporal scales; and eventually the creation of a catalogue of “chaotic fingerprints”. On the other hand, a second path may entail the combination of multiple and heterogeneous approaches into single tests, an option that only recently started to appear in the literature126,136.