Network representations of attractors for change point detection

A common approach of monitoring the status of physical and biological systems is through the regular measurement of various system parameters. Changes in a system’s underlying dynamics manifest as changes in the behaviour of the observed time series. For example, the transition from healthy cardiac activity to ventricular ﬁbrillation results in erratic dynamics in measured electrocardiogram (ECG) signals. Identifying these transitions – change point detection – can be valuable in preparing responses to mitigate the eﬀects of undesirable system changes. Here, we present a data-driven method of detecting change points using a phase space approach. Delay embedded trajectories are used to construct an ‘attractor network’, a discrete Markov-chain representation of the system’s


Introduction
The analysis, prediction and detection of change points in dynamical systems is an open problem that has been studied extensively in many diverse contexts such as financial systems [1,2], climate dynamics [3,4], ecological systems [5] and medicine [6,7].The existence of these change points is ubiquitous in many naturally occurring systems and describe points in time during which the behaviour of the system is altered.This difference in behaviour may be attributed to endogenous or exogenous perturbations to the system.In some cases, such change points signal the transition of the system into a different, potentially undesirable regime of dynamics where the change point is also described as a 'tipping point' in some fields.
A related open problem within the context of dynamical systems is the automatic prediction or detection of imminent change points from observed time series data.This is particularly useful when an analytical mathematical model of the system is not readily available.In medicine, dynamical systems methods are used for the automated detection of acute pathological problems such as arrhythmia and ventricular fibrillation from ECG signals [8], or identifying the onset of epileptic seizures in EEG analysis [9,10].Change detection is also present in non-biological problems such as predicting transitions in paleo-climatic data [11][12][13], and identifying equipment failure [14].The generality and universality of this problem has been long recognised and has given rise to the broader term of 'change point detection' to describe this commonly occurring challenge.
Change point detection is a class of problems within the domain of time series analysis primarily concerned with the detection of changes in the dynamics of an underlying system [15][16][17].Typically, the aim is to accurately detect changes in a system by analysing an incoming stream of observed time series [15].Effective change point detection methods are useful in automating processes for faster cursory analyses.For systems where the state of components vary over time, employing a dynamical approach to detect subtle changes in the underlying dynamics may even provide a method to preemptively diagnose the occurrence of system failure, such as the early detection of ventricular fibrillation.
Change point detection methods can be classified according to two features: (1) offline vs. online, and (2) supervised vs. unsupervised methods.The first classification concerns the time period during which the detection algorithm is applied.Offline detection methods focus on the analysis of observed time series after the change has occurred (i.e.post-system failure), likened to a postmortem of observed time series [15] and is useful for labelling tasks and timeinsensitive applications.In contrast, online methods analyse incoming streams of observations in real-time and flag changes as they occur.The latter method is most used for automated detection.The second classification on supervision describes whether ground truth labels are known a priori.Supervised methods construct a reference model based on a pre-defined ground truth (e.g user provided labels of normal vs. abnormal) [15].Here, we use the term 'model' loosely to describe any constructed system, set of statistics or parameters that characterises the ground truth state.Deviations from the model are then used to infer system changes.This paper will focus specifically on an online supervised change point detection method.
A common approach to change point detection is the comparison of statistics of recent observations against some reference value.Large deviations with respect to some significance level are then used to flag the existence of change points.Some methods used for change point detection include decision trees [18], support vector machines [19,20], and statistical approaches such as Gaussian mixed models [21], Gaussian processes [22] and Bayesian methods [23].However, the usage of statistical measures typically relies on the data adhering to some stationary distribution.This does not account for temporal dependencies often present in dynamical systems, which may be useful in uncovering and characterising changes in the underlying data generating process [24].
An alternative method of identifying change points is to employ a phase space approach to analysing observed time series [25][26][27][28].From dynamical systems theory, the temporal behaviour of a given system can be represented as a trajectory moving through phase space where axes are defined as the observed system variables.When operating under the normal regime, the trajectory of a system may settle and move along a well defined region of space, termed an 'attractor'.Changes in the underlying system will inevitably result in changes in the dynamics of the observed time series and the resulting trajectory [24,27,28].Consequently, this may cause trajectories to move away from the original attractor (representing the normal regime) and settle into a different attractor corresponding to the new system dynamics.Therefore, the problem of change point detection may be re-framed as detecting changes in the attractor of the system.This approach evaluates and measures phase space dissimilarities between two windows of time series and draws similarities to the kernel change detection (KCD) algorithm [19] where changes are flagged when the metric distance between two descriptors of the data exceed a threshold.Several algorithms based on dynamical systems tools such as cross-recurrence plots [29] and empirical mode decomposition [30] have also been implemented by da Costa et al. into the Massive Online Analysis (MOA) software package [24].
In order to employ a phase space approach, the phase space must first be reconstructed from the observed time series.This can be achieved via embedding methods such as time delay embedding.Takens' theorem and its sequels Attractor Networks a)

b)
Fig. 1 The Rössler time series (see B6 for equations).(a), Two points from the time series separated with a delay lag of τ used to construct the delay vector.(b) The resulting reconstructed phase space of the system from plotting the the trajectory in lagged coordinates (x(τ ), x(t − τ )).
provide basic dynamical guarantees for extracting the phase space dynamics of a deterministic system from scalar univariate time series using an appropriate time delay embedding [31] (see Fig. 1 and Section 4.1).We note that in most cases, signals from a physical system may be noisy or be driven by stochastic forces.However, time delay embedding should still provide a reasonable approximation of the dominant phase space dynamics of the system.
Whilst the transformation from time series to reconstructed phase space is relatively straightforward (subject to appropriately selected delay lags), achieving a faithful and parsimonious representation of the attractor's structure and dynamics within phase space (reconstructed or otherwise) is challenging as these attractors are defined continuously in phase space.St. Luce & Sayama [32] proposed a method of achieving a transition network representation of an attractor by discretising phase space into voxels.This greatly simplifies the task of dealing with discrete observations of continuous data.However, such an approach does not scale well with higher dimensional data as the number of voxels increases rapidly.This problem becomes apparent when attempting to extend their approach to analysing high dimensional data such as those resulting from a high dimensional time delay embedding.
Several methods have been proposed to transform time series (and by extension dynamical attractors) into a network representation with some examples including cycle networks [33], recurrence networks [1,34], ordinal partition networks [35,36], k-nearest neighbour network [37] and visibility graphs [38].Provided there is sufficient data to estimate state transition probabilities, many of these transformations extend well to higher dimensional data.But they preserve differing amounts of the spatial and dynamical information contained within the time series, both of which can be crucial in detecting subtle changes in the system's dynamics.
We present in this paper a method for constructing a parsimonious network representation of attractors in phase space, termed the "attractor network", that encodes both the spatial and dynamical information of the observed time series.The attractor network consists of two components: (1) the spatial network, which encodes positional information in phase space, and (2) the dynamics network, which encodes the transition frequencies and probabilities between points in state space.Functionally, the attractor network behaves as a Markov chain surrogate model of the underlying system where the adjacency matrix of the dynamics network is the transition matrix.However, attractor networks build upon the Markov chain idea by preserving the spatial positions of each node.We use the term surrogate in this case as predictions from the Markov model will always be stochastic, even if the input data is deterministic.
Our proposed attractor network approach is shown to be effective in tackling the challenges of the phase space approach for change point detection.Namely, our method remains parsimonious with increasing dimension whilst maintaining the network approach of discretising continuous phase space.To achieve this, attractor networks that approximately model the observed system's dynamics are constructed (equivalent to parametric model training) using observed data of a given system that is known to be operating within its normal regime.The resulting attractor network is subsequently used to test future incoming data observations for change points/abnormal behaviour in real time.This is done by comparing incoming observed transitions (e.g.⃗ x(t) → ⃗ x(t + δt)) against the expected transition probabilities of the system in the normal state.A measure of the surprise S(t) of each observed transition can be calculated using information theory approaches (see Section 4.3).Unexpected transitions such as those that are infrequently or never previously observed when constructing the attractor are given high surprise values (see Fig. 2).A critical threshold S * taken as the 95% quantile of S(t) from the training data is used to convert the continuous measure of S(t) into a binary time series Ŝ(t).We define change points as the occurrence of successive high suprise values.To detect change points, exponential smoothing is first applied to Ŝ(t) to calculate a normalised measure of surprise E S (t).Change points then correspond to times where E(t) > E * where E * is a threshold defined from E(t) of the training data.A schematic overview of the analysis method is provided in Fig. 3.
We demonstrate the effectiveness of attractor networks in a practical context by applying our method to the problem of automatic detection of ventricular fibrillation (VF) from electrocardiogram (ECG) time series.The single-channel ECG data is taken from the publicly available CU Ventricular Tacharrythmia data set on PhysioNet [39,40].We find that the attractor networks approach is effective in detecting the onset of VF in 27 of 29 patients within 5 seconds of the annotated onset of VF.In 9 patients, the surprise measure S(t) was able to preempt the occurrence of VF (Fig. A4).We find that this approach outperforms competing statistical methods such as the moving average, standard deviation and permutation entropy [16,41].
To further illustrate the flexibility of our approach, we test the effectiveness of attractor networks in detecting subtle dynamics changes in artificial time Attractor Networks x(t) x(t-τ) series.Specifically, we investigate if the surprise measures are able to discriminate between normal and surrogate data that possesses the same statistical distributions.To do this, we artificially construct a time series consisting of alternating portions of normal and surrogate signals constructed from the Chua chaotic oscilllator in the single scroll regime [42].Surrogate data consists of artificially constructed, randomised data whose statistics and signal characteristics (e.g.power spectrum, mean, standard deviation) match those of the real reference data but are otherwise dynamically unrelated.This data is often used to perform test the for the existence of nonlinear and or deterministic dynamics in a given signal.In our analyses, we utilise surrogate data as a way to assess if the our proposed attractor network is able to distinguish between time series with subtle dynamical differences but otherwise similar statistical structure.To do this, surrogate data is generated using the iterated amplitude adjusted Fourier transform (AAFT) algorithm [43,44].We find that our approach was able to reliably detect transitions between normal and surrogate time series and outperformed moving window approaches using statistical scores such as moving average, standard deviation and permutation entropy.We also provide results of additional change point detection tasks on other artificial time series in the Appendix.Namely, (1) detecting subtle changes in phase space (phase coherence), and (2) detecting and quantifying gradual transitions and changes in system properties.Fig. 3 An overview of the proposed method.Following arrows: input training data representative of the 'normal' dynamics is used to reconstruct the underlying phase space via a delay embedding (delayed components given by x i (t)).The constructed attractor network consists of a spatial and dynamics component, which discretises the dynamics of the system into a Markov chain.Hyperparameters ϵ, δ, Nmax and K determine the density of nodes and edges in the network (see Section 4.2.1 and 4.2.2).Future incoming observations (test data) are compared against the expected transition probabilities in the attractor network and the level or surprise S(t) is calculated.This is subsequently used to calculate a normalised score E S (t), which determines the occurrence of change points.

Automated VF Detection
ECG data are voltage and current measurements taken from the heart's sinoatrial nodes, typically via externally placed electrodes.In the healthy regime, regular contraction of the heart to facilitate adequate blood circulation is driven by electrical impulses from the sinoatrial node.Each beat within recorded ECG typically consists of three sections, P-QRS-T, corresponding to the movement of electrical wavefronts along the heart [45].
ECG recordings are contain complex information pertaining to various physiological dynamics.For example, the presence of cardiological irregularities and heart malfunction such as arrhythmia and tachycardia typically x(t) manifest as irregularities in an otherwise relatively regular pseudoperiodic signal (see Fig. 4).In this case, pseudoperiodicity can be attribute to minor fluctuations in the approximate frequency and amplitude of oscillations.More serious conditions such as atrial (AF) and ventricular fibrillation (VF) often result in the total degradation of the signal dynamics [27].However, interpretation and analysis of ECG data is difficult and relies on trained medical professionals.This has led to a large body of methods that aim to automatically detect, interpret and diagnose physiological conditions from ECG data.These methods utilise analysis techniques from a wide range of disciplines such statistics [46,47], frequency analysis [48,49], and machine learning [50,51].For brevity, we refer interested readers to [15] for a more comprehensive discussion on current ECG analysis methods.
From the perspective of dynamical systems theory, healthy mode ECG can be equated to be one that adheres well to some attractor (region in state space) that describes the healthy mode dynamics.The onset of VF causes stark changes in the characteristics of the ECG signal (see Fig. 4), which result in deviations from the original attractor.To demonstrate the utility of attractor networks, we apply the proposed method to the task of detecting ventricular fibrillation (VF) from electrocardiogram (ECG) time series.
The CU Ventricular Tachyarrhythmia database [39,40] consists of 8 minute ECG recordings of 35 patients with eventual onset of VF.Each signal was recorded with a sampling rate of 250 Hz and filtered with a 70 Hz low-pass filter.The data set also contains annotations for each beat and the onset of VF.Of the 35 patients, 6 patients were excluded as there was either no clear annotation provided for the onset of VF, extended irregularities in the recorded ECG or the onset of VF was too close to the start of the recording.In most recordings, there were occasional occurrences of missing values most of which occur towards the end of the recording.These may be attributed to improper sensor placement or data logging.Generally, this does not heavily impact the performance the detection algorithm as only one-step transitions are required to evaluate each surprise score.However, to prevent spurious false positives due to data dropouts, the surprise for transitions with missing values were set to 0.
A three dimensional non-uniform delay embedding was used to reconstruct the phase space of the ECG dynamics.Delays were individually selected with the SToPS [52] method based on the first 20000 data points corresponding to the healthy dynamics prior to the onset of VF.The attractor network was then constructed using the first half of the time prior to VF onset as training data.This training data was split equally between constructing the spatial and dynamics components of the network.The presence of both large and small scale dynamics simultaneously within ECG results in a small, very high density region when applying delay embedding, which can cause computational challenges.Therefore, a size parameter of ϵ = 0.003 with a maximum of N max = 6 nodes per observation was used to reduce computation time.The parameters ϵ and N max are used control the density of the nodes and edges of the attractor network respectively (see Section 4.2.1 and 4.2.2).For VF detection, S(t) and E S (t) of the remainder of each ECG recording including the VF onset is calculated.Change points are detected using thresholds S * and E * based on 95% confidence intervals calculated from S(t) and E(t) of the training data (see Section 4.3).Exponential smoothing was applied with a characteristic time scale of 250 steps corresponding to 1 second of activity.Varying confidence levels effectively alters the level of specificity and sensitivity of the test.In our analyses we select a confidence level of 95% due to its ubiquity, but this selection arbitrary and may be adjusted depending on the needs of each application.
In order to quantify the method's VF detection performance, we define true positives as an detected change point that occurs within 5 seconds (1250 Attractor Networks timesteps) before or after the first annotated onset of VF.Due to the nature of VF, the definition of false positives is not straightforward as the occurrence of VF can permanently alter the ECG dynamics even after recovery resulting in extended periods of detected change points well after the annotated VF onset.Any comparison against a threshold from previous 'healthy' mode data is no longer relevant to the detection problem.
For our analyses, we define two main metrics of performance for VF detection: (1) p ∈ [0, 1] that measures the classification accuracy (healthy or VF) of the method near the onset of VF, and (2) p H ∈ [0, 1] that is defined similarly to p but is weighted by the number of consecutive identical classifications (i.e. a string of classifications of VF will have a higher score than intermittent classifications) (see Section 4.5.1 for mathematical formulations).We also provide two additional performance metrics in the Appendix (see Section 4.5.2) that aims to quantify the performance of each method in preempting the onset of VF.We also include further discussion on the difficulties of quantifying VF detection performance in ECG data.
In both cases (p and p H ), performance is measured across a time period 5 seconds before and after the first annotated onset of VF.Normalising over a time span of 10 seconds (5 seconds before and after VF), a score of p = 0.5 (or p H ) generally corresponds to a perfect detection typically after the onset of VF.Scores larger than 1 generally correspond to detections occurring up to 5 seconds before the annotated onset of VF (i.e.preemptive detection).
A summary of detection performance for the four tested methods (moving average, moving standard deviation, moving permutation entropy and attractor network surprise) is given in Table 1.Moving statistics were calculated over sliding windows of 100 timesteps.The permutation entropy was calculated by converting the time series into a symbolic sequence with each sequence corresponding to an ordered sequence of observations ranked by magnitude [16,41].This conversion utilised 7 uniform lags of size equal to the first lag calculated from SToPS [52] in the delay embedding.
The final performance scores for each method is averaged over all 29 analysed patients.In our analyses, performance is assessed based on the successful detection of the first occurrence of VF.This is because following VF episodes, heart dynamics may persist for an extended period of time in regime characterised by stress, which can alter the definition of the 'healthy' mode.A more detailed breakdown of scores is given in Figure A4.Detailed results comparing the performance of each method across individual patients are provided in Figures A1, A2, A3 and A4 in the Appendix.
Overall, the attractor network surprise approach was able to more reliably detect and preempt the onset of VF compared to other methods based on moving statistics averaged across a sliding window.This is reflected in higher mean and median scores for p and p H as well.Focusing on individual cases, 25 out of the 29 patients' VF was detected within 5 seconds of the annotated onset (see Fig. A4 and 5).Of the remaining four, 3 cases (cu05, cu08, cu28) of VF were not detected.The VF in the last case (cu25) was detected but was outside the 5 second window analysed.Surprisingly, 9 cases yielded detection scores greater than 1 and correspond to the attractor network being able to preempt a plausible onset of VF.For 2 of these cases (cu13, cu15), the detections were made up to 3 minutes before the annotated onset of VF (see Figure 5).This may suggest that for some cases, subtle changes in ECG dynamics precede the onset of VF.For both cases where this was observed, these early detections corresponded to the increasing number and frequency of observed irregularities in the ECG relative the healthy mode used for training.In almost all cases, false detections were found both within the training and testing portions of the ECG data.This is possibly due to dynamical irregularities in the time series such cardiac arrhythmia or abnormal voltage amplitudes.However, most of these cases return a detection signal intermittently and are not persistent.This is in contrast the onset of VF where the method persistently returns a positive result for detected change points.Similar qualitative results were found for the other statistics-based detection methods.However, these methods slightly outperformed the attractor network approach in 9 of 29 patients.Of these, 3 patients (cu05, cu08, cu28) saw failures of almost all tested methods with the exception of the moving standard deviation and moving average performing slightly better for cu08 and cu28 respectively.

Amplitude Adjusted Fourier Surrogates
The attractor network approach differs from those that track changes in moving statistics (e.g.CUSUM, Page-Hinkley Test, moving average test) [24,53] in that detections are made based on observed differences in the dynamics of the system rather than the moving statistics of the observed time series.To investigate the effectiveness of the attractor network approach, we apply it to the task of distinguishing between changes in dynamical behaviour that partially preserves the stationary and non-stationary statistics of the data.This test can be formulated by artificially constructing a signal using conjoined portions of normal and amplitude adjusted Fourier transform (AAFT) surrogate time series [43].The result is an artificial data set that preserves the mean, standard deviation and power spectrum of the original data but lacks the original system's dynamical features.However, AAFT surrogates are unable to preserve the moving statistics of the signal.In our analyses, we select the Chua Fig. 5 VF detection results from applying the attractor network and surprise metrics S(t), E S (t) to ECG data for invidividual patients (IDs given by "cu...").Annotated onsets for VF are given black crosses, and flagged detection of VF is given by a red dot.Blue regions represent the length of time series used for training and constructing the attractor network.The onset of VF is characterised by a persistent detection result (red).
system operating in the single scroll regime that exhibits oscillatory dynamics that are relatively regular.
For change point detection, the attractor network was trained on 20000 time steps the regular single scroll Chua oscillator [42], with an additional 20000 time steps used to calculate threshold values for identifying change points.The test data consists of 7 concatenated portions of time series each of length 2000 steps alternating between the normal and surrogate data.Time series was generated by integrating the Chua equations with the RK4 algorithm and time step dt = 0.02.The equations are given by: ẋ = −(y − x + z), where (β, γ, a, b) = (53.612186,−0.75087096, 0.03755, −0.84154) and α = 17.Surrogates were generated using an iterated algorithm applied to simulated Chua time series.The concatenations between normal and surrogate data ensured that end points were matched with no discontinuities.Direct observation of the scalar time series shows that differences between the surrogate and original data are not easily distinguishable (see Fig. 6).Permutation entropy was calculated with up to 7 lags, each a size equal to the first delay lag calculated using SToPS.From Fig. 7, we find that the attractor network approach outperformed the moving statistics and permutation entropy.With the exception of the moving standard deviation, the remaining measures of moving average and permutation entropy fail (as expected) to detect changes in the time series.

Sampling Frequency Effects
The attractor network effectively discretises the phase space dynamics into a Markov chain.Therefore, it is expected that the accuracy of the resulting attractor representation will be affected by the sampling frequency of the data.High sampling frequencies should not adversely affect the reconstruction as the attractor network construction algorithm iteratively simplifies dense cluster of points in phase space.The extent of this simplification is controlled by the size scale hyperparameter ϵ and δ (see Section 4.2.1).However, input data points that are too sparse can limit the resolution of the attractor network and corresponding transitions.To test the effects of sampling, we construct attractor networks using input time series with varying sampling frequencies, which are then used to identify alternating transitions between the original and AAFT surrogate data with the same test setup as Section 2.2.For this test, the analysed time series consisted of 20 alternations between normal and surrogate data with each segment lasting approximately 40 oscillations (2000 time steps).Performance is measured by the correct classification of each observed data point as either normal or abnormal.The F1 and Matthew's correlation coefficient (MCC) were used to identify the classification accuracy (normal vs. surrogate) in each case.
The attractor network approach was found to perform well for sampling frequencies as low as 7 points per period (see Fig. 8).However, there is a decrease in performance for further decreases in sampling frequency.This is expected as time series with very low sampling frequencies effectively decrease the resolution of the phase space discretisation.Additionally, the observed transitions must be inferred across larger jumps in phase space, which can make detection of abnormal dynamics difficult.This is particularly true for chaotic  Fig. 7 Results of all change point algorithms for detecting transitions into AAFT surrogates for the Chua system.From the top to bottom: Original time series, moving average (MA), moving standard deviation (MSTD), moving permutation entropy (MPE), and surprise scores S(t).Sliding window lengths of 100 steps were used for calculating moving averages.Detections (red) are made based on the exponential smoothed quantity E(t).Real change points are given by vertical orange lines.All comparison methods except MSTD struggle to reveal distinct changes in behaviour for the AAFT surrogates.In contrast, the attractor network approach is able to capture transitions well.
dynamics where the uncertainty of the future states exponentially increases in time with respect to small uncertainties in the initial conditions.

Conclusion
We present a new network-based change point detection method that aims to quantify deviations from the underlying attractor manifold of the target system.Significant deviations from the underlying attractor are used to infer the presence of a change point in the time series.To achieve this, input time series recorded from the healthy regime is delay embedded to represent the system dynamics in reconstructed phase space.These observed transitions are then used to construct an attractor network consisting of two components: the spatial network and the dynamics network.The resulting attractor network effectively acts as a Markov-chain representation of the dynamics along the system's attractor.Change point detection is achieved by calculating the Fig. 8 Boxplot of upper and lower quartile of prediction accuracy scores (F1 recall score and Matthew's correlation coefficient (MCC)) for the attractor network approach with varying sampling frequencies (points per period).Scores show a large decrease in performance for sampling frequencies below 7 points per period.Outliers are given by 1.5 times the interquartile range.measures of surprise S(t) and E(t) for each new observation with respective to the learned attractor network.Unexpected observations caused by changes in the underlying system's dynamics result in large persistent values of surprise measures, which are used to flag change points.
The proposed approach was used to automatically detect the onset of ventricular fibrillation (VF) from recorded patient ECG data.Data was taken from the CU Ventricular Tachyarrhythmia data set that is publicly available on PhysioNet.The attractor network method was found to be sensitive in detecting the onset of VF for 25 out of 29 patients.In two cases, our approach was able to detect the occurrence of irregularities in the ECG well before the annotated onset VF provided in the data set.
To further illustrate the flexibility of our method, we apply the attractor network approach to the task of identifying subtle changes in the dynamics of the time series.This is done by constructing artificial time series consisting of alternating portions of normal and abnormal behaviour.Specifically, we test the ability for the attractor network to detect transitions between normal and AAFT surrogate data.The attractor network and relevant metrics S(t) and E(t) were sensitive in distinguishing between normal and surrogate time series.We also find that this performance is maintained even for moderately low sampling frequencies.We provide in the Appendix results of additional change point detection tests and comparisons of the performance of the attractor Attractor Networks network approach against moving windows approaches using statistical scores such as moving average, overall standard deviation and permutation entropy.

Time Delay Embedding
One of the most commonly employed methods of reconstructing the phase space of a system from observed scalar time series via the method of time delay embedding.Consider a multivariate dynamical system ⃗ x(t) = ⃗ F (⃗ x) in state space X where only the first component x 1 (t) is observed.Takens' theorem guarantees generically that a delay embedding reconstruction ⃗ y(t) = (x 1 (t), x 1 (t − τ 1 ), ..., x 1 (t − τ n )) defined in space Y with appropriately chosen delays (τ 1 , ..., τ n ) will have dynamics that are homeomorphic to the true system dynamics in X.That is, there exists a homeomorphism Φ such that, Attractor networks are constructed with respect to some ambient phase space.Hence, the first step for analysing observed time series is constructing an embedding to augment the dimension of the input data.In our analyses, we use a non-uniform delay embedding, where x(t), ⃗ x(t) are the original time series and reconstructed delay vector respectively and (τ 1 , ..., τ n ) are selected time lags.The non-uniform delay embedding method is a generalisation of the uniform delay embedding often used for phase space reconstruction [52].The latter method is simpler in its formulation and only requires the selection of two hyperparameters, the delay lag τ and embedding dimension m.The resulting delay vector is then constructed as: The lag τ is typically selected using a mutual information criterion [54,55], followed by the selection of m using a false nearest neighbour criterion [56].However, reconstruction with a single lag is not adequate for systems with multiple disparate time and spatial scales [57].This may be remedied by selecting a small τ and large m at the expense of creating an overly highdimensional phase space, which may computationally hinder later analyses.In contrast, non-uniform embedding allows the selection of multiple time scales without unnecessarily increasing the embedding dimension m.However, this requires the solving the non-trivial problem of selecting relevant embedding delays τ 1 , ..., τ m .
There are multiple proposed algorithms for selection of non-uniform embedding delays such as MDOP [58], PECUZAL [59] and the method by Garcia & Almeida [60].However, the resultant lags do not usually agree between algorithm due to their differing notions of optimal delays.Additionally, the calculated lags typically do not bear a clear explainable relationship to the time scales within the observed time series.
To address this, we use the SToPS algorithm, a persistent homology based algorithm proposed by Tan et al. [52], that identifies dynamically relevant time scales from the recurrence behaviour of chaotic and non-stationary time series.Instead of outputting a collection of time lags, SToPS calculates the significance of each potential time lag to construct a spectrum of time scales from which individual lag values may be selected to construct a delay vector.This provides a more accurate, explainable and robust method for analysing time series that may contain multiple disparate time scales and magntitudes, such as ECG, by allowing for the construction of well unfolded attractors that more accurately capture the underlying dynamics in phase space.
Such an approach is reminiscent of a Fourier power spectrum, with added benefit of being applicable to data with chaotic behaviour or non-stationary statistics.

Phase Space Reconstruction to Attractor Networks
Following the time delay embedding of time series to reconstruct the system's attractor in phase space, the next step requires the construction of attractor networks by discretising phase space.This step involves the construction of two components: the spatial and dynamics network.

Spatial Network
One of the drawbacks of the network-base method for identifying attractors presented by St. Luce et al. [32] is the potentially poor computational scaling for increasing phase space dimension.This is due to the grid based voxel scheme used to discretise the entire phase space, where each voxel is represented by a single node [32].This results in a prohibitively large network for even modest phase space dimensions that requires extensive pruning before any further computation can be done.
To address this, we propose instead to use the original positions of the input data to guide the discretisation of phase space.Randomly sampled points from the embedded training data are selected as kernels in phase space and used to construct a sparse point cloud representation of the attractor.Additional sampled points are then progressively added in cycles until the the desired density or number of points in the attractor is reached.However, depending on the density distribution of the attractor and sampling of points, the simple addition of more points may not always increase the detail of the attractor structure captured by the point cloud.This results from the presence of redundant points that are near neighbours in high density regions of the attractor.To address this, the point cloud is trimmed by replacing high density cluster Attractor Networks of points in the attractor with a single 'centre-of-mass' point.The algorithm for iteratively constructing the spatial network is given as follows: 1. Let A be the collection of points in the spatial network and ϵ be a predefined size scale.2. Sample a set of points S from the training input data set and add it to A: S ∪ A → A 3. Calculate the pairwise distance δ ij for all point pairs (⃗ x i , ⃗ x j ) ∈ A. Define a neighbour adjacency matrix D with entries: For every node i in the network represented by D, calculate the local clustering coefficient c i , where k i and n ∆i are the degree and number of triangles passing through node i respectively.5. Let C be the collection of nodes whose clustering coefficient c i > 0.5.Let k max be the highest degree amongst all the nodes in C. For all ⃗ x i ∈ C with k max neighbours, calculate point averaged across all neighbours given by, and replace the cluster of points in A with the average point, Repeat steps 4 and 5 until k max < 3 (i.e.no triangle clusters exist).7. Repeat steps 2 to 6 until all the input training data has been included.
In the above formulation, the density parameter ϵ prescribes a minimum allowable distance between point pairs in the attractor.This limits the amount of redundant points in high density regions, which alleviates the computational burden in later steps.The cutoff threshold for the maximum clustering may also be adjusted to control the aggressiveness of the trimming procedure.

Dynamics Network
The dynamics network aims to describe the vector field dynamics of the system within phase space.However, this first requires a discretisation of continuous phase space.To do this, the points in A are used as kernels to discretise the continuous phase space into a finite collection of bounded cells with characteristic spatial scale δ.Because A only contains points that are relevant to the attractor of the system, the resulting discretisation automatically excludes regions which are greater than a notional distance of δ from the attractor.This removes the need to trim off irrelevant regions of the discretised phase space, previously required in the the grid-based voxel approach by St. Luce et al [32].
In order to encode vector fields and dynamics of trajectories into the attractor network representation, pairs of observed transitions between successive points in phase space are used to create a transition matrix.
1. Let M be an |A| × |A| matrix, B is the set of input training points for learning the dynamics network and A is the collection of points in the spatial network.2. Consider a point ⃗ x i (t) ∈ B and its observed next position ⃗ x i (t + dt).The pair of points (⃗ x i (t), ⃗ x i (t + dt)) correspond to a transition between two regions of the discretised state space.3. Identify up to N max each of attractor points ⃗ a, ⃗ b ∈ A that are within a distance δ from (⃗ x i (t),⃗ x i (t + dt)) respectively.The hyperparameter N max controls the edge density of the resultant attractor network.Calculate the corresponding distances δ a , δ b where, 4. Calculate a scaling value α given by the following expression, 5. Add a corresponding weight to the element M ab corresponding the the transition between attractor points ⃗ a, ⃗ b ∈ A, where K is a shape parameter.6. Repeat steps 2-5 for all points ⃗ x i (t) ∈ B.
Observed transitions in the training data are used to record the frequency of transitions between any two points in discretised phase space.The characteristic spatial scale δ is taken to be the 99% quantile of all closest neighbour distances from points included in the spatial network.The value α captures the goodness of fit between the endpoints of an observed transition and the available possible attractor points in the discretised space.Lower values of α result in a contribution closer to 1 to the tally of transitions.For α > 0, inaccuracies in the fit are scaled by K, which governs the penalty applied when calculating the contribution of the observed transition.
Finally, the matrix M maybe converted into a stochastic transition matrix M f , which we label the flow matrix.Combined, flow matrix M f and attractor points A work together to form a discretised network representation of system dynamics in phase space where M f is likened to the Perron-Frobenius operator acting on a domain A.
One additional consideration is that the directed matrix M will likely have nodes with no outdegree.These correspond to points in A that are infrequently visited or are strong sinks or stable nodes.Therefore, calculating M f requires the iterative removal of all nodes with 0 outdegree until none remain.

Surprise
The phase space approach to change point detection relies on the argument that changes in the underlying dynamics of the data generating process correspond to trajectories in phase space that deviate from the original attractor of the system when operating in its normal state.In terms of the discretised phase space, this would correspond to the observation of transitions that are either unlikely or not previously encountered within the training set.The attractor network approach to change point detection uses the calculation of a metric S(t), named the 'surprise', which aims to quantify the level of surprise observed in a given transition.
Consider an observed transition at time t between two nodes i → j in the constructed attractor network.Let k i be the outdegree of node i, and p ij be the calculated probability of transition from the stochastic flow matrix M f .Let N be the number of data points used to train the attractor network.If a transition between nodes i → j was not observed within N observations, p ij is assigned a value of 1/2N to avoid singularities, Let H max (i) be the maximum possible entropy for a given node i in the attractor network based on its outdegree k max .This is corresponds to the case where all outdegree transitions are equally probable, Similarly, H(i) is defined as the actual entropy calculated based on the probability distribution of all k i outdegrees for node i, The quantification of surprise requires a normalisation with respect to the maximum possible entropy for each source node i to account for variations in the outdegree across different nodes in the attractor network.Observed uncommon transitions in uniformly distributed, high outdegree nodes are less informative as each outcome is equally unlikely.Therefore, we define a weighting value, η, to quantify the meaningfulness of an observation with respect the expected potential transitions in the attractor network, , otherwise .
The value of surprise S(t) = S(i, j) corresponding to an observed transition between two nodes i, j at time t in the constructed attractor network is then given by the normalised entropy expression,

Change Point Detection
One common approach of change point detection is to identify deviation in a statistical measure between a known healthy mode and incoming observations.Significant deviations in this measure can be used to infer the presence of a change point.For example, the attractor network approach for change point detection corresponds to identifying consecutive transitions with persistently high values of surprise (i.e.trajectories are more frequently deviating away from the attractor of the normal state).More generally, given some measured statistic S(t) (such as the attractor network surprise), it is possible to convert S(t) into a binary measure of normal vs. abnormal by filtering according to some interval [S min , S max ].Therefore, we can define a new binary measure Ŝ(t), where Ŝ(t) = 1 corresponds to an observation that is abnormal.The values of S min and S max can be selected as quantile of the distribution of observed S(t) from data that is known to be in the healthy regime (i.e. from the training data set).In the case of attractor network surprise, (S min , S max ) = (0, S * ) where S * is selected to be the 95% upper quantile of the data because surprise is always positive.Depending on the statistic, temporary and benign fluctuations in the tracked measure S(t) may result in brief, intermittent classifications of observations as abnormal (i.e.Ŝ(t) = 1).To ensure that an observed change is indeed a genuine change point, incoming observations must be consistently classified as abnormal for an extended period of time τ E to be flagged as potential change points.This can be done by applying exponential smoothing to the Attractor Networks binary series Ŝ(t), where β = 1/τ E is a smoothing parameter.For automatic change point detection, change points are defined when E(t) exceeds some threshold k The second measure of performance p H is a modified form of p that is accounts for both proportion of true positive classifications and the persistence of the classification.In this measure, a string of identical classifications would be weighted higher than those with intermittent true positives.The score for p H is calculated as, where H is the normalised entropy of the distribution of the ordered observations given by, The probability p n is calculated by partitioning the observed string of classifications into groups of successively identical classifications.For example, consider an observed time period with 10 timesteps with a string of classifications given by 0001101111 where a value of 1 corresponds to an observation that is classified as abnormal.This set of observations may be partitioned into 4 groups resulting in the sets of probabilities (p 1 , p 2 , p 3 , p 4 ) = (0.3, 0.2, 0.1, 0.4).Therefore, the quantity H is a measure of the fragmentation of an observed string of classifications where larger values of H indicate more fragmented and intermittent classification of observations of either normal or abnormal.

Quantifying Preemptive Detection Performance
One of the desirable qualities of online change point detection is to predict the occurrence of a change point for some period of time before an event.This can be particularly useful in mission critical systems, such as VF detection, where early or preemptive detection can inform appropriate intervention measures.In the context of VF detection, it is common for arrhythmias to occur prior to the onset of VF.This occurrence potentially suggests gradual changes in the heart dynamics prior to VF, which may otherwise be difficulty to detect by visual inspection of the ECG.The detection of this dynamical change may be used to inform the early warning detection of VF.
For our VF detection analyses, we propose two different measures to quantify and compare the ability for each change point detection method in preempting the onset of VF.For both measures, we consider a time interval [t V F − τ p , t V F ] of increasing length τ p prior to the annotated onset of VF.For every given τ p , we calculate two measures across the time interval: (1) the length of the longest uninterrupted streak of positive detections leading up to the onset of VF, and (2) the proportion positive detections across the entire time interval.The first measure aims to measure the degree of persistence in the positive detection rate where longer positive detection streaks suggests a more confident preemptive detection of an incoming VF episode, whereas the second measure is a naïve measure of the positive detection rate.
Due to the nature of the data and lack of information regarding the real physiological condition of each patient, it is not possible to determine if early detections from each method are true or false positives.Therefore in both cases, we assume that all positives are true positive and limit our analyses to only relatively short maximum window length τ p = 1250 prior to the annotated onset of VF.Only detection relating to the first annotated VF onset in each patient is used to calculate the results as physiological changes to the heart following VF may result in permanent changes to ECG dynamics, which may yield artificially increased number of positive detections.
Scores are calculated for each individual patient.A final performance score for each detection method is then taken as the average scores across all 29 analysed patient ECGs (see Figure 9).The attractor network surprise approach was found to provide longer and more persistent streak of positive detection leading up to the onset of VF compared to the moving average, standard deviation and permutation entropy methods.Positive detection rates were also slightly higher for the attractor network surprise method.
In Figure 9b, it is expected that increasing window lengths will cause gradual decreases in the positive rate as the window length begins to include periods of normal healthy behaviour.However, we find that for the attractor network surprise and moving standard deviation methods, there is a peak in the performance rate partway through the profile.Further inspection into the failure Attractor Networks modes of each patient reveals that this is the result of the method detecting the occurrence of irregular ECG behaviour for some period before the annotated onset of VF (see Figure 10).At the onset, some of these signals collapse into high frequency oscillatory dynamics.This can correspond to a trajectory that lingers in a high density region of the attractor network resulting in artificially low suprise values.by the Australian Government.M.S is also supported by the ARC Discovery Grant DP200102961, funded by the Australian Government.

Declarations 4.6 Data Availability
Instructions on how to generate the artificial data in this paper have been provided.All experimental data are publicly available on PhysioNet [39,40].

Appendix B Phase Coherence vs. Non-Phase Coherence Detection
Chaotic oscillators such as the Rössler system are distinct from periodic systems in that they do not exhibit an exact frequency [61,62].Observation of the power spectrum of chaotic signals reveals activations across the whole band of considered frequencies.Phase coherence (PC) within a chaotic oscillator is characterised by the presence of a well defined peak on the power spectrum.An equivalent condition is that the phase of the signal changes monotonically where the phase can be approximated from the signal using a Hilbert transform [63].This often corresponds to trajectories in phase space that rotate along a collection of orbits around some central point [62].Parallel trajectories on these orbits have phases that are similar throughout the whole period of rotation.In contrast, non-phase coherent (NPC) systems do not have a clearly defined phase relationship over time.
The Rössler chaotic oscillator is one that exhibits both PC and NPC dynamics at different bifurcation values (see Fig. B5) [63] with system equations given by ẋ = −y − z, ẏ = x + αy, ż = z(x − c), where b = 0.4 and c = 8.5.The phase coherence of the system can be controlled by the parameter α = 0.165 and α = 0.265 corresponding ot the PC and NPC regimes respectively.This results in two similar attractors with structurally different features.This feature was used to test the attractor network approach in detecting changes in attractor structure in phase space.
Here, the change point detection task is to identify when the signal changes between the PC and NPC regime.To do this, an attractor network is trained on data generated from a Rössler system operating in the PC regime.An additional shorter length of training data of length 20000 was used to get the 95% cutoff value for S(t) operating on the original system.The test data consisted of the simulated oscillator with a modulating bifurcation value α switching between the PC α = 0.165 and NPC α = 0.265 regimes every 2000 steps.The resulting S(t) was used to evaluate E(t) and determine transition points.
The results revealed that the attractor network method performed equally well if not better than other simpler metrics (moving statistics, permutation entropy) in providing distinct cutoffs for the change points transitions between PC and NPC (see Fig. B6).However, we find that the converse problem of Attractor Networks  detecting PC transitions using attractor networks trained on NPC was not successful, whereas simpler metrics performed consistently.This difference in performance can be attributed to the fact that the PC Rössler attractor is spatially almost a subset of the NPC attractor (see Fig. B5) when discretised with respect to some selected resolution ϵ.Whilst there exists differences between the state space distributions of both attractors, S(t) aims to capture the surprise of transition at a given point in time.Thus, such a method would be limited to the resolution governed by δ and ϵ as defined in Sections 4.

Fig. 2 (
Fig. 2 (a) Illustration of a constructed attractor network with edges weighted by approximate transitions probabilities.Observed states are given by four red and blue nodes with the resulting trajectories shown as dashed line.Node observations are coloured according to surprise value S(t) (red is high surprise).Trajectories (dashed) are coloured either red or blue based on the classification of normal or unhealthy.Red observed states and trajectories (i.e.high S(t) and E S (t)) describe unexpected trajectories that enter into a previously unvisited part of phase space.Persistent high surprise values are indicative of change points.(b) Attractor network constructed from a delay embedded ECG with a trajectory showing the transition from normal to the ventricular fibrillation regime.The identified change point from E S (t) is indicated by a black cross.

Fig. 4
Fig. 4 Time series and phase space reconstruction representations of ECG time series showing the healthy (blue) and VF (red) dynamics.(a) Phase space reconstruction showing the trajectory of the healthy (blue) dynamics lying close to a stable orbit (the attractor).In contrast, the onset of VF (red) results in a structural collapse of the attractor.(b) An extract of the corresponding scalar ECG time series with the onset of VF labelled by a dashed vertical red line.The onset of VF is characterised by a change from regular oscillatory motions (blue) into erratic dynamics (red).

Fig. 6
Fig.6Extract of the artificial time series containing a single concatenation of the original Chua signal (blue) and an AAFT surrogate (red).The differences in the dynamics are not clearly visible.

Fig. 9
Fig.9Comparison of performance profiles for each method: (a) proportion of patients with a given length of consecutive positive detections, and (b) positive detection rate for a given window length prior to the annotated onset of VF.

Fig. 10
Fig.10Examples of two different VF failure modes where signal irregularities occur prior to the onset of VF.(a) Patient cu20 exhibits irregular oscillatory dynamics for an extended period prior to the annotated onset that is detected by the attractor network.However, signal collapses into a high frequency oscillation following onset.(b) Patient cu04 where signal irregularities are detected before the onset of VF (shown by the two peaks in the surprise) followed by a similar collapse into high frequency oscillatory dynamics.The second VF in the ECG recording is subsequently detect later.

Fig. A1
Fig.A1VF detection results using the moving average as the score for change point detection.Annotated onsets for VF are given black crosses, and flagged detection of VF is given by a red dot.The onset of VF is characterised by a persistent detection result (red).

Fig. A2
Fig.A2VF detection results using the moving standard deviation as the score for change point detection.Annotated onsets for VF are given black crosses, and flagged detection of VF is given by a red dot.The onset of VF is characterised by a persistent detection result (red).

Fig. A3
Fig.A3VF detection results using the moving permutation entropy as the score for change point detection.Annotated onsets for VF are given black crosses, and flagged detection of VF is given by a red dot.The onset of VF is characterised by a persistent detection result (red).
Fig. B5 Time series of the Rössler system operating in the (a) PC (α = 0.165) (blue) and (b) NPC (α = 0.265) (red) regime.(c) The corresponding phase space reconstruction where the attractor for the PC system (blue) is almost a subset of the NPC attractor (red).

Fig. B6
Fig.B6Results of all change point algorithms for detecting transitions between NPC and PC Rössler time series.From the top to bottom: Original time series, moving average, moving standard deviation statistic, moving permutation entropy, and surprise scores S(t).Sliding window lengths of 100 steps were used for calculating moving averages.Detections (red) are made based on the exponential smoothed quantity E(t) with respect to a set cutoff E * (seeEq 14).Real change points are given by vertical orange lines.All methods except the moving average are able to detect change points.However, only the the surprise score produces a persistent classification of observations as abnormal.
Fig. C7 (a) The calculated average surprise for each 50 step window corresponding to a given value of α with (b) containing the corresponding bifurcation diagram of the system.Blue and red represent the two disjoint scroll attractors which eventually merge at α ≈ 19.05.(c) The two disjoint scrolls attractors corresponding to the same bifurcation diagram at α = 17.

Table 1
Summary of detection performance scores for the four methods (moving average, moving standard deviation, moving permutation entropy and attractor network surprise).Scores are averaged across data from 29 patients.Each entry lists in order the mean, median and standard deviation.
•E * , where E * is the 95-percentile of observed E(t) in the input training data set and k is a multiplier term that determines the strength of the threshold.Because E(t) can be calculated with respect to an arbitrary measure S(t), we use the following notation E M A (t), E M ST D (t), E M P E (t) and E S (t) for the exponential smoothed series resulting from the moving average, moving standard deviation, moving permutation entropy and attractor network surprise scores respectively.To quantify the accuracy of detecting VF onset, we propose two measures of the true positive rate given p and p H .A true positive detection is defined as the classification of an observation as abnormal for a time period [t s , t e ] before or after the annotated onset of VF with a total length T timesteps.The first measure of performance p is calculated as the proportion of timesteps within the abovementioned time period that are correctly classified as abnormal.