Can we predict the unpredictable?

Golestani, Abbas; Gras, Robin

doi:10.1038/srep06834

Download PDF

Article
Open access
Published: 30 October 2014

Can we predict the unpredictable?

Abbas Golestani¹ &
Robin Gras^1,2,3

Scientific Reports volume 4, Article number: 6834 (2014) Cite this article

17k Accesses
17 Citations
73 Altmetric
Metrics details

Subjects

Abstract

Time series forecasting is of fundamental importance for a variety of domains including the prediction of earthquakes, financial market prediction and the prediction of epileptic seizures. We present an original approach that brings a novel perspective to the field of long-term time series forecasting. Nonlinear properties of a time series are evaluated and used for long-term predictions. We used financial time series, medical time series and climate time series to evaluate our method. The results we obtained show that the long-term prediction of complex nonlinear time series is no longer unrealistic. The new method has the ability to predict the long-term evolutionary trend of stock market time series and it attained an accuracy level with 100% sensitivity and specificity for the prediction of epileptic seizures up to 17 minutes in advance based on data from 21 epileptic patients. Our new method also predicted the trend of increasing global temperature in the last 30 years with a high level of accuracy. Thus, our method for making long-term time series predictions is vastly superior to existing methods. We therefore believe that our proposed method has the potential to be applied to many other domains to generate accurate and useful long-term predictions.

Time-series forecasting through recurrent topology

Article Open access 09 January 2024

A time-reversed model selection approach to time series forecasting

Article Open access 28 June 2022

Deciphering Dynamical Nonlinearities in Short Time Series Using Recurrent Neural Networks

Article Open access 02 October 2019

Introduction

The prediction of future values in a complex time series is a major interest for scientists^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17}, with applications to various scientific fields^18,19,20. There are many natural phenomena that require a prediction algorithm for answering important questions, such as estimating future population variations, predicting the orbits of astronomical objects and predicting the occurrence of seismic waves. The prediction of twenty-first century global temperature rise would be a valuable resource for policy makers and planners⁶. Population projections may be used to predict species extinction before they reach a crisis point⁵. Moreover, prediction is an ongoing and pressing problem in the forecasting of economic time series⁸. In the medical sciences, there are also many applications for which an efficient prediction algorithm could save lives. A large number of time series obtained from monitoring the human body can be used as a basis for the decision-making process to treat or prevent grave diseases such as epilepsy or Alzheimer's disease^1,2,3.

It has been shown that data generated by such natural phenomena often behave chaotically¹¹. Although chaotic behaviours are deterministic, their complex properties make it difficult to distinguish them from random behaviour. Chaotic behaviours are known to be strongly dependent on initial conditions; small changes in initial conditions can possibly lead to immense changes in subsequent time steps and are particularly difficult to predict. Because the exact conditions for many natural phenomena are not known and the properties of a chaotic time series are very complex, it is difficult to model these systems.

Most of the existing methods for complex time series prediction are based on modelling the time series to predict future values, although there are other types of methods such as agent-based simulations that model the system generating the time series²¹. Model-based approaches can be classified into two main domains: linear modelling such as in ARIMA (autoregressive integrated moving average)¹⁸ and nonlinear modelling such as in MLP (multi-layer perceptron)²² and GARCH (generalised autoregressive conditional heteroskedasticity)²³ (for details, see Supplementary Information, section 1). However, other studies have concluded that there is no clear evidence in favour of nonlinear models over linear models in terms of forecasting performance⁷. Regardless, there is no robust procedure that can produce an accurate model for chaotic time series. For all of these methods, the prediction error increases dramatically with the number of time points predicted^7,9,21. Therefore, most of the existing methods focus on very short-term predictions to obtain a reasonable level of accuracy. None of the existing methods demonstrate an acceptable level of accuracy for long-term prediction¹². For example, for financial time series predictions, most methods can predict only one step ahead, which is not very helpful for acting against a financial crisis before it occurs^10,12,21. To address this deficiency in existing methods, we propose a novel approach to making long-term time series predictions (see Methods), GenericPred²⁴, with applications to financial time series, medical diagnosis and global temperature prediction.

Results

For the DJIA (Dow Jones Industrial Average) time series between 1993 and 2001, when markets were stable with no major changes and no financial crisis, our proposed GenericPred method predicts the trend with very high accuracy, whereas the predictions based on other methods strongly and rapidly diverge from the real data (see Fig. 1a and Supplementary Table 1). GenericPred is the only method able to discover the decreasing trend corresponding to the financial crisis, while the three other methods predicted a growth in the stock market (see Fig. 1b and Supplementary Table 1). Although the 2009 financial crisis data were used for training in the third time period, when the recession was in the middle of the considered range, the GenericPred method successfully discovered the general trends for the next 500 days, with a particularly high accuracy for the first 300 days, effectively predicting the increase in the stock market (see Fig. 1c and Supplementary Table 1). The other three methods failed to predict the trend.

The GenericPred method's prediction errors were significantly lower than the other methods for all three periods with respect to both short-range and long-range predictions (see Supplementary Table 1). Moreover, the GenericPred predictions were more stable with a constant lower standard deviation, regardless of whether the target data lie before the recession, during the recession, or after the recession. The prediction error for the first 200 time steps is especially smaller than that of the other methods. For the three considered periods, GenericPred also significantly outperformed L-FABS²¹ and the MLP²² model for short-term predictions (see Supplementary Table 1). GenericPred is also stable given that it has a low sensitivity to the starting point of the prediction (see Supplementary Figure 2).

For the EEG time series, the P&H chaoticity values²⁵ were predicted using GenericPred (see Supplementary Figures 2 and 3) on a constant-length (20 minutes) sliding window (the window moves every 20 seconds) of the five EEG time series for all the patients. During a seizure, a peak in the P&H values obtained from the EEG time series appeared. Based on the analysis of all 21 patients, a threshold for the prediction of seizures (P&H value equal to 2.4) was determined based on patient data (see Supplementary Information). Using this threshold, the GenericPred method can predict the epileptic seizure with 100% sensitivity and specificity up to 17 minutes in advance (see Table 1). The same results have been obtained by considering the data of any five electrodes independently. This represents a considerable improvement over one of the best current predictive methods, which only achieves accuracy levels of 73% sensitivity and 67% specificity for 10 patients within a 1–10 minute range²⁶.

Table 1 Sensitivity and specificity of epileptic seizure prediction for 21 patients for different lengths of prediction. For each patient, one positive and 10 negative samples were constructed. The positive sample contains one epileptic seizure event and the 10 negative samples are seizure-free. Therefore, there are 21 positive and 210 negative samples in total that were used to compute the specificity and the sensitivity accuracy levels

Full size table

For the global temperature anomalies prediction, unlike ARIMA, GenericPred accurately predicted the increasing trend in the last 30 years (Fig. 2a). Moreover, most of the successive peaks and depressions were predicted with a high degree of precision. The mean square error for GenericPred is 0.64, whereas for ARIMA, it is 1.6. The GARCH and VAR methods were unable to make predictions due to insufficient data.

The existing dedicated forecasting models for global temperature anomalies predict that, as the world consumes more fossil fuel, greenhouse gas concentrations will continue to increase and the Earth's average surface temperature will continue to rise²⁷. Based on recent predictions, average surface temperatures could rise between 2°C and 6°C by the end of the 21st century^6,28. We predicted the global temperature anomaly until the end of the 21st century (2014–2100) (Fig. 2b). GenericPred predicted an average anomaly of 2.5°C for the years 2085–2100, which is in accordance with the predictions of the dedicated models.

Our approach demonstrates a significant gain in accuracy over traditional methods with respect to predicting different DJIA time series for both short- and long-term predictions. Its ability to predict the evolutionary trend of the stock market is vastly superior to the predictive ability of existing methods. Further, the ability of our GenericPred method to predict epileptic seizures could be a major breakthrough in terms of managing epilepsy, providing, for the first time, a robust method for predicting an epileptic seizure long before its occurrence. Finally, the success of GenericPred's predictions of the trend of increasing global temperature shows that this method has the potential to be a powerful tool for controlling the global warming phenomenon. Another advantage of our approach is that it does not rely on a complex model of the original time series and is therefore highly general and highly computationally efficient.

This method provides a first step towards accurate and comprehensive time series long-term predictions. Although, with respect to long-term predictions, it is impossible to predict exact values, GenericPred's performance shows great potential for predicting time series' trends.

Methods

Several researchers emphasise the potential of market predictions to improve important financial decisions²⁹, from helping businesses make sounder investment decisions to helping governments make more efficient fiscal and monetary policy decisions⁸. These time series are amongst the most complex time series because of the number of parameters involved. Our results are compared with respect to long-term predictions with ARIMA, GARCH and VAR³⁰, which are the most widely used and most efficient methods for making long-term time series predictions. We also compared our results for short-term predictions with those obtained by two existing methods: the Learning Financial Agent Based Simulator (L-FABS)²¹ and the MLP model²².

For the first period, we considered the DJIA (Dow Jones Industrial Average) time series between 1993 and 2001, when markets were stable with no major changes and no financial crisis. In the second period considered, the US stock market peaked in October 2007, but by March 2009, the Dow Jones average had reached its minimum, which reflects the most serious effects of a financial crisis. In the third period (August 2004–August 2012), the recession was in the middle of the considered range.

Another important application of time series predictions is in medical science. Approximately 1% of the world population suffers from epilepsy³¹. Epileptic seizures are the result of unusual and irregular neuronal activity in the brain^32,33. Many recent methods have been proposed for predicting epileptic seizure^26,34,35 but none of them as shown their ability to perform accurate predictions more than 10 minutes in advance on a large number of patients. To evaluate the performance of our new method for predicting epileptic seizures, we examined the EEG time series measured by five electrodes, generating five different time series, for 21 patients. For each EEG time series, the exact time of the seizure is known.

Predicting the monthly records of global temperature anomalies is currently one of the most pressing and controversial environmental concerns³⁶. As a third experiment, we used the global temperature anomaly data from 1880 to 1983 to train for the prediction of global temperatures during 1983–2013. Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS), which have data from 1880 to the present. These two datasets are blended into a single product to produce the combined global land and ocean temperature anomalies.

Our new method for complex time series prediction is based on the concepts of chaos theory and an optimisation process. The general idea is to extract a unique characteristic from an existing time series that somehow represents the behaviour of the time series and to subsequently generate successive new values that continue the time series, each value minimising the difference between the characteristic of the new time series and the initial one. The details of the GenericPred method for long-term time series prediction are as follows. We consider a time series S_N:

A nonlinear measure V() is computed on S_N The fractal dimension³⁷ and the Lyapunov exponent³⁸ are examples of such nonlinear measures that return a single value for a time series. A possible mapping may be required, forming a new time series , for different applications as follows:

otherwise, , where 0 < L < N is the size of a sliding window used to compute the local level of chaos measured by V(). Therefore, when the mapping is applied, the new considered time series corresponds to the variation in time of the local non-linear measure in the initial time series S_N.

We consider as a reference value that will be used for predicting the next k values of the time series:

The parameter σ of a normal distribution N(y_i, σ²) is estimated by computing the variation between every two consecutive values (y_i to y_i₊₁) of the time series . This distribution represents the probability distribution P(y_i|y_i₋₁) (see Fig. 3). Several data sets have been considered to determine that a normal distribution is a good approximation of the real distribution. However, the same method has been applied using other distributions without significant degradation is the prediction.

For predicting y_N_+i, a set Pos(y_N_+i) of N_rand random values are generated following the distribution N(y_N_+i−1, σ²) (Fig. 3):

N_rand is a parameter that can impact the quality of the prediction because having more values will increase the chance of finding an optimal value. However, no significant improvement was observed for the data considered when N_rand was greater than 10. For this reason, we chose 10 as the value of N_rand for each experiment. y_N_+i is then computed by selecting the that makes the new nonlinear measure the closest to :

The value is chosen to make as close as possible to .

The important point is that the reference value is always , which is the calculated nonlinear measure from the original time series. Therefore, the GenericPred method uses two basic rules:

R1: Always endeavour to keep the value of a nonlinear measure as steady as possible during prediction (Fig. 3).

R2: The new value must be chosen from a set of potential values generated from a probability distribution.

The prediction has to be pursued one step at a time because the predicted value in the current step is needed for determining the valid range of change for the next step. For those problems for which a binary prediction (‘yes’ or ‘no’) is required, (e.g., the epileptic seizure prediction), a threshold t is computed from the learning data. Whenever the value y_N_+i is greater than the threshold t, the prediction is positive. For example, yes there is an epileptic seizure at time N + i if y_N_+i > t; otherwise, there will be no seizure at time N + i.

Classical model-based prediction approaches consider a unique value for the next step, whereas in the GenericPred method, several points are considered simultaneously. Our method is also able to constantly adjust the information regarding the current time series, whereas classical predictive methods apply the model without taking into account the concordance between the original time series and the predicted ones. Technically, any nonlinear measure could be used for the time series characterisation. However, here, we used the P&H method²⁵ because it has been shown that this method can efficiently discriminate between different types of nonlinear behaviour³⁹.

References

Lehnertz, K. & Elger, C. E. Can epileptic seizures be predicted? evidence from nonlinear time series analyses of brain electrical activity. Phys. Rev. Lett. 80, 5019–5023 (1998).
Article ADS CAS Google Scholar
Jeong, J. Nonlinear dynamics of EEG in Alzheimer's disease. Drug. Dev. Res. 56, 57–66 (2002).
Article CAS Google Scholar
Dauwels, J., Vialatte, F., Latchoumane, C., Jeong, J. & Cichocki, A. EEG synchrony analysis for early diagnosis of Alzheimer's disease: a study with several synchrony measures and EEG data sets. Paper presented at Engineering in Medicine and Biology Society, Minneapolis. US-New York: IEEE, 2224–2227 (2009 Sep).
Newman, W. I., Gabrielov, A. M. & Turcotte, D. L. Nonlinear Dynamics and Predictability of Geophysical Phenomena [Newman, W. I., Gabrielov, A. M. & Turcotte, D. L. (ed.)][15–35] (American Geophysical Union, Washington, 1994).
Sugihara, G. et al. Detecting causality in complex ecosystems. Science 338, 496–500 (2012).
Article ADS CAS Google Scholar
Stott, Peter, A. & Kettleborough, J. A. Origins and estimates of uncertainty in predictions of twenty-first century temperature rise. Nature 416, 723–726 (2002).
Article ADS CAS Google Scholar
Clements, M. P., Franses, P. H. & Swanson, N. R. Forecasting economic and financial time-series with non-linear models. Int. J. Forecast. 20, 169–183 (2004).
Article Google Scholar
Arrow, K. J. et al. The promise of prediction markets. Science 320, 877–878 (2008).
Article CAS Google Scholar
Zhang, G. P. & Kline, D. Quarterly time-series forecasting with neural networks. IEEE. Trans. Neural. Netw. 18, 1800–1814 (2007).
Article Google Scholar
Gooijer, J. D. D. & Hyndman, R. 25 Years of time series forecasting. Int. J. Forecast. 22, 442–473 (2006).
Article Google Scholar
Romanelli, L., Figliola, M. A. & Hirsch, F. A. Deterministic chaos and natural phenomena. J. Stat. Phys. 53, 991–994 (1988).
Article ADS MathSciNet Google Scholar
Zanin, M. Forbidden patterns in financial time series. Chaos 18, 013119 (2008).
Article ADS Google Scholar
Boettiger, C. & Hastings, A. Tipping points: From patterns to predictions. Nature 493, 157–158 (2013).
Article ADS CAS Google Scholar
Scheffer, M. Complex systems: Foreseeing tipping points. Nature 467, 411–412 (2010).
Article ADS CAS Google Scholar
Podobnik, B. et al. Network Risk and Forecasting Power in Phase-Flipping Dynamical Networks. Phys. Rev. E 89, 042807 (2014).
Article ADS CAS Google Scholar
Drake, J. M. & Griffen, B. D. Early warning signals of extinction in deteriorating environments. Nature 467, 456–459 (2010).
Article ADS CAS Google Scholar
Becks, L., Hilker, F. M., Malchow, H., Jurgens, K. & Arndt, H. Experimental demonstration of chaos in a microbial food web. Nature 435, 1226–1229 (2005).
Article ADS CAS Google Scholar
Box, G. E. P., Jenkins, G. M. & Reinsel, G. C. Time Series Analysis: Forecasting and Control (Prentice-Hall, Englewood Cliffs, 1994).
Montgomery, D. C., Johnson, L. A. & Gardiner, J. S. Forecasting and Time Series Analysis (McGraw Hill, New York, 1990).
Gottman, J. M. Time Series Analysis (Cambridge University Press, Cambridge, 1981).
Neri, F. Learning and predicting financial time series by combining natural computation and agent simulation. EvoApplications 6625, 111–119 (2011).
Google Scholar
Zirilli, J. S. Financial prediction using Neural Networks (International Thompson Computer Press, London, 1997).
Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econometrics. 31,307–327 (1986).
Article MathSciNet Google Scholar
Filed under US Patent Application Serial Number 61/882863 (granted Sep. 26 2013).
Golestani, A., Jahed Motlagh, M. R., Ahmadian, K., Omidvarnia, A. H. & Mozayani, N. A new criterion for distinguish stochastic and deterministic time series with the Poincaré section and fractal dimension,. Chaos 19, 013137 (2009).
Article ADS MathSciNet Google Scholar
Shouyi, W., Chaovalitwongse, W. A. & Wong, S. Online seizure prediction using an adaptive learning approach,. IEEE Trans. Knowledge Data Eng. 25, 2854–2866 (2013).
Article Google Scholar
Seneviratne, S. I., Donat, M. G., Mueller, B. & Alexander, L. V. No pause in the increase of hot temperature extremes. Nature Clim. Change. 4, 161–163 (2014).
Article ADS Google Scholar
Smith, D. M. et al. Improved surface temperature prediction for the coming decade from a global climate model. Science 317, 796–799 (2007).
Article ADS CAS Google Scholar
Hahn, R. & Tetlock, P. Using information markets to improve public decision making. J. Law Pub. Pol. 28, 213 (2005).
Google Scholar
Lütkepohl, H. New Introduction to Multiple Time Series Analysis (Springer-Verlag, Berlin, 2005).
Meinardi, H., Scott, R. A. & Reis, R. The treatment gap in epilepsy: the current situation and ways forward. Epilepsia 42, 136–149 (2001).
Article CAS Google Scholar
Fisher, R. S., Boas, W. V. E., Blume, W., Elger, C. & Genton, P. Epileptic seizures and epilepsy: definitions proposed by the International League Against Epilepsy (ILAE) and the International Bureau for Epilepsy (IBE). Epilepsia 46, 470–472 (2005).
Article Google Scholar
Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I. & Bernard, C. On the nature of seizure dynamics. Brain, awu133 (2014).
Park, Y., Luo, L., Parhi, K. K. & Netoff, T. Seizure prediction with spectral power of EEG using cost-sensitive support vector machines. Epilepsia 52, 1761–1770 (2011).
Article Google Scholar
Feldwisch-Drentrup, H. et al. Joining the benefits: Combining epileptic seizure prediction methods. Epilepsia 51, 1598–1606 (2010).
Article Google Scholar
Ye, L. M., Yang, G. X., Van Ranst, E. & Tang, H. J. Time-series modeling and prediction of global monthly absolute temperature for environmental decision making. Adv. Atmos. Sci. 30, 382–396 (2013).
Article Google Scholar
Falconer, K. Fractal Geometry: Mathematical Foundations and Applications (Wiley, New York, 2007).
Rosenstein, M. T., Collins, J. J. & De Luca, C. J. A practical method for calculating largest Lyapunov exponents from small data sets. Phys. D. 65,117–134 (1993).
Article MathSciNet Google Scholar
Golestani, A. & Gras, R. Regularity analysis of an individual-based ecosystem simulation. Chaos 20, 043120 (2010).
Article ADS Google Scholar

Download references

Acknowledgements

We thank R. Scott and B. MacPherson for comments on the manuscript. This work was supported by the NSERC grant ORGPIN 341854, the CRC grant 950-2- 3617 and the CFI grant 203617. All financial data have been acquired from the website of the Federal Reserve Bank of St. Louis at http://research.stlouisfed.org. The EEG datasets of 21 patients were acquired from the Epilepsy Center of the University Hospital of Freiburg. Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) dataset and the International Comprehensive Ocean-Atmosphere Data Set (ICOADS),

Author information

Authors and Affiliations

School of Computer Science, University of Windsor, ON N9B 3P4, Canada
Abbas Golestani & Robin Gras
Department of Biology, University of Windsor, ON N9B 3P4, Canada
Robin Gras
Great Lakes Institutes for Environmental Research, University of Windsor, ON N9B 3P4, Canada
Robin Gras

Authors

Abbas Golestani
View author publications
You can also search for this author in PubMed Google Scholar
Robin Gras
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G. designed the study and analysed and interpreted the data. R.G. supervised the study. A.G. and R.G. wrote the paper.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/

Reprints and permissions

About this article

Cite this article

Golestani, A., Gras, R. Can we predict the unpredictable?. Sci Rep 4, 6834 (2014). https://doi.org/10.1038/srep06834

Download citation

Received: 08 July 2014
Accepted: 09 October 2014
Published: 30 October 2014
DOI: https://doi.org/10.1038/srep06834

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.