Predicting disruptive instabilities in controlled fusion plasmas through deep learning


Nuclear fusion power delivered by magnetic-confinement tokamak reactors holds the promise of sustainable and clean energy1. The avoidance of large-scale plasma instabilities called disruptions within these reactors2,3 is one of the most pressing challenges4,5, because disruptions can halt power production and damage key components. Disruptions are particularly harmful for large burning-plasma systems such as the multibillion-dollar International Thermonuclear Experimental Reactor (ITER) project6 currently under construction, which aims to be the first reactor that produces more power from fusion than is injected to heat the plasma. Here we present a method based on deep learning for forecasting disruptions. Our method extends considerably the capabilities of previous strategies such as first-principles-based5 and classical machine-learning7,8,9,10,11 approaches. In particular, it delivers reliable predictions for machines other than the one on which it was trained—a crucial requirement for future large reactors that cannot afford training disruptions. Our approach takes advantage of high-dimensional training data to boost predictive performance while also engaging supercomputing resources at the largest scale to improve accuracy and speed. Trained on experimental data from the largest tokamaks in the United States (DIII-D12) and the world (Joint European Torus, JET13), our method can also be applied to specific tasks such as prediction with long warning times: this opens up the possibility of moving from passive disruption prediction to active reactor control and optimization. These initial results illustrate the potential for deep learning to accelerate progress in fusion-energy science and, more generally, in the understanding and prediction of complex physical systems.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: System overview and disruption-prediction workflow.
Fig. 2: Example predictions on real shots from DIII-D and JET.
Fig. 3: High-performance computing results.

Data availability

The data for this study have restricted access, with permission required from the management of EUROfusion and General Atomics. DIII-D data shown in figures in this paper can be obtained in digital format by following the links at

Code availability

The code used in this work is open source and available from ref. 44.


  1. 1.

    Mote, C. Jr, Dowling, A. & Zhou, J. The power of an idea: the international impacts of the grand challenges for engineering. Engineering 2, 4–7 (2016).

    Article  Google Scholar 

  2. 2.

    Schuller, F. Disruptions in tokamaks. Plasma Phys. Contr. Fusion 37, A135 (1995).

    ADS  Article  Google Scholar 

  3. 3.

    De Vries, P. et al. Requirements for triggering the ITER disruption mitigation system. Fus. Sci. Technol. 69, 471–484 (2016).

    Article  Google Scholar 

  4. 4.

    Lehnen, M. et al. Disruptions in ITER and strategies for their control and mitigation. J. Nucl. Mater. 463, 39–48 (2015).

    ADS  CAS  Article  Google Scholar 

  5. 5.

    Tang, W. et al. Scientific grand challenges: fusion energy science and the role of computing at the extreme scale (US Department of Energy’s Office of Fusion Energy Sciences, Workshop March 18–20, Washington DC, 2009).

    Google Scholar 

  6. 6.

    Aymar, R., Barabaschi, P. & Shimomura, Y. The ITER design. Plasma Phys. Contr. Fusion 44, 519 (2002).

    ADS  CAS  Article  Google Scholar 

  7. 7.

    Wroblewski, D., Jahns, G. & Leuer, J. Tokamak disruption alarm based on a neural network model of the high-beta limit. Nucl. Fusion 37, 725 (1997).

    ADS  CAS  Article  Google Scholar 

  8. 8.

    Cannas, B., Fanni, A., Marongiu, E. & Sonato, P. Disruption forecasting at JET using neural networks. Nucl. Fusion 44, 68 (2004).

    ADS  CAS  Article  Google Scholar 

  9. 9.

    Murari, A. et al. Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees. Nucl. Fusion 48, 035010 (2008).

    ADS  Article  Google Scholar 

  10. 10.

    Vega, J. et al. Results of the JET real-time disruption predictor in the ITER-like wall campaigns. Fusion Eng. Des. 88, 1228–1231 (2013).

    CAS  Article  Google Scholar 

  11. 11.

    Windsor, C. et al. A cross-tokamak neural network disruption predictor for the JET and ASDEX upgrade tokamaks. Nucl. Fusion 45, 337 (2005).

    ADS  CAS  Article  Google Scholar 

  12. 12.

    Luxon, J. L. A design retrospective of the DIII-D tokamak. Nucl. Fusion 42, 614 (2002).

    ADS  CAS  Article  Google Scholar 

  13. 13.

    Matthews, G. et al. JET ITER-like wall—overview and experimental programme. Phys. Scr. 2011, 014001 (2011).

    Article  Google Scholar 

  14. 14.

    Freidberg, J. P. Plasma Physics and Fusion Energy (Cambridge Univ. Press, 2008).

  15. 15.

    Taylor, P. et al. Disruption mitigation studies in DIII-D. Phys. Plasmas 6, 1872–1879 (1999).

    ADS  CAS  Article  Google Scholar 

  16. 16.

    Tang, W. M. & Chan, V. Advances and challenges in computational plasma science. Plasma Phys. Contr. Fusion 47, R1 (2005).

    ADS  Article  Google Scholar 

  17. 17.

    De Vries, P. et al. Survey of disruption causes at JET. Nucl. Fusion 51, 053018 (2011).

    ADS  Article  Google Scholar 

  18. 18.

    Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).

    ADS  MathSciNet  CAS  Article  Google Scholar 

  19. 19.

    Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).

    CAS  Article  Google Scholar 

  20. 20.

    Rattá, G. et al. Feature extraction for improved disruption prediction analysis at JET. Rev. Sci. Instr. 79, 10F328 (2008).

    Article  Google Scholar 

  21. 21.

    Rattá, G. et al. Improved feature selection based on genetic algorithms for real time disruption prediction on JET. Fusion Eng. Design 87, 1670–1678 (2012).

    Article  Google Scholar 

  22. 22.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    ADS  CAS  Article  Google Scholar 

  23. 23.

    Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).

    Article  Google Scholar 

  24. 24.

    Liaw, A. et al. Classification and regression by randomForest. R News 2, 18–22 (2002).

    Google Scholar 

  25. 25.

    Chen, T. & Guestrin, C. XGoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).

  26. 26.

    Chollet, F. Deep Learning With Python (Manning Publications, 2018).

  27. 27.

    Barton, J. E., Wehner, W. P., Schuster, E., Felici, F. & Sauter, O. Simultaneous closed-loop control of the current profile and the electron temperature profile in the TCV tokamak. In American Control Conference (ACC) 3316–3321 (IEEE, 2015).

  28. 28.

    Tobias, B. et al. Commissioning of electron cyclotron emission imaging instrument on the DIII-D tokamak and first data. Rev. Sci. Instr. 81, 10D928 (2010).

    CAS  Article  Google Scholar 

  29. 29.

    De Vries, P., Johnson, M., Segui, I. & Contributors, J. E. Statistical analysis of disruptions in JET. Nucl. Fusion 49, 055011 (2009).

    ADS  Article  Google Scholar 

  30. 30.

    Goyal, P. et al. Accurate, large minibatch SGD: training ImageNet in 1 hour. Preprint at (2017).

  31. 31.

    Svyatkovskiy, A., Kates-Harbeck, J. & Tang, W. Training distributed deep recurrent neural networks with mixed precision on GPU clusters. In Proc. Machine Learning on HPC Environments 10 (ACM, 2017).

  32. 32.

    Top500 supercomputers. Available at (2018/01/11).

  33. 33.

    Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–523 (2015).

    ADS  CAS  Article  Google Scholar 

  34. 34.

    Coelho, R. et al. Synthetic diagnostics in the European Union integrated tokamak modelling simulation platform. Fus. Sci. Technol. 63, 1–8 (2013).

    CAS  Google Scholar 

  35. 35.

    Litaudon, X. et al. Overview of the JET results in support to ITER. Nucl. Fusion 57, 102001 (2017).

    ADS  Article  Google Scholar 

  36. 36.

    Deng, J. et al. Imagenet: a large-scale hierarchical image database. In IEEE Conf. on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  37. 37.

    Zavaryaev, V. et al. in Plasma Diagnostics (eds Kikuchi, M., Lackner, K. & Tran, M. Q.) Ch. 4, 360–534 (International Atomic Energy Agency, 2012).

  38. 38.

    Ferron, J. et al. Real time equilibrium reconstruction for tokamak discharge control. Nucl. Fusion 38, 1055 (1998).

    ADS  CAS  Article  Google Scholar 

  39. 39.

    Alonso, J. et al. Fast visible camera installation and operation in JET. In AIP Conference Proceedings Vol. 988, 185–188 (AIP, 2008).

  40. 40.

    Zadrozny, B., Langford, J. & Abe, N. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining 435–442 (IEEE, 2003).

  41. 41.

    Moreno, R. et al. Disruption prediction on JET during the ILW experimental campaigns. Fus. Sci. Technol. 69, 485–494 (2016).

    Article  Google Scholar 

  42. 42.

    Maas, A. L. et al. Learning word vectors for sentiment analysis. In Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Vol. 1 142–150 (Association for Computational Linguistics, 2011).

  43. 43.

    Marcus, M. P., Marcinkiewicz, M. A. & Santorini, B. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19, 313–330 (1993).

    Google Scholar 

  44. 44.

    Kates-Harbeck, J. & Svyatkovskiy, A. FRNN Codebase. (2017). 

  45. 45.

    Chollet, F. et al. Keras. (2015).

  46. 46.

    Abadi, M. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. Preprint at (2016).

  47. 47.

    Graves, A. Generating sequences with recurrent neural networks. Preprint at (2013).

  48. 48.

    Dean, J. et al. Large scale distributed deep networks. In Proc. 25th Internation Conference on Neural Information Processing Systems, vol. 1 1223–1231 (2012).

  49. 49.

    Chetlur, S. et al. cuDNN: efficient primitives for deep learning. Preprint at (2014).

  50. 50.

    Khomenko, V., Shyshkov, O., Radyvonenko, O. & Bokhan, K. Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. In IEEE First International Conference on Data Stream Mining & Processing (DSMP) 100–103 (IEEE, 2016).

  51. 51.

    Ruder, S. An overview of gradient descent optimization algorithms. Preprint at (2016).

  52. 52.

    Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011).

    Google Scholar 

  53. 53.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  54. 54.

    Das, D. et al. Distributed deep learning using synchronous stochastic gradient descent. Preprint at (2016).

  55. 55.

    Wu, R., Yan, S., Shan, Y., Dang, Q. & Sun, G. Deep image: scaling up image recognition. Preprint at (2015).

  56. 56.

    Chen, T. et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at (2015).

  57. 57.

    Titan: advancing the era of accelerated computing. Oak Ridge National Laboratory (accessed 2 April 2018).

  58. 58.

    Morgan, T. P. Japan keeps accelerating with Tsubame 3.0 AI supercomputer. The Next Platform (accessed 2 April 2018).

  59. 59.

    Summit: Oak Ridge National Laboratory’s 200 PetaFlop Supercomputer. Oak Ridge National Laboratory ( (accessed 2 April 2018).

  60. 60.

    Smith, L. N. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conf. on Applications of Computer Vision 464–472 (IEEE, 2017).

  61. 61.

    Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2014).

  62. 62.

    Gentile, C. & Warmuth, M. K. Linear hinge loss and average margin. In Proc. 1998 Conf. on Advances in Neural Information Processing Systems 225–231 (MIT Press, 1999).

Download references


We are grateful to E. Feibush from the US Department of Energy (DOE) Princeton Plasma Physics Laboratory (PPPL) and the Princeton Institute for Computational Science and Engineering (PICSciE) for assisting with visualization and data collection; to W. Wichser, C. Hillegas, J. Wells, S. Matsuoka, R. Yokota and T. Gibbs for supporting our supercomputing efforts; to T. Donne for facilitating collaborations with JET; to E. Joffrin, R. Buttery and T. Strait for leading the internal reviews of this work at JET and DIII-D; to A. Murari, J. Vega and the associated JET data analysis team for discussions of their classical machine-learning methods; and to M. Maslov for support with the JET data. We also thank R. Nazikian, N. Logan, M. Parsons and M. Churchill of PPPL; K. Felker of Princeton University; R. Granetz and C. Rea of the Massachusetts Institute of Technology (MIT); and P. DeVries of ITER for support and for discussions. We thank the JET contributors35 and management as well as General Atomics (GA) and its DIII-D tokamak project for access to their fusion databases. J.K.-H. was supported by the DOE Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security Administration in the DOE under contract DE-FG02-97ER25308. A.S. is supported by PICSciE, and W.T. by PPPL and PICSciE. This work was carried out within the framework of the EUROfusion Consortium, with funding from the Euratom research and training programme 2014–2018 under grant 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This material is based upon work supported by the US DOE, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under award DE-FC02-04ER54698. Disclaimer: this report was prepared as an account of work sponsored by an agency of the US Government. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the US Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof.

Reviewer information

Nature thanks Ned R. Sauthoff and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information




J.K.-H. conceived the idea, wrote the code including the HPC and MPI features, curated the datasets, ran and analysed computational experiments, generated the theoretical scaling analysis, produced the figures, and wrote the manuscript. A.S. extended and co-authored the code base and ran computational experiments, including initial deployment of the code on supercomputers. W.T. supervised and supported the implementation of the project at all stages, and initiated collaborations with JET and leading supercomputing facilities. All authors contributed to editing the manuscript.

Corresponding author

Correspondence to Julian Kates-Harbeck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 ROC curves from the test set for our model and the best classical model, for DIII-D and JET.

a, DIII-D; b, JET. The true positive rate is the fraction of disruptive shots that are labelled disruptive in advance of the 30 ms deadline. The false positive rate is the fraction of nondisruptive shots that are labelled disruptive at any time. The areas under the curves correspond to the values in Table 1. The insets show the fraction of detected disruptions as a function of the time to disruption for an ‘optimal’ threshold value. On the corresponding ROC curve of the same colour, this optimal threshold defines a point that is indicated by a circle (see main text for details). The inset also shows the 30 ms detection deadline as vertical red line. In a, the AUC is slightly higher for the classical method (see Table 1), but FRNN performs equally well in the interesting upper left region of high true positives and low false positives. Moreover, only our approach provides additional detections between 30 ms and 10 ms to the disruption, reacting to the spikes in radiated power that often occur on this timescale before the disruption (see Prad,core in Fig. 2c). Thus FRNN could provide improved predictive performance if mitigation technology becomes faster in the future. In addition, a threshold value in practice needs to be selected for calling alarms. The best threshold value is estimated by optimizing it on the training set, in the hope that it will still perform well on the unseen testing set. We define the ‘best’ threshold as the value that maximizes the quantity TP − FP, where TP is the true positive rate and FP is the false positive rate. This is equivalent to finding the point on the ROC curve furthest in the ‘northwest’ direction. For FRNN, the threshold generalizes excellently (black and purple circles). For the classical approach, although the overall ROC curve is encouraging, the threshold estimate is poor (orange square) and far from its ideal position (orange circle). For each method, the fraction of detected disruptions is shown in the inset as a function of time until disruption by using the threshold values corresponding to the circle positions, which for the classical method we determine manually with knowledge of the testing set (to give a conservative and maximally favourable estimate of its performance). Median alarm times are about 500–700 ms on DIII-D and around 1,000 ms on JET. Encouragingly, a majority of disruptions is detected with large warning times of hundreds of milliseconds—sufficient for disruption mitigation (requiring around 30 ms) and key to possible future preventative plasma control without the need for shutdown.

Extended Data Fig. 2 Signal-importance studies.

Signals are ordered from top to bottom in decreasing order of importance. Signals are defined in Extended Data Table 1. Models were trained on the DIII-D dataset. a, Test set AUC values achieved by models trained on a single signal at a time. The AUC value is representative of how much information is contained in that single signal. For comparison, we also show the performance for a model trained on all signals (green bar). b, Test AUC values for a model trained on all signals except the labelled one. In this case, the drop in performance compared with the performance of the model trained on all signals (green bar) is a measure of how important the given signal is for the final model. The exact results for both figures are in general stochastic and vary over hyperparameters and for each new training session, so only general trends should be inferred. It appears consistently that the locked-mode, plasma current, radiated power and q95 signals contain a large amount of disruption-relevant information, similar to the results of past studies of signal importance on JET21. Both panels—in particular panel a, which measures the information content of a single signal at a time—also confirm that there is a large amount of information in the profile signals. With higher-quality reconstructions, more frequent sampling and better (causal) temporal filtering (to obviate the need to shift the signal in time and thus lose time-sensitive information), they are likely to become even more relevant. This indicates that higher-dimensional data probably contain much useful information that should be considered in the future. Panel b also highlights another benefit of deep learning, which is that almost all additional signals increase performance, or at least do not have a substantial negative impact. Signals can thus generally be used without having to worry about confusing the algorithm and reducing performance, and therefore without having to spend much time on signal selection. For other methods, signal selection (for example, removing correlated, noisy or noninformative signals) is key21.

Extended Data Fig. 3 Snapshot of the training buffer.

The figure illustrates how data are fed to the RNN for batchwise training with a batch size of M. Each horizontal bar represents data from a shot, and different colours indicate different shots. A colour change in a given row means that a new shot starts. At every time step, the leftmost chunk is cut from the buffer and supplied to the training algorithm, and all shots are shifted to the left. When a shot is finished (as the lighter green bar is about to be), a new shot is loaded into the buffer, and the internal state of the RNN at that batch index is reset. See the Methods subsection ‘Mini-batching’ for details.

Extended Data Table 1 Signals considered and availability on the machines
Extended Data Table 2 Datasets used here
Extended Data Table 3 Hyperparameters to be optimized, explanations and well-performing values
Extended Data Table 4 Data from the later JET ILW campaigns
Extended Data Table 5 Prediction results on the late ILW data

Supplementary information

Supplementary Information

This file contains (i) Supplementary Discussion, including further information on disruptions, challenges in cross-machine training, steps for improving cross-machine predictive performance, and extensions and future work; (ii) Supplemental Equations, in particular the derivation of the computational scaling model shown in figure 3 (b); and (iii) references.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kates-Harbeck, J., Svyatkovskiy, A. & Tang, W. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 568, 526–531 (2019).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing