Predicting disruptive instabilities in controlled fusion plasmas through deep learning

Kates-Harbeck, Julian; Svyatkovskiy, Alexey; Tang, William

doi:10.1038/s41586-019-1116-4

Letter
Published: 17 April 2019

Predicting disruptive instabilities in controlled fusion plasmas through deep learning

Julian Kates-Harbeck^1,2,3,
Alexey Svyatkovskiy⁴^nAff5 &
William Tang^3,4

Nature volume 568, pages 526–531 (2019)Cite this article

16k Accesses
204 Citations
311 Altmetric
Metrics details

Subjects

Abstract

Nuclear fusion power delivered by magnetic-confinement tokamak reactors holds the promise of sustainable and clean energy¹. The avoidance of large-scale plasma instabilities called disruptions within these reactors^2,3 is one of the most pressing challenges^4,5, because disruptions can halt power production and damage key components. Disruptions are particularly harmful for large burning-plasma systems such as the multibillion-dollar International Thermonuclear Experimental Reactor (ITER) project⁶ currently under construction, which aims to be the first reactor that produces more power from fusion than is injected to heat the plasma. Here we present a method based on deep learning for forecasting disruptions. Our method extends considerably the capabilities of previous strategies such as first-principles-based⁵ and classical machine-learning^7,8,9,10,11 approaches. In particular, it delivers reliable predictions for machines other than the one on which it was trained—a crucial requirement for future large reactors that cannot afford training disruptions. Our approach takes advantage of high-dimensional training data to boost predictive performance while also engaging supercomputing resources at the largest scale to improve accuracy and speed. Trained on experimental data from the largest tokamaks in the United States (DIII-D¹²) and the world (Joint European Torus, JET¹³), our method can also be applied to specific tasks such as prediction with long warning times: this opens up the possibility of moving from passive disruption prediction to active reactor control and optimization. These initial results illustrate the potential for deep learning to accelerate progress in fusion-energy science and, more generally, in the understanding and prediction of complex physical systems.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: System overview and disruption-prediction workflow.**

**Fig. 2: Example predictions on real shots from DIII-D and JET.**

**Fig. 3: High-performance computing results.**

Disruption prediction for future tokamaks using parameter-based transfer learning

Article Open access 17 July 2023

Wei Zheng, Fengming Xue, … Yuan Pan

GS-DeepNet: mastering tokamak plasma equilibria with deep neural networks and the Grad–Shafranov equation

Article Open access 22 September 2023

Semin Joung, Y.-C. Ghim, … S. W. Yoon

Disruption prediction with artificial intelligence techniques in tokamak plasmas

Article 06 June 2022

J. Vega, A. Murari, … JET Contributors

Data availability

The data for this study have restricted access, with permission required from the management of EUROfusion and General Atomics. DIII-D data shown in figures in this paper can be obtained in digital format by following the links at https://fusion.gat.com/global/D3D_DMP.

Code availability

The code used in this work is open source and available from ref. ⁴⁴.

References

Mote, C. Jr, Dowling, A. & Zhou, J. The power of an idea: the international impacts of the grand challenges for engineering. Engineering 2, 4–7 (2016).
Article Google Scholar
Schuller, F. Disruptions in tokamaks. Plasma Phys. Contr. Fusion 37, A135 (1995).
Article ADS Google Scholar
De Vries, P. et al. Requirements for triggering the ITER disruption mitigation system. Fus. Sci. Technol. 69, 471–484 (2016).
Article Google Scholar
Lehnen, M. et al. Disruptions in ITER and strategies for their control and mitigation. J. Nucl. Mater. 463, 39–48 (2015).
Article ADS CAS Google Scholar
Tang, W. et al. Scientific grand challenges: fusion energy science and the role of computing at the extreme scale (US Department of Energy’s Office of Fusion Energy Sciences, Workshop March 18–20, Washington DC, 2009).
Google Scholar
Aymar, R., Barabaschi, P. & Shimomura, Y. The ITER design. Plasma Phys. Contr. Fusion 44, 519 (2002).
Article ADS CAS Google Scholar
Wroblewski, D., Jahns, G. & Leuer, J. Tokamak disruption alarm based on a neural network model of the high-beta limit. Nucl. Fusion 37, 725 (1997).
Article ADS CAS Google Scholar
Cannas, B., Fanni, A., Marongiu, E. & Sonato, P. Disruption forecasting at JET using neural networks. Nucl. Fusion 44, 68 (2004).
Article ADS CAS Google Scholar
Murari, A. et al. Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees. Nucl. Fusion 48, 035010 (2008).
Article ADS Google Scholar
Vega, J. et al. Results of the JET real-time disruption predictor in the ITER-like wall campaigns. Fusion Eng. Des. 88, 1228–1231 (2013).
Article CAS Google Scholar
Windsor, C. et al. A cross-tokamak neural network disruption predictor for the JET and ASDEX upgrade tokamaks. Nucl. Fusion 45, 337 (2005).
Article ADS CAS Google Scholar
Luxon, J. L. A design retrospective of the DIII-D tokamak. Nucl. Fusion 42, 614 (2002).
Article ADS CAS Google Scholar
Matthews, G. et al. JET ITER-like wall—overview and experimental programme. Phys. Scr. 2011, 014001 (2011).
Article Google Scholar
Freidberg, J. P. Plasma Physics and Fusion Energy (Cambridge Univ. Press, 2008).
Taylor, P. et al. Disruption mitigation studies in DIII-D. Phys. Plasmas 6, 1872–1879 (1999).
Article ADS CAS Google Scholar
Tang, W. M. & Chan, V. Advances and challenges in computational plasma science. Plasma Phys. Contr. Fusion 47, R1 (2005).
Article ADS Google Scholar
De Vries, P. et al. Survey of disruption causes at JET. Nucl. Fusion 51, 053018 (2011).
Article ADS Google Scholar
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Article ADS MathSciNet CAS Google Scholar
Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).
Article CAS Google Scholar
Rattá, G. et al. Feature extraction for improved disruption prediction analysis at JET. Rev. Sci. Instr. 79, 10F328 (2008).
Article Google Scholar
Rattá, G. et al. Improved feature selection based on genetic algorithms for real time disruption prediction on JET. Fusion Eng. Design 87, 1670–1678 (2012).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS CAS Google Scholar
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
Article Google Scholar
Liaw, A. et al. Classification and regression by randomForest. R News 2, 18–22 (2002).
Google Scholar
Chen, T. & Guestrin, C. XGoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Chollet, F. Deep Learning With Python (Manning Publications, 2018).
Barton, J. E., Wehner, W. P., Schuster, E., Felici, F. & Sauter, O. Simultaneous closed-loop control of the current profile and the electron temperature profile in the TCV tokamak. In American Control Conference (ACC) 3316–3321 (IEEE, 2015).
Tobias, B. et al. Commissioning of electron cyclotron emission imaging instrument on the DIII-D tokamak and first data. Rev. Sci. Instr. 81, 10D928 (2010).
Article CAS Google Scholar
De Vries, P., Johnson, M., Segui, I. & Contributors, J. E. Statistical analysis of disruptions in JET. Nucl. Fusion 49, 055011 (2009).
Article ADS Google Scholar
Goyal, P. et al. Accurate, large minibatch SGD: training ImageNet in 1 hour. Preprint at https://arxiv.org/abs/1706.02677 (2017).
Svyatkovskiy, A., Kates-Harbeck, J. & Tang, W. Training distributed deep recurrent neural networks with mixed precision on GPU clusters. In Proc. Machine Learning on HPC Environments 10 (ACM, 2017).
Top500 supercomputers. Available at https://www.top500.org/lists/2017/11/ (2018/01/11).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–523 (2015).
Article ADS CAS Google Scholar
Coelho, R. et al. Synthetic diagnostics in the European Union integrated tokamak modelling simulation platform. Fus. Sci. Technol. 63, 1–8 (2013).
CAS Google Scholar
Litaudon, X. et al. Overview of the JET results in support to ITER. Nucl. Fusion 57, 102001 (2017).
Article ADS Google Scholar
Deng, J. et al. Imagenet: a large-scale hierarchical image database. In IEEE Conf. on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Zavaryaev, V. et al. in Plasma Diagnostics (eds Kikuchi, M., Lackner, K. & Tran, M. Q.) Ch. 4, 360–534 (International Atomic Energy Agency, 2012).
Ferron, J. et al. Real time equilibrium reconstruction for tokamak discharge control. Nucl. Fusion 38, 1055 (1998).
Article ADS CAS Google Scholar
Alonso, J. et al. Fast visible camera installation and operation in JET. In AIP Conference Proceedings Vol. 988, 185–188 (AIP, 2008).
Zadrozny, B., Langford, J. & Abe, N. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining 435–442 (IEEE, 2003).
Moreno, R. et al. Disruption prediction on JET during the ILW experimental campaigns. Fus. Sci. Technol. 69, 485–494 (2016).
Article Google Scholar
Maas, A. L. et al. Learning word vectors for sentiment analysis. In Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Vol. 1 142–150 (Association for Computational Linguistics, 2011).
Marcus, M. P., Marcinkiewicz, M. A. & Santorini, B. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19, 313–330 (1993).
Google Scholar
Kates-Harbeck, J. & Svyatkovskiy, A. FRNN Codebase. https://github.com/PPPLDeepLearning/plasma-python (2017).
Chollet, F. et al. Keras. https://github.com/fchollet/keras (2015).
Abadi, M. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1603.04467 (2016).
Graves, A. Generating sequences with recurrent neural networks. Preprint at https://arxiv.org/abs/1308.0850 (2013).
Dean, J. et al. Large scale distributed deep networks. In Proc. 25th Internation Conference on Neural Information Processing Systems, vol. 1 1223–1231 (2012).
Chetlur, S. et al. cuDNN: efficient primitives for deep learning. Preprint at https://arxiv.org/abs/1410.0759 (2014).
Khomenko, V., Shyshkov, O., Radyvonenko, O. & Bokhan, K. Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. In IEEE First International Conference on Data Stream Mining & Processing (DSMP) 100–103 (IEEE, 2016).
Ruder, S. An overview of gradient descent optimization algorithms. Preprint at https://arxiv.org/abs/1609.04747 (2016).
Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011).
Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Das, D. et al. Distributed deep learning using synchronous stochastic gradient descent. Preprint at https://arxiv.org/abs/1602.06709 (2016).
Wu, R., Yan, S., Shan, Y., Dang, Q. & Sun, G. Deep image: scaling up image recognition. Preprint at https://arxiv.org/abs/1501.02876 (2015).
Chen, T. et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1512.01274 (2015).
Titan: advancing the era of accelerated computing. Oak Ridge National Laboratory https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/ (accessed 2 April 2018).
Morgan, T. P. Japan keeps accelerating with Tsubame 3.0 AI supercomputer. The Next Platform https://www.nextplatform.com/2017/02/17/japan-keeps-accelerating-tsubame-3-0-ai-supercomputer/ (accessed 2 April 2018).
Summit: Oak Ridge National Laboratory’s 200 PetaFlop Supercomputer. Oak Ridge National Laboratory (https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/ (accessed 2 April 2018).
Smith, L. N. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conf. on Applications of Computer Vision 464–472 (IEEE, 2017).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Gentile, C. & Warmuth, M. K. Linear hinge loss and average margin. In Proc. 1998 Conf. on Advances in Neural Information Processing Systems 225–231 (MIT Press, 1999).

Download references

Acknowledgements

We are grateful to E. Feibush from the US Department of Energy (DOE) Princeton Plasma Physics Laboratory (PPPL) and the Princeton Institute for Computational Science and Engineering (PICSciE) for assisting with visualization and data collection; to W. Wichser, C. Hillegas, J. Wells, S. Matsuoka, R. Yokota and T. Gibbs for supporting our supercomputing efforts; to T. Donne for facilitating collaborations with JET; to E. Joffrin, R. Buttery and T. Strait for leading the internal reviews of this work at JET and DIII-D; to A. Murari, J. Vega and the associated JET data analysis team for discussions of their classical machine-learning methods; and to M. Maslov for support with the JET data. We also thank R. Nazikian, N. Logan, M. Parsons and M. Churchill of PPPL; K. Felker of Princeton University; R. Granetz and C. Rea of the Massachusetts Institute of Technology (MIT); and P. DeVries of ITER for support and for discussions. We thank the JET contributors³⁵ and management as well as General Atomics (GA) and its DIII-D tokamak project for access to their fusion databases. J.K.-H. was supported by the DOE Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security Administration in the DOE under contract DE-FG02-97ER25308. A.S. is supported by PICSciE, and W.T. by PPPL and PICSciE. This work was carried out within the framework of the EUROfusion Consortium, with funding from the Euratom research and training programme 2014–2018 under grant 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This material is based upon work supported by the US DOE, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under award DE-FC02-04ER54698. Disclaimer: this report was prepared as an account of work sponsored by an agency of the US Government. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the US Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof.

Reviewer information

Nature thanks Ned R. Sauthoff and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Alexey Svyatkovskiy
Present address: Microsoft, One Microsoft Way, Redmond, WA, USA

Authors and Affiliations

Department of Physics, Harvard University, Cambridge, MA, USA
Julian Kates-Harbeck
Program for Evolutionary Dynamics, Harvard University, Cambridge, MA, USA
Julian Kates-Harbeck
Princeton Plasma Physics Laboratory, Princeton, NJ, USA
Julian Kates-Harbeck & William Tang
Princeton Institute for Computational Science and Engineering, Princeton University, Princeton, NJ, USA
Alexey Svyatkovskiy & William Tang

Authors

Julian Kates-Harbeck
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Svyatkovskiy
View author publications
You can also search for this author in PubMed Google Scholar
William Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.K.-H. conceived the idea, wrote the code including the HPC and MPI features, curated the datasets, ran and analysed computational experiments, generated the theoretical scaling analysis, produced the figures, and wrote the manuscript. A.S. extended and co-authored the code base and ran computational experiments, including initial deployment of the code on supercomputers. W.T. supervised and supported the implementation of the project at all stages, and initiated collaborations with JET and leading supercomputing facilities. All authors contributed to editing the manuscript.

Corresponding author

Correspondence to Julian Kates-Harbeck.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 ROC curves from the test set for our model and the best classical model, for DIII-D and JET.

a, DIII-D; b, JET. The true positive rate is the fraction of disruptive shots that are labelled disruptive in advance of the 30 ms deadline. The false positive rate is the fraction of nondisruptive shots that are labelled disruptive at any time. The areas under the curves correspond to the values in Table 1. The insets show the fraction of detected disruptions as a function of the time to disruption for an ‘optimal’ threshold value. On the corresponding ROC curve of the same colour, this optimal threshold defines a point that is indicated by a circle (see main text for details). The inset also shows the 30 ms detection deadline as vertical red line. In a, the AUC is slightly higher for the classical method (see Table 1), but FRNN performs equally well in the interesting upper left region of high true positives and low false positives. Moreover, only our approach provides additional detections between 30 ms and 10 ms to the disruption, reacting to the spikes in radiated power that often occur on this timescale before the disruption (see P_rad,core in Fig. 2c). Thus FRNN could provide improved predictive performance if mitigation technology becomes faster in the future. In addition, a threshold value in practice needs to be selected for calling alarms. The best threshold value is estimated by optimizing it on the training set, in the hope that it will still perform well on the unseen testing set. We define the ‘best’ threshold as the value that maximizes the quantity TP − FP, where TP is the true positive rate and FP is the false positive rate. This is equivalent to finding the point on the ROC curve furthest in the ‘northwest’ direction. For FRNN, the threshold generalizes excellently (black and purple circles). For the classical approach, although the overall ROC curve is encouraging, the threshold estimate is poor (orange square) and far from its ideal position (orange circle). For each method, the fraction of detected disruptions is shown in the inset as a function of time until disruption by using the threshold values corresponding to the circle positions, which for the classical method we determine manually with knowledge of the testing set (to give a conservative and maximally favourable estimate of its performance). Median alarm times are about 500–700 ms on DIII-D and around 1,000 ms on JET. Encouragingly, a majority of disruptions is detected with large warning times of hundreds of milliseconds—sufficient for disruption mitigation (requiring around 30 ms) and key to possible future preventative plasma control without the need for shutdown.

Extended Data Fig. 2 Signal-importance studies.

Signals are ordered from top to bottom in decreasing order of importance. Signals are defined in Extended Data Table 1. Models were trained on the DIII-D dataset. a, Test set AUC values achieved by models trained on a single signal at a time. The AUC value is representative of how much information is contained in that single signal. For comparison, we also show the performance for a model trained on all signals (green bar). b, Test AUC values for a model trained on all signals except the labelled one. In this case, the drop in performance compared with the performance of the model trained on all signals (green bar) is a measure of how important the given signal is for the final model. The exact results for both figures are in general stochastic and vary over hyperparameters and for each new training session, so only general trends should be inferred. It appears consistently that the locked-mode, plasma current, radiated power and q₉₅ signals contain a large amount of disruption-relevant information, similar to the results of past studies of signal importance on JET²¹. Both panels—in particular panel a, which measures the information content of a single signal at a time—also confirm that there is a large amount of information in the profile signals. With higher-quality reconstructions, more frequent sampling and better (causal) temporal filtering (to obviate the need to shift the signal in time and thus lose time-sensitive information), they are likely to become even more relevant. This indicates that higher-dimensional data probably contain much useful information that should be considered in the future. Panel b also highlights another benefit of deep learning, which is that almost all additional signals increase performance, or at least do not have a substantial negative impact. Signals can thus generally be used without having to worry about confusing the algorithm and reducing performance, and therefore without having to spend much time on signal selection. For other methods, signal selection (for example, removing correlated, noisy or noninformative signals) is key²¹.

Extended Data Fig. 3 Snapshot of the training buffer.

The figure illustrates how data are fed to the RNN for batchwise training with a batch size of M. Each horizontal bar represents data from a shot, and different colours indicate different shots. A colour change in a given row means that a new shot starts. At every time step, the leftmost chunk is cut from the buffer and supplied to the training algorithm, and all shots are shifted to the left. When a shot is finished (as the lighter green bar is about to be), a new shot is loaded into the buffer, and the internal state of the RNN at that batch index is reset. See the Methods subsection ‘Mini-batching’ for details.

Extended Data Table 1 Signals considered and availability on the machines

Full size table

Extended Data Table 2 Datasets used here

Full size table

Extended Data Table 3 Hyperparameters to be optimized, explanations and well-performing values

Full size table

Extended Data Table 4 Data from the later JET ILW campaigns

Full size table

Extended Data Table 5 Prediction results on the late ILW data

Full size table

Supplementary information

Supplementary Information

This file contains (i) Supplementary Discussion, including further information on disruptions, challenges in cross-machine training, steps for improving cross-machine predictive performance, and extensions and future work; (ii) Supplemental Equations, in particular the derivation of the computational scaling model shown in figure 3 (b); and (iii) references.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kates-Harbeck, J., Svyatkovskiy, A. & Tang, W. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 568, 526–531 (2019). https://doi.org/10.1038/s41586-019-1116-4

Download citation

Received: 16 May 2018
Accepted: 30 January 2019
Published: 17 April 2019
Issue Date: 25 April 2019
DOI: https://doi.org/10.1038/s41586-019-1116-4

This article is cited by

Avoiding fusion plasma tearing instability with deep reinforcement learning
- Jaemin Seo
- SangKyeun Kim
- Egemen Kolemen
Nature (2024)
Modelling dataset bias in machine-learned theories of economic decision-making
- Tobias Thomas
- Dominik Straub
- Constantin A. Rothkopf
Nature Human Behaviour (2024)
Tailoring tokamak error fields to control plasma instabilities and transport
- SeongMoo Yang
- Jong-Kyu Park
- Won-Ha Ko
Nature Communications (2024)
Artificial intelligence in the neonatal intensive care unit: the time is now
- Kristyn Beam
- Puneet Sharma
- Andrew L. Beam
Journal of Perinatology (2024)
First application of data assimilation-based control to fusion plasma
- Yuya Morishita
- Sadayoshi Murakami
- Masaki Osakabe
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.