Nuclear fusion power delivered by magnetic-confinement tokamak reactors holds the promise of sustainable and clean energy1. The avoidance of large-scale plasma instabilities called disruptions within these reactors2,3 is one of the most pressing challenges4,5, because disruptions can halt power production and damage key components. Disruptions are particularly harmful for large burning-plasma systems such as the multibillion-dollar International Thermonuclear Experimental Reactor (ITER) project6 currently under construction, which aims to be the first reactor that produces more power from fusion than is injected to heat the plasma. Here we present a method based on deep learning for forecasting disruptions. Our method extends considerably the capabilities of previous strategies such as first-principles-based5 and classical machine-learning7,8,9,10,11 approaches. In particular, it delivers reliable predictions for machines other than the one on which it was trained—a crucial requirement for future large reactors that cannot afford training disruptions. Our approach takes advantage of high-dimensional training data to boost predictive performance while also engaging supercomputing resources at the largest scale to improve accuracy and speed. Trained on experimental data from the largest tokamaks in the United States (DIII-D12) and the world (Joint European Torus, JET13), our method can also be applied to specific tasks such as prediction with long warning times: this opens up the possibility of moving from passive disruption prediction to active reactor control and optimization. These initial results illustrate the potential for deep learning to accelerate progress in fusion-energy science and, more generally, in the understanding and prediction of complex physical systems.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The data for this study have restricted access, with permission required from the management of EUROfusion and General Atomics. DIII-D data shown in figures in this paper can be obtained in digital format by following the links at https://fusion.gat.com/global/D3D_DMP.
The code used in this work is open source and available from ref. 44.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Mote, C. Jr, Dowling, A. & Zhou, J. The power of an idea: the international impacts of the grand challenges for engineering. Engineering 2, 4–7 (2016).
Schuller, F. Disruptions in tokamaks. Plasma Phys. Contr. Fusion 37, A135 (1995).
De Vries, P. et al. Requirements for triggering the ITER disruption mitigation system. Fus. Sci. Technol. 69, 471–484 (2016).
Lehnen, M. et al. Disruptions in ITER and strategies for their control and mitigation. J. Nucl. Mater. 463, 39–48 (2015).
Tang, W. et al. Scientific grand challenges: fusion energy science and the role of computing at the extreme scale (US Department of Energy’s Office of Fusion Energy Sciences, Workshop March 18–20, Washington DC, 2009).
Aymar, R., Barabaschi, P. & Shimomura, Y. The ITER design. Plasma Phys. Contr. Fusion 44, 519 (2002).
Wroblewski, D., Jahns, G. & Leuer, J. Tokamak disruption alarm based on a neural network model of the high-beta limit. Nucl. Fusion 37, 725 (1997).
Cannas, B., Fanni, A., Marongiu, E. & Sonato, P. Disruption forecasting at JET using neural networks. Nucl. Fusion 44, 68 (2004).
Murari, A. et al. Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees. Nucl. Fusion 48, 035010 (2008).
Vega, J. et al. Results of the JET real-time disruption predictor in the ITER-like wall campaigns. Fusion Eng. Des. 88, 1228–1231 (2013).
Windsor, C. et al. A cross-tokamak neural network disruption predictor for the JET and ASDEX upgrade tokamaks. Nucl. Fusion 45, 337 (2005).
Luxon, J. L. A design retrospective of the DIII-D tokamak. Nucl. Fusion 42, 614 (2002).
Matthews, G. et al. JET ITER-like wall—overview and experimental programme. Phys. Scr. 2011, 014001 (2011).
Freidberg, J. P. Plasma Physics and Fusion Energy (Cambridge Univ. Press, 2008).
Taylor, P. et al. Disruption mitigation studies in DIII-D. Phys. Plasmas 6, 1872–1879 (1999).
Tang, W. M. & Chan, V. Advances and challenges in computational plasma science. Plasma Phys. Contr. Fusion 47, R1 (2005).
De Vries, P. et al. Survey of disruption causes at JET. Nucl. Fusion 51, 053018 (2011).
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).
Rattá, G. et al. Feature extraction for improved disruption prediction analysis at JET. Rev. Sci. Instr. 79, 10F328 (2008).
Rattá, G. et al. Improved feature selection based on genetic algorithms for real time disruption prediction on JET. Fusion Eng. Design 87, 1670–1678 (2012).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
Liaw, A. et al. Classification and regression by randomForest. R News 2, 18–22 (2002).
Chen, T. & Guestrin, C. XGoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).
Chollet, F. Deep Learning With Python (Manning Publications, 2018).
Barton, J. E., Wehner, W. P., Schuster, E., Felici, F. & Sauter, O. Simultaneous closed-loop control of the current profile and the electron temperature profile in the TCV tokamak. In American Control Conference (ACC) 3316–3321 (IEEE, 2015).
Tobias, B. et al. Commissioning of electron cyclotron emission imaging instrument on the DIII-D tokamak and first data. Rev. Sci. Instr. 81, 10D928 (2010).
De Vries, P., Johnson, M., Segui, I. & Contributors, J. E. Statistical analysis of disruptions in JET. Nucl. Fusion 49, 055011 (2009).
Goyal, P. et al. Accurate, large minibatch SGD: training ImageNet in 1 hour. Preprint at https://arxiv.org/abs/1706.02677 (2017).
Svyatkovskiy, A., Kates-Harbeck, J. & Tang, W. Training distributed deep recurrent neural networks with mixed precision on GPU clusters. In Proc. Machine Learning on HPC Environments 10 (ACM, 2017).
Top500 supercomputers. Available at https://www.top500.org/lists/2017/11/ (2018/01/11).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–523 (2015).
Coelho, R. et al. Synthetic diagnostics in the European Union integrated tokamak modelling simulation platform. Fus. Sci. Technol. 63, 1–8 (2013).
Litaudon, X. et al. Overview of the JET results in support to ITER. Nucl. Fusion 57, 102001 (2017).
Deng, J. et al. Imagenet: a large-scale hierarchical image database. In IEEE Conf. on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Zavaryaev, V. et al. in Plasma Diagnostics (eds Kikuchi, M., Lackner, K. & Tran, M. Q.) Ch. 4, 360–534 (International Atomic Energy Agency, 2012).
Ferron, J. et al. Real time equilibrium reconstruction for tokamak discharge control. Nucl. Fusion 38, 1055 (1998).
Alonso, J. et al. Fast visible camera installation and operation in JET. In AIP Conference Proceedings Vol. 988, 185–188 (AIP, 2008).
Zadrozny, B., Langford, J. & Abe, N. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining 435–442 (IEEE, 2003).
Moreno, R. et al. Disruption prediction on JET during the ILW experimental campaigns. Fus. Sci. Technol. 69, 485–494 (2016).
Maas, A. L. et al. Learning word vectors for sentiment analysis. In Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Vol. 1 142–150 (Association for Computational Linguistics, 2011).
Marcus, M. P., Marcinkiewicz, M. A. & Santorini, B. Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19, 313–330 (1993).
Kates-Harbeck, J. & Svyatkovskiy, A. FRNN Codebase. https://github.com/PPPLDeepLearning/plasma-python (2017).
Chollet, F. et al. Keras. https://github.com/fchollet/keras (2015).
Abadi, M. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1603.04467 (2016).
Graves, A. Generating sequences with recurrent neural networks. Preprint at https://arxiv.org/abs/1308.0850 (2013).
Dean, J. et al. Large scale distributed deep networks. In Proc. 25th Internation Conference on Neural Information Processing Systems, vol. 1 1223–1231 (2012).
Chetlur, S. et al. cuDNN: efficient primitives for deep learning. Preprint at https://arxiv.org/abs/1410.0759 (2014).
Khomenko, V., Shyshkov, O., Radyvonenko, O. & Bokhan, K. Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization. In IEEE First International Conference on Data Stream Mining & Processing (DSMP) 100–103 (IEEE, 2016).
Ruder, S. An overview of gradient descent optimization algorithms. Preprint at https://arxiv.org/abs/1609.04747 (2016).
Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Das, D. et al. Distributed deep learning using synchronous stochastic gradient descent. Preprint at https://arxiv.org/abs/1602.06709 (2016).
Wu, R., Yan, S., Shan, Y., Dang, Q. & Sun, G. Deep image: scaling up image recognition. Preprint at https://arxiv.org/abs/1501.02876 (2015).
Chen, T. et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at https://arxiv.org/abs/1512.01274 (2015).
Titan: advancing the era of accelerated computing. Oak Ridge National Laboratory https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/ (accessed 2 April 2018).
Morgan, T. P. Japan keeps accelerating with Tsubame 3.0 AI supercomputer. The Next Platform https://www.nextplatform.com/2017/02/17/japan-keeps-accelerating-tsubame-3-0-ai-supercomputer/ (accessed 2 April 2018).
Summit: Oak Ridge National Laboratory’s 200 PetaFlop Supercomputer. Oak Ridge National Laboratory (https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/ (accessed 2 April 2018).
Smith, L. N. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conf. on Applications of Computer Vision 464–472 (IEEE, 2017).
Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).
Gentile, C. & Warmuth, M. K. Linear hinge loss and average margin. In Proc. 1998 Conf. on Advances in Neural Information Processing Systems 225–231 (MIT Press, 1999).
We are grateful to E. Feibush from the US Department of Energy (DOE) Princeton Plasma Physics Laboratory (PPPL) and the Princeton Institute for Computational Science and Engineering (PICSciE) for assisting with visualization and data collection; to W. Wichser, C. Hillegas, J. Wells, S. Matsuoka, R. Yokota and T. Gibbs for supporting our supercomputing efforts; to T. Donne for facilitating collaborations with JET; to E. Joffrin, R. Buttery and T. Strait for leading the internal reviews of this work at JET and DIII-D; to A. Murari, J. Vega and the associated JET data analysis team for discussions of their classical machine-learning methods; and to M. Maslov for support with the JET data. We also thank R. Nazikian, N. Logan, M. Parsons and M. Churchill of PPPL; K. Felker of Princeton University; R. Granetz and C. Rea of the Massachusetts Institute of Technology (MIT); and P. DeVries of ITER for support and for discussions. We thank the JET contributors35 and management as well as General Atomics (GA) and its DIII-D tokamak project for access to their fusion databases. J.K.-H. was supported by the DOE Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security Administration in the DOE under contract DE-FG02-97ER25308. A.S. is supported by PICSciE, and W.T. by PPPL and PICSciE. This work was carried out within the framework of the EUROfusion Consortium, with funding from the Euratom research and training programme 2014–2018 under grant 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This material is based upon work supported by the US DOE, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under award DE-FC02-04ER54698. Disclaimer: this report was prepared as an account of work sponsored by an agency of the US Government. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the US Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof.
Nature thanks Ned R. Sauthoff and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Fig. 1 ROC curves from the test set for our model and the best classical model, for DIII-D and JET.
a, DIII-D; b, JET. The true positive rate is the fraction of disruptive shots that are labelled disruptive in advance of the 30 ms deadline. The false positive rate is the fraction of nondisruptive shots that are labelled disruptive at any time. The areas under the curves correspond to the values in Table 1. The insets show the fraction of detected disruptions as a function of the time to disruption for an ‘optimal’ threshold value. On the corresponding ROC curve of the same colour, this optimal threshold defines a point that is indicated by a circle (see main text for details). The inset also shows the 30 ms detection deadline as vertical red line. In a, the AUC is slightly higher for the classical method (see Table 1), but FRNN performs equally well in the interesting upper left region of high true positives and low false positives. Moreover, only our approach provides additional detections between 30 ms and 10 ms to the disruption, reacting to the spikes in radiated power that often occur on this timescale before the disruption (see Prad,core in Fig. 2c). Thus FRNN could provide improved predictive performance if mitigation technology becomes faster in the future. In addition, a threshold value in practice needs to be selected for calling alarms. The best threshold value is estimated by optimizing it on the training set, in the hope that it will still perform well on the unseen testing set. We define the ‘best’ threshold as the value that maximizes the quantity TP − FP, where TP is the true positive rate and FP is the false positive rate. This is equivalent to finding the point on the ROC curve furthest in the ‘northwest’ direction. For FRNN, the threshold generalizes excellently (black and purple circles). For the classical approach, although the overall ROC curve is encouraging, the threshold estimate is poor (orange square) and far from its ideal position (orange circle). For each method, the fraction of detected disruptions is shown in the inset as a function of time until disruption by using the threshold values corresponding to the circle positions, which for the classical method we determine manually with knowledge of the testing set (to give a conservative and maximally favourable estimate of its performance). Median alarm times are about 500–700 ms on DIII-D and around 1,000 ms on JET. Encouragingly, a majority of disruptions is detected with large warning times of hundreds of milliseconds—sufficient for disruption mitigation (requiring around 30 ms) and key to possible future preventative plasma control without the need for shutdown.
Signals are ordered from top to bottom in decreasing order of importance. Signals are defined in Extended Data Table 1. Models were trained on the DIII-D dataset. a, Test set AUC values achieved by models trained on a single signal at a time. The AUC value is representative of how much information is contained in that single signal. For comparison, we also show the performance for a model trained on all signals (green bar). b, Test AUC values for a model trained on all signals except the labelled one. In this case, the drop in performance compared with the performance of the model trained on all signals (green bar) is a measure of how important the given signal is for the final model. The exact results for both figures are in general stochastic and vary over hyperparameters and for each new training session, so only general trends should be inferred. It appears consistently that the locked-mode, plasma current, radiated power and q95 signals contain a large amount of disruption-relevant information, similar to the results of past studies of signal importance on JET21. Both panels—in particular panel a, which measures the information content of a single signal at a time—also confirm that there is a large amount of information in the profile signals. With higher-quality reconstructions, more frequent sampling and better (causal) temporal filtering (to obviate the need to shift the signal in time and thus lose time-sensitive information), they are likely to become even more relevant. This indicates that higher-dimensional data probably contain much useful information that should be considered in the future. Panel b also highlights another benefit of deep learning, which is that almost all additional signals increase performance, or at least do not have a substantial negative impact. Signals can thus generally be used without having to worry about confusing the algorithm and reducing performance, and therefore without having to spend much time on signal selection. For other methods, signal selection (for example, removing correlated, noisy or noninformative signals) is key21.
The figure illustrates how data are fed to the RNN for batchwise training with a batch size of M. Each horizontal bar represents data from a shot, and different colours indicate different shots. A colour change in a given row means that a new shot starts. At every time step, the leftmost chunk is cut from the buffer and supplied to the training algorithm, and all shots are shifted to the left. When a shot is finished (as the lighter green bar is about to be), a new shot is loaded into the buffer, and the internal state of the RNN at that batch index is reset. See the Methods subsection ‘Mini-batching’ for details.
This file contains (i) Supplementary Discussion, including further information on disruptions, challenges in cross-machine training, steps for improving cross-machine predictive performance, and extensions and future work; (ii) Supplemental Equations, in particular the derivation of the computational scaling model shown in figure 3 (b); and (iii) references.