An adaptive approach to machine learning for compact particle accelerators

Machine learning (ML) tools are able to learn relationships between the inputs and outputs of large complex systems directly from data. However, for time-varying systems, the predictive capabilities of ML tools degrade if the systems are no longer accurately represented by the data with which the ML models were trained. For complex systems, re-training is only possible if the changes are slow relative to the rate at which large numbers of new input-output training data can be non-invasively recorded. In this work, we present an approach to deep learning for time-varying systems that does not require re-training, but uses instead an adaptive feedback in the architecture of deep convolutional neural networks (CNN). The feedback is based only on available system output measurements and is applied in the encoded low-dimensional dense layers of the encoder-decoder CNNs. First, we develop an inverse model of a complex accelerator system to map output beam measurements to input beam distributions, while both the accelerator components and the unknown input beam distribution vary rapidly with time. We then demonstrate our method on experimental measurements of the input and output beam distributions of the HiRES ultra-fast electron diffraction (UED) beam line at Lawrence Berkeley National Laboratory, and showcase its ability for automatic tracking of the time varying photocathode quantum efficiency map. Our method can be successfully used to aid both physics and ML-based surrogate online models to provide non-invasive beam diagnostics.

Machine learning (ML) methods, such as deep neural networks, can learn relationships between the components of and predict the outputs of complex physical systems such as phases of matter 1 and extreme events in complex systems 2 . Consider an n-dimensional complex physical system whose evolution is described by a system of dynamic equations where x = (x 1 , . . . , x n ) is a vector of physical quantities within the system such as atomic positions or energies, p = (p 1 , . . . , p m ) is a vector of controlled system parameters such as voltages or chemical concentrations, and F represents the nonlinear dynamics governing the physical system which may include partial derivatives with respect to the components of both x and p . Associated with a system such as (1) there is usually some physicalmeaningful measurement, M x(p, t), p , which depends on the parameter settings p and on the entire resulting time-history of x(p, t) . For example, such a measurement may be material properties such as strength of a complex alloy or the results of a high energy physics experiment.
There are cases where the system (1) is so large and complex that an accurate analytical representation of F is unknown. There are also cases in which an analytic physics-based model of F exists, but is so computationally expensive that it cannot be numerically simulated over large length and time scales and requires lengthy computations with high performance computing resources. In either of those two cases, ML methods can be useful for learning a representation of the system that quickly maps parameter settings to measurements. Tool such as recurrent neural networks can handle entire time-series trajectories and reinforcement learning approaches can model analytically unknown reward functions and their corresponding optimal controllers for analytically unknown systems. Such ML-based approaches can learn these representations directly from raw data, whether the data comes from measurements or simulations, by using large collections of input-output pairs, where p h (t) is an unknown time-varying parameter and p is an input parameter. By hidden parameters we mean parameters whose settings are not directly observable. Note that in this simple example (4) if there is no feedback ( p ≡ 0 ) the system is unstable and exponentially diverges. If the unknown time-varying function p h (t) repeatedly changes sign, such as where the frequency of oscillation is itself also an uncertain time-varying system, it is a major challenge for data-based methods because the input p to output x(t) relationship changes repeatedly in an unpredictable way, resulting in the smallest error for a large data set being an average approximation of p h (t) ≡ 0 . Such systems are also challenging for standard feedback controls such as proportional integral derivative (PID) feedback which must assume a known sign of p h (t) and fails spectacularly, destabilizing the system even more whenever p h (t) changes sign. Although many model-independent adaptive feedback control methods exist 32 , the problem of designing a stabilizing feedback controller for unknown time-varying systems with unknown control directions such as (4) with analytical proof of convergence was not accomplished until 2013 by utilizing nonlinear timevarying adaptive feedback and Lyapunov stability analysis methods 33 . Another form of hidden time-varying characteristics is when the initial conditions of a system are themselves time-varying, not easily measurable, and influence the dynamics of the system in a non-trivial way.
In both of the above cases, if either the dynamics-to-parameters relationships or the initial conditions vary with time, then the accuracy of data-based predictions will start to degrade as the system changes to such an extent that it is no longer accurately represented by the data that was originally used to train the ML model. Although powerful ML methods such as domain transfer and re-training are available, for large quickly changing systems with many parameters they may become infeasible. Furthermore, for many systems even if it was possible to quickly collect new data to re-train an ML model for new conditions, sometimes such measurements are highly invasive and cannot be accomplished in real time without interrupting regular operations. Including time as an extra input to the ML tool and attempting to learn the time dependence of the system from data is only possible for simple cases in which the changes are predictable, such as monotonic or periodic changes. Although ML methods for time-series data exist, such as LSTM-based neural networks, those also need to be trained with consistent data and if the dynamics that govern that data change then the predicted time-series will be wrong. Model-independent adaptive feedback methods which respond in real time to un-modeled disturbances and changes face their own major limitations. Although adaptive methods are able to handle time-varying systems they are usually local in nature and can require lengthy tuning or get stuck in local minima especially for large complex systems with many parameters.
In this paper, we present an approach to deep learning for time-varying systems which combines the ability of data-based ML methods to learn global complex relationships directly from data with the robustness of model-independent adaptive feedback to handle unknown time-varying conditions. In particular we focus on encoder-decoder type convolutional neural networks (CNN) which encode high dimensional inputs to a low where as before the w , b are network weights and biases that are trained from data sets, p are known set parameters, and the p h (t) are time-varying tunable parameters that are adjusted dynamically according to an adaptive algorithm whose dynamics are represented as f h . Although we write p h (t) to represent the fact that these adjustable parameters are supposed to track the unknown hidden parameters p h (t) , they can also include more abstract quantities such as additional neural network inputs, weights, or biases. The adjustable parameters are adaptively tuned based on feedback that depends on some function of the prediction, C(t) = µ M (t) , where we are only showing the time arguments of the functions to emphasize that they may be drifting and changing. For example, for the application described below, our ML model's goal is to predict the time-varying input beam distribution I i (t) of an accelerator based on the measurement of an output beam distribution I o (t) at another location further down the beam line. This is done because the input beam distribution is not easily directly measurable and therefore our adaptive tuning adjusts both the neural network's estimate of I i (t) as well as a hidden parameter which represents a time-varying magnet in the accelerator lattice. The adaptive feedback is performed based on a comparison between the measured accelerator output beam I o (t) and that of an online model whose input is the ML-based approximated distribution and whose time-varying magnet setting is also adaptively adjusted. This allows us to make adjustments in real time purely based on available diagnostics without having to re-train the ML model. Our approach of combining external feedbacks with neural networks is analogous to a natural phenomenon in biological systems in which networks of neurons interact with each other and are controlled by external feedback loops and other networks 34 . For example recent studies have shown that the challenging task of synchronization between neuron networks can be achieved by feedback controls in the presence of signal delay, noise, and external disturbances 35,36 , and that resonance can be excited and controlled in complex neuron systems by using external feedback signals including chaotic resonances 37,38 . Some initial simplified adaptive ML studies have also begun on coupling the outputs of CNNs to adaptive feedback for real-time accelerator phase space control 39 and for predicting 3D electron density distributions for 3D coherent diffraction imaging 40 .
In the remainder of this paper we focus on ML-based inverse physics models for time-varying systems, an overview of the approach is shown in Fig. 2, which is a specialized form of the general setup in Fig. 1. This approach is motivated by the fact that very sophisticated and accurate physics models as well as ML-based surrogate models have been developed for many complex systems, which could benefit from accurate estimates of the time-varying initial distributions which are used as their inputs. In particular we will demonstrate our method for particle accelerators and their charged particle beams for which our inverse modeling approach is motivated by sophisticated beam dynamics models which have the potential to serve as non-invasive beam diagnostics if their parameters and input beam distributions could be matched to actual accelerators.
Particle accelerators are powerful tools for scientific research such as imaging of nuclear motion by ultrafast electron diffraction (UED) 42,43 reaching femtosecond temporal resolution 44 allowing for the visualization of lattice dynamics 45 and monitoring the spatiotemporal dynamics of excited atoms or molecules 46 , imaging dynamic phenomenon at free electron lasers (FEL) 47 , and generating GV/m accelerating fields in plasma wakefield accelerators (PWFA) 48 . Major challenges faced by accelerators are the ability to quickly and automatically control the phase space distributions of accelerated beams such as custom current profiles, and switching between different experiments with order of magnitude changes in beam properties, which can require hours of hands on tuning due to limited non-invasive diagnostics, time variation of accelerator parameters, and complex time varying beams. www.nature.com/scientificreports/ Intense bunches of 10 5 −10 10 particles are complex objects whose 6D phase space (x, y, z, p x , p y , p z ) dynamics are coupled through collective effects such as space charge forces and coherent synchrotron radiation 49 . Accelerators generate extremely short bunches: 30 fs bunches at the SwissFEL X-ray FEL 50 , sub 100 fs bunches for UEDs 51 , and picosecond bunch trains for UEDs and multicolor XFELs 52 . The High Repetition-rate Electron Scattering apparatus (HiRES, shown in Fig. 3) at Lawrence Berkeley National Laboratory (LBNL), accelerates pC-class, sub-picosecond long electron bunches up to one million times a second (MHz), providing some of the most dense 6D phase space among accelerators at unique repetition rates, making it an ideal test bed for advanced algorithm development 41,53 .
Advanced diagnostics are important for particle accelerator applications because the dynamics of intense charged particle beams are dominated by complex nonlinear collective effects such as space charge forces and coherent synchrotron radiation in a 6 dimensional (6D) phase space (x, y, z, p x , p y , p z ) . Non-invasive real-time phase space diagnostics would be of great benefit to all particle accelerators as they would provide information which could guide adaptive feedback mechanisms. Automatic fast feedback could then be used for real-time beam tuning, such as quickly changing between various experiments at free electron lasers by modifying the energy vs time phase space of the beam, for creating custom tailored current profiles of intense bunches in plasma wakefield acceleration experiments, and for controlling the transverse bunch shapes for ultra fast electron diffraction (UED) microscopy beam lines.
Although physics and machine learning-based surrogate models can potentially serve as non-invasive beam diagnostics, the main challenges they face are uncertain and drifting accelerator parameters and uncertain initial beam distributions at the entrance of an accelerator to be used for initial conditions. Such distributions drift with time requiring lengthy measurements that interrupt accelerator operations. For example, it has been demonstrated that an online model can be adaptively tuned in order to provide a virtual diagnostic for the timevarying longitudinal phase space (LPS) of an electron beam at the Facility for Advanced Accelerator Experimental Tests (FACET) at SLAC National Accelerator Laboratory by comparing the model's predictions to a non-invasive energy spread spectrum measurement 54 . Although the adaptive model tuning approach at FACET was able to track the beam's LPS in real time its accuracy was limited by the fact that there was no access to a good estimate of the time varying input beam distribution. Another set of virtual LPS diagnostics at FACET and at the LCLS free electron laser (FEL) were then developed which utilized neural networks to instantly map accelerator parameters to LPS measurements 55 . A third approach to virtual LPS diagnostics was recently developed at the LCLS which used neural networks with spectral inputs as well as parameter settings resulting in higher accuracy   58 . All such ML methods could also greatly benefit from adding an estimate of the time-varying initial beam distribution as an input via the approach proposed in this paper or from adding adaptive tuning to some of their dense layers to make them more robust to time-varying initial beam distributions.

Methods
In the present work we utilized an adaptive CNN-based inverse model for mapping output beam measurements measurements directly to the learned principal components that represent beam inputs at the High Repetitionrate Electron Scattering apparatus (HiRES, shown in Fig. 3) beamline at Lawrence Berkeley National Laboratory (LBNL), which accelerates pC-class, sub-picosecond long electron bunches up to one million times a second (MHz), providing some of the most dense 6D phase space among accelerators at unique repetition rates, making it an ideal test bed for advanced algorithm development 41,53 . The first step was to collect data to learn a basis with which to represent a distribution of interest. We were interested in representing the time-varying input beam distribution I i (t)(x, y) at the photocathode of the electron gun of HiRES. We started by collecting laser spot images on the HiRES photocathode over several days. The transverse electron beam density distribution ρ(x, y) created by a laser pulse of intensity I(x, y) is given by ρ(x, y) = I(x, y) × QE(x, y) . For our experiments we used a laser spot of small transverse size of roughly 200 × 200 μm 2 , an area small enough so that no significant quantum efficiency variations were expected and we could relate laser intensity directly to an initial electron bunch charge distribution.
In order to increase the generality of our approach the laser images were then each rotated through 360° at steps of 1° generating a total of 1800 52 × 52 pixel images. These images were then stretched out as 1 × 52 2 dimensional vectors and stacked in a 2704 × 1800 data matrix D. Considering this as a collection of 1800 vectors from a 2705 dimensional space, we then carried out a principal component analysis (PCA) 59 , a method closely related to singular value decomposition for finding lower dimensional representations of higher dimensional spaces, such as the method for finding the most important components of a large complex beam line 60 . PCA was performed by subtracting the mean of each column of D and then factorizing the data matrix D via singular value decomposition to calculate the score matrix T according to whose components can be used in linear combinations as a basis with which to represent the original images.
We then generated input beam distributions I i as linear combinations of N pca PCA components as Although hundreds of PCA components were required to represent > 95% of the variance in our data, we wanted to significantly reduce the dimensionality of our problem. The first 30 principal components (PC 1 , . . . , PC 30 ) are shown in Fig. 4 and qualitatively the most dominant modes are the first 15. To quantify the accuracy of reconstruction as a function of the number of PCA components used, we defined the error  www.nature.com/scientificreports/ We then represented the 5 measured input beam distributions according to (9) and found that the error quickly dropped and leveled off at ∼ 8% for N pca = 15 after which it decayed very slowly so we used the first 15 images shown in Fig. 4 as the basis vectors for representing input beam distributions for HiRES. This limits the lowest error we can achieve for real distributions at 8%, but greatly speeds up real time adaptive tuning. Next we calibrated a General Particle Tracer (GPT) model using a generative neural network surrogate model (SM) to quickly map parameter settings to output beam distributions 61,62 by a high dimensional parameter search (including average beam energy (x p , y p , z, E) ) to match predictions to measured beam data. The calibrated GPT model was then used to generate 51 thousand pairs (X i , y i ) , with each y i = (α i,1 , . . . , α i,15 ) representing 15 PCA coefficients sampled from a normal distribution to generate a random input beam distribution Î i as in (9), and X i being the 51 × 51 image of the (x, y) output beam distribution at the first view screen of HiRES, generated by the GPT model using Î i as input.
We then trained a CNN to map output beam distributions to the accelerator's input beam distribution as shown in Fig. 5A. For the generative half of the CNN, instead of allowing the CNN to build in an arbitrary latent space by a standard approach such as transposed convolution layers, our approach guided the CNN to produce physically significant quantities which were the PCA components representing the basis out of which to build our input beam distributions.
The CNN was tested on 1000 unseen input-output pairs and had an average error as defined in (10) of 1.88% with a standard deviation of 0.56%. Next, we used an experimentally measured output beam distribution as the input to the CNN and were able to predict the corresponding measured input beam distribution with an accuracy of 14.05%. Finally, the CNN's prediction was fine tuned using an iterative adaptive feedback approach with each new input distribution fed into GPT giving a new output distribution, as shown in Fig. 5B-E, resulting in prediction accuracy of 9.69%. A detailed view of adaptive CNN predictions is shown in Fig. 6.
The input beam distributions generated by the CNN were then fed into the GPT model to produce output beam estimates, which were compared to measurements. The error between predicted and measured beam outputs was used to drive an adaptive feedback loop which was connected back into the dense layer of the CNN to adaptively tune the CNN's generative layer's predictions. The adaptive method used was extremum seeking (ES) for high dimensional dynamic systems of the form where f is an analytically unknown time-varying nonlinear function, x are measurable system values, p are adjustable parameters, and ŷ is a noisy measurement of an analytically unknown function y that can be maximized by adjusting p according to where ω j = ω i for i = j . The term α > 0 is the dithering amplitude and can be increased to escape local minima and k > 0 is feedback gain. For large ω i , the dynamics of (12) are, on average dp/dt = −kα∇ p y , a gradient descent (for k > 0 ) or ascent (for k < 0 ), with respect to p , of the actual, analytically unknown function y although feedback is based only on the noisy measurements ŷ(t) , for details and proofs see 33,63,64 . , t), p, t),ŷ = y(x(p, t), p, t) + n(t), (12) dp j dt = 2αω j cos(ω j t + kŷ),

Results
Demonstration at HiRES UED. In our approach the dynamic system of interest was the HiRES beamline and its GPT-based simulation. Our inverse model was designed to map HiRES output beam distributions to their associated input distributions and the CNN parameters being tuned, p , were dense layers which created the PCA coefficients defining the input beam distribution, as well as a hidden parameter within the HiRES simulation represented by the solenoid magnet setting. Our adaptive feedback was guided by maximizing the structural similarity index (SSIM) between a measured output beam distribution I o (t) = ρ m (x, y, t) , and the GPT simulation-based output beam distribution Î 0 (t) = ρ s (x, y, t) , both of which were smoothed with a Gaussian filter of variance 2 and normalized to a range of [0, 1]. SSIM is defined as where µ m and µ s are mean values of the measured and simulated distributions, σ 2 m and σ 2 s are their variance, σ ms is the covariance, and c 1 = c 2 = 10 −465 . The SSIM value lies within the range [−1.0, 1.0] with a value of 1.0 for images that are exactly the same.
To demonstrate the robust adaptive capability of our approach to simultaneously track both changing accelerator parameters and time-varying initial beam distributions, we performed a study running three sets of GPT simulations. In the first simulation, representing HiRES, an input beam was defined and its PCA components as well as the current of solenoid S1 were changed over time, resulting in a large variation of both the input and output beam. Based on that output beam of the first simulation, the CNN alone was used to predict the PCA components which generated an input to a second simulation to predict the output beam. This second simulation quickly diverged from the first because the system changed from the one that had been used to train the CNN. For a third simulation, the adaptive feedback algorithm (12) was also applied with y as in (13), which adaptively tuned both the PCA components and the current of solenoid S1 in order to track the time-varying output distribution of the first simulation, as shown in Fig. 5B-F. Figure 7A shows the error of the PCA component predictions and demonstrates that the CNN + ES setup was able to simultaneously track both the time-varying solenoid current and the PCA components. Figure 7B compares the SSIM of the output beams with and without ES for tracking, (C) shows the percent error (10), and Fig. 8 compares the target output beam and input beam with and without using ES. Using the adaptively tracked input distribution and S1 setting we simulated a 1.6 pC bunch with 3D space charge and compared the 6D phase space to that which would have been generated by the exact correct values as well as to the CNN alone in Fig. 9, showing an almost exact match of the 6D phase space. www.nature.com/scientificreports/ Note that we chose solenoid S1 as the hidden time-varying parameter to be tracked only for the purpose of demonstrating this adaptive technique. In general many such parameters can be adaptively tuned simultaneously such as multiple quadrupole magnets which suffer from hysteresis.
Finally, to demonstrate the ability of the approach in Fig. 7 to track time-varying beamline paramters, we generated an artificial quantum efficiency map QE(x, y), and used the measured laser intensity I(x, y) to construct a beam distribution ρ = I × QE which was used as input to the GPT model. The output was fed into the CNN whose initial guess of ρ was then adaptively fine tuned. The QE of the cathode was then reconstructed as QE(x, y) = ρ(x, y)/I(x, y) , as shown in Fig. 10.

Discussion
We have proposed an approach for adaptively tuning the parameters of both an online physics model and encoder-decoder CNN-based inverse physics model by using an extremum seeking model-independent feedback method. This approach results in an overall adaptive ML setup which automatically compensates for unknown time-varying changes to accelerator components (such as magnets, and to unknown changes in the accelerator's input beam distribution), by tuning inputs to the low dimensional dense layers preceding the decoder section 10    Although demonstrated here for a particle accelerator, such method can be applied to a wide range of complex time-varying systems with limited diagnostics. We have demonstrated our method for adaptively tracking the time varying input beam distribution of the HiRES UED particle accelerator and the associated quantum efficiency (QE) when a measurement of the laser transverse profile at the cathode is available, for example through implementation of virtual cathode diagnostic. Our general approach can be applied at various particle accelerators to enable physics models to accurately map distributions between various locations to enable adaptive tuning. For example transverse deflecting radio frequency resonant cavity (TCAV)-based LPS measurements can be used to reconstruct the initial (z, E) LPS. Such model-based non-invasive diagnostics are possible because the rich physics in models are strong constraints on the dynamic maps between accelerator distributions. Given sufficiently rich measurements we are very unlikely to achieve a close match at one section of an accelerator without uniquely reproducing the correct input distribution.
Tracking non-stationary systems requires adaptive algorithms to be able to map measured outputs to unknown time-varying input(s). Isolating the correct changing input(s), together with its magnitude and time-scale, is an extremely complex problem, including multi-parameter optimization of the inverse physics problem. While this could be achieved using particle tracking codes (such as GPT), it would be incredibly computationally expensive and time consuming, and would practically prevent the method to be used in beamlines for real-time feedbacks. Powerful ML tools, such as convolutional neural networks, speed up operations by order of magnitudes (a single prediction as shown in Fig. 6F takes less than a millisecond), and can therefore efficiently be used for fine-tuned by the adaptive feedback and tracked over time, as shown in Figs. 6G and 7. We demonstrated that our approach is simultaneously tracking both the unknown input beam distribution and time-varying accelerator parameters.
Lastly, it is worth pointing out the limitations to the re-training free approach presented here. First, if the magnitude of the input beam distribution change it is such that the new input no longer falls within the span of the PCA-based basis vectors which we are using, the error between the real and predicted input will diverge. Second, if something in the system changes that is not part of the model or is not being adaptively tuned in the model, such as adding or removing a magnet from the beamline, then a new online model and a new experimental data set must be collected for this fundamentally different system.
In future work we plan to expand on our results by using several diagnostics simultaneously, including TCAV based LPS measurements and magnet tuning-based transverse phase space measurements. Our method can also be extended to 3D distributions ρ(r) using 3D PCA components, spherical harmonics defining star-convex distribution boundaries ∂ρ(θ, ϕ) as where a convolutional neural network can be used to generate the spherical harmonics coefficients α m l which are then adaptively tuned. A more general approach is to generate distributions from combinations of radial basis functions (RBF) defined as where the neural network predicts the RBF amplitudes α n , centers r c,n = (x c,n , y c,n , z c,n ) , and decay rates σ n , all of which are adaptively tuned.