Accurate and confident prediction of electron beam longitudinal properties using spectral virtual diagnostics

Longitudinal phase space (LPS) provides a critical information about electron beam dynamics for various scientific applications. For example, it can give insight into the high-brightness X-ray radiation from a free electron laser. Existing diagnostics are invasive, and often times cannot operate at the required resolution. In this work we present a machine learning-based Virtual Diagnostic (VD) tool to accurately predict the LPS for every shot using spectral information collected non-destructively from the radiation of relativistic electron beam. We demonstrate the tool’s accuracy for three different case studies with experimental or simulated data. For each case, we introduce a method to increase the confidence in the VD tool. We anticipate that spectral VD would improve the setup and understanding of experimental configurations at DOE’s user facilities as well as data sorting and analysis. The spectral VD can provide confident knowledge of the longitudinal bunch properties at the next generation of high-repetition rate linear accelerators while reducing the load on data storage, readout and streaming requirements.

www.nature.com/scientificreports/ method is susceptible to prediction errors if there is a failure in one of the read-back linac controls. As a result, the scalar VD has limited prediction accuracy of the LPS, which may be exacerbated in more complicated accelerator operation modes such as two-bunch configurations 21 . In addition, for a given linac controls there are inherent pulse-length temporal and beam density shot-to-shot fluctuations in the beam due e.g. to MBI [22][23][24] , which are not captured by scalar (integrated) diagnostic signals. As a result the scalar VD will be insensitive to such variations. In addition, in order to transition such VD tools from initial proof-of-concept demonstrations to single-shot diagnostics used in regular operation, it is therefore essential to increase the robustness, accuracy and confidence of their diagnostic predictions.
In this paper, we present a solution that improves the confidence and accuracy of VD predictions by using a direct measurement of the electron beam radiation spectrum to recover LPS on a single shot basis; we refer to as spectral VD. We train the virtual diagnostic model using spectral information which can be obtained nondestructively from a diffraction or bend radiation, and may be measured by a mid-IR 25 or Thz spectrometer 26 .
To demonstrate our method we use three case studies from separate facilities: the Linac Coherent Light Source (LCLS) normal-conducting accelerator 6 , the superconducting LCLS-II linac 12 and the FACET-II accelerator 21 . These examples illustrate different advantages of the spectral VD, namely its additional accuracy, its ability to confidently resolve shot-to-shot features that scalar VD is unable to (e.g. MBI which is important for LCLS-II), and its use in improving confidence in prediction beyond the ground truth measurement (e.g. high current shots in FACET-II). The ML methods presented here, including quantifying uncertainty and increasing prediction's confidence, are useful for other applications as well.

Results and discussion
In what follows, we trained a feed-forward neural network (NN) on thousands of pairwise instances: 1D longitudinal current profiles as the ground truth outputs, and their matching spectrum as inputs. When applicable, we compared the results to the scalar VD with the same NN architecture. Metrics used for comparison can be found in "Methods" section. Next, we repeat this training process for full LPS images of the electron beam. We first discuss the accuracy and advantages of the spectral VD method based on LCLS experimental data. We further extend our method to predict current profiles from the LCLS-II and FACET-II facilities. This illustrates the versatility of the method as applied to a high repetition-rate machine (LCLS-II) or a high-current, ultra-short bunch facility (FACET-II). For each case study, we present a different method to increase the confidence in the prediction, since the VD will be available instead of the XTCAV measurement. Such method would indicate for example when the VD has moved outside of its range of reliability and the predictions should not be trusted. LCLS-improved accuracy over scalar VD with experimental data. For this case study, we use experimental data from LCLS to demonstrate the improved accuracy of the spectral VD prediction over the scalar VD for 1D current profiles as well as 2D LPS images. By comparing the predictions of both VDs we are able to flag low-confidence predictions.
The prediction of the current profile for four test shots (i.e. not used in the NN training) is shown in Fig. 2a-d. The input to the spectral VD is shown in the bottom left panel of Fig. 2. There is an excellent agreement (total normalized mean squared error (NMSE) for the entire test set of 0.28 ± 0.007% ) between the NN prediction trained on spectrum (dashed green) and the measured current profile (blue). There are cases where the scalar VD predictions (dashed red) suffer from numerical artifacts which for some shots result in undershot or overshot (see Fig. 2a or b, and Fig. 2c, respectively). The undershoot in the scalar VD prediction of Fig. 2a occurred since the readback of linac 02 peak current (2 kA) was different than the peak current measured on the XTCAV (6kA). This can happen, for example, due to a malfunction of either linac 02 the current monitor diagnostic or a misfiring of the XTCAV. The latter demonstrates the advantage of the spectral VD, which is an indirect measurement of the beam itself, thus more tolerant to control input errors. Correspondingly, the standard deviation of the spectral www.nature.com/scientificreports/ VD prediction is smaller. We find our network architecture consistently improved ( ∼ 15% ) over Ref. 19 both in terms of overall error ( NMSE = 0.88% ) of the scalars VD, and predicting high peak current shots as in Fig. 2a.
Combining the prediction of two separate NN trained on different input may increase the confidence of the prediction. It could be used as a way to flag shots in which the discrepancy between two independent predictions reaches a threshold. For example, after removing shots for which the mean square error (MSE) of the scalars and spectral VD difference is greater than 0.0005, the total MSE for the entire test set decreased in 0.24% and 0.01% for the scalars and spectral VD respectively. The example shots in Fig. 2a-d cover various types of current profiles as shown in the bottom right panel of Fig. 2. This panel shows the peak current and its full width half maximum (FWHM) for all test shots for the spectral VD. Notably, the spectral VD prediction degrades for higher current shots. This is understandable since there were fewer examples with current > 3.5 kA.
We used the same network architectures to train VD for 2D LPS images. The prediction of the LPS for three test shots is shown in Fig. 3. The spectral VD had slightly better performance (MSE = 0.054, structural similarity index measure (SSIM) of 0.97) than the scalar VD (MSE = 0.079, SSIM = 0.96). Some of the scalar VD predictions suffer from smearing effect shown in Fig. 3 top panel. The bottom panel is an example of XTCAV misfiring, for which the spectral VD predicted noise, but the scalars VD predicted a real LPS. The latter's false positive is due to the fact that scalars input are an integrated characterizing of the beam whereas the spectrum is an indirect measurement of the beam properties.
Finally, we investigated the learned input weights, parameters within the neural network (NN) that transform the input data within the network's first hidden layer. The NN weights affect the amount of influence an input frequency has on the output. We found that all the frequencies participated in the reconstruction (there were no zero weights), and frequencies above 15 THz were given higher weights. This is interpretable since the difference in the form factor results in larger difference in the emitted radiation in higher frequencies.
LCLS-II-shot-to-shot prediction of fine features via ensembling. For this case study, we use simulated data of the LCLS-II superconducting soft X-ray linac to show an example where prediction on a shot-basis is only available using the spectral VD. LCLS-II has a 1km bypass line between the linac and the undulator, so that MBI is especially pronounced. We train an ensemble of neural networks to produce a confidence interval that is then used as a threshold to veto bad predictions, thus increasing the confidence in the diagnostic.
There are cases in which a neural network trained using scalar inputs is insensitive to certain features of the LPS. One such example is trying to use a neural network to resolve details of the microbunching structure of an electron beam. The MBI in linac-driven FELs results from the amplification of microscopic density modulations during the transport of an electron beam from the electron source to the undulator. During the transport shot-to-shot amplitude fluctuations starting from noise can lead to macroscopic fluctuations of the LPS, current profile and electron beam bunching factor b( ) = 1 N N n=1 exp(i2πct n / ) . Here is the wavelength, and the sum is over the N electrons in the bunch, with t n the relative time delay of each. These in turn can seed the growth of unwanted radiation modes in the FEL and/or reduce the FEL peak power. Suppressing the MBI has been the subject of extensive research (see e.g. 27 and references therein). www.nature.com/scientificreports/ As an example, the LCLS-II super-conducting linac will drive a soft X-ray FEL for which the MBI is being studied carefully in its relation to FEL performance 9,23,24 . The LPS may change on a shot-to-shot-basis due to the MBI despite the accelerator set-points remaining un-changed. Thus, we need a diagnostic which is able to predict the amplitude of microbunching fluctuations on a single-shot basis to aid in interpretation of experimental results. In this case we can only use spectral information to train the neural network to make these predictions as the coherent radiation spectrum is directly sensitive to single-shot fluctuations of the current profile, contrary to integrated scalar diagnostics.
The spectrum and current profile of the electron beam were then used to train a neural network-based virtual diagnostic as above. In this case we used a wider neural network to capture the rich structure in the MBI data. The overall predicted current error for test shots was NMSE = 1.1% . Figure 4b1,c1 show the current profile prediction for two test shots with MSE of 1.3e−5 ± 7.7e−6 and 5.2e−3 ± 6.9e−4 for (b1) and (c1) respectively. Their corresponding bunching factor, calculated from simulated and predicted current on the time interval [25,120] fs, is showing in Fig. 4b2,c2, respectively.
Finally, in Fig. 4d we show that the maximum predicted standard deviation is correlated with the NMSE in current prediction. There are two distinct clusters: good predictions are clustered into the purple cluster ( ∼ 87% of the shots). Those have low std and low NMSE, implying that the predictions were accurate. In contrast, bad predictions are clustered into the yellow cluster ( ∼ 13% ) which have high std and high MSE, implying that the predictions were inaccurate. This result indicates that when deployed on the machine, where the ground truth will not be available for calculating MSE, we can flag bad shots by setting a threshold shown in dashed line; if the predicted std is greater than 5e−4, we classify the shot as 'bad' . In addition, the NMSE of the predicted bunching factor averaged over the wavelength's range from 0 to 10 µ m is shown in colorbar. Notably, accurate prediction of the current translates to accurate prediction of the bunching factor.
FACET-II-flagging high peak current shots beyond diagnostic resolution. For this case study, we use simulated data of the FACET-II two bunch mode with high peak current to show an example where www.nature.com/scientificreports/ accurate prediction is limited to the XTCAV current resolution of I < I max ∼ 35 kA 28 . By using the spectral information not only for the network prediction, but also for correlating an integrated spectral intensity with the predicted peak current, we are able to flag suspect shots with peak current I > I max beyond the XTCAV resolution. This approach is crucial for building confidence in the virtual diagnostic prediction which may be used online to facilitate the interpretation of experimental data and tuning of the machine settings. Reliability the virtual diagnostics tool is critical for operations. As shown for the LCLS case study, one way to increase the confidence in the prediction would come from the redundancy of two separate NN trained on different input, and flagging suspect shots for which there is significant discrepancy between the two NN predictions. However, there are cases where the scalar VD isn't applicable as in LCLS-II case study, in those cases the confidence in prediction could come from ensemble methods, e.g. averaging randomly initialized NNs. Nevertheless, those predictions would be limited to the XTCAV ground truth. Thus, there is a need to develop a method to increase the confidence in the prediction by resolving features that are beyond the XTCAV limited resolution, such as high peak shots ( I peak > I max ) or short bunches (≤ 4.5 µm).
Spectral VD is able to resolve the discrepancies between predicted current profiles and actual current at the interaction point (IP), beyond the limited resolution of XTCAV. For example, FACET-II accelerator operating with two-bunch configuration, will generate very short bunches ( µm rms size) with high peak current ( I peak > I max ). The XTCAV will underestimate bunches with I peak,IP > I max -see Fig. 5a. Such measurement would be smeared out on the XTCAV (referred as 'bad' shot), and would look similar to a 'good' shot with I peak,IP ≃ I max -shown in Fig. 5b. Therefore the XTCAV alone cannot distinguish between 'bad' and 'good' shots, and the scalar VD wouldn't allow us to distinguish those shots either. However, very short bunches with high peak current will radiate strong coherent radiation at high frequencies (THz range), thus the spectrum of the shots would be different-see Fig. 5c. Figure 5d shows the simulated and predicted maximum current with the prediction MSE as a colorbar. There is good agreement of the simulated and predicted profiles (total NMAE = 3.2 ± 0.5% ). We then used again the spectrum to veto 'bad' shots based on the integrated signal over a frequency band (using a pyro and filters). The integrated spectral intensity value is used to determine which shots fall outside the XTCAV resolution window. We optimized the frequency band to maximize the difference between 'good' and 'bad' shots. Shots that are within the XTCAV resolution should show correlation between the peak current at the IP and the measured value by the XTCAV. These shots are mostly in the region where the I peak,IP < I max . Shots that are not in this region would be flagged as 'bad' shots. Determining if shot's spectral intensity is in the good region on a shot-to-shot basis will be complementary to the spectral VD, and will provide assurance that the predicted current profiles from the XTCAV map to the IP current profiles. Figure 5e shows the maximum predicted current on the XTCAV, and the corresponding spectral intensity integrated on the interval [5,200] THz. The maximum IP current is shown as the colorbar. All shots with spectral intensity smaller than the cutoff (shown in black line) are with IP current smaller than 35 kA. This means that all predictions in this high confidence region can be trusted (46% of the shots). Shot in the gray region are flagged as 'bad' , since the IP current was much higher to be resolved by the XTCAV.
Lastly, we would like to discuss three additional points: (1) VDs would require re-training in several cases. For example, different machine configurations, accounting for long-term phenomena such as drift, or in cases where the instrumentation changes. For example, the spectrum will be slightly different when using another spectrometer. VD models retraining can be done by using transfer learning, i.e. the VD model developed for one configuration/machine is reused as the starting point for a model on a second configuration/machine. This would speed-up the re-training, since, for example, a VD model on one instrumentation can be used as a starting point to re-train and calibrate the model to a new instrument. (2) There are cases where the resolution of the TCAV www.nature.com/scientificreports/ is insufficient to estimate the slice energy spread due to Panofsky-Wenzel effect. If it is desired to go beyond the TCAV resolution in the energy dimension, we can consider additional inputs to the virtual diagnostic model. For example, we can include the energy spread projection, which can be measured non-destructively e.g. using synchrotron radiation in a dispersive region 29 . (3) In addition to their utilization as predictive tools, VDs can be combined with optimization algorithms to tailor electron beam properties to match desired characteristics 28 . Knowledge of the LPS and the ability to generate desired LPS distributions will increase the physics understanding of experiments at FACET-II and LCLS-II.

Conclusions and outlook
We present a virtual diagnostic tool to predict the 2D longitudinal phase space (LPS) and the 1D current profile from a non-invasive spectral measurement of the electron beam's diffraction or bend radiation. We demonstrated our method on three separate facilities as case studies: the Linac Coherent Light Source (LCLS) normal-conducting accelerator, the superconducting LCLS-II linac and the FACET-II accelerator. Each example illustrates different advantage of the spectral VD. For the LCLS case, the spectral VD provided more accurate predictions than the previously demonstrated scalar VD, which predicts the LPS from non-invasive accelerator control scalars. The confidence in the prediction would come from flagging shots for which there is significant discrepancy between the two neural network (NN) predictions. For the LCLS-II case, the spectral VD was able to resolve shot-to-shot features relevant to microbunching, wherein the scalar VD isn't applicable at all. The confidence in prediction would come from ensembling, namely averaging several randomly initialized NNs. For the FACET-II case, the scalar VD is used not to only to accurately predict the current profile, but also to distinguish between ∼ 35 kA peak current shots and higher peak current shots that would appear similar to the former due to the XTCAV limited resolution. The confidence in prediction for high current shots would come from correlating the std of the current prediction with its integrated spectral intensity, as high peak current shots would have more spectral information in higher frequencies.
Increasing the reliability and robustness of the virtual diagnostics tools are critical for deployment and operations, even beyond the limited resolution of the routinely used XTCAV. We are able to extract robust and meaningful information from complex LPS measurements by combining the spectral VD's accurate prediction with various methods to increase the confidence in the prediction. The Spectral VD has the potential to maximize the scientific output of accelerators, and bring the concept of autonomous control of accelerators one step further.

Methods
Spectral virtual diagnostics. Typically, longitudinal 2D phase space (LPS) is measured at the XTCAV, and the longitudinal 1D beam profile, or current, are derived from the LPS. An IR spectrometer can be used to measure electron beam radiation before the electron beam is manipulated for various applications-see Fig. 1. Given the electron beam radiation spectrum, numerical analysis techniques, such as constrained deconvolution 30,31 , iterative phase retrieval 26,32,33 , or analytic phase computation by Kramers-Kronig dispersion relation 34 could be applied to calculate the 1D longitudinal beam profile 25,35,36 . However, the reconstructed signal using those techniques is not unique 26,33,37 . Prior work has explored using conventional feedback with non-destructive radiation diagnostics to infer the electron beam profile 29 . However, those predictions were not as accurate for such quickly time varying system. In addition, the full 2D longitudinal phase space cannot be reconstructed using the above techniques. Therefore, we proposed the spectral VD to train a neural network to predict the LPS or current profile from a non-destructive spectral measurement.
Virtual diagnostic architecture. For  Metrics for evaluation. As a quantitative relative measure of the error between the prediction ŷ and the measurement (or simulation) y, we used mean squared error MSE(y,ŷ) = N−1 i=0 (y i −ŷ i ) 2 and normalized mean squared error NMSE(y,ŷ) = MSE/ N−1 i=0 y 2 i . As a quantitative measure of the 2D LPS prediction's accuracy, we compute the mean structural similarity index measure (SSIM) between two images 40 .
Data set. We used three different datasets for the experiments shown. All datasets were randomly shuffled and partitioned into 80% for training (from which 10% for validation) and 20% for testing.
LCLS data set. Training data for a feed-forward neural network has been acquired from thousands of measurements ( ∼ 4000 ) in nominal operating energy of 13.4 GeV and 180 pC charge. In order to generate a large variety of LPS profiles, we scanned the LPS distribution with respect to a wide range of values for the phase of linac 01, and the peak current after linac 02. Images of the LPS were recorded by the XTCAV (resolution of ∼ 1.2 µm and 0.92 MeV/pixel 10 ) at the accelerator exit. We adopt the same pre-processing as in Ref. 19 where the images were cropped and centered before the NN was trained. The current profile dataset is calculated from the 2D LPS by integrating for each energy level, and normalizing with the nominal operational charge.
The input for the spectral VD included the spectrum of each current profile. The spectral information can be collected non-destructively by using coherent diffraction radiation (CDR) 41 for example. Here, since we did not have access to simultaneous spectrometer measurements at the time, we derived the spectrum by calculating the square of the Fourier transform of the current profile up to 60 THz, down-sampled it to 0.6 THz resolution. We then multiplied the spectrum by weights with weak frequency dependence accounting for the radiation process, that were derived from simulations (see "Spectra simulations" subsection below). These weights present a decrease in the CDR emitted from a washer, in both low and high frequencies relative to the square of the form factor. This is because the inner radius limits emission at high frequencies, while the outer radius limits the low frequencies. Lastly, we added 10% random noise to match the current state-of-the-art IR spectrometers 26,42 . The input for the scalars VD included five accelerator controls read-back: amplitude and phase of linac 01, amplitude of L1x (the X-band linearizer upstream of the first bunch compressor), and non-destructive current measurements (using coherent radiation monitors 43,44 ) before and after linac 02.
LCLS-II data set. We generated thousands of simulated examples using ELEGANT 45 . While the simulation's controls remained un-changed, we varied the noise's random seed, so each simulation results in a different LPS (shown in Fig. 4a). We sampled down the LPS to match the resolution of the XTCAV, and calculated the current profile. We derived the corresponding spectrum using 'simulated spectra' as explained in the subsection below. By training the VD on simulated current profiles from the XTCAV as an output, the temporal resolution of the predicted current profile is given by the XTCAV. Alternatively, better temporal resolution can be achieved by training the VD on reconstructed current profiles as in Ref. 46 . This recent work showed that a THz spectrometer can be used to measure the amplitude of the MBI including suppression of the MBI using a laser heater. By using spectral reconstruction methods like Kramers-Kronig, a resolution of ∼ 1 fs can be achieved. However, since such reconstruction is not unique 37 , this approach is beyond the scope of our paper.
FACET-II data set. We generated ∼ 3000 LUCRETIA 47 simulations of the current profile of two-bunch configuration, quantifying the expected jitter of the FACET-II linac based on the parameters from Ref. 48 . We derived the spectrum using 'simulated spectra' as explained in the subsection below.
Simulated spectra. Coherent diffraction radiation is calculated with an in-house SLAC code, which follows the method of Ref. 49  www.nature.com/scientificreports/ frequency component, the beam's magnetic field induces a surface current on each element of the radiating foil, which is treated as a perfect conductor. We used a foil with an inner radius of 3 mm, and outer radius of 20.6 mm. The code can compute this magnetic field for four transverse beam distributions: a zero-diameter (line) beam, a flap-top, a circular Gaussian, and an elliptical Gaussian. The line beam is used in these simulations, since it is faster to compute, and since the small transverse beam sizes at LCLS do not require including the full transverse distribution. The surface current is the source of the electric field radiated near the direction of specular reflection. A virtual 'camera' is placed at normal to this direction at a suitable distance, in this case 1 m (in the near field, since the high beam energy results in a large formation length) from the foil. At each pixel of the camera, integration over the foil surface gives the complex electric field, from which the radiated power and power spectral density can be computed. As an input to the Spectral VD, we used the frequency range up to 60 THz and added 10% random noise to match the current state-of-the-art IR spectrometers 26,42 . The calibration of the spectral signal based on the transverse form factor will contribute to the spectrum, especially at wavelengths shorter than the transverse spot-size, and we anticipate accounting for this effect by calibrating the training data during data processing for the VD training.