Towards machine learning aided real-time range imaging in proton therapy

Compton imaging represents a promising technique for range verification in proton therapy treatments. In this work, we report on the advantageous aspects of the i-TED detector for proton-range monitoring, based on the results of the first Monte Carlo study of its applicability to this field. i-TED is an array of Compton cameras, that have been specifically designed for neutron-capture nuclear physics experiments, which are characterized by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-ray energies spanning up to 5–6 MeV, rather low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-ray emission yields and very intense neutron induced \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-ray backgrounds. Our developments to cope with these three aspects are concomitant with those required in the field of hadron therapy, especially in terms of high efficiency for real-time monitoring, low sensitivity to neutron backgrounds and reliable performance at the high \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-ray energies. We find that signal-to-background ratios can be appreciably improved with i-TED thanks to its light-weight design and the low neutron-capture cross sections of its LaCl\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{3}$$\end{document}3 crystals, when compared to other similar systems based on LYSO, CdZnTe or LaBr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{3}$$\end{document}3. Its high time-resolution (CRT \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}∼ 500 ps) represents an additional advantage for background suppression when operated in pulsed HT mode. Each i-TED Compton module features two detection planes of very large LaCl\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{3}$$\end{document}3 monolithic crystals, thereby achieving a high efficiency in coincidence of 0.2% for a point-like 1 MeV \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-ray source at 5 cm distance. This leads to sufficient statistics for reliable image reconstruction with an array of four i-TED detectors assuming clinical intensities of 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{8}$$\end{document}8 protons per treatment point. The use of a two-plane design instead of three-planes has been preferred owing to the higher attainable efficiency for double time-coincidences than for threefold events. The loss of full-energy events for high energy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}γ-rays is compensated by means of machine-learning based algorithms, which allow one to enhance the signal-to-total ratio up to a factor of 2.

Proton therapy in comparison to conventional radiation therapy is able to target the tumor thanks to the maximum dose deposition at the end of the trajectory of the protons (Bragg peak) and its finite penetration in matter. As the dose deposit beyond this distal edge is very low, proton therapy minimizes the damage to neighbouring tissues compared to photon therapy and is hence especially well-suited for tumors close to sensitive organs and in pediatric cases because the lower dose received by healthy tissues reduces the long-term secondary effects 1 . However, inherent range uncertainties associated to anatomical changes, patient setup errors and range errors from uncertainties in particle stopping power, and imaging reconstruction artifacts 2 lead to the application of conservative safety margins. Indeed, up to 1 cm of margin is considered nowadays for a prescribed range of 30 cm 3 , limiting significantly the potential benefits of protons over photons.
In this context, several experimental methods to verify the proton beam range have been developed in recent years, mainly based on the monitoring of the secondary γ-rays, neutrons or positron emitters produced in nuclear reactions along the proton trajectory. Prompt gamma (PG) monitoring allows a proper assessment of the distal dose falloff of the proton beam during the treatment 4 . Unlike the produced positron emitters 5 or neutrons 6 , the spatial distribution of the emitted prompt gamma rays shows a very close correlation with the proton dose distribution at the end of the beam 7 . Moreover, these gamma rays are prompt, which means that they are emitted within 1 ns after the collision, which is key for the online verification of the proton range.
In the last decade, several research groups have designed and tested PG monitoring systems based in prompt gamma timing (PGT) 8,9 , γ-ray spectroscopy 10,11 , or prompt gamma imaging (PGI) methods. The latter are based on either passive collimation, such as knife-edge-slit cameras [12][13][14] , or in active collimation, where most efforts are focused in the development of Compton cameras 15  www.nature.com/scientificreports/ which represent a promising solution for the imaging of gamma-rays of a few MeV. These systems, in contrast to passive collimated cameras, present higher efficiency and expanded field-of-views which allow reconstructing two-(2D) or even three-dimensional (3D) images instead of one-dimensional (1D) profiles 4,16 . However, there are several challenges that need to be addressed for a reliable implementation of this methodology in the clinical case 17 . Indeed, in-vivo range monitoring remains still an issue for most of the Compton cameras under development 16,[18][19][20][21][22][23][24][25][26][27][28][29] . These limitations are related to the limited coincidence efficiency of some of the detectors 20,21,25 , the high counting rates in clinical conditions 16,24,30 , the spatial resolution 22 , the signal-to-background ratio that is challenged by contaminant reactions 18,31 , and the CPU processing-time required by the corresponding image-reconstruction algorithm 26 .
There are remarkable similarities between most of these challenging effects of PGI in proton therapy, and those encountered in other nuclear physics fields, such as neutron-capture cross-section measurements employing the time-of-flight (TOF) technique 32 . These similarities are discussed in the following with some detail. In a TOF neutron capture experiment a pulsed beam of neutrons is shot against a small sample, which typically has a small mass of grams or even milligrams. The reaction of interest, radiative capture, leads to the formation of a compound nucleus, which de-excites emitting a prompt cascade of γ-rays. Thus, common γ-ray energies typically span from a few keV up to 5-6 MeV, which is similar to the range of γ-ray energies from proton-induced inelastic reactions in the carbon, oxygen and nitrogen atoms of the human tissues. On the other hand, in neutroncapture TOF experiments the elastic scattering channel dominates versus the radiative channel of interest. These stray neutrons can be captured in the detector itself or in the surrounding materials, thereby enhancing the γ -ray background level and further obscuring the observation of the channel of interest. For the latter reason, an effort is made to design detection systems of high intrinsic γ-ray efficiency and as transparent as possible to neutrons [33][34][35] . This reduces the intrinsic background level and enhances the signal-to-background ratio. Similarly, PGI is also challenged by the background induced by neutrons originating from nuclear reactions of the primary proton beam 4,6,36 . In the case of carbon-ion beams, neutron production is even more pronounced and their discrimination against prompt γ-rays is an issue 4 . Therefore, optimization of the detection system in terms of neutron sensitivity becomes also an aspect of interest for hadron therapy, although not much attention has been put into this aspect in recent optimization studies 31,37 .
In order to overcome the experimental difficulties discussed above for TOF nuclear experiments, we have developed a total-energy detector with gamma-ray imaging capability, called i-TED 38,39 . i-TED is an array of four individual Compton imaging modules, each of them consisting of two position-sensitive detection layers based on large monolithic LaCl 3 (Ce) crystals. A detailed description of the first such i-TED module can be found in Ref. 40 . i-TED features an excellent time resolution for enabling the TOF technique 39 and it has been especially designed to attain a high detection efficiency, a low sensitivity to scattered-neutron backgrounds and a high image resolving power. Its performance can be further enhanced by using innovative methods based on machine learning (ML) techniques 34,39 , which are very powerful thanks to the multi-dimensional structure of the data acquired with i-TED operated in S&A coincidence, including the energy deposition, position of interaction and time difference of the γ-ray interaction in the two detection planes.
While the work presented here is entirely based on Monte-Carlo (MC) simulations, the i-TED detector has been already developed 34,38,[40][41][42] and employed for TOF experiments at CERN n_TOF 39 . Therefore, intrinsic performance parameters such as energy resolution 41 , 3D intrinsic position resolution 34,42 and efficiency 40 have been experimentally validated and are realistically included in the present study. Based on our previous extensive experience in MC simulations of detectors 38,39 and neutron production and transport 43 , in the present simulation work we have been able to account for both geometrical and physical effects to a great level of detail.
Although most of the work carried out so far with i-TED in terms of neutron-capture experiments has been based on an adaptation of the back-projection (BP) method 44 , for the present study we have implemented two additional imaging algorithms which are the stochastic origin ensemble (SOE) method 45 , and the analytical algorithm (AA) of Tomitani et al. 46 . This has allowed us to better illustrate the benefits of the aforediscussed aspects.
Finally, two other aspects are worth mentioning. On one hand, we have been able to implement a novel event classification algorithm, that allows one to significantly improve the quality of the results by filtering out highenergy γ-ray events with incomplete energy deposition. To the best of our knowledge, this is the first time that such a machine learning (ML) technique is successfully applied to this aim, thereby enhancing the signal-tobackground ratio by up to a factor of two. On the other hand, we have been able to boost the time-performance of the image reconstruction algorithms by means of an advanced graphical-processing unit (GPU) implementation, which led to reconstruction times of the order tenths of seconds for the most complex of the implemented algorithms. As discussed below, the latter two aspects in conjunction with the high intrinsic efficiency of the i-TED design, turn out of great interest when aiming at real-time ion-range monitoring.

Results
To demonstrate the performance of a detector like i-TED for PGI, a series of MC calculations were carried out. In the simulation, a pencil-beam of 120 MeV protons with a spatial spread of σ = 3 mm and a total intensity of 2 × 10 10 protons impinges on water and PMMA phantoms with a size of 10 × 10 × 20 cm 3 . The energy of the protons, similar to that of previous works 4,37,47,48 , was chosen to match the proton range in water and PMMA (10.7 cm and 9.3 cm, respectively) with the center of the phantom, where the detectors are pointing to. Figure 1 shows a schematic view of the simulated geometry. The phantom is surrounded by four i-TED detectors at 50 mm distance from each lateral surface. Each module consists of two layers of LaCl 3 crystals. The separation between both layers (F d in Fig. 1) can be adjusted for a trade-off between efficiency and resolution 40  www.nature.com/scientificreports/ crystals, each one with a size of 50 × 50 × 25 mm 3 . Detector housing, photosensors and other necessary elements have been modeled according to the existing i-TED detector and more details can be found below and in Ref. 40 . Apart from the geometry itself, an effort was made to implement the most suitable physics libraries for a realistic description of the nuclear reactions both for charged particles and neutrons, and the delivered prompt γ -ray distributions from each isotope. The simulations were carried out with the Geant4 toolkit 49 (v10.6) and the officially released QGSP_INCLXX_HP Physics List (PL) was chosen. More details on the impact of the choice of PL are given below.
All the secondary particles generated in the phantom, mainly γ-rays and neutrons, were registered. The largest production yield is found for prompt γ-rays generated in nuclear reactions induced by the 120 MeV proton beam. According to our MC simulations, on average, 0.081 and 0.065 prompt γ-rays ( E > 1 MeV) are produced in Water and PMMA, respectively, by each incident proton. These results are in agreement with the values given in previous works 8,31 but might overestimate the actual production 48 .
The PG spectra generated by the 120 MeV protons in water and PMMA phantoms, presented in Fig More details on the reactions originating these lines can be found elsewhere 10 . The depth distribution of the γ-rays emitted in the irradiation of a water phantom is shown in the right panel Fig. 2. Different emission patterns along the proton track can be identified for each of the lines in the spectrum, associated to the different energy dependence  www.nature.com/scientificreports/ of the underlying nuclear cross sections 47 . Some of the PG lines (for instance 4.4 MeV and 6.1 MeV), show a sharp production maximum in the vicinity of the Bragg Peak, which makes them specially suited for a direct assessment of the proton range. Considering the similarity in terms of PG yield, energy spectra, and proton range in water and PMMA, only the simulations of the water phantom are considered for the results presented hereafter.
To simulate the response of our detection system, all the secondary particles generated inside the phantom are tracked through the geometry model of Fig. 1. For those γ-rays and neutrons interacting with the i-TED detectors, the deposited energy, interaction position and time of all the hits in the S-and A-layers are registered for the subsequent image reconstruction and background assessment. The particle type and energy of the incoming particle were also stored for the analysis described in the following. Experimental effects such as the low energy threshold of 100 keV in each crystal, and the resolution on γ-ray hit position and deposited energy were included in the simulations to account for their impact on the imaging resolution (see Ref. 40 ). These experimental resolutions include a 4.5% fwhm energy spread at 500 keV 41 and 1.5 mm fwhm spatial uncertainty in all three space coordinates for the reconstruction of the γ-ray hit location in each scintillation block 34,42 . ML-aided prompt γ-ray imaging. The four main PG lines of Fig. 2, corresponding to de-excitation transitions in 14 N, 12 C and 15,16 O, were selected for image reconstruction. For each γ-ray line, scatter and absorber (S&A) coincidence events were selected and the energy deposited by the γ-ray in the two planes was summed to create the so-called add-back spectrum. A energy window of 150-300 keV around the peak energies was selected. In order to reduce the delayed gamma and neutron background associated to the moderation and partial capture of neutrons in the phantom, only events firing the S-layer within 10 ns after the proton pulse were selected. This choice of time-of-flight (TOF) selection and its relation with the neutron sensitivity are discussed in more detail in the following. The applicability of such TOF selection in clinical conditions would depend on the specific time structure of the proton accelerator 4 .
A crucial aspect for the application of PGI to proton range verification is the attainable spatial resolution in the distal edge of the PG depth distribution (see Fig. 2). Aiming at achieving a good balance between imaging resolution and reconstruction time, three different algorithms have been implemented and tested for the reconstruction of 2D PG images: BP, SOE and AA. More details on the implementation of these algorithms are given in Methods. Figure 3 shows the Compton images obtained with the three algorithms after selecting in the addback spectra the four main peaks corresponding to the most intense PG lines. These images have been obtained by combining the statistics of the four i-TED detectors and using the full simulated statistics ( 2 × 10 10 protons) to highlight the attainable spatial resolution and the differences between imaging algorithms.
If one compares the three images in the upper row of Fig. 3, obtained using only gamma-ray events with fullenergy deposition, the limited resolution and signal-to-background achieved with the simple BP algorithm is only slightly improved with the SOE algorithm (1000 iterations). On the contrary, the analytical approach leads to a clear upgrade in terms of imaging resolution (see Fig. 3). For a quantitative comparison of the resolution provided by the different algorithms, the projections of the 2D image along the proton beam (Y) axis are presented for different PG energy selections in Fig. 4. For the case of the 12 C peak, the three algorithms reproduce the position of the PG emission maximum within 5 mm. In both cases ( 12 C and 4 main PG lines), the profiles extracted from BP and SOE algorithms start to deviate from the actual fall-off profile below the 80% of the maximum. On the contrary, the AA method is able to reproduce in a remarkable manner the sharp PG fall-off distribution. Moreover, the AA yields also the best reproduction of the PG depth distribution at shallow depths. The precision in the reconstructed PG emission profiles for the number of protons of clinical interest is discussed later in this work.
Besides the finite resolution of the detector and the performance of the imaging algorithm, the hard spectra of the PG lines of interest represents an additional challenge for the satisfactory reconstruction of the PG depth distributions. In 2-plane Compton cameras (CC) like i-TED, the large fraction of γ-ray events with an incomplete energy deposition leads to a deteriorated image reconstruction, as displayed in the central row of Fig. 3. This limitation has led to the development of 3-plane and multistage CC for applications dealing with high energy gamma-ray transitions 20,22,25,26 . These systems enable a more reliable determination of the γ-ray energy by means of threefold time-coincidences. Nevertheless, such an approach has a significant cost in detection efficiency, which hinders the feasibility of real-time range verification.
Aiming at keeping the advantage of the high efficiency in i-TED while selecting only full-energy events, an innovative approach based on machine learning (ML) identification of full-energy events has been developed in this work. This ML solution was coded in the TensorFlow deep-learning API 50 and it has allowed us to enhance the fraction of full-energy events in a factor ranging from 1.5-to-2 in the energy range of interest for PGI (see Fig. 13 in Methods for the details). As a consequence, to a large extent one can recover the resolution and signal-to-background ratio of an ideal Camera with only full-energy events (see Fig. 3).
The impact of the ML-aided image reconstruction is shown for the 1D profiles reconstructed with the analytical algorithm in Fig. 5. This figure shows that the reproduction of the position, width and fall-off of the PG emission profile are improved to some extent thanks to the ML solution (green) with respect to the original reconstruction (red) and getting closer to the ideal reconstruction with only full-energy Compton events (blue). In particular, the deviation between the reconstructed and the true (MC) position of the 4.4 MeV PG fall-off curve (left panel in Fig. 5), quantified at the 80% of the maximum, F 80% , is reduced from the 3.2 mm in the original reconstruction to only 1.3 mm after the ML selection is applied. As for the depth profile of the 4 main lines combined (right panel), the accuracy improvement in the reconstruction of the PG fall-off ( F 80% ) is more subtle, changing from 3 mm with all the events to 2.5 mm with the ML method.
The impact of the ML selection in the Compton images is more sizable for the images reconstructed with the faster SOE and BP algorithms, which do not reach the resolution of the aforediscussed AA. The 1D profiles obtained from the BP and SOE images of Fig. 3 are compared in Fig. 6  www.nature.com/scientificreports/  www.nature.com/scientificreports/ one can appreciate the noticeable enhancement in peak-to-background ratio, especially for the BP algorithm, related to the application of the ML-based full-energy selection (blue) when compared to the initially reconstructed profile (red).

Impact of neutron-induced backgrounds.
Neutrons are produced along with prompt gamma-rays in nuclear reactions during the proton therapy treatment. According to our MC simulations, the prompt neutron yield generated by 120 MeV protons in water is of 0.064 per incident proton, just 20% smaller than that of γ-rays in the energy region of interest (1-7 MeV). These neutrons are partially moderated before escaping the water phantom and may be captured in the detector sensitive volume or surrounding structural materials, leading to an important source of background.
The impact of the detector neutron sensitivity was studied by means of simulations of the same set-up and only 10 8 protons, in which the LaCl 3 crystals of i-TED were replaced by other inorganic scintillators or semiconductor detectors commonly used in existing or foreseen CC designs. These active detection materials are LYSO 16,31 , BGO 16,28,30 , CdZnTe 19,20,26,37 and LaBr 3 18,23,25,51 crystals. Ce:GAGG 22,24 has not been included in the study but its neutron sensitivity could be even higher due to the presence of Gd, featuring one of the largest known thermal cross sections 52 .  www.nature.com/scientificreports/ Figure 8 shows the time distribution of neutron events in the detector for the different active materials compared to the distribution of γ-ray events. LaCl 3 shows the smallest efficiency for the detection of prompt neutrons reaching the detector within few ns after the proton bunch, simultaneously with the prompt gamma-rays arising from proton interactions in the water phantom. On the other hand, BGO is the least sensitive material to slow neutrons reaching the detector 1-100µ s after the proton pulse.
The figure of merit to be studied is the fraction of neutron-and gamma-induced coincidences, displayed in the right panel of Figure 8. LYSO, a very promising crystal in terms of efficiency and time resolution 16 , shows  www.nature.com/scientificreports/ the highest sensitivity to neutrons, which represent 42% of the total counting rate. On the other side, apart from its limited energy resolution for Compton imaging, BGO seems the best solution due to its low sensitivity to neutrons. If the clinical accelerator pulse structure allows to set TOF selections of a few ns, as proposed in recent works 28,53 , the contribution of neutron background would be clearly suppressed for all the studied crystals. Our results indicate that after a TOF selection of e.g. 10 ns, LaCl 3 and BGO would have the lowest fraction (14%) of neutron coincidences or neutron sensitivity. A complete TOF separation of prompt gammas and neutrons is not feasible if one aims to maximize the γ-ray detection efficiency with a small detector-to-phantom distance. Hence, detectors with a low intrinsic sensitivity to neutrons, such as the LaCl 3 based i-TED modules, may represent a valuable solution.
Efficiency and in-vivo real-time PG imaging. The spatial resolution and low sensitivity to backgrounds attainable with the ML-aided i-TED detector are crucial for its applicability to PGI. Aiming at in-vivo real-time range verification, several additional aspects are of special relevance. The first key feature is the detection efficiency, which has to be high enough to reconstruct PG images with sufficient resolution in clinical conditions. Average clinical beam intensities are of the order of 1-2 nA (i.e. 6-12×10 9 p/s) 4 and the relevant clinical scenarios for a single pencil beam in beam-scanning proton RT correspond to the delivery of 10 8 to 10 9 protons 12,16,26 . Table 1 summarizes the detection efficiency for the proposed setup composed of four i-TED modules at 100 mm from the proton beam axis displayed in Fig. 1. The resulting efficiency, defined as the number of S&A coincidences per number of incident protons (as in Ref. 37 ), is compared for several energy ranges in add-back which comprise single or multiple prompt gamma lines. The impact of the distance between the S-and A-planes, which can be remotely adjusted for an optimum trade-off between efficiency and resolution has been studied. The latter approach refers to the electronic-dynamic collimation implemented in i-TED, which is described in detail in Ref. 40 . The proposed setup of four detectors at 10 cm may not be feasible in some clinical scenarios. In a general case, the efficiency of our imaging system would scale proportional to the number of Compton modules and inversely proportional to the square of the distance.
According to the results shown in Table 1, only 6% of the total coincident events are selected with an energy window around the 4.4 MeV peak. If the four main PG lines are combined, as in Fig. 3, this fraction increases to 16% of the total coincidences. Hence, in order to achieve sufficient statistics ( ∼10 4 events per image) for invivo range verification (i.e. <10 9 protons), the combination of several PG lines becomes mandatory. Figure 9 compares the 1D profiles of the images obtained with the AA algorithm and the ML-based full-energy selection corresponding to different number of protons and two different energy windows. On one hand, selecting the Table 1. Detection efficiency for S&A in time coincidence per incident proton combining the four i-TED modules of Fig 1. Each row corresponds to a different selection in deposited energy and each column shows the result for a different distance between the S-and A-planes. The uncertainties due to counting statistics are below 0.5%.  Table 2. The values in this table correspond to depth in water taking into account that the reference frame of the Compton images in this work is centered at a depth of 100 mm. The position of the reconstructed maxima (Max) has been determined from the center of the bin and the positions along the fall-off curve (F 90% , F 80% and F 50% ) have been linearly interpolated from the histograms in Fig. 9. The final reconstructed positions in Table 2 have been calculated as the average of two independent sets of simulated data and the uncertainties correspond to the standard deviation between the two independent results. According to the results of Table 2, deviations between the reconstructed positions and the actual PG distribution are in the range 0.5 mm to 5 mm for the profile selecting the energy window from 1 to 7 MeV. For the case of the 4 main peaks the agreement is in most cases better than 3 mm. As the number of protons decreases, the uncertainties tend to be slightly enhanced. The best precision is obtained for the reconstructed position of the 50% fall-off (F 50% ), which agrees with the real (MC) distribution within less than 1 mm in most cases (see Table. 2). On the other hand, the position of the PG maxima are systematically underestimated by at least 3 mm and have a lower accuracy related to the 2.5 mm bin size.
The computing-time performance of the relatively complex Compton imaging algorithms is also critical for the in-vivo range verification via PGI. In this work, three different 2D image reconstruction algorithms have been tested. The reconstruction times with the BP and SOE methods are perfectly compatible with real-time imaging (see Table 3). For the latter, good quality images containing a minimum of 2 ×10 4 coincident events are reconstructed in few seconds using a single-thread CPU calculation. As for the AA method, which yields the best image resolution (see Figs. 3 and 4), we used a GPU-accelerated CUDA 54 implementation of the code, which allows reconstructing an image in few tenths of seconds, 120 times faster than with a conventional single-thread CPU based approach. A similar acceleration has been reported for 3D position reconstruction algorithms in our previous work 34 .

Discussion and conclusion
i-TED is a Compton camera array that has been specifically designed for neutron-capture nuclear physics experiments. Several design aspects of i-TED, such as its high time resolution, high efficiency and relatively low neutron sensitivity may become of interest in order to address some of the current challenges in Prompt Gamma Imaging for range verification in proton therapy treatments. The results presented in this work show the prospects of i-TED concerning detection efficiency, sensitivity to the neutron background and spatial resolution attainable with a combination of ML-based full-energy selection method and state-of-the-art reconstruction algorithms.
This work has presented a MC study on the applicability of i-TED to range verification, where the attainable image resolution is one of the critical issues. The BP and SOE algorithms yielded a good reproduction of the PG maximum emission point in spite of their limited resolution. Much higher resolution images can be obtained with i-TED using the analytical approach of Ref. 46 , which provided the best reproduction of the distal fall-off for several selections of PG lines with an accuracy ranging from 1 to 3 mm. To achieve these results, a ML classifier, which improved the fraction of correct Compton events up to a factor 2, has been developed to partially compensate the deterioration of the images due to partial-energy events.
In comparison, the recent MC study by Yao et al. 37 , obtained accuracies within 2 mm for the same F 80% and F 50% magnitudes using also a combination of γ-ray lines. Moreover, our results are also at the level of previous works on MC simulations of Compton cameras applied to PGI 18,26,51 , and better than the 5 mm obtained in Ref. 20 or the 7 mm reported in Ref. 22 . Table 2. PG maximum (Max) and 90% (F 90% ), 80% (F 80% ) and 50% (F 50% ) fall-off positions corresponding to the profiles of Fig. 4 compared to the real PG emission profile in the MC simulations. The values are given in mm. The values in parentheses correspond to the standard deviation of independent data sets. www.nature.com/scientificreports/ The i-TED detector was developed for neutron-capture time-of-flight experiments, where a low sensitivity to neutrons is a crucial aspect. Neutrons are also among the main contributors to the secondary patient dose and to the background in PGI systems. Indeed, previous works have discussed the possible rejection of the neutron background in PGI systems through the application of time-of-flight selections 28,36,53 . However, aiming at a maximum detection efficiency, the detector-to-phantom distance has to be minimized and a satisfactory TOF separation of the prompt gammas and neutrons is not feasible (see Fig. 8). In this scenario, the use of scintillators with reduced neutron sensitivity, such as LaCl 3 or BGO, is the best solution to minimize the neutron-induced background. Still, LaCl 3 shows the advantage of the high energy resolution for Compton imaging.
The maximum attainable efficiency of the full i-TED setup (see Table 1) in the energy range of interest for PGI (1)(2)(3)(4)(5)(6)(7) is at the level of the most efficient existing or designed Compton cameras for PGI 37 , which reports an efficiency of 4.1×10 −4 per proton for deposited energies above 1 MeV. The absolute efficiency per emitted prompt γ-ray for the i-TED array, calculated as the average of the four main PG transitions, is in the range 0.9-1.6×10 −3 for the focal distances of Table 1. This value outperforms most of the CC for PGI developed to date, with a range of efficiencies between 10 −4 and 10 −8 per emitted PG 4 . The high efficiency for the proposed setup over the whole energy range of interest (1-7 MeV) allows to determine the position of the PG distribution falloff with an accuracy better than 3 mm for proton intensities as low as 10 8 , in the range of clinical interest 12,16,26 .
Following the promising prospects for range verification in proton therapy with i-TED presented in this work, we aim at testing this detection system in proton beam facilities, such as the 18 MeV cyclotron at CNA (Sevilla) and clinical beams. These tests will provide an experimental validation for the methods and results presented in this work and they will allow us to explore possible additional aspects that can be optimized for the present ion-range monitor application.

Methods
The i-TED Compton imager. This work has presented a MC study based on the i-TED array consisting of four high-efficiency Compton cameras 38,39 . This novel detection system is under development at Instituto de Física Corpuscular (IFIC) within the HYMNS-ERC project 55 . The first demonstrator has been already assembled, fully characterized 40 and applied to high-resolution neutron TOF experiments 39 .
Each of the i-TED Compton modules uses 5 position-sensitive detectors (PSDs) distributed in two parallel detection planes, Scatter (S) and Absorber (A), as shown in Fig. 10. Each PSD contains a LaCl 3 (Ce) monolithic crystal with a square-cuboid shape and a base surface of 50×50 mm 2 . The LaCl 3 is hygroscopic and thus it is encapsulated in an aluminum housing. Each crystal base is coupled to a 2 mm thick quartz window, which is optically joined to a silicon photomultiplier (SiPM) from SensL (ArrayJ-60035-64P-PCB). The photosensor features 8 × 8 pixels over a surface of 50×50 mm 2 . A 15 mm thick crystal is used for the PSD in the S-plane. Four 25 mm thick crystals are utilized for the PSDs placed in the A-plane (see Fig. 10). In total, 320 SiPM channels are biased and readout by means of front-end and processing PETsys TOFPET2 ASIC electronics 56 . In order to minimize gain shifts due to changes in the temperature of the experimental hall, every ASIC is thermally coupled to a refrigeration system composed by a Peltier cell, a heat-sink and a small-size fan (see Ref. 40 for further details).
The i-TED Compton modules embed the so-called dynamic electronic collimation technique 40 . This is accomplished by means of a linear positioning stage, that allows one to remotely vary the distance between the A-and S-planes, thereby optimizing performance for each specific application. Finally, the successful implementation of www.nature.com/scientificreports/ ASIC-based TOF-PET readout electronics for Compton imaging has led to a rather compact and cost-effective system, when compared to other Compton imagers 57,58 .

MC simulations of the proton beam and i-TED. The applicability of i-TED for range verification has
been studied by means of MC simulations using the Geant4 toolkit. The detailed geometry model of each of the i-TED detectors as implemented in Geant4 is shown in Figure 10. More details can be found in our previous work 39 .
The modelling in Geant4 of the physics processes occurring during the irradiation of a phantom with protons can be carried out with different models, so-called Physics Lists (PL) 59 . In this work we have tested several officially released PLs, which combine the Quark-Gluon-String model (QGSP) for the inelastic scattering of protons above ∼10 GeV, not relevant for the present study, with three different cascade models covering the energy below 10 GeV: the Liége Intranuclear Cascade model INCL++, the Bertini (BERT) or the Binary Cascade model (BIC). The resulting prompt gamma yields for the different PLs, presented in Fig. 11, indicate that INCL++ and BIC agree within 2% in the absolute PG yield while BERT leads to a 2.3 times higher production, in agreement with the clear overestimation reported in Ref. 48 . Moreover, the BERT model generates a continuum of γ -ray energies instead of the expected discrete spectra (see Fig. 11), as it has been reported in previous works 47 . Last, the QGSP_BIC_AllHP Physics List was tested. This PL includes the G4ParticleHP package, still under development, which uses nuclear data libraries instead of the default models for the transport of light charged particles 60 . In particular, the inelastic interaction of protons with 12 C and 16 0 up to 150 MeV are simulated using the ENDF/B-VII.1 61 cross section. This PL leads to a 32% smaller PG yield, a result which is in line with the reduction suggested by the benchmark of Pinto et al. 48 . However, the latter was discarded since it failed to produce individual γ-ray lines in the proton-induced inelastic reaction (see Fig. 11).
As for the neutron yields in the irradiated phantoms, all the studied PLs agree within 24%. For an accurate transport of neutrons below 20 MeV, neutron-induced reactions within Geant4 were simulated by means of the G4NeutronHP package 62 , using the G4NDL-4.6 data library (based on the JEFF-3.3 63 evaluated data file). This high-accuracy neutron transport package, not included in some of the previous works 20, 37 , has a significant impact in the slowing-down and partial capture of neutrons within the phantoms, and in the response of the detectors to neutrons. Indeed, the number of slow neutron events in the detectors would have been underestimated in up to a factor 5 if this package had not been included.
The final choice of PL was QGSP_INCLXX_HP since it includes an accurate modelling of neutron interactions and leads to the smallest PG yield among the PLs, which correctly generate discrete γ-ray transitions. In addition, the INCL++ model has shown the best reproduction of the neutron and γ-ray yields in the simulation of proton-induced reactions in other applications 43,64 . The right panel of Figure 11 shows the depth profile for the secondary emission of γ-rays and neutrons in water resulting from our MC simulations of the 120 MeV proton beam. In the case of the γ-rays, the largest production is due to proton-induced reactions (dotted line), which show a clear maximum at the end of the proton range. The much smaller neutron-induced production of γ-rays is shown with the a dotted-dashed line in the same figure. For the PG images obtained in this work, several energy cuts have been applied to select either the whole spectrum of Fig 2 between 1-7 MeV or the main single transitions.
A particular emphasis has been placed on the physical origin and time distributions of the γ-rays and neutrons escaping the phantom and reaching the i-TED detectors. This is of particular relevance for our study of the neutron sensitivity and the applicability of time-of-flight cuts. The left panel of Fig. 12 shows the time of arrival of γ -rays and neutrons to i-TED after being produced in a water phantom by 120 MeV protons. Three time structures can be identified in this figure, whose limits are indicated with dotted lines. The prompt particles, γ-rays in their majority, reach i-TED within the first 10 ns. A second component, which extends up to 1 ms after the proton bunch, contains both neutrons moderated in the phantom and γ-rays originated in neutron capture reactions, Compton imaging algorithms. Effective prompt gamma ray monitoring requires images with spatial resolutions down to a few mm and reconstruction times of the order of a few seconds, at most. Compton imaging uses as a fundamental ingredient the Compton scattering law and thus, most Compton cameras employ two layers, or more, of position sensitive detection planes. The direction of the scattered γ-ray is determined from the interaction position in both detection planes. In case that the scattered γ-ray deposits all its energy in the second detection plane, the incident γ-ray energy can be determined by adding-up the deposited energy in the first and second detection planes. The Compton imaging algorithms used in this work are based on this assumption, whose validity is extended by means of a supplementary machine learning classifier, that will be discussed in the following section. There exist many different imaging reconstruction algorithms, each of them with advantages and drawbacks. For instance, algorithms based on Maximum Likelihood Expectation Maximization 65,66 have shown excellent position reconstruction accuracy, but they require previous computation of the system response matrix. Since the aim of this work is to study the applicability of an imager like i-TED for clinical purposes, we have chosen algorithms which do not require any previous system response calculation. Those algorithms are listed as follow: • Fast back-projection (BP) 67 : This is the simplest approach, with rather limited spatial resolution, but also with the fastest reconstruction performance. Developed by Wilderman in 1998, the image of the individual γ-rays is obtained from the intersection of the back-projected Compton cones along the image plane, where the γ-ray source is located. The general quadratic curve from this intersection leads to a set of possible positions for the γ-ray source origin in the image plane. The final Compton image is made by the superposition of all the individual γ-ray images. While the algorithm is very fast, the spatial resolution is quite poor, when compared to other algorithms (see for example Fig 3). • Stochastic origin ensemble (SOE) 45 : The SOE algorithm for Compton imaging was developed originally by Andreyev in 2016. It is a Monte Carlo Markov chain Method based on the Metropolis-Hasting algorithm and does not require any forward or backward projections. The drawback of this algorithm is related to its iterative nature. The initial image, obtained from all the statistics available, is generated by random sampling of the possible positions of the γ-ray source within the intersection of Compton conical surfaces and the image plane. Once this first image is created, the iterative process begins by randomly choosing a γ-ray event.
A new position for this event is sorted in the image space constrained to its Compton conical surface. Then, this new position is either accepted or rejected with an acceptance A based on the ratio between the local density of γ-ray events in the new ( ′ i ) and old ( i ) positions: This is repeated for a number of times equivalent to the available statistics. This process corresponds to an iteration and it is repeated until a stationary situation is reached. The number of iterations required depends on the specific problem, such as the available statistics and the binning of the Compton image.
(1) A = min 1, where t is a unit vector into the projection space, ω min and ω max are the minimum and maximum Compton scattering angles included in the calculation, g( � t; cosω) is the projection data in the image space and k −1 ( � t, � p; cos(ω)) is the inversion kernel described as with H n given by N max is the maximum number of terms involved in the calculation, P n is the Legendre polynomial of order n and σ (cosω) is the Klein-Nishina Compton differential cross-section 68 . The drawback of this algorithm is the large computational cost required to reconstruct a Compton image. Because of this reason, H n was precomputed for a wide range of γ-ray energies and Compton scattering angles corresponding to the minimum and maximum values can be experimentally registered by the detection system. Thus, H n values were saved in a table format for its posterior use, saving a crucial time for quasi-real time imaging. The reconstruction time depends remarkably on the complexity of the involved algorithms and the optimization of their parameters such as the number of iterations for the SOE algorithm or the number of terms involved for AA reconstruction. Additionally, those parameters will impact the quality of the Compton image, introducing a mismatch between the reconstructed and the experimental spatial image resolution of the Compton imager; a limited number of iterations or terms in these imaging algorithms would lead to a poor spatial resolution, while an excessive number would introduce artifacts on the final reconstructed image.
In order to evaluate the best suitable algorithm for PG monitoring, a comparison benchmark using 20000 γ -ray events leading to interactions in both layers were used. The image plane of 200×200 mm 2 was pixelated into 200×200 pixels for all three algorithms. The results for the computing time obtained in this benchmark study are displayed in Table 3.
The critical parameters for each algorithm used in the benchmark are indicated in the third column of the same table. Their values correspond to the best compromise between resolution and computation speed found by trial and error. As the results indicate, the fastest algorithm is BP followed by SOE and AA, being a factor 100 slower than BP. However, given the superior spatial resolution of AA, both multi-threading and CUDA 54 implementations were used to make this algorithm time competitive with the others. Compared to the single-thread version, the multi-thread implementation, with a total of eight threads, has a speed factor of about 7.6. Despite the improvement, the time required to get the image is still not competitive for PG imaging with a factor between 17 and 30 times slower than the other algorithms. On the contrary, the CUDA version, with a speed factor of about 121 with respect to the singled-threaded version, reaches reconstruction times of 15 s, comparable to the SOE algorithm (14 s). It is worth to mention that the CUDA implementation was executed in modern NVIDIA GPUs: a NVIDIA GeForce RTX 2060 and NVIDIA GTX 1080 Ti with compatible execution times.
Machine-learning aided full-energy event selection. The i-TED detector, as any two-plane Compton camera, bases its working principle on a γ-ray totally depositing its energy in the absorber (A) plane after a Compton interaction in the scatter (S) layer. However, only a fraction of the coincidences, ranging from 47% for 1 MeV γ-rays to just 5% at energies around 7 MeV, satisfy this condition. The remaining fraction leads to a wrong reconstruction of the Compton angle and, as a consequence, an increase in the image background and a degradation of the spatial resolution. Aiming at improving the fraction of true Compton events, we have implemented a Machine-Learning algorithm for the identification of full-energy events.
( www.nature.com/scientificreports/ To train and test the ML algorithms in the discrimination of good Compton events where the γ-ray deposits all its energy between the S and A planes (full-energy), and those with partial energy escape (no-full-energy), we performed dedicated simulations of the response of the i-TED detectors to 5 ×10 10 γ-rays of energies homogeneously distributed between 200 keV and 7 MeV and spatially originated in a random position within a 20× 20×20 cm air cube separated 50 mm from the detector face, replicating the position of the phantom. For these ancillary simulations, we used the same physics models and distance between the S-and A-planes of i-TED as before. Each MC event (S&A coincidence) contains the same eight features determined with the detector in a real measurement: 3D coordinates of the γ-ray interactions in the two PSDs (6), energy deposited in the S-and A-planes (E s , E a ) (2). The energy and position resolutions of the detector were included as described before. Additionally, the Compton angle, calculated from the deposited energies, and its probability according to the Klein-Nishina formula 68 for a γ-ray energy E γ =E a +E s , were also included in the training to improve performance. The MC output was split into 14 energy intervals of add-back deposited energy between 200 keV and 7 MeV and the same number ( ∼10 6 ) of either kind of events were selected from the MC output for each energy interval. For each energy range we trained the same algorithm independently, aiming for the best accuracy and stability along the entire deposited energy range detected in add-back.
In this work, the performance of several state-of-the-art ML algorithms included in Scikit-learn Python module 69 was evaluated. The Scikit-learn algorithms evaluated were k-nearest neighbors, logistic regression, support vector classifier, Gaussian naive Bayes, random forest, AdaBoost and quadratic discriminant analysis. However, the best classification results in terms of accuracy were obtained for other two ML algorithms: Boosted decision trees implemented in XGBoost 70 and artificial neural networks implemented in Tensorflow 50,71 .
The XGBoost classifier was optimized using the Scikit-learn GridSearchCV method 69 . The best results were obtained using 140 trees with a depth of 4 and a gamma parameter of 0.1. The rest of parameters were set to default because of the minimal impact on the accuracy results. The classifier output is given by a probability number between 0 (no-full-energy event) and 1 (full-energy event). The discrimination threshold between classes was set to 0.5. The Tensorflow classifier model consists of a stack of four fully connected layers. In the first three layers, 256 units per layer using rectified linear activation functions. In the last layer, a single neuron with a sigmoid activation function was used with the purpose to provide a probability number between 0 and 1. The architecture and activation function was chosen based on the best performance in terms of accuracy and stability along the entire range of deposited energy in add-back. The loss function used for the minimization was binary cross-entropy, commonly used in this type of classification problem because of its best performance. As in the case of XGBoost, the identification threshold was set to 0.5.
XGboost and Tensorflow models have almost the same performance, being the latter slightly more stable for all the different deposited energy ranges where the training and test was performed. For this reason, only the model based on Tensorflow was used for the final results of this work. The accuracy of this algorithm was quantified on i-TED events from the simulations of the proton beam prior to its application to the image reconstruction. As shown in Fig. 13, the ML algorithm is able to correctly recognize 65 to 73% of the full-energy events among the prompt gamma events registered in i-TED. On the other hand, 35 to 40% of the non-peak γ -ray events are wrongly predicted as full-energy. As a combination of both results, this algorithm enhances the peak-to-background ratio by a factor of 1.5-2.1 (see right panel of Fig. 13). This classifier is also able to reject about 50% of the neutron events. The final peak-to-total gain factor, after the neutron events are included, ranges from 1.4 to 1.9 (see Fig. 13).
Received: 2 June 2021; Accepted: 20 January 2022 Figure 13. Left: Fraction of correctly identified full-energy events (dashed) and wrongly classified no-fullenergy (solid) as a function of the add-back deposited energy. The curve with squares correspond to the fraction of neutron events which are identified by the ML-classifier as good (full-energy) events. Right: gain factor of full-over no-full-energy events as a function of the add-back deposited energy. In both panels, the black and red lines represent, respectively, the results with only γ-ray events and after including the neutron events. www.nature.com/scientificreports/