## Introduction

The ability of extremely intense and brief femtosecond X-ray free-electron laser (XFEL) pulses to outrun radiation damage avoids the need to freeze (and thus immobilize) biological samples to minimize damage, as required in conventional protein crystallography1 or cryogenic electron microscopy (cryo-EM). For single particles, this enables the study of protein dynamics under near-physiological conditions at room temperature. The principle of outrunning damage by collecting diffraction data before the onset of the damaging photoelectron cascade was first established experimentally at the Free-Electron Laser in Hamburg (FLASH) facility in 20062 and is now routine in serial femtosecond crystallography3,4,5. Since the first aerosol single-particle imaging experiments at the FLASH6, the method of flash X-ray imaging has been applied to image living cells7, cell organelles8, and viruses9,10, in particular, the giant Mimivirus in two-dimensional (2D) projections11, as well as in full 3D12. Despite continual improvements in reconstruction algorithms, the number of reconstructed resolution elements across the sample remains at about a dozen voxels13,14,15. The main reasons for this limitation are the large dynamic range spanned by the diffracted intensities, going beyond the technical limits of current detector technology, as well as the weakness of the diffraction signal and the shot-to-shot variations in imaging conditions due to lateral distance between the sample and the X-ray focus (the impact parameter), background scattering, and detector response. Averaging over a very large number of single-particle snapshots is required to obtain sufficient information at high-resolution regions in diffraction space. This is necessary even for strongly scattering samples. Until now, this has been hampered by the low hit probabilities and the relatively low 120 Hz pulse repetition rate at XFEL facilities available to date.

The European XFEL (EuXFEL) introduces an era of high-intensity, high repetition-rate, and high data-rate XFELs by taking advantage of a superconducting linear accelerator16. The high repetition rate poses new challenges for sample injectors and X-ray detectors. Whenever the XFEL pulse hits a sample, it rapidly transforms it into a plasma. To fully exploit the high repetition rate, this plasma must not interfere with the delivery of the next particle, thereby ensuring that different pulses correspond to independent measurements from undamaged, intact objects. For serial crystallography at the EuXFEL, this has recently been shown to be possible17,18,19.

The first single-particle experiments at the EuXFEL were performed in December 2017 using the Single Particles, Clusters, and Biomolecules & Serial Femtosecond Crystallography (SPB/SFX) instrument20 with microfocus optics. The main goal of the experiment was to demonstrate single-particle imaging at the high intra­bunch repetition rate of the EuXFEL with the Adaptive Gain Integrated Pixel Detector (AGIPD)21.

In this article, we present the results of this experiment. We start by characterizing the background inherent to the instrument, which is a critical parameter for determining the maximum achievable resolution, as well as the signal-to-noise ratio (SNR) of the recorded patterns, instrumental stability, and the incident photon flux. We then size the particles corresponding to the patterns recorded while injecting viruses into the beam, confirming that a substantial fraction of the patterns corresponded to the expected particle size. Finally, we searched for any correlation or dependence among diffraction patterns obtained from the same pulse train. Overall, we show that single-particle imaging experiments can be performed at the megahertz intrabunch repetition rate of the EuXFEL.

## Results

### Overview of data collection

The experiment (EuXFEL proposal 2013) was performed over five 12-h shifts in December 2017. The X-ray beam, with a photon energy of 9.2 keV, was focused to a spot of 15 × 15 μm2. Data were recorded for 300 pulses per second, at an inter-train repetition rate of 1.1 MHz, during 376 experimental runs. Each run contained 30,000 pulses, corresponding to one thousand bunch trains, with each containing 30 pulses. In total, 11,255,800 frames were recorded with the MHz camera AGIPD, out of which 557,675 patterns were identified as hits or diffraction patterns from the target samples. The overall statistics of the measured data are summarized in Table 1.

A heavy-metal salt solution was used to align the beam and the injector. When a salt solution is aerosolized and focused by a gas dynamic virtual nozzle (GDVN) (see “Methods”), it forms a single-file stream of droplets. Water quickly evaporates from the droplets in a vacuum, resulting in amorphous salt spheres. In aerosol imaging experiments, a salt solution is convenient for detecting the X-ray beam since each droplet gives rise to a salt particle, thus leading to high hit rate. This contrasts with colloidal particles dispersed in a volatile medium, where many droplets may not contain particles or form any upon injection, leading to low a hit rate.

Diffraction from these spheres was simulated to determine the effect of experimental parameters such as the incident flow rate, particle size, and alignment on the diffraction patterns. A scattering model for spherical particles22 was fitted to the diffraction patterns for the iridium(III) chloride (IrCl3) samples (see Fig. 1a–c) captured in the third and fourth shifts, as described in “Methods”. We assumed that the density of amorphous IrCl3 particles formed in vacuum was close to its solid-state density of 5.3 g/cm3. Also, we assumed that, on average, each IrCl3 molecule is hydrated by three water molecules, resulting in a molar mass of 352.6 g/mol and a scattering factor of 149.6 electrons. We further assumed that radiation damage from the X-rays had a negligible effect on the low angle scattering we fitted. Particle sizes and incident beam fluences were obtained as described in the “Methods” and are shown in Fig. 2a–d.

The 2D distributions of particle sizes indicate that the particle size ranges from 80 to 800 nm in diameter (Fig. 2c, d) and show an upper limit of the fluence of the incident photons, independent of particle size (see Fig. 2a, b, green dashed line). This limit is the value of the fluence at the focus of the beam (Im), where it reaches a maximum. The lack of events in the upper-right corner of the distribution results from the small number of large particles in the measured set. Thus, we can only approximately estimate the upper limit of the flux at about 2.8 × 109 photons/μm2 during the third shift, and about 1.3 × 109 photons/μm2 during the fourth shift.

The lower fluence limit (Fig. 2a, b, red dashed line) depends on the particle size and corresponds to the sensitivity limit (Is) below which it was impossible to fit a spherical scattering model. The slope of the lower bound is −3 on the log-scale, matching the scaling of the signal for a given particle volume

$$I_s = I_m\left( {R_0^3/R^3} \right).$$

The line showing the limit of sensitivity crosses the line for the upper limit of the flux Im at a particle size R0. This value indicates the theoretical size limit of particles that can be distinguished for a given sample and set-up. These were 52 and 73 nm in the third and fourth shifts, respectively.

### Background characterization

The background scattering data were collected in the third shift, comprising 4000 images taken with an average pulse energy of 1.135 mJ, as measured by the X-ray gas monitor detector23, and 120,000 images with an average pulse energy of 1.477 mJ in the fourth shift.

In addition to the instrument background, we measured the background including any contributions from the gas used for sample delivery itself, known as injection background, by using the frames classified as nonhits, as described above. We calculated the average injection background for each shift, except for the third shift when the detector was moved. As a result, we calculated two separate background profiles.

The injection background, shown as a function of $$S = \frac{2}{\lambda }{\rm{sin}}\, \theta$$ (with θ half the scattering angle) in Fig. 3a, b, was averaged over 569,274 and 471,072 patterns with an average pulse energy of 1.276 and 1.539 mJ, respectively. The injection background barely exceeds the instrument background at low diffraction angles. The median background for all pixels of the detector was about 4 × 10−4 photons per pixel in both shifts.

The background fades rapidly, reaching 10−3 photons per pixel from S > 0.02 nm−1. The value of 10−3 photons per pixel is the limit of the statistical accuracy of background estimation, given the calibration of the AGIPD detector as available in this experiment (see “Methods”). At higher S, only stochastic fluctuations are observed.

### Variations in the position of diffraction pattern centers

The position of the diffraction pattern centers varies from pulse to pulse since each particle collides with the X-ray beam at a random point relative to the beam axis24. At these different interaction points, the beam has different phase shift values, that define the shift of the zero wavevector of the diffraction. The 2D histograms of the reconstructed centers of diffraction patterns scattered from spherical IrCl3 particles are shown in Fig. 4a, b. The diffraction pattern centers are given in horizontal (γh) and vertical (γv) angles of the beam deviation from the mean beam direction when measured from the interaction point.

The distribution during the third shift had an interquartile range (IQR) of 18 μrad along the horizontal axis and 20 μrad in the vertical direction. Overall, 90% of the diffraction pattern centers lie in the range of 50 and 59 μrad in the horizontal and vertical directions, respectively. During the fourth shift, the corresponding values of IQR were 18 and 22 μrad, and the corresponding ranges for 90% of the centers were 47 and 55 μrad. The fraction of centers inside the central pixel (see Fig. 4a, b, square shown in black dashed lines) is 91% and 94% for the third and fourth shifts, respectively.

### Signal versus background

The assembled and cropped diffraction pattern from a single hit of an IrCl3 particle is shown in Fig. 5a. The particle has an estimated diameter of 439 nm, which is close to the size of Mimivirus. The estimated incident photon fluence was 6.8 × 108 photons/μm2.

The measured pattern corresponds to the spherical model at small diffraction angles (see Fig. 5b). At scattering vectors above 0.054 nm−1 (red dashed line), the noise in one frame exceeds the amplitude of the spherical model, and fringes are not distinguishable, although the background when averaged across a large number of frames, is still an order of magnitude lower than the expected signal. The model approaches the injection background level at diffraction angles above 0.079 nm−1 (purple dashed line).

The radial average of the scattering intensities above the background (Fig. 6), when averaged across the different samples, also show the signal disappearing around 0.08 nm−1.

### Filtering virus images by the particle size

Scattering from Mimivirus particles was recorded in 154 runs, which produced a total of four million frames. A pixel where the signal was above one photon was considered to have detected photons, hereafter called a lit pixel. Frames, where the number of lit pixels was three standard deviations above the mean, were classified as hits and the rest as misses. This resulted in a set of 44,905 hit diffraction patterns, which were further processed.

The next step was to identify diffraction patterns produced by a single Mimivirus particle. In this work, we were only interested in single hit diffraction patterns as they can be immediately used to reconstruct the 3D Fourier space volume of the sample. To identify single hit diffraction patterns, we estimated the size of injected particles. A continuous wavelet transform (CWT)-based procedure was used, as described in the “Methods.” The distribution of images by the diameter of the particle is presented in Fig. 7a.

The particle diameter distribution (Fig. 7) is bimodal, with a maximum at the lower end of the detection range, which likely corresponds to aggregates of impurities25, and another one at around 500 nm, which coincides with the diameter of Mimivirus particles measured by cryo-EM26. In the case of multiple hits, this size is significantly larger, and for nonvirus particles, the size varies widely but is in general smaller than that of a Mimivirus.

In the distribution shown in Fig. 7a, we further selected the region of diameters from 400 to 600 nm (hatched area in Fig. 7a) and fitted it with a Gaussian distribution. We then discarded all images outside a one-sigma range and obtained a smaller subset of 11,308 diffraction patterns (see Fig. 7b). Relying on the fact that for these images we know the approximate particle size, we could use the last step of our CWT-based procedure (as described in “Methods”) to recalculate that size more precisely (see Fig. 7c). We applied the one standard deviation criterion again, producing the final set of 4335 images.

We randomly selected 1000 images from the initial set of 44,905 hits, and manually identified single hits among them to estimate the efficiency of our filtering. In total, 393 images were marked as single hits. Out of the selected 1000 images, 260 were part of the second set of 11,308 images with 185 of them having been marked as single hits. For the final set of 4335 images, these numbers are 86 and 76, respectively. From these numbers, we can estimate the 95% confidence intervals for the ratio of single hits to all hits for each set (see Table 2), using the normal approximation. For the initial set this ratio is 39 ± 3%, after the first step of filtering it becomes 71 ± 5%, and in our final set of about 4000 images 88 ± 7% are single hits.

### Independence of the pulses within one train

The small time interval between consecutive pulses of only around 1 µs in this experiment might have caused interference between adjacent pulses, e.g. due to the debris resulting from the preceding pulse. We investigated the distribution of incident photon fluences and particle sizes derived from spherical particles of IrCl3 for specific pulses within the trains (see Fig. 8a–d).

The distribution of particle sizes was different in the two shifts but remained stable over the pulses within a train. The incident photon fluences increased slightly throughout the first few pulses (up to five pulses), but then also remained stable up to the end of a train (Fig. 8c–d). This increase at the start of the pulse train agrees with the observed total pulse energy, as measured by the X-ray gas monitor detector of the instrument. The distributions of particle sizes for different pulses cannot be distinguished after taking into account the different incoming pulse energy, Fig. 7a–b. Therefore, we conclude that there was no correlation between pulse position in the train and particle size or incident fluence.

We also investigated the distribution of the number of patterns in one train, which could be fitted with the scattering model for spherical particles, hereafter called the number of fits. The details about when a fit was regarded as successful are described in the “Methods.” The frequency of fits is about the same for every pulse position in the train (see Fig. 8e).

For independent pulses, the distribution of trains by the number of fits in them should follow a mixture of binomial distributions with the estimated probability of “fit” events in individual runs equal to the fraction of successful fits in this train, the fit ratio,

$$G\left( k \right) = N\mathop {\sum}\limits_i {B\left( \!{k,30,\frac{{M_i}}{N}} \right),B\left( {k,n,p} \right) = \frac{{n!}}{{k!\left( {n - k} \right)!}}p^k\left( {1 - p} \right)^{n - k}} ,$$

where N is the number of frames in each run, Mi is the number of fits in the run i, k is the number of fits in a train, and i goes over runs.

A comparison of the expected distribution G(k) and the observed distribution is presented in Fig. 8f. The two distributions agree very well, which is consistent with the independence of pulses in a train.

To additionally confirm the hypothesis of pulse independence, we computed the correlation coefficients of the derived spherical model parameters for all pairs of successive pulses and found no significant correlations between any of them.

## Discussion

Coherent diffractive imaging requires a low-noise measurement of diffracted intensities from a sample. Even with the strong pulses available at XFELs, the number of diffracted photons from a single particle is relatively low due to the small scattering cross section of X-rays. However, the high repetition rate of the EuXFEL allows the collection of very large datasets that can be used to improve the SNR by averaging information from many diffraction patterns. An accurate estimate of the number of patterns required for a given resolution is not trivial given the large number of parameters that influence it, such as background noise and sample heterogeneity. Purely theoretical calculations fail to take these factors into account and give excessively optimistic numbers. Until now the resolution obtained from 3D reconstructions13,14,15 has not substantially surpassed that from single-shot imaging11, showing the need to obtain larger datasets to extend the resolution.

In this experiment, the selected photon energy (9.2 keV) was the only one available for the first round of experiments at the EuXFEL. The same applies to the focal spot, which was the smallest that could be achieved at that time. Ideally for this type of experiment one would prefer a lower energy, e.g. around 4 keV, to increase the number of scattered photons while still making it possible to achieve high resolution. The focal spot should also be smaller, more closely matching the sample size, increasing the photon flux leading to a stronger signal.

Background noise is an important determinant of the maximum resolution that can be achieved. The number of background photons per pixel in the first EuXFEL single-particle experiment compares favorably with previous experiments27 at the Coherent X-ray Imaging (CXI) instrument of the Linac Coherent Light Source (LCLS), although a quantitative comparison is difficult due to different experimental geometries. The detector is another critical component to achieve a low background, as it must be able to distinguish between electronic noise and real photons. The AGIPD detector demonstrated admirable performance, achieving a SNR of 7 and being able to record data at an intrabunch repetition rate of 1.1 MHz. Any instabilities in the instrument can lead to changes in the background making its removal much more difficult. Our measurements of the variation of the center of the diffraction patterns show an order of magnitude lower instability than similar measurements at the LCLS Atomic, Molecular and Optical Science instrument24 as well as at the CXI instrument27, and much smaller than one Shannon pixel. Such a small center variation, even if it cannot be corrected, can be safely ignored as it will not lead to any appreciable blurring of the assembled intensities.

The incident fluence on the sample is a key parameter for the success of single-particle imaging experiments. From the fits of the spherical patterns, we obtained a maximum beam fluence of about 2.8 × 109 photons/μm2. This number is consistent with what one would expect from our experimental conditions; a 1 mJ pulse, 9.2 keV beam focused to a 15 × 15 μm2 focal spot, resulting in 3.1 × 109 photons/μm2, assuming perfect transmission. The relatively low maximum intensity, when compared with other XFEL experiments8,27, is due to the initial larger temporary focus, which has since been upgraded.

The size estimates of the Mimivirus patterns show a peak around 500 nm (see Fig. 7), corresponding to the virus particles, and another one at the lower end of the detection range, below 200 nm. This second peak may be caused by contaminants in the solution which, combined with the large droplets created by the GDVN, can give rise to large aggregates25. The width of the peak around 500 nm is not due to the intrinsic variation of the viral particles size, but more likely due to contaminations deposited around the viral particles, as well as measurement errors. Using electrospray instead of GDVN for the formation of the aerosol could solve the contamination problems, due to the dramatically smaller droplet size it generates, and provide better quality diffraction data, enabling the technique to achieve higher resolution.

Statistical analysis shows that there are no correlations between pulses in the same train. The hit probability is also independent of the position of the pulse in the train or other hits in the same train. This clearly shows that any debris resulting from a hit leaves the interaction region before the next pulse arrives. It has been previously shown that for aerodynamic lenses, the main sample delivery instrument for X-ray single-particle imaging experiments, the particle speed increases with decreasing sample size28. This makes it likely that even at the maximum repetition rate of the EuXFEL, of 4.5 MHz, sub-100-nm particles should be able to vacate the interaction region in less than the minimum pulse spacing of 220 ns16, making the maximum rate usable for most samples of interest.

We presented an analysis of the first single-particle imaging experiment at the EuXFEL, performed when some of the functions planned for the SPB/SFX instrument were not yet available. However, the instrument proved to be very stable, and the measured background was low, which bodes well for future experiments. The measured photon flux in the interaction region matches what could be expected by taking into account the experimental conditions. The reduced focal spots achieved by the two Kirkpatrick-Baez mirror pairs, which have since been installed at the instrument29, should greatly improve the maximum flux, making future experiments with much smaller samples feasible. Measurements of smaller samples, however, would require changing injection from GDVN to electrospray, to avoid contamination due to the large droplets6,25,30.

Despite the limitations in the available experimental parameters, in particular, focal spot, and wavelength, we were able to conclusively demonstrate that it is possible to perform single-particle imaging at a megahertz repetition rate without any measurable difference between isolated and consecutive hits. This paves the path for high repetition-rate and high data-rate single-particle imaging at XFELs.

## Methods

### Sample preparation

An iridium(III) chloride hydrate (Sigma-Aldrich, purity 99.9%) solution at volume concentrations of 0.1% was used for the first five runs, and at a concentration 1% for the remaining runs. A solution of cesium iodide (Sigma-Aldrich, purity 99.9%) at a volume concentration of 1% was used for all respective runs. Melbourne and Mimivirus were both prepared following the protocol described in ref. 31, after which they were ultracentrifuged in sucrose gradient supplemented with 2.5% (v/v) glutaraldehyde to fixate them to fulfill the biosafety requirements of the EuXFEL. The fixed viruses were dialyzed five times in 250 mM ammonium acetate, pH 7.5 to remove the sucrose as completely as possible. Melbourne virus was used at a concentration of 1010 particles/ml in shift 4 and 2 × 1010 particles/ml in the final shift. Mimivirus was used at a concentration of 1011 particles/ml in the first 11 runs of shift 3, at 3 × 1011 particles/ml for the rest of shift 3 and the first three runs of shift 4, at 1012 particles/ml for the following eight runs in shift 4, at 2 × 1011 particles/ml for the next 42 runs in shift 4, at 1011 particles/ml for the final 47 runs in shift 4.

### Sample delivery

The samples were aerosolized using a GDVN and focused on the interaction region as described in ref. 28.

### Experimental set-up at the SPB/SFX instrument

The data were collected at the SPB/SFX instrument of the EuXFEL in December 2017, under the proposal p2013. The accelerator produced ten evenly spaced bunch trains per second with 30 X-ray pulses per bunch train at an intra-train repetition rate of 1.125 MHz, giving a separation between pulses of about 0.89 μs. The photon energy was 9.2 keV and the pulse energy, as measured by the gas monitor detector upstream, was around 1.5 mJ corresponding to about 1012 photons. The beam was focused by beryllium compound refractive lens (CRL) and the focus size was estimated to be 15 μm in diameter. The AGIPD 1 M detector21,32,33 was placed 5.465 m downstream from the interaction region. Online data analysis was done with Hummingbird34, through the Karabo bridge35.

Beamline background on AGIPD was minimized using a three-slit collimation system as described in ref. 36. Beam-defining “power” slits made out of B4C were positioned close to the CRL on the downstream side. Further downstream, a set of antiscattering slits, made from a tantalum–tungsten alloy, was used to clean up the stray light from the upstream optics. Finally, a set of germanium guard slits was positioned far downstream, close to the sample position, in order to remove the secondary scattering produced by the antiscattering slits. For all three slits, the gap was carefully adjusted, with micrometer accuracy, such that the slits received no direct beam while still maximizing the stray light reduction.

### Detector characterization

The AGIPD 1 M detector32,33 contains 16 panels, each containing 64k pixels. The detector can record a signal from individual pulses in the bunch train, storing the data from each pulse into a separate memory cell on the chip. This results in variations of the detector response not only from one pixel to another but also between different memory cells of the same pixel.

The detector allows single-photon counting at 9.2 keV photon energy. We analyzed intensity histograms for each pixel and memory cell over all of the collected experimental data (see Fig. 9a–c). These histograms showed that the one-photon peak (located at μ1) was well separated from the zero-photon peak (located at baseline μ0). The baseline (μ0) and noise (σ0) for each memory cell of each pixel were calculated as a mean and a standard deviation of the dark signal. The gain (μ1μ0) was determined from the difference between the first two peaks of pixel–cell intensity histogram.

A 2D histogram of the data by gain and noise is shown in Fig. 9c, and it shows a linear dependence between these parameters. The slope of the linear regression is equal to 7 and corresponds to the average SNR of the detector. The distribution of all SNR values is shown in Fig. 9b and has an IQR of 0.6.

Only a small fraction of pixels had statistics sufficient to determine the one-photon peak (at least about 100 events at the one-photon peak). For the remaining pixels, to improve statistics we used histograms built using all memory cells of the same pixel. If the histogram-based grouping by the memory cells was still insufficient, we binned together blocks of 8 × 8 pixels to build a common histogram.

In cases when the single gain (g′) parameter was determined for the group of memory cells or pixels by the combined histograms, the individual cell–pixel gain parameters were determined by multiplying g′ on $$\sigma _i/\left( {\mathop {\sum}\nolimits_i {\sigma _i^2} } \right)^{1/2}$$, where the summation is carried out over cell–pixels in the group.

Pixels with the noise (σ) or the baseline (μ0) values outside of a 3.5 standard deviations interval and with the gain (μ1μ0) outside of 4 standard deviations interval in the distributions of corresponding values over the detector panels were marked as bad pixels.

### Hit/nonhit images classification

We used a lit pixel counter8 to split frames into two classes: nonhits were frames with background scattering, and hits were frames with scattering from a sample.

In each frame, we calculated the number of lit pixels that record a signal of more than 45 analog-to-digital units above the baseline (~0.7 of the one-photon signal). For each run, the histogram of lit pixel counts was fitted with a Gaussian function. The value equal to 2.5 standard deviations above the mean of the fitted Gaussian was set as a threshold for the hits in this particular run. Frames with the number of lit pixels below the threshold were classified as nonhits. If we had a true Gaussian distribution of lit pixels in the set of frames only with background scattering, then we would expect about 150 (~0.5%) false positive hits per run using this value of the threshold.

### Model of scattering from spheres

The scattered intensity from a sphere of diameter R, placed in the beam with incident photon fluence I0 at the scattering vector S is given by

$$I\left( {S,R,I^0} \right) = I^0\left( {r_{\mathrm{e}}\frac{{\pi R^3}}{6}n} \right)^2{\it{\Delta \Phi }}\left[ {3\frac{{j_1\left( {\pi SR} \right)}}{{\pi SR}}} \right]^2,$$

where n is the density of electrons, re is the classical electron radius, ΔΦ is the solid angle and j1 is the spherical Bessel function of the first kind.

The length of the scattering vector Si related to the i-th pixel with coordinates (xi, yi) on the detector at the distance L from the scattering point is

$$S_i = \frac{2}{\lambda }{\mathrm{sin}}\, \theta _i = \frac{{\sqrt {2 - 2c_i} }}{\lambda },\;c_i = {\mathrm{cos}}\, 2\theta _i = \frac{L}{{\sqrt {L^2 + r_i^2} }},\;r_i = \sqrt {\left( {x_i - x} \right)^2\, +\, \left( {y_i - y} \right)^2} ,$$

where x, y are the coordinates of the diffraction pattern center, λ is the wavelength, 2θi is the angle between the beam direction and the direction to the pixel i.

The solid angle of i-th pixel is

$${\it{\Delta \Phi }}_i = \frac{A}{{L^2}}c_i^3,$$

where A is an area of a pixel.

The measured diffraction νi at pixel i is a result of the combination of Poisson and Gaussian statistics

$$v_i = P\left( {I_i + b_i} \right) + N\left( {0,\sigma _i^2} \right),$$

where σi is the instrumental error at the pixel i, estimated by the processing of the dark run, and b0 is the averaged background scattering.

One diffraction pattern consists of N pixels with successfully measured diffraction

$$X = \left\{ {x_i,y_i,v_i,\sigma _i,b_i} \right\},\;i = 1 \ldots N.$$

### Fitting the sphere scattering model to experimental patterns

The following procedure was used for model-based interpretation of the experimental diffraction pattern X. First we found a rough estimate of the center (x, y) of the diffraction pattern averaged over several strongest patterns using the Hough transform37,38. Then we made a rough estimate of the diameter R of the particle and the incident photon fluence I0 by a least-squares fit of the scattering from the spherical model to the measured radially averaged diffraction intensity. We then selected the interpretable images according to χ2 value of the fit. Finally, all parameters (x, y, R, I0) were refined using maximum likelihood given the measured intensities (νi). In contrast to the initial rough estimate of R and I0done before; here, we also refine the center of the diffraction pattern.

### Refinement of parameters with likelihood maximization

Here, we approximate the Poisson distribution with the Normal distribution. Then the likelihood may be written as

$${\cal{L}}\left( {\theta |X} \right) = \mathop {\prod}\limits_{i = 1}^N {\frac{1}{{\sqrt {2\pi \left( {I_i + \sigma _i^2} \right)} }}{\mathrm{exp}}\left( { - \frac{{\left( {I_i + b_i - v_i} \right)^2}}{{2\left( {I_i + \sigma _i^2} \right)}}} \right)} .$$

Take a logarithm

$$l\left( {\theta |X} \right) = - \frac{1}{N}{\mathrm{log}}\, {\cal{L}}\left( {\theta |X} \right) = \, \frac{{{\mathrm{log}}\left( {2\pi } \right)}}{2} + \frac{1}{{2N}}\mathop {\sum}\limits_{i = 1}^N {\mathrm{log}}\left( {I_i + \sigma _i^2} \right) \\ \,\,\,\,\,\,+ \frac{1}{{2N}}\mathop {\sum}\limits_{i = 1}^N {\frac{{\left( {I_i + b_i - v_i} \right)^2}}{{\left( {I_i + \sigma _i^2} \right)}}}.$$

The optimal parameters correspond to the minimum of l

$$\theta = \left( {R,I^0,x,y} \right) = \arg\,\min l\left( {\theta |X} \right).$$

The goodness of fit was estimated as

$$\chi ^2 = \frac{1}{N}\mathop {\sum}\limits_{i = 1}^N {\frac{{\left( {I_i - v_i} \right)^2}}{{I_i + \sigma _i^2}}} .$$

The fitting was regarded as successful if the first- and the second-order optimality conditions were met and the goodness of fit (χ2) was less than a predefined tolerance

$$\left| {\left| {\frac{{\partial \theta }}{{\partial X}}} \right|} \right|\, <\, \varepsilon ,\;H = \frac{{\partial ^2l}}{{\partial \theta \partial \theta^{\prime} }}\;{\mathrm{is}}\;{\mathrm{positive}}\;{\mathrm{defined}},\chi ^2\, <\, \zeta ,$$

where ε and ζ are predefined tolerance. We used ε = 10−6 and ζ = 1.1.

### Fast determination of particle size by the CWT

To estimate the size of the scattering particle for each diffraction pattern we used the spherical particle model. A centered diffraction pattern is converted to its radial average which is then compared to the diffraction pattern of a uniform sphere. To account for an unknown background signal present in experimental data, the experimental and theoretical spherical diffraction functions were only compared at the positions of their maxima.

To find peaks in noisy experimental radial average, we used a CWT-based peak detection algorithm39. We used, scaled and translated the second peak of the spherical form factor as our wavelet, which has produced better results than the commonly used Ricker wavelet.

To estimate the diameter of the particle, we used three passes of this CWT procedure. The first pass was tuned to identify images for which the diameter was too small (<300 nm); these images were discarded. The second pass was used to estimate the diameter of larger particles with a diameter between 300 and 800 nm. In both cases, we estimated the diameter using the average distance between neighboring maxima, relying on the fact that for spherical form factor this distance is very close to π/r.

The third pass was used to refine the initially determined approximate value of the particle diameter. We used the positions of the first three peaks in the spherical scattering function to refine the particle size using least-squares minimization.

$$\frac{{X_i}}{r} + c,$$

where Xi is a position of i-th order maximum of spherical form factor with 1 nm radius and c is an arbitrary constant shift introduced to account for imprecise determination of the center of the diffraction and for the fact that experimental particles are not perfectly spherical. In this way, in addition to the particle diameter, we obtain two more values—the shift of the beam center and the mean square error of the fit. Both these values are used to estimate the reliability of the obtained parameters.