## Background & Summary

Fluctuation X-ray scattering (FXS) studies extend traditional small angle X-ray scattering (SAXS) methods by using X-ray snapshot with exposure times so short that the ensemble of illuminated particles can be well approximated as frozen in time and space. The resulting scattering patterns are no longer angularly isotropic, but instead exhibit small intensity fluctuations around the mean SAXS intensity1,2. Angular correlations of these intensity fluctuations can be directly related to the underlying molecular structure of the sample, providing much more information than traditional 1D SAXS curves3. The correlation function ${C}_{2}\left(q,q\text{'},\mathrm{\Delta }\varphi \right)$ is defined as

$\begin{array}{}\text{(1)}& {C}_{2}\left(q,q\prime ,\mathrm{\Delta }\varphi \right)=\frac{1}{2\pi N}\sum _{j=1}^{N}{\int }_{0}^{2\pi }{I}_{j}\left(q,\varphi \right){I}_{j}\left(q\text{'},\varphi +\mathrm{\Delta }\varphi \right)d\varphi ,\end{array}$

where N is the total number of diffraction patterns, q and q' are the magnitudes of the scattering vectors (in inverse resolution), and ϕ and ϕ + Δϕ are the corresponding angular coordinates describing the intensity Ij(q, ϕ) of the jth scattering pattern recorded on the detector. The correlation function ${C}_{2}\left(q,q\text{'},\mathrm{\Delta \Delta }\varphi \right)$ can be written as a Legendre series:

$\begin{array}{}\text{(2)}& {C}_{2}\left(q,q\text{'},\mathrm{\Delta }\varphi \right)=\sum _{l=0,2,\dots }^{\infty }{k}_{l}{B}_{l}\left(q,q\text{'}\right){P}_{l}\left(\mathrm{cos}{\theta }_{q}\mathrm{cos}{\theta }_{q\text{'}}+\mathrm{sin}{\theta }_{q}\mathrm{sin}{\theta }_{q\text{'}}\mathrm{cos}\mathrm{\Delta }\varphi \right),\end{array}$

where Pl is the Legendre polynomial of order l and ${\theta }_{q}=\mathrm{arccos}\left(q\lambda /4\pi \right)$, λ is the wavelength of the incident X-rays, and ki is a scale factor equal to the number of particles in the beam for l > 0, and equal to its square for l=0. The expansion coefficients Bl(q, q′) are in turn related to the spherical harmonic expansion coefficients Ilm(q) of the 3D intensity scattering volume I(q) of the scattering particle, where q=(q, θ, ϕ):

$\begin{array}{}\text{(3)}& {B}_{l}\left(q,q\prime \right)=\sum _{m=-l}^{l}{{I}^{*}}_{lm}\left(q\right){I}_{lm}^{*}\left(q\prime \right),\end{array}$

where, in polar coordinates,

$\begin{array}{}\text{(4)}& I\left(q,\theta ,\varphi \right)=\sum _{l=0}^{\infty }\sum _{m=-l}^{l}{I}_{lm}\left(q\right){Y}_{lm}\left(\theta ,\varphi \right).\end{array}$

The intensity function is equal to the square of the modulus of the Fourier transform of the real-space object ρ(r) under investigation:

$\begin{array}{}\text{(5)}& I\left(\mathbit{q}\right)={|\mathrm{ℱ}\left[\rho \left(\mathbit{r}\right)\right]|}^{2},\end{array}$

where denotes the Fourier transform1.

Prior work in fluctuation scattering from biological samples has demonstrated that high-quality correlation data can be obtained from single-particle diffraction data4 and can be used for ab initio structure determination using the multi-tiered iterative phasing algorithm5. Although previous work has shown that the signal to noise ratio (SNR) of such data is independent of the number of particles per shot5 when the particles are in a vacuum, the relationship between number of particles and SNR in the presence of large buffer and detector backgrounds has not yet been studied. In this communication, we describe unprocessed, experimental multi-particle scattering data from which an FXS correlation data set can be derived. The data, obtained at the Atomic, Molecular and Optical (AMO) instrument at the Linac Coherent Light Source6,7, consist of close to 60 000 high quality scattering images of the Paramecium bursaria Chlorella virus 1 (PBCV-1, ~190 nm in diameter8) and 30 000 scattering images of the sample buffer. The images presented here provide the community with experimental data on which algorithms for processing fluctuation scattering data and structure solution can be tested. The data are deposited at the CXIDB9 in the form of hdf5 and xtc files.

## Methods

### Sample Preparation, Sample Delivery and Data collection

A batch of Paramecium Bursaria Chlorella virus 1 sample was prepared as described previously10. Here we used 1% triton instead of Nonidet and centrifuged the virus sample at 20,000 rpm in an ultracentrifuge. The pure virus sample was dialyzed against 50 mM 4-methylmorpholine11. The quality of the FXS data was gauged by comparing the derived SAXS data against a reference Small Angle Scattering curve obtained at the cSAXS beamline at the Swiss Light Source, at an energy of 11 keV. The sample used for FXS data collection was diluted with buffer to a concentration of approximately 5 × 1011 particles per ml.

The multi-particle FXS scattering data were collected at the Atomic, Molecular and Optical (AMO) instrument at the LCLS6,7. The experiment was performed in the CFEL-ASG Multi-Purpose chamber (CAMP)12. The PBCV-1 solution described above was injected into the XFEL interaction region as a microjet of approximately 5 μm diameter, using a gas dynamic virtual nozzle (GDVN) injector13,14 at a flow rate of ~20 μl/min (Fig. 1). The diffraction data were collected in the water window, using a photon energy of 514 eV, an electron bunch length (pulse length) of 100 fs and a repetition rate of 120 Hz. The average number of photon per pulse was 1012. The focus size was approximately 25 μm2 (FWHM). Diffraction patterns were collected on two pairs of p-n junction charge-couple device (pnCCD) detectors15 read out at 120 Hz. The front and back panels of the detector consisted of two pairs of 1024 × 512 arrays of square 75 μm× 75 μm pixels. The front panels were placed 224 mm from the interaction region, separated by a horizontal gap of 23 mm. The back detector was placed 741 mm from the sample/XFEL interaction zone, with a horizontal gap of 1.73 mm (Table 1). The maximum resolution achievable under these conditions is 14.3 nm (8.9 nm) at the edge (corner) of the front and 46.8 nm (32.7 nm) at the edge (corner) of the back detector. The X-ray scattering patterns and associated metadata were stored as xtc files. These diffraction images were pre-processed using the CFEL-ASG Software Suite (CASS)16. The images were corrected for dark current; pixels systematically producing outlying intensity values were flagged. The resulting data was cast in larger arrays to include the detector gaps. The detector halves were placed roughly symmetrically around the X-ray beam. The back detector, with gaps included, is contained in an array of 1024 by 1047, with the mean beam center located at (506, 526). The front detector (with gaps included) is contained in an 1331 by 1031 array, with a mean beam center of (657, 552). The back detector was at gain mode 1, corresponding to 1250 ADU per 1keV photon. The gain mode of the front pnCCD was 4, corresponding to 78 ADU per 1keV photon. The resulting arrays are stored in an hdf5 file and are deposited in the CXIDB with accession number 79 (Data Citation 1). A Globus end-point is available for high-speed data-transfer.

From the concentration, the jet diameter and the focus size the particle count per exposure is estimated to 60 particles per shot. Given that the focussed X-ray beam has extended tails beyond the focal limit, there is an uncertainty that likely places the true particle count somewhat higher. A conservative estimate of the bounds of the particle count of 50 to 200 is proposed.

### Pattern Selection

Due to liquid jet and X-ray beam instabilities, not all scattering patterns collected are of sufficient quality for correlation analyses. The set of diffractions patterns of the sample collected contain, besides multi-particle hits, a set of blanks, where no virus was intersected by the XFEL beam, as well as images characterized by a very high total scattered intensity and extensive intensity streaks where the X-rays hit the edge of the jet or part of the liquid-jet nozzle, Fig. 2. Similar observations were made for the buffer run. A selection of patterns was made on the basis of the total integrated intensity on the back panel. A histogram analysis of the integrated intensity reveals a bimodal distribution, with high quality patterns occurring around the most-populated mode of the distribution.

### Code Availability

CASS is publicly available on github (https://gitlab.gwdg.de/p.lfoucar/cass). The reading of xtc files is supported by the psana libraries distributed by the LCLS (https://stanford.io/2lhTEwT).

## Data Records

Four individual datasets have been deposited on the CXIDB website (Data Citation 1). The deposited data consists of the raw xtc file of the experimental PBCV-1 and buffer scattering data. Selected patterns, pre-processed with CASS, involving dark-current subtraction and common mode corrections, of PBCV-1 and buffer have been deposited also in separate hdf5 files for the back and front pnCCD detectors. The xtc files contain close to 100 000 scattering patterns, whereas the selected pre-processed files contain close to 60 000 patterns for PBCV-1 and 30 000 patterns for the buffer. A reference buffer subtracted SAXS data set from PBCV-1 at the same concentration, collected at 11 keV using the CSAXS beamline at the Swiss Light Source has been deposited as well. The data records are summarized in Table 2.

## Technical Validation

The quality of the data can be assessed by the mean intensity as a function of resolution, Fig. 3. The SAXS curve was obtained by angular integration of the images after masking out the strong jet scattering streaks seen in the diffraction patterns, Fig. 2c. A mask covering the jet streak included in the deposited hdf5 files. The experimental SAXS data were fitted in the low-q region (up to 0.015 Å−1) with the theoretical scattering curve of a hard sphere with a diameter of 174 nm. Given that the diameter of an icosahedron is 17% larger than the sphere that touches the midpoint of each vertex17, the hard sphere model derived here would correspond to a maximum particle dimension of little over 200 nm, consistent with the available model8. The analyses of the reference data collected at the Swiss Light Source can be modelled (at low q) with a hard sphere with a radius of 168 nm, corresponding to an icosahedron with diameter of 197 nm. The difference in estimated size between the reference data and the curve obtained from AMO can be ascribed to changes in relative contrast at lower X-ray energies, the effects of radiation damage on the sample at synchrotron sources, variations in sample preparations or concentration effects on the shape of the low q data.