Free-electron laser data for multiple-particle fluctuation scattering analysis

Fluctuation X-ray scattering (FXS) is an emerging experimental technique in which solution scattering data are collected using X-ray exposures below rotational diffusion times, resulting in angularly anisotropic X-ray snapshots that provide several orders of magnitude more information than traditional solution scattering data. Such experiments can be performed using the ultrashort X-ray pulses provided by a free-electron laser source, allowing one to collect a large number of diffraction patterns in a relatively short time. Here, we describe a test data set for FXS, obtained at the Linac Coherent Light Source, consisting of close to 100 000 multi-particle diffraction patterns originating from approximately 50 to 200 Paramecium Bursaria Chlorella virus particles per snapshot. In addition to the raw data, a selection of high-quality pre-processed diffraction patterns and a reference SAXS profile are provided.


Background & Summary
Fluctuation X-ray scattering (FXS) studies extend traditional small angle X-ray scattering (SAXS) methods by using X-ray snapshot with exposure times so short that the ensemble of illuminated particles can be well approximated as frozen in time and space. The resulting scattering patterns are no longer angularly isotropic, but instead exhibit small intensity fluctuations around the mean SAXS intensity 1,2 . Angular correlations of these intensity fluctuations can be directly related to the underlying molecular structure of the sample, providing much more information than traditional 1D SAXS curves 3 . The correlation function C 2 q; q 0 ; Δϕ ð Þis defined as where N is the total number of diffraction patterns, q and q' are the magnitudes of the scattering vectors (in inverse resolution), and ϕ and ϕ + Δϕ are the corresponding angular coordinates describing the intensity I j (q, ϕ) of the j th scattering pattern recorded on the detector. The correlation function C 2 q; q 0 ; ΔΔϕ ð Þcan be written as a Legendre series: where P l is the Legendre polynomial of order l and θ q ¼ arccos qλ=4π ð Þ, λ is the wavelength of the incident X-rays, and k i is a scale factor equal to the number of particles in the beam for l > 0, and equal to its square for l = 0. The expansion coefficients B l (q, q′) are in turn related to the spherical harmonic expansion coefficients I lm (q) of the 3D intensity scattering volume I(q) of the scattering particle, where q = (q, θ, ϕ): where, in polar coordinates, The intensity function is equal to the square of the modulus of the Fourier transform of the real-space object ρ(r) under investigation: where F denotes the Fourier transform 1 . Prior work in fluctuation scattering from biological samples has demonstrated that high-quality correlation data can be obtained from single-particle diffraction data 4 and can be used for ab initio structure determination using the multi-tiered iterative phasing algorithm 5 . Although previous work has shown that the signal to noise ratio (SNR) of such data is independent of the number of particles per shot 5 when the particles are in a vacuum, the relationship between number of particles and SNR in the presence of large buffer and detector backgrounds has not yet been studied. In this communication, we describe unprocessed, experimental multi-particle scattering data from which an FXS correlation data set can be derived. The data, obtained at the Atomic, Molecular and Optical (AMO) instrument at the Linac Coherent Light Source 6,7 , consist of close to 60 000 high quality scattering images of the Paramecium bursaria Chlorella virus 1 (PBCV-1,~190 nm in diameter 8 ) and 30 000 scattering images of the sample buffer. The images presented here provide the community with experimental data on which algorithms for processing fluctuation scattering data and structure solution can be tested. The data are deposited at the CXIDB 9 in the form of hdf5 and xtc files.

Sample Preparation, Sample Delivery and Data collection
A batch of Paramecium Bursaria Chlorella virus 1 sample was prepared as described previously 10 . Here we used 1% triton instead of Nonidet and centrifuged the virus sample at 20,000 rpm in an ultracentrifuge. The pure virus sample was dialyzed against 50 mM 4-methylmorpholine 11 . The quality of the FXS data was gauged by comparing the derived SAXS data against a reference Small Angle Scattering curve obtained at the cSAXS beamline at the Swiss Light Source, at an energy of 11 keV. The sample used for FXS data collection was diluted with buffer to a concentration of approximately 5 × 10 11 particles per ml.
The multi-particle FXS scattering data were collected at the Atomic, Molecular and Optical (AMO) instrument at the LCLS 6,7 . The experiment was performed in the CFEL-ASG Multi-Purpose chamber (CAMP) 12 . The PBCV-1 solution described above was injected into the XFEL interaction region as a microjet of approximately 5 μm diameter, using a gas dynamic virtual nozzle (GDVN) injector 13,14 at a flow rate of~20 μl/min (Fig. 1). The diffraction data were collected in the water window, using a photon energy of 514 eV, an electron bunch length (pulse length) of 100 fs and a repetition rate of 120 Hz. The average number of photon per pulse was 10 12 . The focus size was approximately 25 μm 2 (FWHM). Diffraction patterns were collected on two pairs of p-n junction charge-couple device (pnCCD) detectors 15 read out at 120 Hz. The front and back panels of the detector consisted of two pairs of 1024 × 512 arrays of square 75 μm × 75 μm pixels. The front panels were placed 224 mm from the interaction region, separated by a horizontal gap of 23 mm. The back detector was placed 741 mm from the sample/XFEL interaction zone, with a horizontal gap of 1.73 mm ( Table 1). The maximum resolution achievable under these conditions is 14.3 nm (8.9 nm) at the edge (corner) of the front and 46.8 nm (32.7 nm) at the edge (corner) of the back detector. The X-ray scattering patterns and associated metadata were stored as xtc files. These diffraction images were pre-processed using the CFEL-ASG Software Suite (CASS) 16 . The images were corrected for dark current; pixels systematically producing outlying intensity values were flagged. The resulting data was cast in larger arrays to include the detector gaps. The detector halves were placed roughly symmetrically around the X-ray beam. The back detector, with gaps included, is contained in an array of 1024 by 1047, with the mean beam center located at (506, 526). The front detector (with gaps included) is contained in an 1331 by 1031 array, with a mean beam center of (657, 552). The back detector was at gain mode 1, corresponding to 1250 ADU per 1keV photon. The gain mode of the front pnCCD was 4, corresponding to 78 ADU per 1keV photon. The resulting arrays are stored in an hdf5 file and are deposited in the CXIDB with accession number 79 (Data Citation 1). A Globus end-point is available for high-speed data-transfer.
From the concentration, the jet diameter and the focus size the particle count per exposure is estimated to 60 particles per shot. Given that the focussed X-ray beam has extended tails beyond the focal limit, there is an uncertainty that likely places the true particle count somewhat higher. A conservative estimate of the bounds of the particle count of 50 to 200 is proposed.

Pattern Selection
Due to liquid jet and X-ray beam instabilities, not all scattering patterns collected are of sufficient quality for correlation analyses. The set of diffractions patterns of the sample collected contain, besides multi-  particle hits, a set of blanks, where no virus was intersected by the XFEL beam, as well as images characterized by a very high total scattered intensity and extensive intensity streaks where the X-rays hit the edge of the jet or part of the liquid-jet nozzle, Fig. 2. Similar observations were made for the buffer run. A selection of patterns was made on the basis of the total integrated intensity on the back panel. A histogram analysis of the integrated intensity reveals a bimodal distribution, with high quality patterns occurring around the most-populated mode of the distribution.

Data Records
Four individual datasets have been deposited on the CXIDB website (Data Citation 1). The deposited data consists of the raw xtc file of the experimental PBCV-1 and buffer scattering data. Selected patterns, preprocessed with CASS, involving dark-current subtraction and common mode corrections, of PBCV-1 and buffer have been deposited also in separate hdf5 files for the back and front pnCCD detectors. The xtc files contain close to 100 000 scattering patterns, whereas the selected pre-processed files contain close to 60 000 patterns for PBCV-1 and 30 000 patterns for the buffer. A reference buffer subtracted SAXS data set from PBCV-1 at the same concentration, collected at 11 keV using the CSAXS beamline at the Swiss Light Source has been deposited as well. The data records are summarized in Table 2.

Technical Validation
The quality of the data can be assessed by the mean intensity as a function of resolution, Fig. 3. The SAXS curve was obtained by angular integration of the images after masking out the strong jet scattering streaks seen in the diffraction patterns, Fig. 2c. A mask covering the jet streak included in the deposited hdf5 files.
The experimental SAXS data were fitted in the low-q region (up to 0.015 Å − 1 ) with the theoretical scattering curve of a hard sphere with a diameter of 174 nm. Given that the diameter of an icosahedron is 17% larger than the sphere that touches the midpoint of each vertex 17 , the hard sphere model derived here would correspond to a maximum particle dimension of little over 200 nm, consistent with the available model 8 . The analyses of the reference data collected at the Swiss Light Source can be modelled (at low q) with a hard sphere with a radius of 168 nm, corresponding to an icosahedron with diameter of 197 nm. The difference in estimated size between the reference data and the curve obtained from AMO can be ascribed to changes in relative contrast at lower X-ray energies, the effects of radiation damage on the sample at synchrotron sources, variations in sample preparations or concentration effects on the shape of the low q data. The hdf5 path '/data/data' contains the data. The '/data/mask' field contains a mask covering the overloaded jet streak. 'data/ timestamp_str' contains timestamps of the images.
CXIDB-79: SAXS Data A reference SAXS curve of PBCV-1 collected at the cSAXS beamline at the Swiss Light Source, collected at an energy of 11keV.
A SAXS curve in plain ASCII. Each line contains q (in inverse Ångström) and the corresponding mean intensity. Table 2. Data Records. Figure 3. The experimental SAXS data derived from the selected snapshots on the back detector display a characteristic oscillatory behaviour consistent with spherical-like particles. This SAXS curve is close to a reference curve obtained from PBCV-1 at the Swiss Light Source's CSAXS beamline at 11000 eV. Minor discrepancies between the soft X-ray and hard X-ray curve are likely due to contrast differences in the water window. A hard sphere model of the AMO data and the SLS data suggest a diameter of 174 nm and 168 nm, which would correspond to an icosahedron with diameter of 204 and 197 nm respectively.