Coherent diffraction of single Rice Dwarf virus particles using hard X-rays at the Linac Coherent Light Source

Single particle diffractive imaging data from Rice Dwarf Virus (RDV) were recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS). RDV was chosen as it is a well-characterized model system, useful for proof-of-principle experiments, system optimization and algorithm development. RDV, an icosahedral virus of about 70 nm in diameter, was aerosolized and injected into the approximately 0.1 μm diameter focused hard X-ray beam at the CXI instrument of LCLS. Diffraction patterns from RDV with signal to 5.9 Ångström were recorded. The diffraction data are available through the Coherent X-ray Imaging Data Bank (CXIDB) as a resource for algorithm development, the contents of which are described here.

Injection testing. The sample was injected using an injection setup identical to that used in the LCLS experiment (see Sample injection at the LCLS) to investigate their ability to aerosolize and their resistance to the injection procedure. By placing a microscope glass slide covered by a gel piece (Gel-Pak) beneath the outlet of the aerodynamic lens (at the same position as the interaction region with the X-ray beam in the subsequent LCLS experiments), a particle dust could be observed through an objective lens mounted below. In a second set of experiments a formvar/carbon grid (#01754-F, F/C 400 mesh Cu, Ted Pella Inc.) was substituted for the glass slide, to capture RDV particles that had traversed the injector. These samples were examined further using an environmental scanning electron microscope (ESEM) (Quanta FEG 650, FEI). The pressure in the vacuum chamber was kept at approximately 10 − 5 mBar.
Sample size and monodispersity in the liquid phase. The size distribution of the RDV sample in solution (250 mM ammonium acetate buffer, pH 7.5) was measured using both DLS (w130i, AvidNano Ltd. and Spectrolight 600, Molecular Dimensions) and NTA techniques (NanoSight, model LM10,  Figure 1. Experimental design. (1) As the first step in the experiment, an analysis of candidate samples was carried out and a primary target (Rice Dwarf Virus, RDV) was selected. RDV was purified from grasshopper nymphs, which were fed infected rice plants as described in the text. (2) Purified virus particles were then injected into the X-ray beam of the LCLS and diffraction patterns were recorded on the front and back detectors of the CXI instrument 20 . (3) The raw data were pre-processed using psana 30 and converted to XTC files. 175 frames of strong hits were selected and converted into the CXI file form. Malvern Instruments Ltd.). For these DLS and NTA measurements, the sample was diluted to 10 9 particles ml − 1 and 10 8 particles ml − 1 , respectively. The measured size distribution is shown in Fig. 2a,b.
Sample size and monodispersity in the gas phase. The size distribution of the RDV sample in the gas phase were measured by means of Electrophoretic DMA. RDV was aerosolized with a nano-Electrospray ionization (ESI) source (TSI model 3480) and passed through an electrostatic classifier (TSI model 3480) whose size selection window was continuously scanned. Transmitted particles were counted with a condensation particle counter (CPC, TSI model 3786). The size distribution is shown in Fig. 2c.

Sample injection at LCLS
The experiment was carried out at the CXI instrument at the LCLS 19,20 . An aerosol injector (described in ref. 8) was used to introduce the particles into the X-ray beam. Purified RDV were transferred to a volatile buffer (250 mM ammonium acetate, pH 7.5) at a concentration of 10 12 particles ml − 1 and introduced to the injector via a gas dynamic virtual nozzle (GDVN) 28 at a flow rate of 1-2 μl min − 1 . The aerosol passed through a skimmer and a relaxation chamber and it was focused into a narrow particle beam by an aerodynamic lens. By regulating gas, liquid flow and skimmer pressure, the quality of the particle beam could be optimized. Injected particles intersected the X-ray beam in random orientations.

Experimental setup and data collection
Data were collected at the CXI instrument at LCLS. LCLS was tuned to a photon energy of 7 keV and produced pulses with approximately 4 mJ pulse energy and o50 fs duration. The selection of the photon energy, within the CXI operation range of 5 to 11 keV, was driven by the competition between the sample's scattering cross section, which tends to be high at lower energies, and the ability to discriminate single-photon events in the detector, which increases at the high end of the range. Additionally, photon energies below the Iron K-alpha excitation edge at 7.1 keV avoid an isotropic fluorescence signal from the steel walls of the vacuum chamber, which in turn would complicate the identification of photons scattered by the sample.
X-rays were focused using a pair of Kirkpatrick-Baez (KB) mirrors to a nominal size of 0.1 × 0.1 μm. The focused beam passed through a set of beam-defining apertures to reduce the X-ray scattering imperfections in the optical system as the FEL beam overfills the KB entrance. Additional cleanup slits and apertures, including a post sample aperture, are used to limit background scatter. The post sample aperture limited the collection angle to commensurate with 5.9 Å resolution. The post sample aperture is a small 3 mm circular aperture positioned just downstream of the interaction region and is used to prevent scattered X-rays coming from locations other than the sample position reaching the sensitive surface of the detector. Small angle diffraction patterns were recorded with a CSPAD 0.14 Mpix detector (also referred to as the back detector), located 2.4 m downstream of the interaction and high-angle scattering was captured on a 2.3 Mpix CSPAD detector (also referred to as the front detector), located 217.4 mm downstream from the interaction point, in a tandem arrangement as shown in the center panel of Fig. 1 (refs 23,24). All data events were recorded and synchronized with the LCLS repetition rate of 120 Hz. The back detector was offset with respect to the optical axis of the focusing optics, and extended to a maximal resolution of c. 15.2 nm and c. 11.6 nm on the edge and in the corner, respectively. A semitransparent beam-stop, consisting of 25 μm Ti and 100 μm Zn, was utilized so that very low-q scattering could be collected, as well as provide a monitor for the direct beam. Data were analyzed onsite using Hummingbird, a fast online analysis tool developed for single particle imaging 29 and Cheetah, a software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data 30 .

Data processing steps used
To provide interpretable data in addition to the raw XTC files, the native format of the LCLS data stream, we selected a small subset of diffraction events (175 frames) and converted them into a CXI file using psana 31 . For both back and front detectors, data were calibrated using psana's ImgAlgos.NDArrCalib module, with pedestal subtraction (do_peds), common-mode correction (do_cmod), statistical correction (do_stat) and gain corrections (do_gain) turned on. Pixel gains were calculated by generating per-pixel histograms from a flat-field run and fitting a bimodal distribution with respect to the noise peak and the single photon peak. Using psana's CSPadPixCoords.CSPadImageProducer and CSPadPixCoords. CSPad2x2ImageProducer, the detector panels were assembled in order to form real images. For a list of provided files and data entries, see section Data Records. The 175 patterns were selected manually by means of identifying strong diffraction signal showing similarities to simulations. See Technical Validation for more details.

Data Records
Data citation 1-Sample size and monodispersity The data are available at Figshare (Data Citation 1) and contains an excel file with raw data from the DLS, NTA and DMA measurements.

Data citation 2-Coherent diffractive imaging data
The data are deposited in the CXIDB 25 (Data Citation 2) and stored in the CXIDB data format, which is based on the HDF5 format. HDF5 files are readable in many computing environments, including Python using the h5py module and MATLAB using e.g., the h5read function. Convenient functions for accessing the CXIDB data file exist in the libspimage package for C and Python 32 . For visualizing data in the CXIDB format, the Owl software is convenient (https://github.com/FilipeMaia/owl/). In addition to the CXI file, the conversion script (create_dataset.py) and additional metadata files (selection.h5, psana.cfg) are provided along with usage instructions. Detector panel calibration files mapping data to real space are also provided. Configuration files for Hummingbird, psana and Cheetah are provided for completeness of describing processing performed on the deposited data.

Technical Validation
Background scattering and direct beam scatter A background scattering pattern was derived by averaging 1000 frames that did not include any hits or dark frames. This background, as well as suggested masks for non-responsive pixels and beamstops, are shown in Fig. 3.
Additionally, for the back detector data, manifold-embedding methods were used to detect and identify the nature and origin of stochastic changes, and quantify the necessary corrections to the background. The manifold of raw RDV single-particle snapshots is shown in Fig. 4, where each point represents a diffraction pattern [33][34][35] . The parabolic nature of this manifold reveals that a single parameter dominates the changes from snapshot to snapshot, namely fluctuations in pulse intensity, consistent with the self-amplified spontaneous emission process of the FEL. This can be corrected by appropriate normalization procedures 36 . In addition to a monotonic intensity change along the parabola, a prominent deviation is evident. This is caused by a shift of about one pixel, of the center of intensity along the lateral direction of the beam. The cause of this shift is a drift in pointing of the offset mirrors on the beamdefining aperture of the KB mirrors for the CXI beamline.

Simulated diffraction data of expected size
In Fig. 5, two diffraction patterns from different single particle hits are shown in comparison to simulated diffraction from homogeneous spheres of size 71 nm. In the simulation, a photon energy of 7 keV and an assumed mass density for RDV of 1.381 g cm À 3 was used. The back detector was simulated using a detector distance of 2.4 m, a pixel size of 110 microns and a signal conversion rate of 33 ADUs per photon.

Signal above background on front detector
The front detector was located 217.4 mm downstream of the sample interaction region and collected diffraction data extending to a resolution of 5.9 Å. Although hits are immediately apparent on the back detector, and this signal is used for hit finding, determining whether there is useful signal from the sample on the front detector above background levels is not immediately apparent from any individual image. A radial average of the sum of frames determined by Cheetah to be hits shows that there is indeed consistently elevated signal above background when sample is detected in the beam based on the back detector (Fig. 6). This intensity distribution falls off with the expected q-dependence, and stops at a resolution of 5.9 Å. This resolution limit is set by the angular acceptance of the post-sample aperture. Beyond this resolution both radial sums are identical, further supporting the notion that signal up to 5.9 Å resolution comes from individual particles. This validates the potential usefulness of signal on the front detector for image analysis. Cheetah processing scripts are included in the archive.

Validation that scattering comes from RDV
The method of least surprise (described below) was used to determine whether signal on the front detector corresponded to the expected signal from RDV. Data were converted from Analog-to-Digital Units (ADU) measured by the detector into photon counts using the relation where k i is the photon count at pixel i, A i is the dark, common-mode, gain-corrected ADU measured at pixel i, and γ is the average ADUs per photon for the detector which was calculated from a flat-field run. In this analysis, we use front detector data up to 6.67 Å resolution, which corresponds to a radius of 265 pixels.
Assuming Poisson statistics, we define the surprise function as the negative log-likelihood where K denotes the dependence on data, with k i being the measured photon count at pixel i, n i is the average photon number at pixel i when the fluence is Φ and the RDV particle has orientation Ω j , and the summation runs over all the pixels. Minimizing the surprise function, or maximizing the log-likelihood, across different orientations and fluence values, we assign each data frame with the orientation and fluence at which it was most likely recorded. To help us assess the quality of these assignments, we further 'normalize' the surprise function. Given the RDV model (PDB 1UF2) 19 with an estimate of the particle orientation and fluence, we calculate the mean log Pðn i ; kÞ h i   Figure 6. Radial average of signal on the front detector from blank frames compared to radial average from frames determined to be hits. Elevated photon counts from the sample are visible up to an angle commensurate with 5.9 Å resolution, this being the resolution limit set by the angular acceptance of the post-sample aperture. 'All hits' is the average of all hits and includes all particles independent of size (including clusters of particles), while 'single particle' is the average of hits that are of the appropriate size to be isolated single particles. 42 frames within ± 1 Figure 7. Front detector normalized surprise (z-score) versus back detector particle size fits. The dashed red line indicates the diameter (70.8 nm) of the RDV model. The normalized surprise function, or its z-score, measures the agreement of the data with a known model: The data are inconsistent with the model when the absolute value of the z-score is much greater than unity: a z-score much greater than unity is consistent with the data being 'surprising' given the assumed model. the absolute value of the z-score is much greater than unity: a z-score much greater than unity is consistent with the data being 'surprising' given the assumed model.
The z-scores of all the selected frames versus particle size are shown in Fig. 7. The particle sizes were determined by fitting back detector data to a homogeneous sphere model with adjustable size. Frames with particle size close to the diameter (70.8 nm) of the RDV model generally have smaller z-scores, though some still manifest inconsistency with the model. The source of this could be the presence of a water layer on the particle surface. This model-based surprise function calculation may potentially be useful for hit-finding, especially when the signal is as weak as the front detector data.

Usage Notes
The dataset (CXIDB ID 36) contains the full data stream recorded during the experiment in.xtc format. The dataset also contains a set of pre-selected hits as a CXI file (as described above) from both CSPAD detectors plus instrument metadata. XTC files are the native format of LCLS and can be read using analysis frameworks provided by the LCLS (see https://confluence.slac.stanford.edu/display/PSDM/LCLS +Data+Analysis). 36. Hosseinizadeh, A., Dashti, A., Schwander, P., Fung, R. & Ourmazd, A. Single-particle structure determination by X-ray free electron lasers: possibilities and challenges.