Coherent soft X-ray diffraction imaging of coliphage PR772 at the Linac coherent light source

Single-particle diffraction from X-ray Free Electron Lasers offers the potential for molecular structure determination without the need for crystallization. In an effort to further develop the technique, we present a dataset of coherent soft X-ray diffraction images of Coliphage PR772 virus, collected at the Atomic Molecular Optics (AMO) beamline with pnCCD detectors in the LAMP instrument at the Linac Coherent Light Source. The diameter of PR772 ranges from 65–70 nm, which is considerably smaller than the previously reported ~600 nm diameter Mimivirus. This reflects continued progress in XFEL-based single-particle imaging towards the single molecular imaging regime. The data set contains significantly more single particle hits than collected in previous experiments, enabling the development of improved statistical analysis, reconstruction algorithms, and quantitative metrics to determine resolution and self-consistency.


Background & Summary
Theoretical studies predict X-ray Free Electron Lasers (XFELs) can potentially image biomolecules and hence determine their structures without crystallization 1 . To realize this in practice has proved a significant experimental challenge. The Single Particle Imaging (SPI) Initiative 2 was formed as a collaborative effort to identify and solve the experimental challenges to achieving high-resolution imaging of single molecules with X-rays.
Coliphage PR772 is a virus of approximately 70 nm in diameter, which infects Escherichia coli (E. coli). It was selected as the sample for this experiment due to its high structural homogeneity, uniform size distribution, suitable particle concentration in solution, having a known structure, and the ability to be aerosolized by Gas Dynamic Virtual Nozzle (GDVN) 3 for injection into the XFEL beam using an aerosol injector. Having a known structure of PR772 (unpublished) enables validation of any subsequent data analysis steps.
For the data presented here, Coliphage PR772 was aerosolized and delivered into the focus of the LAMP instrument at the Atomic Molecular Optics (AMO) beamline 4 of the Linac Coherent Light Source (LCLS) X-ray laser 5,6 . The data include clear diffraction snapshots from single PR772 virus particles.

Sample preparation
Bacteriophage PR772 (ATCC BAA-769-B1) infects E.coli K12 J53-1 (ATCC BAA-769). PR772 was cultured on agar by the overlay method. Tryptic Soy Agar (Difco Tryptic Soy Agar) was prepared and poured into petri dishes to form a hard agar support layer. The overlay medium was soft agar prepared with Tryptic Soy Broth (Bacto Tryptic Soy Broth) along with 0.5% Agar-Agar (Microbiology, Merck). E. coli was cultured in Tryptic Soy Broth to reach an OD 600 of 0.7. Soft agar was melted and cooled to 45°C. E. coli and viral stock (10 8 pfu ml -1 ) were added to the soft agar in a volume ratio of 10:1 and mixed well. The mixture was immediately poured onto the hard agar in the petri dishes and incubated overnight at 37°C.
The soft agar layer from the overnight culture was scraped off and collected in a sterile container, diluted with 100 ml of sterile storage buffer (TRIS 50 mM, NaCl 100 mM, MgSO 4 1 mM, EDTA 1 mM, pH 8.0) and mixed overnight at 4°C. The mixture was centrifuged at 8,000 g for 30 min to remove agar and cell debris. The supernatant was collected and filtered through a 0.45 micron filter (Filtropur S 0.45, Sarstedt). The viral particles were separated from the permeate solution by PEG precipitation (9% w/v PEG 8000 and 5.8% w/v NaCl). After mixing overnight at 4°C, the precipitate was centrifuged for 90 min at 8,000 g. The supernatant was discarded and the viral pellet was suspended in 1 ml storage buffer. The viral suspension was then applied to a Capto-Q anion exchange column. The sample was eluted by varying the concentration of NaCl (100 mM-1.5 M). The fractions representing the elution profile peak was collected and observed with a electron microscopy (EM) (Quanta FEG 650, FEI) to confirm the presence of intact viral particles. (Figure 1) Before sample injection, the PR772 was transferred from storage buffer into volatile ammonium acetate buffer (250 mM, pH 7.5) using PD10 desalting columns (GE Healthcare). Hence, most assays were performed with sample in ammonium acetate buffer instead of storage buffer.

Sample characterization
Infectivity. Virus titers were measured by plaque assay following purification purification 7 . Serial 10-fold dilutions of purified virus were plated on a mat of E. coli and incubated overnight at 37°C.  Plaque forming units per ml (pfu ml -1 ) were calculated using the formula: average # plaques/volume plated x dilution.
Liquid phase. The size and monodispersity of the PR772 in ammonium acetate buffer were measured using Nanoparticle Tracking Analysis (NTA) (NanoSight LM10, Malvern Instruments Ltd.) ( Figure 2) and Dynamic Light Scattering (DLS) (w130i, AvidNano Ltd.) ( Figure 2). The sample was diluted to the required concentrations for measurement. For NTA 10 8 particles per ml were used to limit the number of tracks to 200. For DLS about 10 10 particles per ml was used to reach a recommended counts per s.
Gaseous phase. The size distribution in the gas phase was measured using Electrophoretic DMA ( Figure 2). PR772 was aerosolized with a nano-electrospray ionization (ESI) source (TSI model 3480) and passed through an electrostatic classifier (TSI model 3480) to continuously classify particles from 10 to 500 nm. Classified particles were counted with a condensation particle counter (CPC, TSI model 3786).
Injection testing. Injection tests on PR772 were performed with a setup similar to the one subsequently used at the LCLS experiment to investigate the aerosolization characteristics of the samples and to study sample behavior during the injection procedure 8 . A glass microscope slide covered by a transparent sticky gel piece (GelPak) was positioned beneath the exit point of the aerodynamic lens (at a position similar to the interaction region with the X-ray beam at LCLS). Particle dusting spots were observed through an objective lens mounted below the glass slide. The focused particles from the injector were also collected on formvar/carbon grids (#01754F, F/C 400 mesh Cu, Ted Pella Inc.). These samples were examined by EM without any further modification.

Sample injection at LCLS
Samples were delivered into the X-ray beam using the aerodynamic lens stack system used in previous experiments 9 . Purified PR772 was transferred to a volatile ammonium acetate buffer at a concentration of 10 12 particles ml − 1 and introduced to the injector via a GDVN 3 at a flow rate of 1-2 μl min − 1 . The aerosol continued through a gas skimmer and a relaxation chamber, forming a fine beam of focused particles by the aerodynamic lens stack 10 . The focus of the particle beam and multiplicity We observed a single large peak at 75 nm with concentration of 10 8 particles/ml, which is similar to the dilution used for measurement. The diameter peak from the DMA measurement (c) was approximately 60 nm. The actual gas flow during the DMA measurement was lower than the set value, which resulted in a diameter low by about 10%. The gas flow was disregarded since the purpose of the DMA measurement was to assess sample purity and not size. DLS measurements of bacteriophage PR772 in ammonium acetate buffer (250 mM, pH 7.5). Curve in red shows the average of 18 correlation data collected over time for the sample (a). Curves in black and blue show the size of the particles based on mass (d) and intensity (e) distribution respectively. The mean diameter of the particle was 75.99 nm with a polydispersity index (PDI) of 0.012. of the particles could be optimized by regulating the flow of sample and gas along with the skimmer pressure.

Experimental setup and data collection
Coherent diffraction snapshots using single soft X-ray pulses were recorded at the AMO beamline of the LCLS XFEL using the LAMP endstation 4 . The configuration was similar to that used in previous coherent diffraction experiments 11 , but with a shorter distance from the sample interaction region to the detector due to the gate valve between detector and sample chambers being removed for the present experiment.
Measurements were performed using LCLS tuned to a photon energy of 1.6 keV delivering 4 mJ into a 70 fs duration pulse at the end of the undulators. This was focused into a nominal 1.5 μm 2 FWHM region using a pair of Kirkpatrick-Baez (KB) mirrors, giving a nominal power density of 3.8 10 18 W cm -2 or 10 13 photons per pulse assuming no beamline losses. Comparison of measured and calculated from known scattering objects indicate that the actual power density may differ from this estimate by a factor of 10. This is due to three combined effect of overfilling of the focusing optics, carbon contamination on the KB mirrors increasing mirror roughness (reducing reflectivity), and a less-than-perfect focal spot size due to carbon contamination distorting the KB mirrors (increasing effective focal spot size).
The photon energy was selected by considering the highest achievable resolution, while remaining below the silicon absorption edge at 1.8 keV. At photon energies above the silicon K-edge of 1.8 keV, the silicon beam conditioning apertures' fluorescence increases the background. A series of silicon apertures, including a post-sample aperture, were used to limit the amount of background scatter incident on the detectors from the beamline. Three 1 mm × 1 mm Si 3 N 4 apertures were used to reduce low q scatter. The first two apertures were positioned laterally to the beam to form a small rectangle, with the third aperture used to clean up diffraction from the first two. The first two apertures were separated by 5 cm, with the third aperture separated by 10 cm. Additionally the third aperture was 10 cm away from the focus. A large round aluminum post sample aperture was used to block high q scatter. It was located 2.4 cm downstream of the sample with a diameter of 4 cm.
Diffraction snapshots were recorded using two pairs of pnCCD detectors 12 at the LCLS pulse repetition rate of 120 Hz. One pair of pnCCD detectors was located 10 cm downstream of the interaction region, and a second pair of detectors further downstream at 58.1 cm from the interaction region. The pixel size are 75 μm and each detector in the pair has 512 × 1,024 pixels giving the full set-up a pixel count of two planes each with 1,024 × 1,024 pixels. The experimental configuration was similar to that . Diffusion map analysis of all the hits reduced to two dominant dimensions Φ 1 and Φ 2 as described in the text. Clustering helps identify the single particle diffraction patterns in the dataset. In addition to single hits (a) and multiple hits (b), the spherical particles (c) classified as 'water' may also contain contaminant residue from the sample and buffer. Additionally slightly larger than average background scattering (d) and frames with a defective readout (e) are also shown in the diffusion map. The data were analyzed online using Hummingbird 14 and psana 15 .

Data processing
Data saved in the native XTC format used at the LCLS were analyzed and converted to the CXI file format using the LCLS data analysis framework psana 15 . The raw pnCCD pixels contain analog to digital units (ADUs), to which various corrections must be made in order to obtain photon counts. As a first step, dark calibration and row-by-row common mode correction were performed on the pnCCD detector images by the LCLS software environment, psana. Data was calibrated using psana's ImgAlgos. NDArrCalib module, with pedestal subtraction (do_peds), common-mode correction (do_cmod), statistical correction (do_stat) and gain corrections (do_gain) turned on. A flat-field data set using silicon K edge fluorescence (1.7 keV) was used to calibrate the gain on a pixel basis. These gain-corrected ADU values were then thresholded based on an average of 128 ADUs/photon in order to obtain photon counts. Not all detector frames (or events) contained diffraction from single particles. Since the average number of particles per X-ray focus volume is much less than one, most of the pulses do not hit any particles. Hits (events in which a particle was intercepted by the X-ray beam) were identified using a chi squared metric adapted from ref. 16.
Chi squared for the jth image I j was calculated by subtracting a running median background B and normalizing by the variance of B. The chi squared value was calculated within an annular area of radii 150 and 400 pixels from beam center and hits were defined to frames with Chi-squared above 10. Hits were saved to file with detector corrections applied (i.e., pedestal, common mode and gain corrected) and then down-sampled by a factor of 4.

Data Records
The data are deposited in the Coherent X-ray Imaging Data Bank (CXIDB) 20 in the CXIDB data format, which is based on the HDF5 format (Data Citation 1). Convenient functions for accessing the CXIDB data file exist in the libspimage package for C and Python (Maia 2010), as well as many computing environments, including Python using the h5py module and MATLAB using e.g., the h5read function. The Owl software is convenient for visualizing data in the CXIDB format. In addition to the CXI file, the conversion script (create_dataset.py) and additional metadata files (selection.h5, psana.cfg) are provided along with usage instructions. Detector panel calibration files mapping data to real space are also provided. Configuration files for Hummingbird and psana are provided for completeness of describing processing performed on the deposited data.

Technical Validation
Single-particle diffraction patterns were identified by analyzing all the hits in a reduced set of dimensions using diffusion map embedding similar to the method described in ref. 17 shown in Figure 3. The normalized graph Laplacian is used to calculate the likelihood of diffusion from the center of the single particle cluster at Φ 1 , Φ 2 , Φ 3 = − 0.75, 0, 0. A likelihood value of 0.725 or higher is considered inside the single particle cluster. A total of 12,678 out of 14,772 images were identified as single particle hits. An alternative manifold-based data analytical approach yielded a larger dataset consisting of 37,550 singleparticle snapshots, whose indices are provided. This approach reveals and corrects intensity variations and a range of other imaging artifacts 18 . Another method for data sorting and classification is based on principal component analysis (PCA) 19 . With this approach we have filtered 21,733 images as hits by intensity thresholding (all images above 200k photons). A total of 7,992 images were identified by the PCA technique as single hits and removed outliers by radial intensity filtering (also provided). A comparison of the summation of the single-particle hits, forming a pseudo small angle X-ray scattering patterns, is shown in Figure 4 and is shown in Table 1 per XTC run number. Manual selection was not employed due to the large number of hits and the potential for user bias. Because identification of single-particle hits is dependent on the input parameters for clustering, we have deposited all data frames and identified potential single particle hits as a list of events. This enables the testing and comparison of different hit sorting algorithms.
Raw XTC files are included in the data deposition for anyone wishing to repeat the analysis from scratch.