Background & Summary

Cysteinyl leukotrienes, produced from arachidonic acid via the 5-lipooxygenase pathway, are pro-inflammatory mediators that modulate vascular permeability and immune response; hence, they are involved in multiple disorders including asthma, cardiovascular diseases and cancer1. Cysteinyl leukotrienes elicit their action through two G protein-coupled receptors (GPCRs), CysLT1R and CysLT2R, that share 38% sequence identity1. CysLT1R is mostly expressed in the lungs and immune cells, and its stimulation leads to allergic symptoms in the airways2. CysLT2R is found additionally in cardiovascular and brain tissues, with demonstrated involvement in ischemia and acute brain injuries3,4, however, the role of this receptor remains controversial and poorly understood. Both CysLT1R and CysLT2R have been implicated in progression of various cancers5,6,7,8, while the mutated form of CysLT2R with L129Q substitution has been associated with uveal melanoma9,10. Thus, CysLTRs are important pharmaceutical targets11, what inspired us to determine their high-resolution structures in complex with antiasthmatic drugs and other prospective antagonists.

Over the last few years, small-wedge synchrotron crystallography (SWSX) and serial femtosecond crystallography (SFX) have developed into powerful techniques, enabling high-resolution structure determination of many difficult to crystallize targets12,13. Several approaches to data processing have been developed for both SFX13 and SWSX14,15,16,17,18,19,20, and several papers reported deposition of raw serial crystallography data for challenging targets21,22,23,24,25. Many datasets can be found online on SBGrid (data.sbgrid.org)26, Zenodo (zenodo.org) or CXIDB (cxidb.org)27; the latter is used for SFX and other XFEL-related data deposition, whereas SBGrid and Zenodo host SWSX among other types of data.

Recently, we have determined crystal structures of CysLT1R28 (PDB codes 6RZ4, 6RZ5) and CysLT2R29 (PDB codes 6RZ6, 6RZ7, 6RZ8, 6RZ9). Here, we present fully-annotated SWSX and SFX datasets for these structures, as well as unpublished SFX data of a new crystal form of CysLT1R. The raw diffraction data, consisting of five SWSX and two SFX datasets, represent a wide range of resolutions (2.4–3.5 Å), SWSX miniset30 wedge sizes (3–180°), and space groups (6 different space groups). We carefully document crystallization conditions and harvesting details for each dataset, allowing one to investigate crystal non-isomorphism. Finally, we describe all data processing steps, provide supporting code and intermediate results, aiming for reproducibility of deposited data processing.

Methods

The preparation of CysLT1R and CysLT2R samples, data collection, and processing have been described previously28,29. Here, we provide a summary for each sample.

Construct engineering, expression, purification, and crystallization of CysLT1R and CysLT2R

The human CysLT1R gene (UniProt ID Q9Y271) was codon-optimized for expression in Spodoptera frugiperda (Sf9) insect cell line and modified for crystallization by a C-terminal truncation at K311 and by the insertion of a fusion protein BRIL31 (thermostabilized apocytochrome b562 from Escherichia coli with mutations M7W, H102I, and R106L) in the third intracellular loop (ICL3) between K222 and K223 using the S and SG linkers on each side, respectively (Fig. 1a). For CysLT2R, the human WT gene (UniProt ID Q9NS75) was modified by truncating amino acids 1–16 from the N-terminus and 323–346 from the C-terminus and inserting BRIL into ICL3 between residues E232 and V240. Three point mutations, W511.45V, D842.50N, and F1373.51Y (superscripts refer to the generic Ballesteros-Weinstein numbering of residues in Class A GPCR32), were further introduced to improve receptor surface expression as well as its stability and yield (Fig. 1b).

Fig. 1
figure 1

Construct design and crystallization of CysLT1R and CysLT2R. (a,b) Amino acid sequence snake-plot of the CysLT1R (a) and CysLT2R (b) crystallization constructs. Protein modifications are shown in red, red background shows stabilising point mutations, red font amino acids – linkers for BRIL insertion. Initial figure drawn using GPCRdb49. The N-terminal peptide fragment (a) was added to both constructs. (ch) microphotographs of typical crystals grown in lipidic cubic phase (LCP) for CysLT1R-zafirlukast (c), CysLT1R-pranlukast (d), and CysLT2R with antagonists: 11a (C2221 space group) (e), 11a (F222 space group) (f), 11b (g), and 11c (h) complexes51.

Each gene of interest was cloned into a modified pFastBac1 plasmid, containing a cleavable influenza hemagglutinin signal sequence (HA), a Flag tag, a AKLQTM linker, a 10 × His tag, and a Tobacco Etch Virus (TEV) protease site followed by KpnI restriction site on the N-terminal side of the inserted gene (Fig. 1c). The plasmid was then transfected into Sf9 insect cells using the bac-to-bac expression system (Invitrogen). High-titer recombinant baculovirus (>3 × 108 viral particles per ml) was obtained and used to infect Sf9 cells at a density of (2-3) × 106 cells per ml culture and a multiplicity of infection of 5–10 in the presence of a ligand: 8 µM zafirlukast (Cayman Chemical) for CysLT1R or 3 µM BayCysLT2 (Cayman Chemical) for CysLT2R. The protein surface expression and the virus titer were measured using flow cytometry. Cells were harvested 48–50 hours post infection by centrifugation at 2,000 × g and stored at −80 °C until use.

Protein purification was conducted at 4 °C. For each protein-ligand complex, the relevant ligand was added during purification. Cells were thawed and lysed by repetitive homogenization with a glass douncer followed by ultracentrifugation (30 min at 220,000 × g), 2 times in hypotonic buffer (10 mM HEPES pH 7.5, 20 mM KCl, and 10 mM MgCl2) and 3 times in high osmotic buffer (10 mM HEPES pH 7.5, 20 mM KCl, 10 mM MgCl2, and 1 M NaCl) with the addition of a protease inhibitor cocktail (500 µM 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride (Gold Biotechnology), 1 µM E-64 (Cayman Chemical), 1 µM Leupeptin (Cayman Chemical), 150 nM Aprotinin (A.G. Scientific)).

Membranes were then incubated for 30 min in 10 mM HEPES pH 7.5, 20 mM KCl, 10 mM MgCl2, 2 mg ml−1 iodacetomide, protease inhibitor cocktail, and 25 µM ligand. Then receptors were solubilized by the addition of an equal volume of solubilisation buffer (300 mM NaCl, 2% (w/v) n-dodecyl-β-D-maltopyranoside (DDM; Avanti Polar Lipids) 0.4% (w/v) cholesteryl hemisuccinate (CHS; Sigma), 10% glycerol) and incubation for 3.5 hours. After 1-hour centrifugation (650,000 × g) to remove insolubilized material, the supernatant was incubated with a TALON IMAC resin (Clontech) overnight in the presence of 10/20 mM imidazole, 100 мМ HEPES pH 7.5 for CysLT1/2R with NaCl concentration increased to 800 mM.

The resin was then washed with 10 column volumes (CV) of wash buffer I (8 mM ATP, 100 mM HEPES pH 7.5, 10 mM MgCl2, 500 mM NaCl, 15 mM imidazole, 10 μM ligand, 10% glycerol, 0.1% DDM, 0.02% CHS), then with 5 CV of wash buffer II (25 mM HEPES pH 7.5, 250/500 mM NaCl for CysLT1/2R, 30 mM imidazole, 10 μM ligand, 10% glycerol, 0.015% DDM, 0.003% CHS), then buffer was exchanged into buffer III (25 mM HEPES pH 7.5, 250/500 mM NaCl for CysLT1/2R, 10 mM imidazole, 10 μM ligand, 10% glycerol, 0.05% DDM, 0.01% CHS) and the protein-containing resin was treated with PNGase F (Sigma) for 5 hours. Resin was further washed with 5 CV of wash buffer III and eluted with (25 mM HEPES pH 7.5, 250/500 mM NaCl for CysLT1/2R, 300 mM imidazole, 10 μM ligand, 10% glycerol, 0.05% DDM, 0.01% CHS) in several fractions. After removing imidazole using a PD10 desalting column (GE Healthcare), the protein was incubated with 50 µM ligand and a His-tagged home-made TEV protease overnight to remove the N-terminal tags. Reverse IMAC was performed on the following day and the protein was concentrated up to 50–70 mg ml−1 using a 100 kDa molecular weight cut-off centrifugal concentrator (Millipore). The protein purity was checked by SDS-PAGE, and the protein yield and monodispersity were estimated by analytical size exclusion chromatography (aSEC).

Crystals for SWSX were grown using high-throughput nanovolume LCP crystallization. The purified and concentrated protein solution was combined with a lipid mixture: 90% monoolein (Sigma), 10% cholesterol (Affymetrix) in the ratio of 2:3 v/v and homogenized using a lipid syringe mixer until a transparent gel-like LCP formed33. Crystallisation was set up in 96-well glass sandwich LCP plates (Marienfeld), with 40 nL LCP drops and 800 nL precipitant drops, which were pipetted using an NT8-LCP robot (Formulatrix). All LCP manipulations were performed at room temperature (20–23 °C), and plates were incubated and imaged at 22 °C using an automated incubator/imager RockImager 1000 (Formulatrix).

CysLT1R-pranlukast crystals had a needle shape (Fig. 1d) and gained their full size after 3-4 weeks; however, the best diffraction was obtained from samples incubated for 2 months. CysLT2R crystals grew to their full size within 1–3 weeks. Crystals of CysLT2R in complex with ligands 11a and 11b had a shape of an elongated plate with a maximal size up to 150 µm (Fig. 1e–g). In case of CysLT2R-11c complex, crystals grew as flat parallelepipeds as long as 30–50 µm in diagonal (Fig. 1h). For the full list of crystallization conditions for crystals used in the data collection see Table 1.

Table 1 Summary of crystallization conditions for SWSX datasets.

Microcrystals of the CysLT1R-zafirlukast complex for SFX were grown in 100 µl gas-tight Hamilton syringes as previously described34,35. Briefly, approximately 5 µl of protein-laden LCP was transferred through a coupler (Formulatrix) into a syringe, containing 50 µl of precipitant, so that LCP extends towards the plunger as a straight filament. For experiments conducted in 2016 (dataset CysLT1R_zafirlukast-P21), zafirlukast was added at 50 μM prior to the protein concentration. Crystals grew in the following precipitant conditions: 100 mM ammonium phosphate, 31–34% v/v PEG400, 100 mM HEPES pH 7.0, 1 µM zafirlukast. For experiments conducted in 2017 (dataset CysLT1R_6RZ5), zafirlukast was added at 200 μM prior to the protein concentration. Crystals grew in the following precipitant conditions: 120–200 mM sodium/potassium phosphate, 31–34% v/v PEG400, 100 mM HEPES pH 7, no zafirlukast added. Crystals grew for 1-2 weeks, reaching an average crystal size of 5 × 2 × 2 µm (Fig. 1c).

Synchrotron data collection

Crystal harvesting

Crystals were harvested directly from LCP using 50–200 µm dual thickness MicroMounts or 400–700 µm MicroMesh loops (MiTeGen) with various hole sizes and flash frozen in liquid nitrogen, as described36.

Full sets data collection

Single-crystal datasets (for CysLT1R_6RZ4 and CysLT2R_6RZ8) were collected using the following procedure. First, the best diffracting position was found using automatic X-ray centring37 with a microfocus beam, followed by characterization37 and dose estimation using BEST38 software, and further data collection as proposed by BEST. This resulted in over 90% complete datasets, however, with a relatively low resolution (>3 Å).

Partial sets data collection

To improve resolution, SWSX partial datasets (minisets, as introduced by Basu et al.30) were collected using an updated version of the raster-scanning approach39. The process is illustrated in Fig. 2a. Each loop was first visually aligned and oriented with its plane perpendicular to the X-ray beam. Then, the whole loop was scanned with the beam to identify locations with diffracting crystals (shutterless mode was used on the ID29 and ID30b beamlines). Raster scans were performed using a minimal dose per image, which allowed for visual detection of diffraction spots, but was less than 1% of the total dose per dataset. The grid spacing was set around \(\sqrt{{\bf{2}}}{\boldsymbol{R}}\), where R is the beam profile radius (HWHM). The overlap between adjacent beam spots was introduced to improve accuracy in location of the best diffracting positions and to maximize the grid coverage by HWHM profiles. The grid cells showing diffraction spots were ranked by the DOZOR score37 and then manually selected for further data collection. In the case of large single crystals spanning through several grid cells, minisets were collected starting from the highest ranked location and then moving to the next best location along the crystal but skipping grid cells if they had a common edge with the cells already used for data collection to avoid collecting data from previously exposed parts of the crystal. Consecutive minisets from the same crystal were collected by ensuring 1-2° overlaps in the goniometer rotation ω angle. When the goniometer rotation angle exceeded 10° from its original orientation, a new line raster scan was performed to re-align the crystal with the beam. Each miniset was collected restricting an estimated dose per diffraction location within 20 MGy and using 0.1–0.2° oscillation and 3–20° total wedge size. The wedge size and the corresponding exposure time were selected based on the total number of harvested crystals from the particular condition and were adjusted by decreasing the wedge size and increasing the exposure time when preliminary data processing indicated that a complete dataset had been already collected, or in case of a weak diffraction. The beam size was chosen to match the smallest crystal dimension. A summary of miniset parameters for each SWSX entry is given in Table 2.

Fig. 2
figure 2

Synchrotron and XFEL data collection setups. (a) Schematics of the SWSX data collection process. The bar colour indicates the DOZOR score (from red – the best diffraction, to yellow – the worst). For minisets collected from the same crystal, as judged by the diffraction patterns and DOZOR score heatmap, an overlap of δω = 1–2° is introduced between consecutive sets. When the rotation angle ω exceeds ~10° from the initial orientation, as for the point 1d, an additional line scan is performed to re-align the crystal. The orientations of two different crystals 1 and 2 in the loop are assumed independent, and thus minisets from them are collected within the same ω range. (b) LCP-SFX data collection scheme. Microcrystals embedded in LCP are injected inside a vacuum chamber into the XFEL beam focus region using a viscous media microextrusion injector. A stream of sheath gas (nitrogen or helium) is used to keep the LCP stream straight. Microcrystals intersect with the XFEL beam in random orientations and diffraction patterns are collected by a CSPAD detector.

Table 2 Summary of SWSX datasets.

XFEL data collection

Loading crystals into injector

Precipitant solutions were slowly withdrawn from 3 syringes containing microcrystals of appropriate size and density through a 22 s gauge Hamilton needle. The remaining samples of LCP with microcrystals embedded in it were consolidated from these 3 syringes into one syringe using a syringe coupler (Formulatrix). An aliquot of ~10% of 7.9 MAG lipid was added to the sample to absorb the excess of the precipitant and to avoid LCP freezing upon extrusion in the vacuum chamber40. A total sample volume of 15–20 µl was loaded into an LCP injector as described40.

LCLS data collection: 2016

An overall scheme of the data collection setup is shown in Fig. 2b. SFX data of CysLT1R_Zafirlukast-P21 were collected in August 2016 at the CXI instrument of the Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory, Menlo Park, California. LCLS was operated at a wavelength of 1.305 Å (9.50 keV) delivering individual X-ray pulses of 40 fs duration and 2.6 × 1010 photons per pulse focused into a spot size of ~1.5 µm in diameter using a pair of Kirkpatrick-Baez mirrors. LCP with protein microcrystals was extruded at room temperature and at a flow rate of 0.3 μl min−1 inside a vacuum chamber into the beam focus region using an LCP injector40 with a 50-μm diameter capillary. The XFEL beam was attenuated at transmission levels of 6.1% to avoid disruptions of the LCP stream. Diffraction images were collected at an XFEL pulse repetition rate of 120 Hz using a 2.3 Megapixel Cornell-SLAC Pixel Array Detector41 (CSPAD).

A total number of 900,173 detector images were collected, of which 22,047 (2% of total) were identified as potential crystal hits with more than 15 Bragg peaks with SNR = 6.0, threshold 100 and min-pix-count 3.0 using peakfinder8 algorithm as implemented in Cheetah42. The overall time of data collection from a sample with a total volume of 27 μl was about 2 h 6 min.

LCLS data collection: 2017

SFX data of CysLT1R_6RZ5 were collected in August 2017 at the CXI instrument. LCLS was operated at a wavelength of 1.302 Å (9.52 keV) delivering individual X-ray pulses of 43 fs duration and 1.9 × 1010 photons per pulse focused into a spot size of ~1.5 µm in diameter using a pair of Kirkpatrick-Baez mirrors. LCP with protein microcrystals was extruded at room temperature and at a flow rate of 0.3 μl min−1 inside a vacuum chamber into the beam focus region using an LCP injector40 with a 50-μm diameter capillary. The XFEL beam was attenuated at transmission levels of 6.3–10% to avoid disruptions of the LCP stream. Diffraction images were collected at an XFEL pulse repetition rate of 120 Hz using a 2.3 Megapixel Cornell-SLAC Pixel Array Detector43 (CSPAD).

A total number of 390,442 detector images were collected, of which 43,417 (11% of total) were identified as potential crystal hits with more than 20 Bragg peaks with SNR = 4.0, threshold 200 and min-pix-count 3.0 using peakfinder8 algorithm as implemented in Cheetah42. The overall time of data collection from a sample with a total volume of 15 μl was about 54 min.

Data processing

All datasets, except for the SFX dataset CysLT1R_Zafirlukast-P21 (P21 space group), have been previously indexed, integrated, sorted, and merged to solve the structures of the corresponding receptor complexes by molecular replacement, as described28,29. Re-processing of the data with the same or better processing statistics as in the original manuscripts is described in the Technical validation section.

Data Records

SWSX data44,45,46,47,48,49 have been deposited to Zenodo under accession numbers provided in Table 3. Each SWSX dataset folder contains subfolders, representing each miniset collected, regardless of the angular range for data collection. Each miniset subfolder is named as XXX_YY_ZZ_NN, where XXX is the sequential number of the miniset, YY is the crystallization condition ID, ZZ is the serial number of the harvesting loop within each crystallization condition, NN – the serial number of the miniset within each loop. Each miniset subfolder contains a subfolder ‘images‘ with all diffraction images in either cbf or HDF5 format. It also contains an XDS parameter file XDS.INP with the keyword NAME_TEMPLATE_OF_DATA_FRAMES pointing to files in ‘images‘ subfolder, and other parameters as used during reprocessing (see keywords for express.py below). Also, each miniset subfolder contains all XDS-related files (including geometry correction x_geo_corr.cbf and y_geo_corr.cbf for cbf files) for this miniset (everything up to CORRECT.LP and XDS_ASCII.HKL for successfully integrated datasets, and only COLSPOT.LP for non-successful ones). A summary of all SWSX entries is shown in Table 2, and a summary of all SWSX entries crystallization conditions is present in Table 1, with a full description provided in Supplementary Table 1.

Table 3 Data availability on Internet. Github gist, associated with the publication, has the ‘download_all.sh‘ script for Linux to download all data entries described in this publication.

SFX data have been deposited to CXIDB as ID10650 (CysLT1R_6RZ5) and ID10751 (CysLT1R_Zafirlukast-P21). Only those images identified as crystal hits by Cheetah are included in the deposited dataset. Each SFX dataset folder contains a subfolder ‘raw_data’ with all runs as written by Cheetah, their respective cheetah.ini files and cxi files with images. Also, each SFX dataset folder contains a file ‘initial.geom’ that was used during reprocessing. A summary for all SFX entries is given in Table 4.

Table 4 Summary of SFX datasets.

Technical Validation

Data processing

During the preparation of this manuscript, all data were re-processed in a consistent manner. Here we present a pipeline for data processing that results in similar or better resolution values and figures of merit compared to those reported in the original papers. Data processing statistics for all datasets is shown in Table 5.

Table 5 Crystallographic data collection statistics.

SWSX data

For SWSX data, the processing algorithm works as following (note that the treatment of both full datasets and minisets is the same). For each dataset, initial indexing and integration are performed by XDS within the resolution range of 40–2.5 Å using the beamline-provided XDS.INP file, without specifying the unit cell parameters and the space group identity (for Dectris images, the “neggia” library was used, as described here (https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Eiger). Each integration runs first with the keywords “JOB = XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT”, then the integration parameters are updated using the output of the CORRECT step as described in the section “Final polishing: Re-INTEGRATEing with the correct spacegroup, refined geometry and fine-slicing of profiles” (https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles), and integration is re-run using the keywords “JOB = DEFPIX INTEGRATE CORRECT”. After that, the algorithm attempts to scale all obtained XDS_ASCII.HKL files using XSCALE, and runs several rounds of ΔCC1/2 rejection of non-isomorphous minisets using xdscc12 subprogram as described17 (until there are no rejected minisets at the subsequent iteration). For most datasets, the miniset rejection was applied in two steps: first, using the low-resolution range (e.g. 30.0–10.0 Å, ΔCC1/2 threshold 0–2), and then using the high-resolution range (e.g. 10.0-2.5 Å, ΔCC1/2 threshold 1–5). All processing parameters are summarized in Table 6. The obtained dataset is merged and used as a REFERENCE_DATA_SET during the 2nd integration attempt of all minisets (including those rejected during the previous integration attempts). If CC1/2 in the highest resolution shell exceeds 0.15, the RESOLUTION_RANGE is increased manually for the 3rd integration. Next, another round of ΔCC1/2 rejection of non-isomorphous minisets is performed followed by merging to produce a final dataset. Improvements in the figures of merit for a dataset as a result of ΔCC1/2 rejection are shown in Fig. 3. For the CysLT2R_6RZ8 dataset (space group I4), there is an indexing ambiguity with two indexing options available for each miniset, thereby some minisets have to be re-indexed using ‘REIDX_ISET = 0 -1 0 0 -1 0 0 0 0 0 -1 0‘ keyword in XSCALE. This is done by following an iterative procedure: first, two largest minisets are merged together using two possible indexing options for the second set, and the indexing option resulting in a smaller Rmeas is chosen. Then, all other minisets are added one by one, using the indexing choice that producess smaller Rmeas for the merged dataset. For the final merged dataset, phenix.xtriage reports no significant twinning.

Table 6 SWSX data processing parameters. For CysLT2R_6RZ6 and CysLT2R_6RZ7, only full resolution range rejection was performed.
Fig. 3
figure 3

Improvements in data merging statistics for SWSX datasets during ΔCC1/2 rejection process. (ae) Plots of redundancy, I/σ, Rmeas and CC1/2 vs resolution for each data processing iteration are shown for the following datasets: CysLT1R_6RZ4 (a), CysLT2R_6RZ6 (b), CysLT2R_6RZ7 (c), CysLT2R_6RZ8 (d), CysLT2R_6RZ9 (e). Darker curves represent latter stages of ΔCC1/2 miniset rejection.

SFX data

CysLT1R_6RZ5. Previously published data28 were processed using CrystFEL (v. 0.6.3 + 23ea03c7). For peak finding, peakfinder8 with min-snr = 4.5, threshold = 210 was used. For indexing, the following indexers were employed: felix, dirax, asdf, taketwo, mosflm-nolatt-cell, mosflm-nocell-latt, and xds (in that order), with–multi option enabled. Data were merged using process_hkl, with push-res = 1.8 and max-adu = 14,000. For reprocessing, the same parameters in CrystFEL (v. 0.8.0) were used. The final reprocessed dataset included 28,900 indexed lattices (67% of the frames selected by Cheetah). Among indexers, felix was the most successful one, providing 16,717 indexed lattices (57.8% of all indexed lattices). Improvements of Rsplit, CC* and I/sigma are shown in Fig. 4a–d.

Fig. 4
figure 4

Improvements in SFX data processing statistics with increasing the number of crystals. (ad) Plots of I/σ (a), Rsplit (b), CC* (c) and redundancy (d) in three major resolution shells (low resolution, high resolution and overall) vs the number of crystals used for merging for the dataset CysLT1R_6RZ5. (eh) The same as panels (ad) but for the dataset CysLT1R_Zafirlukast_P21.

CysLT1R_Zafirlukast-P21

Data (previously unpublished) were processed using CrystFEL (v. 0.8.0). For peak finding, peakfinder8 with highres = 3.0, min-snr = 4.4, threshold = 20, max-res = 300 and min-res = 80 was used. For indexing, indexers dirax, taketwo, mosflm, xds, and asdf (in that order) were used. Data were merged using process_hkl, with mincc = 0.3 and push-res = 5.0. The final dataset resulted in 17,193 indexed lattices (79% of the frames selected by Cheetah). Among indexers, dirax was the most successful one, providing 14,457 indexed lattices. Improvements of Rpim, CC* and I/sigma are shown in Fig. 4e–h.

Usage Notes

Downloading data

The information about downloading data is shown in Table 3. A Linux script ‘download_all.sh‘, fetching all data using curl utility is provided on the Github gist, associated with the publication. Folder with each entry is archived in a single tar.gz file for more convenient fetching.

Data processing assistance scripts

Here, a brief description of scripts is given. Please, find a more detailed description in the github gist (https://gist.github.com/marinegor/96102c9b7ce87509a0832649d11ba927), associated with the publication.

  1. 1.

    create_xscale.inp.py — a simple script to include all existing XDS_ASCIIs to XSCALE.INP

    Given the structure of folders as in data deposited in this publication, creates an input file for express.py in the csv format

  2. 2.

    express.py — the SWSX integration pipeline

    Given a list of folders with XDS.INP and a path to the respective data sets, the script runs XDS for all data sets in the list, optionally adding UNIT_CELL_CONSTANTS, SPACE_GROUP_NUMBER, INCLUDE_RESOLUTION_RANGE, setting SPOT_RANGE same as DATA_RANGE, and setting REFERENCE_DATA_SET. Adds MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS for processing on large clusters. Runs xscale_par afterwards.

  3. 3.

    xdscc.py — parsing of XDSCC.LP logfile of xdscc12 utility for rejection of minisets based on ΔCC1/2.

    Analyses the output of xdscc12 utility together with the last XSCALE.INP used, providing the list of datasets with their ΔCC1/2 values. Saves list of those which have ΔCC1/2 higher than the input threshold value.

  4. 4.

    reject.sh — iterative “until no dataset with negative ΔCC1/2 are left” dataset rejection script

    Scales XDS_ASCII.HKL files in all subfolders of the current folder. Then iteratively runs ΔCC1/2 rejection with the given resolution range and the number of cycles. Saves all intermediate XSCALE.INP-s and XSCALE.LP-s.

  5. 5.

    run_crystfel.sh

    A wrapper for the indexamajig routine, which i) arranges all CrystFEL-related files into subfolders, ii) automatically assigns the date and time for each generated stream and respective log file, iii) links the last created stream to ‘laststream‘ link, and shuffles the input file list, so that one could quickly and reliably check the indexing rate before the indexing finishes.

  6. 6.

    analysis.sh

A wrapper for process_hkl, partialator, check_hkl, and compare_hkl routines, which produces an XSCALE.LP-like statistics table, counts images indexed with different indexers, produces a command-line visible histogram of the image resolution (for a simple estimation of the push-res parameter), and writes logs.