Small-wedge synchrotron and serial XFEL datasets for Cysteinyl leukotriene GPCRs

Structural studies of challenging targets such as G protein-coupled receptors (GPCRs) have accelerated during the last several years due to the development of new approaches, including small-wedge and serial crystallography. Here, we describe the deposition of seven datasets consisting of X-ray diffraction images acquired from lipidic cubic phase (LCP) grown microcrystals of two human GPCRs, Cysteinyl leukotriene receptors 1 and 2 (CysLT1R and CysLT2R), in complex with various antagonists. Five datasets were collected using small-wedge synchrotron crystallography (SWSX) at the European Synchrotron Radiation Facility with multiple crystals under cryo-conditions. Two datasets were collected using X-ray free electron laser (XFEL) serial femtosecond crystallography (SFX) at the Linac Coherent Light Source, with microcrystals delivered at room temperature into the beam within LCP matrix by a viscous media microextrusion injector. All seven datasets have been deposited in the open-access databases Zenodo and CXIDB. Here, we describe sample preparation and annotate crystallization conditions for each partial and full datasets. We also document full processing pipelines and provide wrapper scripts for SWSX and SFX data processing.

Recently, we have determined crystal structures of CysLT 1 R 28 (PDB codes 6RZ4, 6RZ5) and CysLT 2 R 29 (PDB codes 6RZ6, 6RZ7, 6RZ8, 6RZ9). Here, we present fully-annotated SWSX and SFX datasets for these structures, as well as unpublished SFX data of a new crystal form of CysLT 1 R. The raw diffraction data, consisting of five SWSX and two SFX datasets, represent a wide range of resolutions (2.4-3.5 Å), SWSX miniset 30 wedge sizes (3-180°), and space groups (6 different space groups). We carefully document crystallization conditions and harvesting details for each dataset, allowing one to investigate crystal non-isomorphism. Finally, we describe all data processing steps, provide supporting code and intermediate results, aiming for reproducibility of deposited data processing.

Methods
The preparation of CysLT 1 R and CysLT 2 R samples, data collection, and processing have been described previously 28,29 . Here, we provide a summary for each sample.
Construct engineering, expression, purification, and crystallization of CysLT 1 R and CysLt 2 R.
The human CysLT 1 R gene (UniProt ID Q9Y271) was codon-optimized for expression in Spodoptera frugiperda (Sf9) insect cell line and modified for crystallization by a C-terminal truncation at K311 and by the insertion of a fusion protein BRIL 31 (thermostabilized apocytochrome b 562 from Escherichia coli with mutations M7W, H102I, and R106L) in the third intracellular loop (ICL3) between K222 and K223 using the S and SG linkers on each side, respectively (Fig. 1a). For CysLT 2 R, the human WT gene (UniProt ID Q9NS75) was modified by truncating amino acids 1-16 from the N-terminus and 323-346 from the C-terminus and inserting BRIL into ICL3 between residues E232 and V240. Three point mutations, W51 1.45 V, D84 2.50 N, and F137 3.51 Y (superscripts refer to the generic Ballesteros-Weinstein numbering of residues in Class A GPCR 32 ), were further introduced to improve receptor surface expression as well as its stability and yield (Fig. 1b).
Each gene of interest was cloned into a modified pFastBac1 plasmid, containing a cleavable influenza hemagglutinin signal sequence (HA), a Flag tag, a AKLQTM linker, a 10 × His tag, and a Tobacco Etch Virus (TEV) protease site followed by KpnI restriction site on the N-terminal side of the inserted gene (Fig. 1c). The plasmid was then transfected into Sf9 insect cells using the bac-to-bac expression system (Invitrogen). High-titer recombinant baculovirus (>3 × 10 8 viral particles per ml) was obtained and used to infect Sf9 cells at a density of (2-3) × 10 6 cells per ml culture and a multiplicity of infection of 5-10 in the presence of a ligand: 8 µM zafirlukast (Cayman Chemical) for CysLT 1 R or 3 µM BayCysLT 2 (Cayman Chemical) for CysLT 2 R. The protein surface expression and the virus titer were measured using flow cytometry. Cells were harvested 48-50 hours post infection by centrifugation at 2,000 × g and stored at −80 °C until use.
Crystals for SWSX were grown using high-throughput nanovolume LCP crystallization. The purified and concentrated protein solution was combined with a lipid mixture: 90% monoolein (Sigma), 10% cholesterol (Affymetrix) in the ratio of 2:3 v/v and homogenized using a lipid syringe mixer until a transparent gel-like LCP www.nature.com/scientificdata www.nature.com/scientificdata/ formed 33 . Crystallisation was set up in 96-well glass sandwich LCP plates (Marienfeld), with 40 nL LCP drops and 800 nL precipitant drops, which were pipetted using an NT8-LCP robot (Formulatrix). All LCP manipulations were performed at room temperature (20-23 °C), and plates were incubated and imaged at 22 °C using an automated incubator/imager RockImager 1000 (Formulatrix).
CysLT 1 R-pranlukast crystals had a needle shape (Fig. 1d) and gained their full size after 3-4 weeks; however, the best diffraction was obtained from samples incubated for 2 months. CysLT 2 R crystals grew to their full size www.nature.com/scientificdata www.nature.com/scientificdata/ within 1-3 weeks. Crystals of CysLT 2 R in complex with ligands 11a and 11b had a shape of an elongated plate with a maximal size up to 150 µm ( Fig. 1e-g). In case of CysLT 2 R-11c complex, crystals grew as flat parallelepipeds as long as 30-50 µm in diagonal (Fig. 1h). For the full list of crystallization conditions for crystals used in the data collection see Table 1.
Synchrotron data collection. Crystal harvesting. Crystals were harvested directly from LCP using 50-200 µm dual thickness MicroMounts or 400-700 µm MicroMesh loops (MiTeGen) with various hole sizes and flash frozen in liquid nitrogen, as described 36 .
Full sets data collection. Single-crystal datasets (for CysLT1R_6RZ4 and CysLT2R_6RZ8) were collected using the following procedure. First, the best diffracting position was found using automatic X-ray centring 37 with a microfocus beam, followed by characterization 37 and dose estimation using BEST 38 software, and further data collection as proposed by BEST. This resulted in over 90% complete datasets, however, with a relatively low resolution (>3 Å).
Partial sets data collection. To improve resolution, SWSX partial datasets (minisets, as introduced by Basu et al. 30 ) were collected using an updated version of the raster-scanning approach 39 . The process is illustrated in Fig. 2a. Each loop was first visually aligned and oriented with its plane perpendicular to the X-ray beam. Then, the whole loop was scanned with the beam to identify locations with diffracting crystals (shutterless mode was used on the ID29 and ID30b beamlines). Raster scans were performed using a minimal dose per image, which allowed for visual detection of diffraction spots, but was less than 1% of the total dose per dataset. The grid spacing was set around R 2 , where R is the beam profile radius (HWHM). The overlap between adjacent beam spots was introduced to improve accuracy in location of the best diffracting positions and to maximize the grid coverage by HWHM profiles. The grid cells showing diffraction spots were ranked by the DOZOR score 37 and then manually selected for further data collection. In the case of large single crystals spanning through several grid cells, minisets were collected starting from the highest ranked location and then moving to the next best location along the crystal but skipping grid cells if they had a common edge with the cells already used for data collection to avoid collecting data from previously exposed parts of the crystal. Consecutive minisets from the same crystal were collected by ensuring 1-2° overlaps in the goniometer rotation ω angle. When the goniometer rotation angle exceeded 10° from its original orientation, a new line raster scan was performed to re-align the crystal with the beam. Each miniset was collected restricting an estimated dose per diffraction location within ∼20 MGy and using 0.1-0.2° oscillation and 3-20° total wedge size. The wedge size and the corresponding exposure time were selected based on the total number of harvested crystals from the particular condition and were adjusted by decreasing the wedge size and increasing the exposure time when preliminary data processing indicated that a complete dataset had been already collected, or in case of a weak diffraction. The beam size was chosen to match the smallest crystal dimension. A summary of miniset parameters for each SWSX entry is given in Table 2.

XFEL data collection. Loading crystals into injector. Precipitant solutions were slowly withdrawn from
3 syringes containing microcrystals of appropriate size and density through a 22 s gauge Hamilton needle. The remaining samples of LCP with microcrystals embedded in it were consolidated from these 3 syringes into one syringe using a syringe coupler (Formulatrix). An aliquot of ~10% of 7.9 MAG lipid was added to the sample to absorb the excess of the precipitant and to avoid LCP freezing upon extrusion in the vacuum chamber 40 . A total sample volume of 15-20 µl was loaded into an LCP injector as described 40 .
LCLS data collection: 2016. An overall scheme of the data collection setup is shown in Fig. 2b. SFX data of CysLT1R_Zafirlukast-P21 were collected in August 2016 at the CXI instrument of the Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory, Menlo Park, California. LCLS was operated at a wavelength of 1.305 Å (9.50 keV) delivering individual X-ray pulses of 40 fs duration and 2.6 × 10 10 photons per pulse focused into a spot size of ~1.5 µm in diameter using a pair of Kirkpatrick-Baez mirrors. LCP with protein microcrystals was extruded at room temperature and at a flow rate of 0.3 μl min −1 inside a vacuum chamber into the beam focus region using an LCP injector 40 with a 50-μm diameter capillary. The XFEL beam was attenuated at transmission levels of 6.1% to avoid disruptions of the LCP stream. Diffraction images were collected at an XFEL pulse repetition rate of 120 Hz using a 2.3 Megapixel Cornell-SLAC Pixel Array Detector 41 (CSPAD).
A total number of 900,173 detector images were collected, of which 22,047 (2% of total) were identified as potential crystal hits with more than 15 Bragg peaks with SNR = 6.0, threshold 100 and min-pix-count 3.0 using peakfinder8 algorithm as implemented in Cheetah 42 . The overall time of data collection from a sample with a total volume of 27 μl was about 2 h 6 min.
www.nature.com/scientificdata www.nature.com/scientificdata/ LCLS was operated at a wavelength of 1.302 Å (9.52 keV) delivering individual X-ray pulses of 43 fs duration and 1.9 × 10 10 photons per pulse focused into a spot size of ~1.5 µm in diameter using a pair of Kirkpatrick-Baez mirrors. LCP with protein microcrystals was extruded at room temperature and at a flow rate of 0.3 μl min −1 inside a vacuum chamber into the beam focus region using an LCP injector 40 with a 50-μm diameter capillary. The XFEL beam was attenuated at transmission levels of 6.3-10% to avoid disruptions of the LCP stream. Diffraction images were collected at an XFEL pulse repetition rate of 120 Hz using a 2.3 Megapixel Cornell-SLAC Pixel Array Detector 43 (CSPAD).
A total number of 390,442 detector images were collected, of which 43,417 (11% of total) were identified as potential crystal hits with more than 20 Bragg peaks with SNR = 4.0, threshold 200 and min-pix-count 3.0 using peakfinder8 algorithm as implemented in Cheetah 42 . The overall time of data collection from a sample with a total volume of 15 μl was about 54 min.
Data processing. All datasets, except for the SFX dataset CysLT1R_Zafirlukast-P21 (P21 space group), have been previously indexed, integrated, sorted, and merged to solve the structures of the corresponding receptor complexes by molecular replacement, as described 28,29 . Re-processing of the data with the same or better processing statistics as in the original manuscripts is described in the Technical validation section.

Data Records
SWSX data [44][45][46][47][48][49] have been deposited to Zenodo under accession numbers provided in Table 3. Each SWSX dataset folder contains subfolders, representing each miniset collected, regardless of the angular range for data collection. Each miniset subfolder is named as XXX_YY_ZZ_NN, where XXX is the sequential number of the miniset, YY is the crystallization condition ID, ZZ is the serial number of the harvesting loop within each crystallization condition, NN -the serial number of the miniset within each loop. Each miniset subfolder contains a subfolder 'images' with all diffraction images in either cbf or HDF5 format. It also contains an XDS parameter file XDS. INP with the keyword NAME_TEMPLATE_OF_DATA_FRAMES pointing to files in 'images' subfolder, and other parameters as used during reprocessing (see keywords for express.py below). Also, each miniset subfolder contains all XDS-related files (including geometry correction x_geo_corr.cbf and y_geo_corr.cbf for cbf files) for this miniset (everything up to CORRECT.LP and XDS_ASCII.HKL for successfully integrated datasets, and only COLSPOT.LP for non-successful ones). A summary of all SWSX entries is shown in Table 2, and a summary of all SWSX entries crystallization conditions is present in Table 1 www.nature.com/scientificdata www.nature.com/scientificdata/ SFX data have been deposited to CXIDB as ID106 50 (CysLT1R_6RZ5) and ID107 51 (CysLT1R_ Zafirlukast-P21). Only those images identified as crystal hits by Cheetah are included in the deposited dataset. Each SFX dataset folder contains a subfolder 'raw_data' with all runs as written by Cheetah, their respective cheetah.ini files and cxi files with images. Also, each SFX dataset folder contains a file 'initial.geom' that was used during reprocessing. A summary for all SFX entries is given in Table 4. The bar colour indicates the DOZOR score (from red -the best diffraction, to yellow -the worst). For minisets collected from the same crystal, as judged by the diffraction patterns and DOZOR score heatmap, an overlap of δω = 1-2° is introduced between consecutive sets. When the rotation angle ω exceeds ~10° from the initial orientation, as for the point 1d, an additional line scan is performed to re-align the crystal. The orientations of two different crystals 1 and 2 in the loop are assumed independent, and thus minisets from them are collected within the same ω range. (b) LCP-SFX data collection scheme. Microcrystals embedded in LCP are injected inside a vacuum chamber into the XFEL beam focus region using a viscous media microextrusion injector. A stream of sheath gas (nitrogen or helium) is used to keep the LCP stream straight. Microcrystals intersect with the XFEL beam in random orientations and diffraction patterns are collected by a CSPAD detector.   Table 3. Data availability on Internet. Github gist, associated with the publication, has the 'download_all.sh' script for Linux to download all data entries described in this publication.
www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation Data processing. During the preparation of this manuscript, all data were re-processed in a consistent manner. Here we present a pipeline for data processing that results in similar or better resolution values and figures of merit compared to those reported in the original papers. Data processing statistics for all datasets is shown in Table 5. SWSX data. For SWSX data, the processing algorithm works as following (note that the treatment of both full datasets and minisets is the same). For each dataset, initial indexing and integration are performed by XDS within the resolution range of 40-2.5 Å using the beamline-provided XDS.INP file, without specifying the unit cell parameters and the space group identity (for Dectris images, the "neggia" library was used, as described here (https://strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Eiger). Each integration runs first with the keywords "JOB = XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT", then the integration parameters are updated using the output of the CORRECT step as described in the section "Final polishing: Re-INTEGRATEing with the correct spacegroup, refined geometry and fine-slicing of profiles" (https:// strucbio.biologie.uni-konstanz.de/xdswiki/index.php/Optimisation#Re-INTEGRATEing_with_the_correct_ spacegroup.2C_refined_geometry_and_fine-slicing_of_profiles), and integration is re-run using the keywords "JOB = DEFPIX INTEGRATE CORRECT". After that, the algorithm attempts to scale all obtained XDS_ASCII. HKL files using XSCALE, and runs several rounds of ΔCC 1/2 rejection of non-isomorphous minisets using xdscc12 subprogram as described 17    Low-resolution rejection  Table 6. SWSX data processing parameters. For CysLT2R_6RZ6 and CysLT2R_6RZ7, only full resolution range rejection was performed.
www.nature.com/scientificdata www.nature.com/scientificdata/ datasets, the miniset rejection was applied in two steps: first, using the low-resolution range (e.g. 30.0-10.0 Å, ΔCC 1/2 threshold 0-2), and then using the high-resolution range (e.g. 10.0-2.5 Å, ΔCC 1/2 threshold 1-5). All processing parameters are summarized in Table 6. The obtained dataset is merged and used as a REFERENCE_ DATA_SET during the 2 nd integration attempt of all minisets (including those rejected during the previous integration attempts). If CC 1/2 in the highest resolution shell exceeds 0.15, the RESOLUTION_RANGE is increased manually for the 3 rd integration. Next, another round of ΔCC 1/2 rejection of non-isomorphous minisets is performed followed by merging to produce a final dataset. Improvements in the figures of merit for a dataset as a result of ΔCC 1/2 rejection are shown in Fig. 3. For the CysLT2R_6RZ8 dataset (space group I4), there is an indexing ambiguity with two indexing options available for each miniset, thereby some minisets have to be re-indexed using 'REIDX_ISET = 0 -1 0 0 -1 0 0 0 0 0 -1 0' keyword in XSCALE. This is done by following an iterative procedure: first, two largest minisets are merged together using two possible indexing options for the second set, and a b c d e Fig. 3 Improvements in data merging statistics for SWSX datasets during ΔCC 1/2 rejection process. (a-e) Plots of redundancy, I/σ, R meas and CC 1/2 vs resolution for each data processing iteration are shown for the following datasets: CysLT1R_6RZ4 (a), CysLT2R_6RZ6 (b), CysLT2R_6RZ7 (c), CysLT2R_6RZ8 (d), CysLT2R_6RZ9 (e). Darker curves represent latter stages of ΔCC 1/2 miniset rejection. (2020) 7:388 | https://doi.org/10.1038/s41597-020-00729-2 www.nature.com/scientificdata www.nature.com/scientificdata/ the indexing option resulting in a smaller R meas is chosen. Then, all other minisets are added one by one, using the indexing choice that producess smaller R meas for the merged dataset. For the final merged dataset, phenix.xtriage reports no significant twinning. SFX data. CysLT1R_6RZ5. Previously published data 28 were processed using CrystFEL (v. 0.6.3 + 23ea03c7).

Usage Notes
Downloading data. The information about downloading data is shown in Table 3. A Linux script 'down-load_all.sh', fetching all data using curl utility is provided on the Github gist, associated with the publication. Folder with each entry is archived in a single tar.gz file for more convenient fetching.
Data processing assistance scripts. Here, a brief description of scripts is given. Please, find a more detailed description in the github gist (https://gist.github.com/marinegor/96102c9b7ce87509a0832649d11ba927), associated with the publication. A wrapper for the indexamajig routine, which i) arranges all CrystFEL-related files into subfolders, ii) automatically assigns the date and time for each generated stream and respective log file, iii) links the last created stream to 'laststream' link, and shuffles the input file list, so that one could quickly and reliably check the indexing rate before the indexing finishes. 6. analysis.sh A wrapper for process_hkl, partialator, check_hkl, and compare_hkl routines, which produces an XSCALE. LP-like statistics table, counts images indexed with different indexers, produces a command-line visible histogram of the image resolution (for a simple estimation of the push-res parameter), and writes logs.