Introduction

Solving the phase problem is an essential step in the determination of any structure by X-ray crystallography1. Several methods have been developed to approach the phase problem for biological macromolecules. If good homology models exists, phasing is often straightforward by applying molecular replacement (MR)2,3,4.With an increasing number of structural models deposited into the protein data bank, the impact of MR for phasing is steadily increasing. However, for the determination of new protein folds or distant homologues, MR cannot be applied. In these cases, a de novo phasing strategy is inevitable. In general, five techniques can be distinguished: single wavelength anomalous diffraction (SAD)5,6,7, multi wavelength anomalous diffraction (MAD)8,9, macromolecular ab initio phasing10, multiple isomorphous replacement (MIR)11, and single isomorphous replacement (SIR)11,12. The latter two can be combined with anomalous scattering (MIRAS, SIRAS)13,14. Nowadays the most commonly used de novo phasing method is SAD15. SAD generally requires the presence of ordered heavy atoms, which can act as anomalous scatterers. It exploits the differences in the intensity of the Bijvoet Pairs caused by the imaginary f'' component of the anomalous scattering. From these differences, the positions of the anomalous scatterers can be derived and this substructure can then be used to solve the phase problem. A commonly used SAD method is based on the substitution of methionine with seleno-methionine during protein expression16,17. This method however, requires a suitable expression system and the presence of sufficient methionines in the macromolecule. In addition, the natively present sulfur in cysteine and methionine can be exploited for SAD (S-SAD)18,19, but this approach is also limited to proteins containing a sufficient amount of cysteines and methionines. Furthermore, high resolution is required as the anomalous signal of sulfur is quite small20. These restrictions call for alternative approaches that mildly introduce anomalous scatterers which can be exploited for phasing.

Here, we investigated the introduction of an anomalous scatterer at different stages of the crystallization process, i.e. during protein purification, protein crystallization, and crystal cryo-protection. This harbours the potential for a step-wise procedure. The first and simplest procedure would be a “quick-soak” strategy, using already existing crystals21. In case this strategy doesn’t yield the means for solving the phase problem, an anomalous scatterer could be introduced at an earlier stage i.e. during protein purification or crystallization. This approach requires the identification of a suitable compound, which is compatible with each of these stages. Commonly used components within protein buffers are sodium chloride (NaCl) or potassium chloride (KCl). These salts are also used in crystallization solutions, serving as precipitant or additive to aid crystallization22,23. Na and K are both alkali metals, and the group is completed by rubidium (Rb), cesium (Cs), and francium (Fr). Since elements from the same group of the periodic system of the elements usually display similar chemical properties, we chose Cs as a possible candidate. Cs is the heaviest, not radioactive member of this group with very potent anomalous scattering propensities (Fig. 1). Indeed, Cs has been utilized to overcome the phase problem in prior studies24,25, and has been proposed as general phasing strategy in RNA crystallography26. To validate the approach, we pursued our analysis with two different proteins. First, hen egg white lysozyme (HEWL) to validate the general applicability. Second, a construct of the pleckstrin homology (PH) domain of the TFIIH subunit p62 from Chaetomium thermophilum (p62 PH), encompassing the first 109 residues. The structure of this target has so far not been determined by other means.

Figure 1
figure 1

Anomalous scattering of cesium plotted against energy. The plot was generated with data from27.

Results

HEWL crystallization and phasing

As a proof of concept for the feasibility of our strategy, we initiated our analysis with different approaches to HEWL crystallization. As outlined above, we introduced CsCl at different stages in the crystallization process. Starting with the substitution of standard buffer components like KCl or NaCl we supplemented HEWL with 0.25 M CsCl, a salt concentration that is commonly used in protein buffers for crystallization. Additionally, we supplemented CsCl in the crystallization buffer and/or the cryo-protectant solution (see Table 1). In the case of HEWL the addition of CsCl did not affect crystal growth in any of the evaluated approaches. Crystal morphology or the space group were not affected either indicating no major impact on protein quality or crystallization behaviour (Fig. 2). Subsequently, data sets were collected from the crystals obtained from these different approaches to compare the feasibility and success rate. All approaches led to crystals that could be phased using the anomalous signal as described in the methods section. All cesium sites that were identified during the different approaches have been numbered and are depicted in Fig. 3.

Table 1 Crystallization and cryo-protectant conditions of HEWL crystals.
Figure 2
figure 2

Crystallization of HEWL. (a) HEWL crystallized in the presence of NaCl. (b) HEWL crystallized in the presence of CsCl.

Figure 3
figure 3

Overview of all observed cesium sites in HEWL. Numbering is coherent with Table 5. HEWL is represented as surface, cesium ions as spheres. (a) Front view. (b) Side view left. (c) Back view. (d) Side view right.

The cesium substructure, after supplementing with CsCl at the different stages towards crystallization, is depicted in Fig. 4. The substructure of CsCl only present in the protein buffer is not shown, as no bound cesium ions could be observed. The data collection and refinement statistics and the statistics for the different steps in the structure solution process provided by the Crank2 pipeline are shown in Tables 2, 3 and 4, respectively. The occurrence and occupancies of the cesium sites for all HEWL datasets are summarized in Table 5.

Figure 4
figure 4

Cesium substructure of HEWL after supplementing with CsCl. HEWL is represented as grey surface, cesium ions are represented as spheres. Orange meshes display the anomalous density contoured at 3 σ. (a) Dataset #2: Protein dissolved in a solution containing 0.25 M CsCl and supplemented with 0.25 M CsCl in the cryo-protectant solution. (b) Dataset #3: Protein dissolved in H2O and supplemented with 1.71 M CsCl in the cryo-protectant solution. (c) Dataset #4: Protein dissolved in a solution containing 0.25 M CsCl and supplemented with 1.71 M CsCl in the cryo-protectant solution.

Table 2 Data collection and refinement statistics of HEWL datasets.
Table 3 Phasing procedure for HEWL datasets.
Table 4 Phasing and structure solution data for HEWL.

Table 5 shows that cesium sites could only be observed when CsCl was used in the buffer and was at least present in the cryo-protectant. The anomalous signal of the data where CsCl is present in the experiment also improves supporting that Cs is incorporated at stable positions (Table2). In line with this the HEWL dataset #1 in which no Cs site could be detected shows the smallest anomalous signal as indicated by the RCR (Rms Correlation Ratio) anomalous. However, this dataset could still be solved using the standardized approach indicating that the sulfur signal was picked up for successful phasing. The phasing success rate and the statistics given at each step for the Crank2 pipeline (Table 4) support the observation that Cs incorporation is beneficial for phasing HEWL. We observe a clear step and concentration dependent effect of CsCl in the figure of merit (FOM) derived from the initial phases and after initial density modification (Tables 1 and 4) thus further supporting that HEWL phasing has benefited from the described procedure and phasing statistics have improved as compared to dataset #1. In addition, the number of sites show a clear concentration dependent effect (Table 5). When comparing datasets #3 and #4, one additional site and a higher overall occupancy sum could be observed for the latter. In comparison to dataset #2 both, #3 and #4, are superior with respect to sites and occupancy. Incorporating CsCl at high concentrations in the crystallization condition and the cryo-protectant yields a high number of sites with the highest occupancy as indicated in dataset #5. In summary, our data indicate that CsCl is a feasible phasing option and is easily incorporated into protein structures using different approaches. More importantly, we could observe that CsCl is readily interchangeable with commonly used salts like KCl and NaCl. However, one caveat in this experimental approach was that the structure of HEWL could also be solved using the standardised SAD procedure for Cs phasing in the absence of Cs sites indicating that the anomalous signal derived from the sulfur sites is also present in the phasing procedure for Cs thus impairing a final judgement on the feasibility of this strategy.

Table 5 Occurrence, occupancy and B factor of cesium sites in HEWL for the different datasets.

Purification and crystallization of p62 PH in the presence of CsCl

To overcome the aforementioned problem of an unbiased de novo phasing approach we applied our strategy to a novel, not yet by X-ray crystallography characterized protein. As a target for the de novo phasing approach we chose a subdomain of the p62 protein from the TFIIH complex of the eukaryote Chaetomium thermophilum. We cloned the pleckstrin homology domain of p62 (p62 PH) and overexpressed it in Escherichia coli (see methods section for details). The His-tagged protein was first purified via affinity chromatography. The subsequent size exclusion chromatography (SEC) was performed either in NaCl-buffer or in CsCl-buffer to assess whether CsCl has an effect on protein quality or oligomerisation. The elution profiles are virtually identical, revealing no significant effect when cesium was utilized instead of sodium in this step of the purification process (Fig. 5). Furthermore, the presence of CsCl in the crystallization solution did not impact crystallization as depicted in Fig. 6. Taken together, these results further support that CsCl may be highly compatible with purification and crystallization of macromolecules. The different approaches are summarised in Table 6. The final data collection and refinement statistics are provided in Table 7.

Figure 5
figure 5

SEC elution profiles of the p62 PH domain in NaCl-buffer (red) and CsCl-buffer (black).

Figure 6
figure 6

Crystallization of p62 PH. (a) p62 PH crystallized in KCl. (b) p62 PH crystallized in CsCl.

Table 6 Crystallization and cryo-protectant conditions of p62 PH crystals.
Table 7 Data collection and refinement statistics of p62 PH datasets.

Phasing and structure solution of p62 PH

Using our above described strategy, we were able to solve and build the complete p62 PH protein model. We succeeded with 4 of the 6 employed approaches for phasing (Tables 6, 7, 8 and 9). The same phasing strategy as for HEWL was employed to obtain comparable results. The phasing statistics for the p62 pH domain improved with the stepwise addition of CsCl, indicating incorporation of Cs that can be harnessed during the phasing procedure. This is again reflected by the anomalous signal of the datasets representing the different approaches. The first two datasets that could not be phased experimentally show no or only a very small anomalous signal (RCR anomalous, Table 8). With the stepwise increase in CsCl concentration in the experiment, the anomalous signal increased to values between 1.5 and 1.9 as defined by the overall anomalous RMS correlation ratio given by aimless. However, the FOM derived from the initial phases did not permit a clear distinction on the success rate since only the last two datasets containing the highest concentrations of CsCl in the experimental approach showed better FOM values compared to the other datasets. After initial density modification the FOMs improved and ultimately led to successful automated structure solution. Data sets #3 and #4 only led to a significant solution after the automated model building routine was employed, suggesting that the signal that can be derived from Cs was very weak but could be utilized. To obtain an overview for all the approaches, unsuccessful de novo phasing cases were phased via rigid body refinement against models from solved datasets. All cesium sites that were observed during the different approaches (Tables 6, Table 10) have been numbered and are depicted in Fig. 7.

Figure 7
figure 7

Overview of all observed cesium sites in the p62 PH domain. Numbering is coherent with Table 10. P62 PH is represented as grey surface, cesium ions as yellow spheres. (a) Front view. (b) Side view left. (c) Back view. (d) Side view right.

The cesium substructure after supplementing with CsCl at different stages of the purification and crystallization process is depicted in Fig. 8. For treatment with CsCl only during SEC (crystal #2), a low anomalous peak was observed at site 4. Compared to p62 PH without CsCl treatment (crystal #1) the anomalous peak at this site is higher for dataset #2 (Fig. 9). Thus this site was modelled as potassium in dataset #1 and a Cs in dataset#2 (Fig. 9) and the following datasets. However, the resolution of dataset #1 was lower compared to #2 (Table 7).

Figure 8
figure 8

Cesium substructure of the p62 PH domain after supplementing with CsCl at different stages of the purification and crystallization process. P62 PH is represented as grey surface, cesium ions are represented as spheres. Orange meshes display the anomalous density contoured at 3 σ. (a) Dataset #2: Protein purified in CsCl-buffer. (b) Dataset #3: Protein purified in CsCl-buffer and supplemented with 0.25 M CsCl in the cryo-protectant solution. (c) Dataset #4: Protein purified in NaCl-buffer and supplemented with 0.75 M CsCl in the cryo-protectant solution. (d) Dataset #5: Protein purified in CsCl-buffer and supplemented with 0.75 M CsCl in the cryo-protectant solution.

Figure 9
figure 9

Comparison of cesium site 4 of p62 PH supplemented with CsCl only during SEC to the corresponding potassium site in p62 PH without CsCl treatment. Orange meshes display the anomalous density contoured at 3 σ. (a) Dataset #2: P62 PH supplemented with CsCl. (b) Dataset #1: P62 PH without CsCl supplement.

The occurrence and occupancies of the final individual cesium sites for all datasets are listed in Table 10, alongside with the overall occupancy sum and average occupancy per site. Cesium site 1 poses a particular case as it lies on a special position, i.e. a crystallographic two-fold axis (Fig. 10). In this case the doubled occupancy is given. Crystals #3 and #4 display slightly different unit cell parameters (Table 7), going along with a disordered loop region for these datasets (Fig. 11a). As cesium site 3 is coordinated by this loop (Fig. 11b), this site is absent in crystals #3 and #4.

Figure 10
figure 10

Cesium site 1 in p62 PH occupies a special position. (a) Cartoon representation of two p62 PH molecules related by a crystallographic two-fold axis perpendicular to the paper plane. The cesium ion located on this axis is represented as a sphere. (b) Detailed view of cesium site 1. The orange mesh displays the anomalous density contoured at 3 σ.

Figure 11
figure 11

Superposition of p62 PH models from crystals with different unit cell sizes. (a) Surface and cartoon representation in grey correspond to dataset #5. Cesium sites are displayed as spheres. The cartoon in red corresponds to dataset #3. The loop region at the top is not present in the red model, due to disorder. (b) Detailed view of the loop region with cesium site 3. The orange mesh displays the anomalous density contoured at 3 σ.

The analysis of supplementing with CsCl during the purification, crystallization, or cryo-protection process (Table 10), indicates an additive effect with respect to bound ions and overall occupancy, which is in line with the observations during the automated phasing procedure employed by the Crank2 pipeline. The comparison of datasets #4 and #5 reveals three additional cesium sites and a higher overall occupancy sum for the latter. A beneficial application of this result can be observed in dataset #3, where the CsCl supplement during SEC was combined with a lowered CsCl concentration in the cryo-protection step. As expected, this approach resulted in fewer occupied sites and a lower overall occupancy, yet this procedure was still sufficiently powerful to overcome the phase problem by means of SAD (Table 8 and 9). Importantly, in contrast to HEWL, phasing for p62 PH was only possible with the CsCl approach, whereas S-SAD alone or MR was not successful. For MR, an NMR model of the human PH domain was available as search model (PDB code: 1PFJ). Comparison of this search model with our p62 PH structure yielded an RMSD of about 2.4 Å, indicating significant deviations of both models. Especially, region 110–120 from p62 PH differs from the search model. These findings strongly emphasize the benefit of our approach for a de novo phasing strategy that is highly compatible with the purification and crystallization workflow.

Discussion

CsCl was introduced during all three major steps of sample treatment in crystallography: purification, crystallization, and cryo-protection. No detrimental effects during SEC (Fig. 5), crystallization (Fig. 2, Fig. 6), or cryo-protection could be observed. Ultimately, de novo structure solution by means of SAD was successful employing our strategy for p62 PH (Table 6 and 9), whereas the S-SAD approaches failed. Remarkably, even low incorporation as shown for datasets #3 and #4 support structure solution. The expected electrons from Cs as compared to S at the employed wavelength (1.7712 Å) should lead to a signal for Cs that is approx. 12 times higher than for S27 thus permitting successful phasing with one Cs site that is only partially occupied in the case of p62 PH. The expected higher signal of Cs is reduced in all cases that we analysed since the sites were only partially occupied. However, beneficial effects for phasing can still be observed due to the much higher expected signal. The high compatibility with all three steps in protein handling renders CsCl a highly versatile compound for experimental phasing and enables a flexible adjustment of heavy atom introduction, depending on the specific needs for a particular project.

Due to its compatibility with purification, crystallization, and cryo-protection, CsCl can be used in various ways. First, usage in cryo-soaks (“quick-soaks”) as described for halides by Dauter et al.21 is possible, as demonstrated for p62 PH dataset #4. Cryo-soaking with cesium provides a good alternative to soaking with halides, especially when crystals suffer from halide treatment or no bound halides can be obtained due to unfavourable surface charge of the target protein. Here, the opposite charge of cesium can be beneficial. Second, supplementing with CsCl at an early step of protein handling. i.e. during SEC can be combined with cryo-soaks. For both proteins tested in this study, additive effects with respect to bound ions and overall occupancy could be observed. The boosted anomalous signal might be beneficial for difficult borderline cases. Third, this additive effect can be exploited to reduce the CsCl concentration in the cryo-protection step. This approach might be beneficial for proteins, which can only tolerate limited amounts of CsCl, as this procedure would provide much milder soaking conditions. Fourth, if NaCl or KCl are present in the crystallization condition, co-crystallization with CsCl can be conducted. Substitution of NaCl or KCl with CsCl has been successfully pursued for HEWL and p62 PH, respectively.

We therefore suggest to introduce CsCl in the work flow at the earliest possible stage i.e. at the SEC step if applicable. It remains to be investigated whether our protocol can be applied successfully to cases where significantly larger proteins need to be phased with CsCl. However, given the strong anomalous signal provided by Cs at energies that can be readily accessed at most synchrotron beamlines and the high compatibility with current protein purification and crystallization strategies application to larger proteins seems highly feasible.

The usage of CsCl provides an elegant, easy to use, and low cost phasing strategy. No special equipment is needed and the procedure can be seamlessly integrated into the common procedure of sample treatment. CsCl is broadly commercially available and much cheaper as for example seleno-methionine and the potent anomalous scattering propensities make cesium a very powerful agent for phasing. The phasing procedure with CsCl permits a flexible adjustment to the specific needs of a particular project and can be performed in a step-wise procedure.

Methods

Reagents, buffers, and protein preparation

The composition of the purification buffers of the p62 PH domain are the following (IMAC: immobilized metal affinity chromatography, SEC: size exclusion chromatography).

Lysis buffer: 50 mM CHES pH 9.0, 0.3 M NaCl, 5 mM Imidazole.

Elution buffer (IMAC): 50 mM CHES pH 9.0, 0.3 M NaCl, 0.25 M Imidazole.

NaCl-buffer (SEC): 20 mM CHES pH 9.0, 0.25 M NaCl.

CsCl-buffer (SEC): 20 mM CHES pH 9.0, 0.25 M CsCl.

The DNA sequence encoding the p62 PH domain from Chaetomium thermophilum was cloned into a pBADM-11 vector (EMBL) with an N-terminal 6 × His-tag and a TEV cleavage site. P62 PH was expressed in Arctic Express (DE3) RIL cells (Agilent). After cell harvest, the pellet was resuspended and lysed in Lysis buffer, and purified in two steps. First, IMAC was performed using Ni-TED beads (Macherey–Nagel) and bound protein was eluted with Elution buffer. Second, SEC was performed using a HiLoad 16/600 Superdex 200 pg column (Cytiva) with either NaCl-buffer or CsCl-buffer. Peak fractions were pooled and concentrated with centrifugal filter units (Merck Millipore) to 11–13 mg/ml.

HEWL was purchased as dry powder (Carl Roth) and dissolved to reach a concentration of 50 mg/ml in deionized water with 0.1 M sodium acetate pH 4.5, or 0.25 M CsCl. No further purification steps were applied.

Crystallization

Crystallization experiments were performed using the vapor diffusion method. All solutions used for crystallization were filtered through 0.2 µm filters (Sartorius Stedim Biotech) prior to use.

Crystallization of HEWL was pursued via the hanging drop method in 24 well plates (Crystalgen). 3 µl of protein solution at a concentration of 50 mg/ml was mixed with 3 µl precipitant solution and equilibrated against 1 ml of the precipitant solution. Crystals appeared within 1 or 2 days with edge lengths mostly between 200 und 500 µm. Crystallization and cryo-protectant conditions are listed in Table 1.

Crystallization trays of p62 PH were set up via the hanging drop method in 24 well plates. 1 µl of protein solution at a concentration of 11–13 mg/ml was mixed with 1 µl precipitant solution and equilibrated against 1 ml of the precipitant solution. Plate like crystals appeared within 1 or 2 days with edge lengths mostly between 200 and 600 µm, and a thickness of 20–30 µm. Crystallization and cryo-protectant conditions are listed in Table 6.

Crystals were harvested with cryo-loops (Hampton Research) and flash frozen in liquid nitrogen.

Data collection and processing

Data were collected via the rotation method and datasets were indexed, integrated, and scaled with XDS28. One dataset per crystal was collected comprising a full rotation of 360°, except for crystals #3 and #4 of p62 PH. For these, two datasets (2 full rotations of 360°) from one crystal were collected, combined, and brought to a common scale with XSCALE. Data were merged with Aimless29. The HEWL data for the cesium approach were collected to resolutions similar to that of the p62 PH data to obtain more comparable data for the analysis. Data collection and processing statistics are given in Tables 2 and 7 for HEWL and p62 PH, respectively.

Structure solution and refinement

Structure solution was performed using a unified unbiased approach applying the Crank2 pipeline30 that is part of the current CCP4 software package. We deliberately used the default workflow without any modifications, except for the number of SHELXD trials which were raised from 2,000 to 10,000. We used the SAD pipeline that comprises the following setup: 1) Substructure detection with SHELXC,SHELXD31, 2) substructure phasing using refmac532, 3) hand determination using solomon and multicomb, 4) density modification with parrot and refmac5, 5) automated model building with buccaneer33, refmac5, and parrot, and 6) model refinement using refmac5. The resolution cutoff for substructure detection that is suggested by default from SHELXC was used in all cases and ranged between 3.2 and 2.4 Å for all datasets. Phasing was performed using all data. We used 10 initial sites as estimate for the substructure search for all datasets. The structure solution procedure for each dataset is given in Tables 3 and 8 for HEWL and p62 PH, respectively. The main indicators for the quality of each step in the phasing procedure are listed in Tables 4 and 9 for HEWL and p62 PH, respectively. The structures were completed and corrected with Coot34. Structures were refined directly against the SAD data with refmac5. The substructure occupancy was refined as well. Model stereochemistry was analysed via the MolProbity server35. Refinement and model statistics are given in Tables 2 and 7 for HEWL and p62 PH, respectively. Final statistics for the Cs atoms for HEWL and p62 PH are given  in  Tables 5 and 10, respectively. (Tables 8, 9, 10).

Table 8 Phasing procedure for p62 PH datasets.
Table 9 Phasing and structure solution data for p62 PH.
Table 10 Occurrence, occupancy and B factors of cesium sites in p62 PH for the different datasets.

Anomalous difference maps and final ion assignment

Anomalous difference maps were generated by directly refining against the SAD data and were used as guidance for final ion placement. Anomalous peak heights of sulfur from cysteines/methionines were used as reference to distinguish cesium from other ions. Hereby, peaks clearly exceeding the sulfur peak heights were attributed to cesium. Chloride ions were placed based on the comparison with datasets without cesium. Potassium and chloride ions were distinguished by consideration of bonding distances36,37.