Abstract
The breathing motions of proteins are thought to play a critical role in function. However, current techniques to study key collective motions are limited to spectroscopy and computation. We present a highresolution experimental approach based on the total scattering from protein crystals at room temperature (TS/RTMX) that captures both structure and collective motions. To reveal the scattering signal from protein motions, we present a general workflow that enables robust subtraction of lattice disorder. The workflow introduces two methods: GOODVIBES, a detailed and refinable lattice disorder model based on the rigidbody vibrations of a crystalline elastic network; and DISCOBALL, an independent method of validation that estimates the displacement covariance between proteins in the lattice in real space. Here, we demonstrate the robustness of this workflow and further demonstrate how it can be interfaced with MD simulations towards obtaining highresolution insight into functionally important protein motions.
Similar content being viewed by others
Introduction
Structural biology has seen remarkable advances in recent years with cryoelectron microscopy^{1} and structure prediction^{2,3}. However, protein crystallography remains the goto method for obtaining structures at atomic resolution^{4}. The technique is widely accessible and is still the dominant source of depositions in the Protein Data Bank. Crystallography also provides subangstrom coordinate precision and is therefore essential for benchmarking computational methods^{5}, such as simulations and structure prediction. Moreover, crystallography produces diffuse scattering^{6}, an untapped source of information on subtle protein motions that underlie processes such as allostery, catalysis, and signaling^{7,8}.
Crystal structures are often thought to represent static snapshots, but in fact, protein motions occur within the watery environment of crystals^{9}. Diffuse scattering is a direct consequence of this motion, appearing as a structured, continuous signal in the background of diffraction images^{10}. By studying the total scattering of a protein crystal (combining Bragg diffraction with diffuse scattering), crystallography has the potential to simultaneously provide a highresolution average structure and information on correlated atomic displacements^{5}. Until recently, diffuse scattering analysis was largely considered intractable, but with advances in roomtemperature data collection^{11}, the widespread availability of direct Xray detectors^{12}, and new data processing software^{13}, it has now become feasible to routinely measure highly accurate diffuse scattering maps. However, although extensive efforts have been made in understanding protein diffuse scattering^{7}, a general workflow for utilizing this information has not yet been realized. In order to fulfill the promise of diffuse scattering in structural biology, it is essential to establish robust workflows for data processing, set standards for modeldata agreement, and provide benchmark examples in the form of simulations and highquality experimental data.
Our recent study of lysozyme in the triclinic (P1) space group showed that it is possible to account for the total scattering from a crystal entirely and selfconsistently using physicallymotivated atomistic models^{13}, and thus it can serve as a potential roadmap for establishing a standard workflow. A key advance from this study was the characterization of intense halolike scattering around the Bragg peaks, which arise from correlated displacements of protein chains in different unit cells. Supercell simulations demonstrated that these correlations are longranged and consistent with phononlike lattice vibrations, where proteins fluctuate about their average positions. To isolate internal motions of proteins from these external motions, it was first necessary to explain the halo features in the diffuse scattering map, which we achieved by fitting halos with a crystalline elastic network model treating the protein chain as a rigid body. Once the contribution of lattice disorder was accounted for, we were able to show that the remaining diffuse scattering signal and the Bfactors from Bragg refinement were consistent with internal protein motions. In developing a workflow, the next step is to establish the accuracy and generality of lattice disorder models and develop tools for model validation. Ultimately, diffuse scattering analysis must provide new insight into biochemically relevant questions, and thus, it is also important that the workflow outputs data in a form that can be directly compared with atomistic modeling such as molecular dynamics (MD)^{5}.
Here, we introduce computational tools and demonstrate a robust workflow to isolate the internal motion signal from total scattering data (Fig. 1). To model lattice disorder, we present GOODVIBES, a general crystalline elastic network model and optimization routine. This method allows for multiple rigid bodies per unit cell, as found in highsymmetry space groups, and accounts for symmetry in its parameterization of the elastic network. To validate the lattice disorder model, we present an independent method called DISCOBALL, which estimates rigidbody displacement covariances for pairs of protein chains in the crystal by deconvolution of the 3D pair distribution function or 3DΔPDF (the Fourier transform of the diffuse scattering intensities). Using simulated data and experimental datasets from three lysozyme polymorphs, we show that GOODVIBES and DISCOBALL in combination can be used to accurately model lattice disorder scattering. Finally, we show that the signal from internal protein motion can be recovered and compared quantitatively with crystalline MD simulations. The demonstration of a general workflow for diffuse scattering analysis lays the groundwork towards obtaining atomistic insight into correlated protein motions from experimental data.
Results
A robust workflow to isolate internal motion signal from total scattering data
An overview of a general workflow for the analysis of total scattering from protein crystals is shown in Fig. 1. The first step begins with data collection and reduction, the procedures for which we previously demonstrated^{13}. Briefly, diffraction images are acquired from protein crystals at room temperature as well as from the background scattering using a roomtemperature macromolecular Xray crystallography (RTMX) setup. Data reduction can then be performed with the mdx software library, which we introduced previously^{13}. Measurement of the background scattering and a careful scaling procedure (such as that implemented in mdxlib) allow for a highquality reconstruction of the total scattering (TS) on an absolute scale (electron units), which includes Bragg peak intensities and a threedimensional diffuse scattering map. The timeaveraged electron density of the unit cell is then determined by conventional structure refinement of the Bragg data, along with the mean atomic coordinates and atomic displacement parameters (ADPs) or Bfactors. The ADPs represent the motion of each atom, while the diffuse scattering map contains information on how these motions are correlated.
In our previous work^{13}, we showed that correlated motions arise from two sources in protein crystals: the motion of atoms within a protein (internal motion) and deviations from the ideal arrangement of proteins in the crystal (external motion or lattice disorder). Lattice disorder tends to be longranged, and therefore it produces characteristic halos around the Bragg peaks, while correlations from internal motion are mostly shortranged and produce smoothly varying, cloudy patterns. Although the two types of signals have distinct appearances, they cannot be simply separated in reciprocal space because the halo features are remarkably broad and overlap significantly with the much weaker and more nuanced cloudy pattern. Therefore, we aimed to develop additional tools based on a physical model of lattice disorder to subtract its contribution from the diffuse scattering map and the ADPs.
First, it was necessary to derive a general physical model of lattice disorder that can be refined to fit the diffuse halos (Fig. 1, green box). Based on the success of parameterized crystalline elastic network models in the case of triclinic lysozyme^{13}, we chose to extend those techniques to arbitrary space groups with multiple rigid bodies per unit cell. We call the parameterization and refinement method GOODVIBES for General Optimization Of Diffuse halos from VIBrational Elastic network Simulations. The GOODVIBES model for lattice disorder represents proteins as rigid bodies arranged in a supercell with periodic boundary conditions (Fig. 2a). The size of the supercell is chosen to be large enough to account for the longranged correlations implied by halo features in the experimental map. Rigidbody motion is enforced by adopting a generalized coordinate system for small displacements, which is a reasonable assumption for the subtle structural fluctuations we expect. The coordinate system, which is identical to that used in translation, libration, screwaxis (TLS) refinement, assigns each rigid body six degrees of freedom representing small translations and rotations. The close contacts between proteins in the lattice are modeled using a network of intermolecular springs, whose functional form and strength can be tuned. The dynamics of the system are computed using Newtonian mechanics and equipartition of thermal energy (see Methods).
A key advantage of the GOODVIBES model is that both the displacement covariance matrix and the predicted diffuse scattering can be calculated analytically with minimal computational cost (see Methods for details). This allows for a closed refinement loop, where the parameters in the model are optimized to improve the fit between simulated and measured diffuse scattering (Fig. 2b). We demonstrated this approach previously with triclinic lysozyme^{13}. Here, we extend the modeling to unit cells with more than one protein chain with a parameterization method that accounts for space group symmetry by forcing springs related to a symmetry operator to have the same parameters.
Second, we required an independent method to test whether the model for lattice disorder is sufficiently accurate (Fig. 1, magenta box). We developed a modelfree approach, DISCOBALL, for DISplacement COVariance Between ALl Lattice neighbors. DISCOBALL was inspired by features observed in the 3DΔPDF of triclinic lysozyme^{13}. While the shortrange contributions of internal protein motions are focused near the origin of the 3DΔPDF, sharp peaks are present at the lattice nodes as a direct consequence of halo features centered at reciprocal lattice nodes in the diffuse map (the Fourier transform of a lattice is another lattice). These sharp 3DΔPDF peaks have a simple physical interpretation: they arise from correlated displacements of atoms separated by a lattice translation. The position of the peak corresponds to a vector between unit cells in the crystal, and all atoms offset by that vector contribute to the peak if their motions are correlated.
The mathematical form of the 3DΔPDF peaks can then be derived by considering the diffuse scattering arising from correlated motions of rigid bodies that are separated by unitcell vectors and then taking the Fourier transform (see Methods). For simplicity, we may assume that translational motions of unit cells are the dominant contribution as rotational motions are only likely to be correlated between nearest neighbors. This can then be generalized to the case where there are multiple rigid bodies in a unit cell that are related by space group symmetry (see Methods). The mathematical form of the 3DΔPDF peaks can be interpreted as a filtered version of the origin peak in the Patterson map (the autocorrelation of the electron density), where the filtering function depends on the displacement covariance, or jointADP, of pairs of proteins (see Methods). From a signal processing point of view, the 3DΔPDF peaks are exactly a convolution of the Patterson origin peak with the filtering function. Using the known Patterson function (e.g., from Fourier transform of the Bragg intensities), the jointADPs can be recovered by deconvolution (Fig. 2c).
To test the DISCOBALL method on a realistic dataset, we simulated the diffuse scattering from a GOODVIBES model for triclinic lysozyme, added Gaussian random noise comparable to the experimental map, and computed the 3DΔPDF. We found that DISCOBALL was able to recover both the magnitude and shape of the jointADPs (Supplementary Fig. 1). For a more quantitative test, we derived two statistical metrics for the similarity between jointADPs: the total covariance (trace of the jointADPs, Eq. (22) in Methods) and the anisotropic covariances (elements of the jointADP with isotropic part subtracted, Eq. (23) in Methods). By comparing these metrics with corresponding ground truth values, we find that DISCOBALL is also able to recover jointADP magnitudes and shapes with high fidelity (Pearson correlation of 0.999 and 0.988, for isotropic and anisotropic metrics respectively).
After lattice dynamics are adequately modeled, the final step in the workflow is to subtract its contribution and obtain only the signal from internal protein motion. Previously, a major obstacle to this approach was the potential for errors in the refined structure model to manifest in the diffuse scattering simulation, which would then produce even more significant errors in the subtracted signal^{13}. Here, we have addressed this concern by using experimental structure factor amplitudes in the GOODVIBES simulation. This approach succeeds because measurement precision is usually much better than modeldata agreement in protein crystallography^{14} (for examples studied here, R_{pim} ~1% vs. R_{work} ~10%). When investigating the accuracy of our Bragg data, we unexpectedly found that intensities were systematically suppressed in an intensity and resolutiondependent manner. Interestingly, this behavior is reminiscent of dynamical scattering (extinction), and we corrected for it using a method developed for smallmolecule crystallography (Supplementary Fig. 2). We also subtract the ADPs according to GOODVIBES for each of the atoms in the refined structure. These residual ADPs more accurately represent the internal motions of the protein.
Our workflow provides the opportunity to compare experimental data with atomistic simulations. For greatest accuracy, allatom MD is needed to reproduce both the correlations internal to the protein as well as its surrounding solvent^{7}. Allatom MD can also account for effects of crystal packing by using a supercell with periodic boundary conditions^{15}. Previously, such simulations in the crystalline state have been found to reproduce diffuse scattering data with reasonable accuracy^{16} including halos if increasingly large supercells are used^{13}. Here, we asked whether smallscale simulations, of one or a few unit cells, might be compared with the subtracted maps in order to maximize the proportion of the scattering signal from internal motion. A potential issue with such comparisons is that smallscale simulations still include external protein motion to some extent. Since GOODVIBES also involves a periodic supercell, we reasoned it could estimate the amount of external motion that ought to occur in an MD simulation of arbitrary size. This allowed us to create a target diffuse map for direct comparison with any particular crystalline MD simulation.
The procedure for creating target diffuse maps is described in Methods. Briefly, we first fit a GOODVIBES model and subtract its scattering contribution from the experimental data, as described above. The subtracted map is on the same Miller index grid as the experimental data. To create the target map, it is first necessary to interpolate the subtracted map onto the Miller index grid of the MD simulation (e.g. for a single unit cell, points with integer Miller indices). The subtracted maps are noisy, both from photon counting statistics as well as errors in the simulated intensities, and they also contain gaps at Bragg peak locations. To account for these errors and gaps during interpolation, we take advantage of the expected smoothness of diffuse scattering from internal motions. This smoothness comes about because interatomic correlations are restricted to a maximum distance defined by the protein envelope, i.e. the diffuse scattering is a bandlimited function. For a smoothing interpolant we chose an errorweighted SavitzkyGolay filter^{17} with a kernel corresponding to the reciprocal dimensions of the protein. The effect is similar to a lowpass filter, but it also handles outliers, missing data, and error propagation (see Methods for details). Finally, to produce the target map, the diffuse scattering from external motions is simulated using GOODVIBES with the MD supercell and added to the subtracted, interpolated map. The target map can then be compared directly with the MD simulation on an absolute scale.
Application of GOODVIBES and DISCOBALL to experimental datasets from lysozyme polymorphs
To test the workflow on experimental data, we applied it to TS/RTMX datasets from three lysozyme polymorphs: triclinic lysozyme reported previously^{13,18,19} and new datasets from orthorhombic (P2_{1}2_{1}2_{1}) and tetragonal (P4_{3}2_{1}2) crystals (Fig. 3a, top row). Because protein crystals are highly susceptible to radiation damage at room temperature, the total dose was distributed by collecting narrow wedges from multiple locations on each crystal and from multiple crystals when necessary to obtain a complete dataset (Fig. 3a, middle row and Supplementary Table 1). The maximum tolerable dose of ~65 kGy^{13} was set conservatively by monitoring the decay of the Wilson Bfactor as well as evidence for disulfide bond reduction in the electron density maps (Supplementary Fig. 3). Crystals from each polymorph diffracted to atomic resolution allowing anisotropic ADPs to be refined for each atom, and stereochemically highquality structures with low Rfactors were obtained (Supplementary Table 2).
Next, the maps of diffuse scattering from orthorhombic and tetragonal lysozyme were reconstructed from the diffraction images (Fig. 3a, bottom row, and Supplementary Table 3). Low crystal mosaicity and fine phi slicing of 0.1 degree allowed for sampling on a fine grid, comparable to the triclinic map reported previously^{13,19}. All three datasets contain strong halo features at the Bragg peak locations (Fig. 3b, bottom row). To model these halo features, a subset of 400 halos was selected for each dataset (blue rectangles in Fig. 3b), and GOODVIBES was used to parameterize and optimize the spring constants of a crystalline elastic network model (Fig. 3b, top row). After refinement, the diffuse map was simulated throughout reciprocal space. For each of the three polymorphs, GOODVIBES simulations show remarkable agreement with the observed diffuse patterns (Fig. 3b, middle row). The simulations reproduce the standard deviation of intensity in all resolution bins (Fig. 3c, upper axes), and the correlation coefficient in each resolution shell is close to the theoretical limit allowed by signaltonoise of the data (CC vs. CC*, Fig. 3c, lower axes).
To quantify how well the GOODVIBES models explain lattice disorder in each lysozyme polymorph, we performed a complementary DISCOBALL analysis. The diffuse maps were preprocessed to remove the isotropic scattering and fill in missing voxels (such as at Bragg peak locations), and the 3DΔPDFs were computed using a Fourier transform as described in Methods. The 3DΔPDFs displayed sharp peaks at each lattice point that become weaker with distance from the origin peak (Supplementary Fig. 4). The jointADPs between neighboring chains were obtained by deconvolution of the Patterson origin peak as described in Methods. The number of unique jointADPs that can be determined by DISCOBALL analysis depends on the number of peaks in the ASU of the 3DΔPDF, which is a function of space group symmetry, unit cell size, and reciprocal space sampling (Supplementary Table 3). The GOODVIBES model, in contrast, predicts displacement covariances for each pair of proteins in the crystal (the number is comparable for the three polymorphs). Thus, for the triclinic case, DISCOBALL can crosscheck all of the jointADPs predicted by GOODVIBES model, while for orthorhombic and tetragonal only a subset can be compared.
JointADPs from DISCOBALL and GOODVIBES analyses are compared in Fig. 4a. Overall, the total displacement covariance shows a similar decay profile for all GOODVIBES models (Fig. 4a, green symbols). A similar overall behavior is expected since the models have the same essential physics (lattice vibrations). The total covariances are negative at large distances. This can be understood as a consequence of the finite supercell size. Since the supercell’s center of mass is fixed in space, correlated motion at short distance must be canceled by anticorrelated motion at large distance. The covariances from DISCOBALL analysis show a similar overall decay (Fig. 4a, blue symbols), however the decay approaches closer to zero than the GOODVIBES predictions. In DISCOBALL analysis, the extra long distance information comes from the interpolation of the diffuse map at the Bragg peak locations. For a quantitative comparison, the different asymptotic behaviors at large distances were taken into account by fitting a straight line to the scatter plot of DISCOBALL vs. GOODVIBES total covariances (Fig. 4b). As expected, the line does not pass through the origin (it falls below \(y=x\)). The agreement between the two methods was quantified using the Pearson correlation coefficient (\(r\)) corresponding to the linear fit. Based on this quantitative analysis, the GOODVIBES models do an excellent job describing the total covariance, with \(r\) values in excess of 0.995 (Fig. 4b).
We also tested the ability of GOODVIBES models to account for the anisotropic components of the jointADPs. In all three polymorphs, the GOODVIBES models and DISCOBALL analysis show a strong correlation (Fig. 4c). Interestingly, orthorhombic and tetragonal lysozyme have similar magnitudes of the total correlation (Fig. 4b, middle vs. bottom panels), but the spread of anisotropic components for orthorhombic is approximately double that for tetragonal (Fig. 4c, middle vs. bottom panels).
To better understand how crystal mechanics give rise to such strong anisotropy in the case of the orthorhombic crystal, we overlaid a diagram of the crystal lattice with the jointADPs shown as isosurface ellipsoids (Fig. 4d). The strength and directionality of the mechanical couplings between proteins can be visualized by the size and shape of the ellipsoids (which represent the magnitude and anisotropy of their jointADPs). The couplings relative to the asymmetric unit (Fig. 4d, yellow protein) are strong in the ±a, ±b, and +c directions (Fig. 4d, double arrows). In comparison, couplings are weak in the −c direction across the continuous solvent channels running parallel to the a axis (Fig. 4d, purple regions). In addition, the jointADPs are elongated relative to the proteinprotein vector in the ±b directions, where the sequence of direct interactions is unbroken by solvent channels. Thus, the anisotropy of jointADPs can be understood as a consequence of the topology of solvent channels and crystal contacts.
After validating the model of lattice disorder, the next step is to subtract the lattice disorder component and examine the remainder. First, we examined the lattice contributions to the total ADPs. Although GOODVIBES models are refined to diffuse scattering data, they directly predict TLS matrices (Supplementary Table 3), which in turn can be converted into anisotropic ADPs for each atom (see Methods). We compared these model ADPs directly with the individual anisotropic ADPs obtained from structure refinement. For visual clarity, we grouped backbone atoms by residue and averaged their equivalent isotropic Bfactors (Eq. (6) in Methods). These perresidue Bfactors are plotted for each lysozyme polymorph in Fig. 5a. A distinct pattern of variation is observed in each polymorph (Fig. 5a, black line and points). The median backbone Bfactor for triclinic lysozyme is 8.24 Å^{2}, approximately half of the median backbone Bfactors of orthorhombic and tetragonal (17.66 and 17.74 Å^{2} respectively).
The lattice disorder ADPs obtained from GOODVIBES include both overall translational motion, which is the same for all atoms (Fig. 5a, light blue region), and rotational motion, which causes the ADP to vary depending on atomic position (Fig. 5a, dark blue region). The translational lattice motion varies between polymorphs, with equivalent Bfactors of 3.06, 8.08, and 8.84 Å^{2} for triclinic, orthorhombic, and tetragonal, respectively (Fig. 5a, light blue region). As noted above, the orthorhombic and tetragonal forms have median Bfactors that are approximately twice that of triclinic. The same trend is observed in the lattice components, suggesting that lattice disorder is largely responsible for the Bfactors of the most ordered atoms in a protein crystal.
The contribution of internal protein motion to diffuse scattering can be visualized readily in the 3DΔPDF, since the signal from shortranged correlations is concentrated near the origin. We computed the 3DΔPDF for the experimental diffuse maps and GOODVIBES simulations (Fig. 5b). In all polymorphs, variations in the measured 3DΔPDFs agree with the GOODVIBES simulations for \(d > 10\) Å, but the variations are stronger in the region \(d < 10\) Å where internal motions are expected to contribute^{13}. The residual 3DΔPDF (total minus simulation) decays rapidly with distance from the origin in a similar manner for each polymorph (Fig. 5c). These features are consistent with the hypothesis that the residual signal derives from internal protein motion.
Application of GOODVIBES to match experimental data to MD simulations
To demonstrate the final step in our workflow, a single unitcell MD simulation of tetragonal lysozyme was performed (Supplementary Fig. 5), and the diffuse scattering was calculated from the snapshots. Because the simulation had periodic boundary conditions, the diffuse scattering is defined only at the points in reciprocal space with integer Miller indices. For comparison, we reprocessed the tetragonal dataset to produce a target map, which includes external motion of the proteins that are compatible with the MD supercell (Supplementary Fig. 6a) using the method described earlier. Overall, the MD simulation and the target maps have a remarkably similar appearance (Fig. 6a). They also show quantitative agreement (Fig. 6b). Both include a strong isotropic ring of scattering at ~3 Å resolution (Fig. 6b, top panel). The MD simulation also reproduces the magnitude of scattering variations in each resolution shell, including a distinctive shoulder at ~2 Å resolution (Fig. 6b, middle panel). A comparison of MD and the target map’s internal and external components provides compelling evidence that external protein motions alone cannot explain diffuse scattering (Supplementary Fig. 6b) and furthermore, that the atomic fluctuations are captured well by MD.
Previous comparisons between MD and experiment have relied on correlation coefficients computed within resolution shells^{13,16,20}. We applied a similar approach using the target diffuse map (Fig. 6b, bottom panel). MD agrees best at low resolution (CC ~0.8 at 10 Å), and the agreement is somewhat reduced at higher resolution (CC ~0.5 at 2 Å). Although far from perfect, these are some of the best correlations reported to date for a single unitcell simulation^{7}. Despite only being a single unitcell simulation, MD is able to describe all aspects of the target map (mean, standard deviation, and precise pattern) remarkably well (Fig. 6b). Most importantly, we demonstrate a workflow for direct comparison between experimental diffuse scattering and a smallscale MD simulation. With this development, we can envision a way to experimentally validate atomistic simulations of protein motions.
Discussion
Protein motions involved in key functions, such as catalysis, are thought to occur at the subangstrom to angstrom scale^{21,22,23,24,25}. However, few experimental techniques are sensitive to the subtle intrinsic motions of proteins. Diffuse scattering from protein crystals has long been proposed as a promising approach as it contains information on the correlation of atomic displacements, i.e., how atoms move with respect to each other. Although the idea of using diffuse scattering was conceptualized decades ago^{7,26}, major bottlenecks stood in the way of realizing this as a general technique. With the recent introduction of direct Xray detectors^{12} and data collection and processing strategies^{13}, diffuse scattering data can now be measured with high accuracy. Before we can interpret the diffuse scattering signal from proteins, however, the more dominant signal arising from lattice disorder must be accounted for.
In this work, we developed a robust workflow for total scattering (TS) analysis that enables the complete characterization of correlated protein motion from RTMX experiments. We demonstrated two applications with an immediate impact on Xray crystallography. First, we are able to partition the ADPs into parts arising from internal and external motion (i.e. protein motion vs. lattice disorder). This enables a more nuanced and unambiguous interpretation of structural heterogeneity, which is needed for making wellsupported inferences about the role of dynamics in protein function. Likewise, we are able to partition the diffuse scattering into internal and external contributions. Consequently, correlated motion can now be compared between different crystalline environments, as we demonstrated with three lysozyme polymorphs, or between experiments and MD simulations of a single unit cell (Fig. 1).
It is remarkable how well the GOODVIBES lattice vibration model is able to reproduce experimental diffuse scattering features. Before this study, few examples of protein diffuse scattering of sufficient quality were available to test models of lattice disorder. Although the elastic vibrations model performed well in our previous work on triclinic lysozyme^{13}, the triclinic crystals have unusual properties, including a small unit cell, low solvent content, and low Bfactors. However, tetragonal and orthorhombic lysozyme are more representative of a typical protein crystal according to statistics from the Protein Data Bank (PDB)^{27}. Thus, the fact that elastic vibrations describe diffuse scattering from all three lysozyme polymorphs gives confidence that it can be applied generally. We note that there is nothing in our procedure that requires highresolution Bragg data, however, the insights that can be derived from Xray data depend on having high confidence in the average structure (e.g. as judged by Rfactors). High resolution is already a precondition for detailed studies of biochemical mechanisms, and therefore we expect those experimental systems to be ideal candidates for TS/RTMX.
When lattice disorder is adequately modeled, as judged using DISCOBALL, we have shown that its scattering contribution can be subtracted directly. The remaining signal from internal motion can then be interpreted using a model. Several parsimonious models have been proposed including elastic networks, normal modes, discrete ensembles, and empirical correlation functions^{7,8}. Until now, it has been difficult to evaluate these models experimentally because the signal from lattice disorder was present in the data^{28,29}. Our TS/RTMX workflow presents the possibility of directly refining a parsimonious model to the residual diffuse scattering. However, here we have chosen to highlight MD simulation as the method of choice going forward. It is the only method currently available that is able to simulate the entire diffuse signal; it excels at modeling hydration water^{30} and disordered solvent^{31} and it has the potential to explain biochemically relevant protein motions in detail^{7}. MD also adds the time dimension, which is missing from our experiments but can be corroborated by spectroscopic probes such as solidstate NMR^{32}. Here, we have laid the groundwork for greater synergy between MD simulation and total scattering by developing a procedure to produce target diffuse maps that can be compared directly with MD simulations of one or more unit cells with periodic boundary conditions.
Our application of MD simulation to the tetragonal lysozyme dataset demonstrates the potential for this approach. We found remarkable agreement on an absolute intensity scale for both the isotropic scattering component and the magnitude of scattering fluctuations, and reasonable agreement in the detailed fluctuation pattern (correlation coefficients vs. resolution of 0.5–0.8), including for features at 2 Å resolution attributed to internal motion. However, the discrepancies for diffuse maps between MD and experiment remain much larger than the experimental error. Thus, the accuracy of MD can potentially be improved using TS/RTMX data as a target^{5}. For instance, target diffuse maps could be used to derive external forces or to bias the sampling by MonteCarlo approaches. Our benchmark experimental datasets from lysozyme polymorphs will be instrumental in developing such methods.
Finally, by developing a robust TS/RTMX workflow, we have laid the groundwork to study the role of correlated motion in enzyme function. In the context of protein allostery, correlated motions are implicated in propagating a signal from one site of a protein to another. Additionally, subtle motions in the active site of enzymes are thought to affect the rates of the reactions they catalyze. To examine these and other biochemical properties of proteins, highresolution insight is needed. With the development of a general TS/RTMX workflow, we can envision that this ultimate goal is within reach.
Methods
Crystallization, Xray data collection, and structure refinement
Lyophilized hen egg white lysozyme from Gallus gallus (Sigma) was dissolved in 20 mM sodium acetate (NaOAc) pH 4.6 at a concentration of 100 mg/mL, passed through a 0.2 µm filter and subsequently diluted using the same buffer to the working concentrations (specified below). The tetragonal form was grown by hanging drop vapor diffusion (24well VDX plate, Hampton Research). Drops containing 2 µL each of 40 mg/mL protein and reservoir solution were equilibrated over a reservoir of 1.2 mL at ambient temperature. Crystals were harvested from a well containing 1.1 M NaCl and 0.1 M NaOAc pH 4.8. The orthorhombic form was grown by sitting drop vapor diffusion (Cryschem plate, Hampton Research). Drops consisted of 10 µL of 90–100 mg/mL protein and 20 µL reservoir solution. The sealed trays were incubated at 45 °C until crystals appeared, and then returned to ambient temperature. Crystals were harvested from drops equilibrated against a reservoir of 1.0–1.1 M NaCl and 0.1 M NaOAc pH 4.6.
Diffraction data were collected at the CHESS F1 beamline at ambient temperature^{13}. Crystals were held in Kapton loops (MiTeGen MicroRT) and surrounded by a PET capillary containing 10 µL of well solution in the tip to maintain hydration. The Xray beam incident on the sample had a circular crosssection of 100 µm diameter, wavelength of 0.9768 Å, bandwidth \(\varDelta E/E\sim 5\times {10}^{4}\), and flux of \(\sim 2\times {10}^{11}\) photons per second. Diffraction images were recorded continuously using a pixel array detector (Pilatus3 6M, Dectris) at 10 frames per second while rotating the crystal about the phi axis at 1 degree per second (0.1 degree per frame). To avoid excessive radiation damage, the total exposure for each spot was limited to 50 seconds (50 degrees of oscillation over 500 frames). The tetragonal dataset consisted of 4000 frames acquired from 8 separate locations on a single large crystal. The orthorhombic dataset consisted of 4000 frames from 8 separate locations from two large crystals (4 locations on each). Background images were also collected for each crystal by translating the crystal out of the Xray beam along the spindle axis, and acquiring images through the capillary at 1 frame per second while rotating at 1 degree per second (1 degree per frame).
Bragg peaks were indexed and integrated using XDS^{33} and imported into ccp4i2^{34} for further processing and structure refinement using the ccp4 suite of programs^{35}. Scaling and merging were performed using Aimless^{36}, and amplitudes were estimated using ctruncate. Initial phases were obtained by Phaser^{37} using molecular replacement with search models from PDB IDs 193L (tetragonal) and 1WTM, and the structures were iteratively built and refined in Coot^{38} and refmac^{39}. Water, ions, and alternative conformers were added where supported by the electron density. Anisotropic ADPs were added in final rounds of refinement. Stereochemical quality was assessed using MolProbity^{40} (Supplementary Table 2).
Reconstruction of diffuse scattering maps from diffraction images
The data reduction procedure was described previously for the triclinic lysozyme dataset^{13}. Here we briefly describe the procedure and modifications for the tetragonal and orthorhombic datasets. Data reduction was performed in MATLAB using the macromolecular diffuse scattering library mdxlib^{13,41}. Each pixel on the detector was mapped to a fractional Miller index in reciprocal space using refined geometric parameters from XDS (INTEGRATE.LP). Photon counts were accumulated on a threedimensional reciprocal space voxel grid. The datasets were processed in two passes as described below: first to generate a series of coarse maps for scaling, and second, to generate the fine, merged and scaled map of elastic scattering on an absolute scale (electron units per unit cell).
In the first pass, a coarse reciprocal space grid was used for computational efficiency. The grids were centered at the reciprocal lattice points (RLPs) and subdivided by an odd number of times along each reciprocal lattice direction. Odd numbered subdivisions were chosen so that each voxel is associated with an RLP (even subdivisions result in voxels midway between RLPs). The number of subdivisions in each direction was chosen to make the voxels approximately cubic (reciprocal cell dimensions are given in Supplementary Table 3). For tetragonal, a voxel size of (a*/3, b*/3, c*/5) was used for the coarse map. For orthorhombic, a voxel of (a*/5, b*/3, c*/3) was chosen. A count histogram was generated for each voxel, and a sensitive filter based on Poisson statistics was applied to detect and mask out nonsmooth features, such as Bragg peaks. After masking, the grid was rebinned to one sample per RLP. The photon count rate was converted to intensity by applying geometric corrections for polarization, solid angle, detector quantum efficiency, and air absorption. Additionally, the voxels containing Bragg peaks were integrated, background subtracted, and corrected for the geometric effects above plus the Lorentz effect.
A scaling model was refined to account for sample variations and artifacts^{13}. The scaling model contained four terms: absorption, which depends on detector position and rotation angle; scale, which depends on rotation angle; detector chip gain, which applies globally to each of the 960 CMOS ASIC chips; and offset, which is a positive (excess) isotropic scattering correction that depends on rotation angle and scattering vector magnitude (or scattering angle). The continuous parameters were represented by linear interpolation of a regular grid of control points. Regularization was used to enforce smoothness and to minimize the contribution of the offset term, which would otherwise be underdetermined. The model parameters were refined to the diffuse scattering data to minimize the leastsquares error between the observed and merged intensities of symmetryrelated voxels, plus regularization terms weighted by Lagrange multipliers. The Lagrange multipliers were adjusted manually to obtain a physically reasonable and wellconverged solution. The scaling models for orthorhombic and tetragonal lysozyme are shown in Supplementary Figs. 7 and 8. The scaling model was applied to both the Bragg and diffuse intensities, and each were merged separately. The Bragg intensities integrated using mdxlib were then replaced by the more accurate values reported by XDS/Aimless after applying an appropriate scale factor.
The merged diffuse map includes Compton (inelastic) scattering in addition to the coherent (elastic) signal of interest. To place the data on an absolute intensity scale and subtract the Compton scattering, the integrated total scattering was compared with a theoretical value calculated from the unit cell’s molecular inventory, which includes the disordered solvent. The number of waters and ions were estimated by equilibrating an MD simulation of a single unit cell, as described below, to achieve ~1 atm of pressure and to neutralize the charge. This resulted in 1890 water molecules and 32 Cl ions for orthorhombic, and 3304 water molecules and 56 Cl ions for tetragonal. For tetragonal lysozyme, the number of waters and ions differs slightly from the production simulation described below because of different assumptions about the lysozyme charge (the effect on scaling is negligible). The theoretically integrated intensity was computed for the model and data out to a maximum scattering vector of 0.7 Å^{1}. The theoretical intensity included elastic scattering factors, incoherent scattering factors, and intramolecular interference terms due to covalent bonds as described previously^{13}. A scale factor was found to bring the data into agreement with the model, and the Compton scattering was subtracted from the diffuse map.
Finally, in the second phase of data processing, we reconstructed maps of diffuse scattering on a fine grid by reintegrating the diffraction images and applying the scaling model determined from the first stage. The grids were subdivided an odd number of times per RLP, with subdivisions chosen to make the voxels approximately cubic, as described above for the coarse map, and the voxel volume comparable to the fine map reported previously for triclinic lysozyme (Supplementary Table 3). The diffraction images were reintegrated on the fine voxel grid, photon counts were corrected and then merged to the reciprocal space asymmetric unit according to the Laue group symmetry (Supplementary Table 3) using the geometric corrections and scaling model as described above. Voxels containing Bragg peaks were excluded. Finally, the data were placed on an absolute scale and the Compton scattering was subtracted. For visualization, the diffuse scattering was separated into isotropic and variational components as described previously^{13}. Missing voxels in the variational scattering map (such as at Bragg peak locations) were filled using the mean value of the six nearestneighbor voxels.
GOODVIBES model
Lattice disorder was modeled as thermallyexcited vibrations of an elastic network, where each protein was described by a rigid body and the interprotein forces were modeled as harmonic springs. For each space group of lysozyme, a network of springs was generated from the atomic coordinates excluding hydrogen and solvent. A list of external contacts between each atom in the asymmetric unit and those of symmetryrelated neighbors was computed using a KD treebased search, where contact was defined using a hard distance cutoff of 4 Å. Symmetry operators were applied in order to expand this list to the entire unit cell. The contacts were modeled as generalized springs with a harmonic restoring force that depends on the relative displacement of the endpoints (see Model Parameterization and Refinement, below).
The motion of each rigid group (indexed by \(\kappa\)) in a particular unit cell (indexed by \(l\)) was described by a set of six generalized coordinates \({{{{{{\bf{w}}}}}}}_{\kappa l}\) that describe rotational and translational degrees of freedom. Rotations are defined with respect to a fixed origin for each rigid group \({{{{{{\bf{o}}}}}}}_{\kappa }\) (here, the center of mass of the rigid group). Let \({{{{{{\bf{r}}}}}}}_{j\kappa l}\) be the position of an atom j in group κ and unit cell l relative to the unit cell’s origin \({{{{{{\bf{R}}}}}}}_{l}\). The atom’s instantaneous displacement is defined as \({{{{{{\bf{u}}}}}}}_{j\kappa l}={{{{{{\bf{r}}}}}}}_{j\kappa l}{\bar{{{{{{\bf{r}}}}}}}}_{j\kappa }\), where \(\bar{{{{{{\bf{r}}}}}}}\) is the unperturbed position. The mapping from generalized coordinates to Cartesian displacements u can be approximated as a linear operator for small rotations (as in TLS refinement)^{42} such that u_{jκl} = A_{jκ} w_{κl} where
and
The thermallyexcited vibrational motions of the crystalline elastic network were solved using the Born/VonKarman (B/VK) method^{43,44} for a periodic supercell as described previously^{13}. For notational convenience, the displacements of all \(K\) groups in a particular unit cell are described by a \(6K\)dimensional vector \({{{{{{\bf{w}}}}}}}_{l}\) formed by placing \({{{{{{\bf{w}}}}}}}_{\kappa l}\) endtoend. The B/VK method transforms the equations of motion into the form of an eigenvalue problem; the socalled dynamical matrix \({{{{{{\bf{D}}}}}}}_{({{{{{\bf{k}}}}}})}\) is computed for each wavevector k and diagonalized to give \(6K\) eigenvectors (normal modes) and corresponding eigenvalues (squared vibrational frequencies). The modes are then populated according to the classical (high temperature) equipartition theorem, resulting in a simple expression for the covariance of generalized coordinates:
where \(N\) is the number of unit cells in the supercell, k runs over the allowed wavevectors in the first Brillouin zone, \({k}_{B}\) is the Boltzmann factor, \(T\) is the temperature, L is a lower triangular matrix that depends on the masses and rotational moments of each group (defined previously^{13}) and \({{{{{{\bf{D}}}}}}}_{({{{{{\bf{k}}}}}})}^{+}\) is the pseudoinverse of the dynamical matrix. The pseudoinverse is identical to the inverse except when \({{{{{\bf{k}}}}}}=0\): \({{{{{{\bf{D}}}}}}}_{0}\) is not invertible because it includes three modes with zero frequency corresponding to rigid translations of the entire supercell. In the pseudoinverse, these three modes are removed by setting their reciprocal eigenvalues to zero.
The covariance matrix (Eq. 3) includes 6 ×6 diagonal blocks (where \(l=l^\prime \) and \(\kappa=\kappa^\prime\)) composed of the 3 ×3 matrices T, L, and S (as defined in TLS refinement):^{42}
where the l index has been dropped because of translational symmetry. The atomic displacement parameters (ADPs) are found by projection onto Cartesian coordinates:
Equivalent isotropic Bfactors were computed from the ADPs as follows:
GOODVIBES diffuse scattering simulation
Diffuse scattering from rigidbody motion was computed using the onephonon approximation. In this approximation, which is valid for small displacements, lattice vibrational modes with wavevector k scatter independently and therefore contribute only at discrete points in reciprocal space (essentially, as satellite reflections at points \({{{{{{\bf{g}}}}}}}_{{{{{{\bf{h}}}}}}}\pm {{{{{\bf{k}}}}}}\), where g_{h} points to an RLP with integer Miller indices \({{{{{\bf{h}}}}}}=(h,k,l)\)). This enables rapid calculation of the diffuse halos when the Born/VonKarman formalism is used to decompose the lattice dynamics in terms of vibrational modes, as described above. The onephonon diffuse scattering, \({I}^{(1)}({{{{{\bf{q}}}}}})\), where \({{{{{\bf{q}}}}}}={{{{{{\bf{g}}}}}}}_{{{{{{\bf{h}}}}}}}{{{{{\bf{k}}}}}}\), is computed as follows:
where \({{{{{\bf{G}}}}}}({{{{{\bf{q}}}}}})\) is a vectorvalued onephonon structure factor with 6 components per rigid group. The components associated with a particular group κ can be computed as follows:
where \(\bar{{\rho }_{\kappa }}({{{{{\bf{r}}}}}})\) is the average electron density of group κ, r is the position relative to the unit cell origin, and \({{{{{{\mathscr{F}}}}}}}^{1}\) is the inverse Fourier transform.
In ref. ^{13} the electron densities appearing in the onephonon structure factor were computed from the refined atomic coordinates and ADPs. A potential disadvantage of this approach is that errors in the structural model propagate to errors in the simulated diffuse intensities. Here, we take a different approach, and compute the electron densities using model phases and experimentally determined amplitudes from the observed Bragg peak intensities. First, the unit cell electron density was computed by Fourier synthesis using model phases and experimental amplitudes. Then, the mean electron density in the bulk solvent region was subtracted from the map (by Babinet’s principle, subtracting a constant does not alter the intensity except at \(q=0\)). Next, the unit cell was partitioned into regions associated with each rigid group, creating a series of hard masks equal to 1 inside the rigid group and 0 outside. The hard masks were then blurred by convolving with a Gaussian function (Bfactor of 50 Å^{2}) creating soft masks. The electron density of each rigid group was defined as the product of its soft mask and the unit cell electron density.
In order to compute the Fourier transforms efficiently for an arbitrary set of q points, an oversampling and Fourier interpolation scheme was used. The electron density of the rigid group (κ) was first shifted to the unit cell origin, and the grid was cropped or zeropadded to four times the maximum dimension of the group in each direction. Next, an inverse Fast Fourier Transform (IFFT) was performed to generate an oversampled molecular transform, \({F}_{\kappa }({{{{{{\bf{q}}}}}}})\). The real and imaginary components of \({F}_{\kappa }({{{{{{\bf{q}}}}}}})\) were then interpolated at the desired values of q, and a phase factor was applied to undo the origin shift. A similar calculation was performed for the coordinateweighted electron densities (one for each Cartesian coordinate). Finally, the components of \({{{{{{\bf{G}}}}}}}_{\kappa }\) were computed by taking the appropriate products between structure factors and Cartesian components of q.
GOODVIBES model parameterization and refinement
Elastic network models were optimized to fit the threedimensional diffuse halos by allowing each of the spring constants to vary. Symmetryequivalent springs were forced to have identical spring constants, and additional restraints were applied during refinement to prevent overfitting. The functional form of the harmonic potential allows for different restoring forces for displacements perpendicular or parallel to the interatomic vector. Two special cases are defined: Gaussian, or equal restoring forces for any direction; and parallel, in which components of displacement perpendicular to the interatomic vector have zero restoring force. The general spring constant is parameterized using a linear combination of Gaussian and parallel forces (called hybrid springs).
A set of 400 halos was chosen from each experimental dataset corresponding to the most intense Bragg peaks in the 2.0 to 2.5 Å resolution range. The measured intensities on an absolute scale were extracted from the fine diffuse maps, along with experimental uncertainties propagated from photoncounting statistics. Spring constants were refined in four stages with progressively fewer restraints: (1) a single Gaussian spring constant was fit globally; (2) a Gaussian spring constant was fit for each unique proteinprotein interface; (3) a hybrid spring constant was fit for each unique proteinprotein interface; and (4) a hybrid spring constant was fit for each unique residueresidue interaction. The refinement program minimized the leastsquares error between measured and simulated intensity using inversesigma weights derived from the experimental uncertainty. Since the one phonon intensity does not include the isotropic scattering, an additional linear background term was fit simultaneously for each halo (4 parameters per halo are added for the background term).
Derivation of the 3DΔPDF peak function used by DISCOBALL
Here we derive an approximate function for the 3DΔPDF lattice peaks in the case of a crystal with space group symmetry. First we assume that the peaks arise mainly from correlated translational motions of rigid groups. We also assume that the displacement covariances are small so that the harmonic and onephonon scattering approximations are valid. With these approximations, the diffuse scattering per unit cell is:
where the summation runs over unit cells in the crystal (labeled by \(n\)) and pairs of rigid groups (labeled by \(\kappa\) and \({\kappa}^\prime\)), \({{{{{{\bf{R}}}}}}}_{n}\) is the unit cell origin, \({F}_{\kappa }({{{{{\bf{q}}}}}})\) is the mean stucture factor of a group with local origin at \({{{{{{\bf{r}}}}}}}_{\kappa }\), and \({{{{{{\bf{V}}}}}}}_{n\kappa {\kappa^\prime}}\) is the (symmetrized) 3 × 3 covariance matrix for displacements of groups \(\kappa\) and \(\kappa^\prime\) in unit cells separated by \({{{{{{\bf{R}}}}}}}_{n}\) (or jointADP). The 3DΔPDF is the Fourier transform of \({I}_{D}({{{{{\bf{q}}}}}})\), as follows:
where
and
The function \({P}_{\kappa \kappa^\prime}({{{{{\bf{r}}}}}})\) is the crosscorrelation of mean electron density for pairs of groups in the unit cell, and \({P}_{{\kappa} {\kappa }}({{{{{\bf{r}}}}}})\) (when \(\kappa={\kappa }^\prime\)) is the autocorrelation. The autocorrelation has a strong peak at \({{{{{\bf{r}}}}}}=0\), however the crosscorrelations do not have strong peaks in general because the groups are oriented differently in space (except in special cases such as translational noncrystallographic symmetry). Mathematically, the peaks at \({{{{{{\bf{R}}}}}}}_{n}\) are:
When there is only one rigid body in the unit cell (P1 space group), \({P}_{\kappa \kappa }({{{{{\bf{r}}}}}})\) is approximately equal to the Patterson function \(P({{{{{\bf{r}}}}}})\) for small \({{{{{\bf{r}}}}}}\), and thus we can write:
Thus, in the case of P1, each peak is simply a convolution of the Patterson function with another function that depends on the jointADP. Since the Patterson function is known from the Bragg data, deconvolution can be applied to recover the jointADPs without additional modeling.
In a symmetric space group, the deconvolution is more difficult because the contributions of each rigid group to the peak are mixed. We note that the 3DΔPDF likely contains sufficient information to perform this deconvolution if a sufficiently large region around each peak is modeled. However if too large a region is chosen, the contributions from the \(\kappa\ne\kappa^\prime\) terms also become significant. Thus, here we restrict our fits to a limited region around the peak (small r).
The \({P}_{\kappa \kappa }({{{{{\bf{r}}}}}})\) are identical except for a rotation operator, and at small r, they contain a strong peak that is dominated by individual atom contributions (Supplementary Fig. 9a, left panel). This peak region is expected to have approximate rotational symmetry except in special cases (such as when the resolution is highly anisotropic). Thus, in the general case, the peak does not contain enough information to deconvolve the contributions of each rigid group individually. Instead, we assume that the peak can be approximated by its symmetry average, or equivalently the Patterson peak for the unit cell (Supplementary Fig. 9a, middle and right panels). Then, the peak shape is described by Eq. (14) with an effective jointADP that is equal to the average over groups, as follows:
Modeling the symmetric crystal as though it were P1 involves assumptions that are not obviously valid. However, lattice dynamics simulations show that the error incurred is small enough to be neglected in the cases studied here (Supplementary Fig. 9b).
Finally, we note that the DISCOBALL formula (Eq. 14) resembles the liquidlike model (LLM) for diffuse scattering^{7,26,45}. The LLM is derived by assuming that joint ADPs depend only on the interatomic vector. We do not make that assumption explicitly when deriving DISCOBALL, however doing so has certain advantages; it would ensure that Eq. (14) applies regardless of space group symmetry, and that Eq. (14) remains valid for the entire 3DΔPDF, not just the peak regions. Our derivation, though more complex, uses a more limited set of assumptions so that joint ADPs are directly comparable with those derived from atomistic models such as GOODVIBES.
DISCOBALL deconvolution method
Given a 3DΔPDF and Patterson function, we can fit the effective jointADPs as follows. First, crop out a spherical region (radius d) of the 3DΔPDF centered on \({{{{{{\bf{R}}}}}}}_{n}\) and identify this region with \({P}_{n}({{{{{\bf{r}}}}}})\):
where \({{{{{{\bf{1}}}}}}}_{{{{{{\rm{}}}}}}{{{{{\bf{r}}}}}}{{{{{\rm{}}}}}} < d}\) is an indicator function representing a spherical mask with radius d. By applying the inverse Fourier transform to both sides of this equation, we can transform the deconvolution problem into ordinary least squares. Let
and
be the inverse Fourier transforms of the masked peaks and density autocorrelation, respectively. The six unique components of the jointADP can then be found by least squares minimization:
The domain of integration D depends on the resolution of the data. In our implementation, \(P({{{{{\bf{r}}}}}})\) and \({P}_{n}({{{{{\bf{r}}}}}})\) are represented on a discrete realspace grid, and thus the integral becomes a sum over a set of samples in reciprocal space.
DISCOBALL analysis of experimental datasets
To calculate each 3DΔPDF, the variational component of the measured diffuse scattering was truncated to a resolution of 1.6 Å, missing values were filled with zeros, and the 3DΔPDF was calculated using a fast Fourier transform (FFT). Peaks regions were extracted from the 3DΔPDF and a spherical mask was applied to each with a maximum radius of d = 4 Å. The regular Patterson map was computed by taking the FFT of the Bragg intensities. The central peak was interpolated onto the same real space grid as the 3DΔPDF peaks and the same mask was applied.
For each 3DΔPDF peak, the 6 unique values of \({{{{{{\bf{V}}}}}}}_{{{{{{\bf{n}}}}}}}\) were found by least squares fitting. We observed that the very low resolution points deviated from the expected behavior, likely because the constant offsets in the 3DΔPDF or Patterson map are not exactly known. Thus, points with dspacing greater than 5 Å were excluded from the fit. Minimization was performed for each peak using the lsqnonlin function in MATLAB.
DISCOBALL validation of GOODVIBES models
For each GOODVIBES simulation, the centerofmass covariances were calculated for each pair of rigid bodies as follows:
\({{{{{{\bf{A}}}}}}}_{\kappa }({{{{{\bf{0}}}}}})\) projects from the generalized coordinates onto the Cartesian displacement at the center of mass of rigid body κ (see Eq. 1). The jointADP for a pair of rigid bodies is defined as the symmetrized covariance matrix:
where the unit cell index \(n\) refers to the relative displacement of unit cells \(l\) and \(l^\prime\), \({{{{{{\bf{R}}}}}}}_{n}={{{{{{\bf{R}}}}}}}_{l}{{{{{{\bf{R}}}}}}}_{{l}}\).
For crossvalidation using DISCOBALL, the effective jointADPs were computed from the simulation using Eq. (15). To compare jointADPs between DISCOBALL and model calculations, Pearson correlation coefficients were calculated for isotropic and anisotropic components of the jointADPs. The isotropic part (or total covariance) was defined as
and the anisotropic (residual) components were
where \({{{{{{\bf{I}}}}}}}_{3}\) is a 3 × 3 identity matrix.
MD simulations
A simulation of a single unit cell (8 protein chains) of tetragonal lysozyme was performed, in a manner similar to the triclinic lysozyme simulations described previously^{13,15}. We used Amber18, with the ff14SB force field for the protein^{46,47} and the OPC model for water^{48}. The starting structure was taken from PDB entry 5L9J^{49}, using the A alternate conformer. We added 32 chloride ions to neutralize the charge, and 2540 additional water molecules to fill in space. The number of water molecules was manually adjusted in order to achieve ~1 atm pressure at 295 K, resulting in 3288 total waters (411 per protein chain). The resulting density was 1.23 g/cm^{3}, which is the same as a measured density for crystals grown from an NaCl soak density of 1.05 g/cm^{3}^{50}.
The simulations were equilibrated under NVT conditions for 0.6 µs and continued for an additional 5 µs, saving coordinates every 0.4 ns. A time step of 4 fs was used, where nonwater hydrogen masses are set to 3 amu, with a corresponding decrease in the mass of its bonded atom^{51}. Diffuse scattering was computed as described earlier^{13}.
Since Xrays scatter from individual electrons, it is better to account for conformational disorder and flexibilty by averaging the electron density rather than the structures themselves. We took the average structure factors from the simulation (truncated to 1.5 Å, which is the experimental resolution limit for 5L9J), and reset the symmetry to P4_{3}2_{1}2. Using the deposited 5L9J pdb model as a starting structure, we ran 15 cycles of phenix.refine with default parameters. The backbone RMS deviation between this structure and the 5L9J crystallographic model was 0.44 Å.
Procedure to create target diffuse maps for comparison with MD
A target map matching the one unit cell MD of tetragonal lysozyme was created from the experimental data in several processing steps (Supplementary Fig. 6a). A map of scattering from internal motion was found by subtracting the GOODVIBES supercell scattering simulation from the experimental diffuse map and then applying a filter. This filter served two purposes: first to resample the map on an appropriate reciprocal space grid for comparison with MD, and second to mitigate the effect of noise and outliers.
We implemented a robust, errorweighted SavitskyGolay filter^{17} as follows. First, a target grid was chosen with sufficient sampling to describe the molecular transform. For tetragonal lysozyme, a target grid with voxel dimensions of (a*, b*, c*/2) was used. Then, for each voxel in the target grid, a spherical neighborhood of voxels in the parent grid was defined using a cutoff radius of \((3{v}_{t}/{(4\pi )})^{1/3}\) where \({v}_{t}\) is the volume of the target voxel. For each neighborhood, a secondorder polynomial was fit using weighted leastsquares^{17}. Then, the value of the polynomial was computed at the target grid location, and the corresponding uncertainty was propagated from the experimental errors. To render the method robust to outliers, the fit was performed a second time with adjusted weights. The reweighting factors were chosen using the bisquare method^{52} which is based on residuals from the first pass. To compute the residual at each point in the parent grid, we first performed cubic interpolation of the intensities from the target grid back to the original grid. Then, the bisquare reweighting factor was computed for each parent voxel (i) as follows^{52}:
where m is the median absolute deviation of the inverse sigmaweighted residuals \({r}_{i}\). Finally, the bisquare weights were applied and fits were repeated, yielding the filtered internal map on a coarse grid.
After obtaining the internal motion map, scattering from external motion compatible with the MD supercell was simulated and added back. To match the single unit cell MD simulation of tetragonal lysozyme, the reciprocal lattice points with integer Miller indices were selected from the internal map. Next, the scattering from GOODVIBES simulation corresponding to the MD supercell was added. For a one unit cell simulation, these are the \({{\mbox{k}}}=0\) modes (optical phonons in which protein molecules move out of phase with each other such that the center of mass is stationary). Note that a new GOODVIBES simulation was not required in this case, because scattering from \({{\mbox{k}}}\) = 0 modes was included in the previous supercell simulation (at integer Miller index points).
Finally, to compare MD simulation and target diffuse maps, the mean intensity, standard deviation (SD), and correlation coefficient (CC) were computed within resolution bins of width 0.02 Å^{1}. Since the isotropic scattering changes significantly across certain bins, it potentially contributes to the SD and CC if not subtracted first^{20}. Therefore, for each diffuse map, we fit a smooth onedimensional interpolant to the intensity vs. resolution, and subtracted this isotropic signal before computing SD and CC.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw diffraction data generated in this study have been deposited in the SBGrid Data Bank under accession codes 957 (tetragonal lysozyme)^{53} and 958 (orthorhombic lysozyme)^{54}. The atomic coordinates and structure factors have been deposited in the Protein Data Bank (PDB) with accession codes 8dyz (tetragonal lysozyme) and 8dz7 (orthorhombic lysozyme). For the previously published triclinic lysozyme dataset^{13}, raw diffraction images are available from the SBGrid Data Bank under accession code 747^{18}, processed maps are available from the Coherent Xray Imaging Data Bank (CXIDB) under accession code 128^{19}, and atomic coordinates and structure factors are available from the PDB with accession codes 6o2h. The source data underlying Fig. 6 are provided as a Source Data file. Starting coordinates for molecular replacement and MD simulations were obtained from the PDB entries and 193l, 1wtm, and 5l9j. Source data are provided with this paper.
Code availability
Structure determination was performed using software curated by SBGrid^{55}. Subsequent data reduction, model fitting, and analysis was performed using MATLAB with mdxlib. The software is available on GitHub at https://github.com/andolab/mdxlib and version 1.2.0 used in this study has been deposited in Zenodo^{41}. Scripts required to reproduce the analysis using MATLAB are available in a separate GitHub repository https://github.com/andolab/mdxexamples, and version 0.1.1 used in this study has been deposited in Zenodo^{56}. The script md2diffuse.sh used to compute diffuse scattering from MD trajectories is distributed with AmberTools (http://ambermd.org).
References
Nogales, E. The development of cryoEM into a mainstream structural biology technique. Nat. Methods 13, 24–27 (2016).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Baek, M. et al. Accurate prediction of protein structures and interactions using a threetrack neural network. Science 373, 871–876 (2021).
Shoemaker, S. C. & Ando, N. Xrays in the cryoelectron microscopy era: structural biology’s dynamic future. Biochemistry 57, 277–285 (2018).
Xu, D., Meisburger, S. P. & Ando, N. Correlated motions in structural biology. Biochemistry 60, 2331–2340 (2021).
Welberry, T. R. & Weber, T. One hundred years of diffuse scattering. Crystallogr. Rev. 22, 2–78 (2016).
Meisburger, S. P., Thomas, W. C., Watkins, M. B. & Ando, N. Xray scattering studies of protein structural dynamics. Chem. Rev. 117, 7615–7672 (2017).
Wall, M. E., Wolff, A. M. & Fraser, J. S. Bringing diffuse xray scattering into focus. Curr. Opin. Struct. Biol. 50, 109–116 (2018).
Woldeyes, R. A., Sivak, D. A. & Fraser, J. S. E pluribus unum, no more: From one crystal, many conformations. Curr. Opin. Struct. Biol. 28, 56–62 (2014).
Helliwell, J. R. Macromolecular crystallography with synchrotron radiation. (Cambridge University Press, 2005).
Fischer, M. Macromolecular room temperature crystallography. Q. Rev. Biophys. 54, e1 (2021).
Meisburger, S. P. & Ando, N. Correlated motions from crystallography beyond diffraction. Acc. Chem. Res. 50, 580–583 (2017).
Meisburger, S. P., Case, D. A. & Ando, N. Diffuse Xray scattering from correlated motions in a protein crystal. Nat. Commun. 11, 1271 (2020).
Holton, J. M., Classen, S., Frankel, K. A. & Tainer, J. A. The Rfactor gap in macromolecular crystallography: An untapped potential for insights on accurate structures. FEBS J. 281, 4046–4060 (2014).
Janowski, P. A., Liu, C., Deckman, J. & Case, D. A. Molecular dynamics simulation of triclinic lysozyme in a crystal lattice. Protein Sci. 25, 87–102 (2016).
Wall, M. E. Internal protein motions in moleculardynamics simulations of Bragg and diffuse Xray scattering. IUCrJ 5, 172–181 (2018).
Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. Numerical Recipes 3rd Edition: The Art of Scientific Computing. (Cambridge University Press, 2007).
Meisburger, S. & Ando, N. Xray diffraction data for: Hen lysozyme. PDB code 6O2H. https://doi.org/10.15785/SBGRID/747 (2020).
Meisburger, S. P., Case, D. A. & Ando, N. Diffuse xray scattering from correlated motions in a protein crystal (CXIDB ID 128) https://doi.org/10.11577/1601281 (2020).
Wall, M. E. et al. Conformational dynamics of a crystalline protein from microsecondscale molecular dynamics simulations and diffuse Xray scattering. Proc. Natl Acad. Sci. USA 111, 17887–17892 (2014).
Agarwal, P. K., Billeter, S. R., Rajagopalan, P. R., Benkovic, S. J. & HammesSchiffer, S. Network of coupled promoting motions in enzyme catalysis. Proc. Natl Acad. Sci. USA 99, 2794–2799 (2002).
Wang, L., Goodey, N. M., Benkovic, S. J. & Kohen, A. Coordinated effects of distal mutations on environmentally coupled tunneling in dihydrofolate reductase. Proc. Natl Acad. Sci. USA 103, 15753–15758 (2006).
Boehr, D. D., McElheny, D., Dyson, H. J. & Wright, P. E. The dynamic energy landscape of dihydrofolate reductase catalysis. Science 313, 1638–1642 (2006).
Bhabha, G. et al. A dynamic knockout reveals that conformational fluctuations influence the chemical step of enzyme catalysis. Science 332, 234–238 (2011).
Stojković, V., Perissinotti, L. L., Willmer, D., Benkovic, S. J. & Kohen, A. Effects of the donor–acceptor distance and dynamics on hydride tunneling in the dihydrofolate reductase catalyzed reaction. J. Am. Chem. Soc. 134, 1738–1745 (2012).
Caspar, D., Clarage, J., Salunke, D. & Clarage, M. Liquidlike movements in crystalline insulin. Nature 332, 659–662 (1988).
Rose, Y. et al. RCSB protein data bank: architectural advances towards integrated searching and efficient access to macromolecular structure data from the PDB archive. J. Mol. Biol. 433, 166704 (2021).
De Klijn, T., Schreurs, A. & KroonBatenburg, L. Rigidbody motion is the main source of diffuse scattering in protein crystallography. IUCrJ 6, 277–289 (2019).
Peck, A., Poitevin, F. & Lane, T. J. Intermolecular correlations are necessary to explain diffuse scattering from protein crystals. IUCrJ 5, 211–222 (2018).
Wych, D. C. et al. Moleculardynamics simulation methods for macromolecular crystallography. Acta Crystallogr. D. Struct. Biol. 79, 50–65 (2023).
Hura, G. et al. Water structure as a function of temperature from xray scattering experiments and ab initio molecular dynamics. Phys. Chem. Chem. Phys. 5, 1981–1991 (2003).
Schanda, P. & Ernst, M. Studying dynamics by magicangle spinning solidstate NMR spectroscopy: Principles and applications to biomolecules. Prog. Nucl. Magn. Reson. Spectrosc. 96, 1–46 (2016).
Kabsch, W. XDS. Acta Crystallogr., Sect. D: Biol. Crystallogr. 66, 125–132 (2010).
Potterton, L. et al. CCP4i2: The new graphical user interface to the CCP4 program suite. Acta Crystallogr., Sect. D: Biol. Crystallogr. 74, 68–84 (2018).
Winn, M. D. et al. Overview of the CCP4 suite and current developments. Acta Crystallogr., Sect. D: Biol. Crystallogr. 67, 235–242 (2011).
Evans, P. R. & Murshudov, G. N. How good are my data and what is the resolution? Acta Crystallogr., Sect. D: Biol. Crystallogr. 69, 1204–1214 (2013).
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr 40, 658–674 (2007).
Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of coot. Acta Crystallogr., Sect. D: Biol. Crystallogr. 66, 486–501 (2010).
Murshudov, G. N. et al. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr., Sect. D: Biol. Crystallogr. 67, 355–367 (2011).
Chen, V. B. et al. MolProbity: allatom structure validation for macromolecular crystallography. Acta Crystallogr., Sect. D: Biol. Crystallogr. 66, 12–21 (2010).
Meisburger, S. & Ando, N. Andolab/mdxlib: v1.2.0. (Zenodo, 2023). https://doi.org/10.5281/zenodo.7578017 (2023).
Schomaker, V. & Trueblood, K. N. On the rigidbody motion of molecules in crystals. Acta Crystallogr., Sect. B: Struct. Crystallogr. Cryst. Chem. 24, 63–76 (1968).
Willis, B. T. M. Thermal diffuse scattering of Xrays and neutrons. in International Tables for Crystallography Volume B: Reciprocal space (ed. Shmueli, U.) 400–406 (Springer Netherlands, 2001). https://doi.org/10.1107/97809553602060000563 (2001).
Born, M. & Huang, K. Dynamical theory of crystal lattices. (Clarendon Press, 1954).
Wall, M. E., Clarage, J. B. & Phillips, G. N. Motions of calmodulin characterized using both bragg and diffuse xray scattering. Structure 5, 1599–1612 (1997).
Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).
Hornak, V. et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins: Struct., Funct., Bioinf 65, 712–725 (2006).
Izadi, S., Anandakrishnan, R. & Onufriev, A. V. Building water models: a different approach. J. Phys. Chem. Lett. 5, 3863–3871 (2014).
Crosas, E. et al. Uridine as a new scavenger for synchrotronbased structural biology techniques. J. Synchrotron Radiat. 24, 53–62 (2017).
Moreau, D. W. Ice formation and solvent nanoconfinement in protein crystallography. (Cornell University, 2020).
Hopkins, C. W., Le Grand, S., Walker, R. C. & Roitberg, A. E. Longtimestep molecular dynamics through hydrogen mass repartitioning. J. Chem. Theory Comput. 11, 1864–1874 (2015).
Croarkin, C. 2.6.5.2.2. Bisquare weighting. NIST/SEMATECH eHandbook of Statistical Methods https://doi.org/10.18434/M32189 (2012).
Meisburger, S. & Ando, N. Xray diffraction data for: Hen lysozyme. PDB code 8DYZ. https://doi.org/10.15785/SBGRID/957 (2022).
Meisburger, S., Imran, SM. S. & Ando, N. Xray diffraction data for: Hen lysozyme. PDB code 8DZ7. https://doi.org/10.15785/SBGRID/958 (2022).
Morin, A. et al. Cutting edge: Collaboration gets the most out of software. eLife 2, e01456 (2013).
Meisburger, S. & Ando, N. Andolab/mdxexamples: v0.1.1. (Zenodo, 2023). https://doi.org/10.5281/zenodo.7578043 (2023).
Karplus, P. A. & Diederichs, K. Linking crystallographic model and data quality. Science 336, 1030–1033 (2012).
Acknowledgements
The authors thank Syed Muhammad Saad Imran for growing the crystals that yielded the orthorhombic data used in this study. We are grateful to Dr. Veronica Pilar and Dr. Will Thomas for assistance with data collection, and Dr. Xiaokun Pei, Haoyue Wang, and Neti Bhatt for critical feedback on the manuscript. We are especially grateful to Mike Wall for his feedback on our preprint and for suggesting an improved method for calculating correlation coefficients with MD. Experiments were performed at the Center for High Energy Xray Sciences (CHEXS), which is supported by the National Science Foundation (BIO, ENG and MPS Directorates) under award DMR1829070, and the Macromolecular Diffraction at CHESS (MacCHESS) facility, which is supported by award 1P30GM12416601A1 from the National Institute of General Medical Sciences, National Institutes of Health (NIH), and by New York State’s Empire State Development Corporation (NYSTAR). This work was supported by NIH grants F32GM117757 (to S.P.M.), GM122086 (to D.A.C.), and GM124847 (to N.A.) and startup funds from Cornell University (to N.A.).
Author information
Authors and Affiliations
Contributions
S.P.M. performed the experiments, developed the GOODVIBES and DISCOBALL software, and analyzed the data. D.A.C. performed and analyzed the MD simulations. N.A. conceived of and supervised the research. All authors contributed to writing the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks James Holton, Arkadiy Simonov and Michael Wall for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Meisburger, S.P., Case, D.A. & Ando, N. Robust total Xray scattering workflow to study correlated motion of proteins in crystals. Nat Commun 14, 1228 (2023). https://doi.org/10.1038/s41467023367343
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467023367343
This article is cited by

Functional protein dynamics in a crystal
Nature Communications (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.