Abstract
The challenges involved in determining the structures of molecules to atomic resolution in noncrystalline samples using Xray freeelectron laser pulses are formidable^{1}. Proposals to determine biomolecular structures from diffraction experiments using femtosecond Xray freeelectron laser pulses involve a conflict between the incident brightness required to achieve diffractionlimited atomic resolution and the electronic and structural damage induced by the illumination. Significant advances have already been made, however, in the design and preparation of experiments using fourthgeneration sources^{2} and the corresponding structural analysis of diffraction data. Here we show that previous estimates of the conditions under which biomolecular structures may be obtained in this manner are unduly restrictive, because they are based on a coherent diffraction model that is not appropriate to the proposed interaction conditions. A more detailed imaging model derived from optical coherence theory and quantum electrodynamics is shown to be far more tolerant of electronic damage. The nuclear density is employed as the principal descriptor of molecular structure. The foundations of the approach may also be used to characterize electrodynamical processes by carrying out scattering experiments on complex molecules of known structure.
Similar content being viewed by others
Main
The difficulties posed by diffraction data collected from single molecules dropped with random orientations into the path of a freely propagating Xray freeelectron laser (XFEL) pulse have led to the development of schemes by which twodimensional diffraction patterns may be assembled into complete threedimensional diffraction sets^{3,4}. Such approaches enable the accumulation of diffraction data corresponding to identical targets, increasing the signaltonoise ratio, particularly in the largeangle scattering sector of the data that carries the highresolution information about molecular structure. The inevitable Coulomb explosion of molecules subjected to such pulses has also been addressed by detailed computational modelling of the interaction dynamics^{1,5,6}. This has led to proposals for the incorporation of sacrificial tampers to delay the Coulomb explosion of the scattering target^{7} or temporal gating of data acquisition to reduce the degradation of the structural information due to the molecular disintegration^{8}.
The feasibility of successful molecular structure determination from diffraction data using XFEL sources has been assessed in each of these studies by calculating some variant of the crystallographic Rfactor,
in which I_{real}(u) is the simulated intensity at spatial frequency u including the effects of damage to the scattering target, I_{ideal}(u) is the corresponding intensity distribution in the absence of damage, and the summations over u include all discrete samples included in the threedimensional set of diffraction data.
The measure of data quality may be further refined, but its principal role is to provide a quantitative estimate of how closely the crystallographic model matches the experimental data. The requirement that R≤0.15 has been suggested^{1} as a ‘rule of thumb’ by which diffraction data obtained from femtosecond XFEL experiments on isolated molecules should be regarded as possessing sufficient information to obtain a molecular structure with a spatial resolution determined by the maximum measured scattering angle. The Rfactor has guided all computational studies regarding the pulse requirements for Xray diffraction imaging of single biological molecules^{1,5}.
Implicit in the Rfactor are the assumptions that the incident illumination and the diffracted wave possess full spatial and temporal coherence, independent of the nature of the matter–radiation interaction; these are implicit components of the crystallographic model against which the data are assessed. The validity of these assumptions is not supported, however, by a detailed consideration of the interaction physics that describes an encounter between a molecule and an intense XFEL pulse. This critical assessment pertains to the proposed experiments to determine molecular structures, even if the illumination exhibits full spatial and temporal coherence or if the scattering interaction takes place over so short a time that the nuclear framework is completely unaffected. A key component of the formulation in this Letter is the observation that the timedependent evolution of the electron density imparts on the scattered wave the statistical characteristics of a partially coherent wavefield, which we incorporate explicitly in our analysis.
Two critical assumptions are made in our approach. The first is that the collection of diffraction data originates from a matter–radiation interaction such that we adopt the Born–Oppenheimer approximation and the nuclei are fixed in space throughout the encounter. Detailed simulations^{1,6} put an upper limit on this period of 5 fs, which we assume is to be achieved either by pulse shaping or data gating. The second assumption is that a threedimensional diffraction set has been assembled^{3,4}. Each distinct molecular orientation must be associated with a sufficient number of twodimensional molecular projections that the stochastic distribution of molecular electronic vacancies created by photoionization and Auger emission may be represented by onsite statistical atomic averages.This is also an implicit assumption in existing simulations^{6} on which we have based our electyrodynamical model.
As we are concerned with reproducing detailed molecular scattering properties rather than dynamical averages, we depart from the earlier treatment^{8} by using a model atomic basis of spherically averaged orbital electron densities, ρ_{Z γ}(r−R_{m}_{Z}^{Z}), located at nuclear positions R_{m}_{Z}^{Z}, where Z labels the atomic species and γ labels the orbital shell. Note that this restates the problem of recovering structure to one of determining the nuclear positions. The impact of damage, then, only plays a role in the manner in which the desired information is transferred from the sample to the detector plane. We show here that this viewpoint offers considerable advantages. In our formalism, the coefficients a_{Z γ}(t) define the time variation in the occupancy of ρ_{Z}_{γ}(r−R_{m}_{Z}^{Z}), averaged over all equivalent sites. The timedependent electron density is represented in the form
where m_{Z} denotes an atom of atomic weight Z located at R_{m}_{Z}^{Z}. This representation is augmented by continuum approximation for the density of electrons trapped by the residual ionic charge of the molecule, which is treated as a separate species. The essence of optical coherence theory is to recognize that the field may vary on timescales well within the observation time. Equation (1) suggests that, even in the case of illuminating the sample with a coherent field, dynamical effects within the sample will result in a scattered field that is best described using the theory of partial coherence. Here, we make this connection explicit.
The coherence properties of the scattered wavefield are incorporated in a matrix, A, the elements of which are defined by the statistical averages
where I(t) represents the timedependent intensity of the incident pulse. The integrated intensity, I(q), corresponding to momentum transfer, q, may be written in the form
where
and f_{Z γ}(q) is the spherically symmetric elastic Xray scattering factor for ρ_{Z γ}(r). We note that T_{Z}(q) is readily generalized to include thermal effects on nuclear motion by convolving it with Gaussian ellipsoids. One can foresee further elaborations in which the effects of molecular dissociation precipitated by interaction with an XFEL pulse are incorporated by a similar modification of T_{Z}(q). As foreshadowed, in this model, all of the structural information is contained within the vector T_{Z}(q) that is specified purely by nuclear positions, and all electrodynamical information is contained within A_{Z,Z′}(q); one may determine one of these quantities from measurements of I(q) if the other is known to sufficient accuracy. Our attention here is restricted to the determination of molecular structures, but we note that A_{Z,Z′}(q) is a smooth, continuous function of q, so that electronic damage may also be characterized from measurements of I(q) in systems for which T_{Z}(q) is known. This feature may facilitate the incorporation of recently reported experimental information on the femtosecond electronic response of atoms as part of an integrated imaging strategy^{9}.
Under the interaction conditions of interest, A_{Z γ,Z′γ′} is poorly approximated by , for fixed constants and . As a consequence, I(q) possesses the statistical characteristics of a partially coherent diffraction pattern and the Rfactor provides an inappropriate measure of its information content.
It has recently been demonstrated^{10} that explicit incorporation of models of partial coherence into the solution of inverse problems may markedly improve the quality of reconstructions using iterative, propagationbased techniques. To relate the optical properties of the molecule with a farfield scattered intensity, it proves convenient to write I(q) as the modal expansion^{11}
where the real, nonnegative parameters, η_{k}, and the corresponding modes, ψ_{k}(q), are solutions of an Mdimensional integral equation defined by the mutual optical intensity of optical coherence theory^{11}, making explicit the connection foreshadowed after equation (1). Each mode assumes the general form
where c_{Z γ}^{k} represents the effective occupancy of shell γ in atom of type Z in mode ψ_{k}(q). A scheme to determine η_{k} and c_{Z γ}^{k} under typical experimental conditions appears in the Methods section. Without any detailed calculation, however, one may immediately deduce important qualitative features of the charge distribution from the structure of ψ_{k}(q) that lead to the partially coherent character of I(q). Each coefficient c_{Z γ}^{k} is necessarily real, but may be either positive or negative, because the elements of the set ψ_{k}(q) are orthonormal. The partially coherent character of I(q), which is electrodynamical in origin, is reproduced precisely in this approach by a small number of electrostatic charge distributions that represent multicentre shell polarizations of the various atomic types. Each of these modal density distributions, ρ_{k}(r), is obtained as the inverse Fourier transform of the corresponding optical mode, ψ_{k}(q).
The explicit inclusion of the electronic properties in the diffraction process has the interesting advantage that one is able to remove the intermediate step of recovering the electronic density distribution^{12}. We are, instead, able to formulate the problem in such a manner that the nuclear structure is revealed directly. In common with a recent study of diffractive imaging using partially coherent Xray sources^{13}, we here recast the partially coherent scattering information within an equivalent singlemode model inverse problem that carries the required information. It is clear from equation (4) that any single optical mode, ψ_{k}(q), carries all of this information, because the unknown parameters, R_{m}_{Z}^{Z}, appear within T_{Z}(q). The inverse problem to be solved is of the form T(q)T*(q)=I′(q), where T(q) and I′(q) are defined in the Methods section. The Fourier transform of T(q) is subject to the constraint that it must represent a nuclear distribution function of known spatial extent; the solution reveals the positions and identities of the constituent atoms without constructing an intermediate electron density.
Threedimensional diffraction data for bacteriorhodopsin were generated using the electrodynamical model^{6} described in the Methods section, assuming that 10^{12} 10 keV photons were incident on the target in each 5 fs pulse. The number of photons in the pulse determines the damage levels in the simulations, but we should make clear that we do not include the statistical uncertainties that would be associated with an experimental diffraction pattern. Compared with a similar calculation in the absence of any electronic damage, this resulted in an Rfactor of 0.17. Figure 1 shows samples of the undamaged diffraction pattern (Fig. 1a) and damaged pattern (Fig. 1b). These data display the reduction in the highq scattering resulting from depletion of the core electrons through the pulse. Figure 1c shows the ratio of these two patterns and illustrates that the impact of the electronic damage imposes an uneven and structuredependent modification to the diffraction data indicating that a simple rescaling of the pattern^{12}, although appealing, is not adequate but may be usefully regarded as a first approximation to a full solution.
A projection through the reconstruction is shown in Fig. 2, where it can be seen that the atomic number and location of each atomic species are recovered essentially perfectly. Indeed, each atomic position may be located beyond the formal resolution of the diffraction pattern using a centroiding process on the fringes around each nucleus; if the nucleus is centrally located in each pixel then these fringes are absent and their relative amplitudes encode the location of the nucleus in the form of the fringes. The concept of superresolving crystallographic data has recently emerged in a rather different but related context^{14}. There is no need to construct a representation of ρ(r,0), or to determine R_{m}_{Z}^{Z} from it by crystallographic modelbuilding. This appears to be the first reconstruction of its type; the molecular structure of a complex biomolecule has been recovered directly under interaction conditions in which the electron density is so comprehensively damaged that the usual working rules of protein crystallography are no longer valid. To underline this point, Fig. 2c shows a detail from a slice through the reconstruction, demonstrating that the atomic species information is automatically recovered using this algorithm.
The application of the XFEL to the determination of molecular structure is an extremely exciting endeavour. However, it is also one that offers very considerable scientific and technical challenges. A very fundamental challenge is the need to understand the role of the interaction of the molecule with the incident field. This Letter demonstrates that recovery of molecular structure in the presence of damage is most readily achieved by adopting a model that reflects the detailed interaction physics, rather than the usual crystallographic assumption of full coherence. Our approach is predicated on the need for the damage mechanisms to be well characterized, either by kinetic modelling, or by inversion of diffraction data obtained from scattering targets of known structure. As such, we submit an important obstacle in the realization of singlemolecule structural biology using XFEL sources has been removed.
Methods
Data simulation. The threedimensional intensity distribution of Xray photons scattered from a molecule of bacteriorhodopsin, I(q), was simulated using methods previously published^{6}.
The modal distributions defined by equation (3) are determined in the following manner. A nonorthogonal basis of orbital densities with which to expand the modes is
so that , and c_{Z γ}^{k} is an associated expansion coefficient that, in the present context, must be a real quantity. These coefficients and the corresponding real and nonnegative eigenvalues, η_{k}, that appear in equation (3) are determined as the solutions of a generalized matrix eigenvalue equation of the form JC=η SC. The diagonal matrix, η, contains the eigenvalues, the columns of C contain the eigenvectors and the matrix elements of J and S are defined by
where N_{Z} is the number of atoms of nuclear charge Z,δ_{Z Z′} is the Kroneker delta and
For consistency with the diffraction simulations, the orbital density integrals were evaluated using a sphericalatom model derived from Slatertype orbitals^{15} within the tightbinding approximation. We note, however, that the formalism is readily extended to more detailed numerical treatments of the electron density simply by interpreting γ to be a label identifying Nelectron electronic state functions including the effects of electronic relaxation, correlation and relativistic corrections. The three most significant modes contribute in the ratio 0.93:0.05:0.01 in the present model of bacteriorhodopsin.
Structure recovery. We have introduced the modes ψ_{k}(q) through equation (4). The coherent mode formulation of optical coherence theory^{11} then writes the coherence function, known in this context as the mutual optical intensity, in the form
where the measured intensity is given by I(q)=J(q,q), which is equation (3). If we substitute equations (4) into (5) and reorganize a little it adopts the form
We introduce the functions and . The sum in the function μ(q,Z) is weighted by the inverse of the atomic number so that it is approximately independent of the atomic number,
allowing us to write the mutual optical intensity in the approximate form
We may now use the expression for the intensity to conclude that to reconstruct molecular structure from diffraction data including electronic damage, we initially investigate the existence of a solution, T(q), of the equation
for the function B(q) defined in our theory by . The relatively simple modification suggested by equation (9) is a remarkable result and shows that the effects of electronic damage may be incorporated into a scheme designed to solve directly for the nuclear positions, removing all reference to the electronic coordinates. As shown further on, the assumption that leads to equations (8), (7), can be removed at the last stage of the solution but we have found it to be unnecessary; the approximation is excellent for a complex biomolecule.
The lefthand side of equation (9), which represents the measured intensities, I(q), and the product T(q)T*(q) are both nonnegative definite functions. As we consider I(q) to be constructed from modes that carry the structural information, any solution of the inverse problem T(q)T*(q)=I(q)/B(q)=I′(q) must satisfy the nuclear distribution constraints.
To a good approximation, we note that , where is a smooth, slowly varying function of q, for which . Defining B(q) by
we see that for q0, B(q)≃1.
Our focus on biomolecules suggests the use of a particularly simple model to specify a functional form for B(q) that possess the required characteristics, including in the neighbourhood of the zeros of I(q). The discrete nuclear positions in equation (2) are replaced by continuous spherical uniform nuclear charge distributions of finite radius, R, so that equation (2) is readily evaluated using a standard integral, leading to the approximation
where N_{Z} is the number of atoms of charge Z. For q0, this function has the limit N_{Z}N_{Z′} and vanishes for because no two atoms of different charge may share the same spatial position. In the case Z=Z′, we require that the function should possess the limit N_{Z} for as there are N_{Z} occurrences of unity appearing in the summation that defines T_{Z}(q)T_{Z′}*(q) that are not present in the continuum approximation leading to equation (10). These terms are restored by adding a correction of the form N_{Z}[1−exp(−κ q^{2})]. The selection of the value of κ proves to be not very critical provided that the function rises rapidly to the value N_{Z} for small q; we use κ≃0.1R^{2}.
Having specified B(q), the solution of the resulting inverse problem for T(q) is essentially conventional, and employs the iterative hybrid input–output and errorreduction algorithms^{16}, supplemented by the judicious use of ‘chargeflipping’^{17} in the early stages of the structure recovery to establish the effective exterior support surface of the molecule. The iterations are initialized by a uniform charge distribution with a radius of 30 Å, and typically converge within 1,000 iterations.
In view of the approximations involved in the specification of B(q), we note that a final refinement of the structure may be carried out directly from the measured intensities, I_{0}(q). This may be achieved by using the structure derived from T(q) to initialize a set of trial parameters using a conventional, gradientdriven errorminimization scheme, such as is commonly used in crystallography, to fit I_{0}(q), using the modal amplitudes to propagate the trial intensity, I(q). All modal amplitudes and their gradients with respect to the nuclear coordinates are readily evaluated using elementary analytical methods. We have confirmed that such an errorminimization scheme recovers the target structure if, but only if, the procedure is initialized sufficiently close to the solution that the global minimum is located; in practice this means that each nuclear position in the trial solution should lie within the Wigner–Seitz sphere centred at the target position. Using simulated noiseless data, this subsequent refinement proves to be unnecessary, but it may provide a measure of additional robustness against the effects of measurement noise in experimental data, when it becomes available.
References
Neutze, R., Wouts, R., van der Spoel, D., Weckert, E. & Hajdu, J. Potential for biomolecular imaging with femtosecond Xray pulses. Nature 406, 752–757 (2000).
Chapman, H. N. et al. Femtosecond diffractive imaging with a softXray freeelectron laser. Nature Phys. 2, 839–843 (2006).
Fung, R., Shneerson, V., Saldin, D. K. & Ourmazd, A. Structure from fleeting illumination of faint spinning objects in flight. Nature Phys. 5, 64–67 (2009).
Elser, V. Noise limits on reconstructing diffraction signals from random tomographs. IEEE Trans. Inf. Theor. 55, 4715–4722 (2009).
HauRiege, S. P., London, R. A., Huldt, G. & Chapman, H. N. Pulse requirements for Xray diffraction imaging of single biological molecules. Phys. Rev. E 71, 061919 (2005).
HauRiege, S. P., London, R. A. & Szoke, A. Dynamics of biological molecules irradiated by short Xray pulses. Phys. Rev. E 69, 051906 (2004).
Jurek, Z. & Faigel, G. The effect of tamper layer on the explosion dynamics of atom clusters. Eur. Phys. J. D 50, 35–43 (2008).
Jurek, Z., Faigel, G. & Tegze, M. Dynamics in a cluster under the influence of intense femtosecond hard Xray pulses. Eur. Phys. J. D 29, 217–229 (2004).
Young, L. et al. Femtosecond electronic response of atoms to ultraintense Xrays. Nature 466, 56–61 (2010).
Whitehead, L. W. et al. Diffractive imaging using partially coherent X rays. Phys. Rev. Lett. 103, 243902 (2009).
Wolf, E. New theory of partial coherence in the spacefrequency domain. 1. Spectra and cross spectra of steadystate sources. J. Opt. Soc. Am. 72, 343–351 (1982).
HauRiege, S. P., London, R. A., Chapman, H. N., Szoke, A. & Timneanu, N. Encapsulation and diffractionpatterncorrection methods to reduce the effect of damage in Xray diffraction imaging of single biological molecules. Phys. Rev. Lett. 98, 198302 (2007).
Dilanian, R. A. et al. Diffractive imaging using a polychromatic highharmonic generation softXray source. J. Appl. Phys. 106, 023110 (2009).
Schroder, G. F., Levitt, M. & Brunger, A. T. Superresolution biomolecular crystallography with lowresolution data. Nature 464, 1218–1222 (2010).
Slater, J. C. Atomic shielding constants. Phys. Rev. 36, 57–64 (1930).
Fienup, J. R. Phase retrieval algorithms—a comparison. Appl. Opt. 21, 2758–2769 (1982).
Oszlanyi, G. & Suto, A. The charge flipping algorithm. Acta Crystallogr. A 64, 123–134 (2008).
Acknowledgements
This research was supported by the Australian Research Council through its Centres of Excellence and Federation Fellowships programmes.
Author information
Authors and Affiliations
Contributions
H.M.Q. carried out most of the theoretical analysis and computational implementation. Both authors contributed to the conceptual formulation of the research and to the writing of the Letter.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Quiney, H., Nugent, K. Biomolecular imaging and electronic damage using Xray freeelectron lasers. Nature Phys 7, 142–146 (2011). https://doi.org/10.1038/nphys1859
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nphys1859
This article is cited by

Machine learning for laserinduced electron diffraction imaging of molecular structures
Communications Chemistry (2021)

A comprehensive simulation framework for imaging single particles and biomolecules at the European Xray FreeElectron Laser
Scientific Reports (2016)

Heterositespecific Xray pumpprobe spectroscopy for femtosecond intramolecular dynamics
Nature Communications (2016)

Singlepulse enhanced coherent diffraction imaging of bacteria with an Xray freeelectron laser
Scientific Reports (2016)