Abstract
Scattering experiments with femtosecond highintensity freeelectron laser pulses provide a new route to macromolecular structure determination. While currently limited to nanocrystals or virus particles, the ultimate goal is scattering on single biomolecules. The main challenges in these experiments are the extremely low signaltonoise ratio due to the very low expected photon count per scattering image, often well below 100, as well as the random orientation of the molecule in each shot. Here we present a de novo correlationbased approach and show that three coherently scattered photons per image suffice for structure determination. Using synthetic scattering data of a small protein, we demonstrate nearatomic resolution of 3.3 Å using 3.3 × 10^{10} coherently scattered photons from 3.3 × 10^{9} images, which is within experimental reach. Further, our threephoton correlation approach is robust to additional noise from incoherent scattering; the number of disordered solvent molecules attached to the macromolecular surface should be kept small.
Introduction
First proposed by Neutze et al.^{1}, singleparticle scattering experiments with highintensity Xray freeelectron lasers (XFELs) hold the promise to solve the threedimensional atomic structure of biological macromolecules such as proteins without the need for crystallization^{2,3,4,5}. Highrepetition femtosecond Xray pulses are used to outrun the severe radiation damage due to Auger decay and Coulomb explosion and thus allow for extremely high peak brilliance pulses to the point where single molecules can be imaged. Indeed, the first proof of principle experiments^{6,7} determined the 3D structure of single mimivirus particles to a resolution of 125 nm and Hosseinizadeh et al. recently demonstrated the structure determination of a coliphage virus with 9 nm resolution^{8}. In these experiments, more than 10^{7} photons per Xray pulse were scattered by the virus and recorded on a pixel detector (Fig. 1a). In contrast, for a mediumsized molecule and an expected XFEL fluence of 1.3 × 10^{6} photons nm^{−2} (10^{12} photons) at a 1 μm focus diameter^{9}, only about 10–50 coherently scattered photons per scattering image are expected at a beam energy of 5 keV (2.5 Å wavelength)^{9,10,11}.
The high statistical noise in this extreme Poisson regime poses considerable methodological challenges, and hence XFEL structure determination attempts almost exclusively focus on nanocrystals^{12,13,14,15,16,17,18}. A particular challenge is to determine the orientation of the molecule for each image to assemble all recorded images in 3D Fourier space for subsequent electron density determination. For macroscopic 2D objects and 3D objects rotated around a single axis, Philipp et al. showed structure recovery from only 2.5 photons per image on average^{19,20,21}, but the method was not extended or applied to threedimensional objects or molecules with unknown orientation. For singlemolecule scattering experiments, several orientation determination methods were developed^{22,23,24,25,26,27,28,29}, which, however, require at least 100 photons per image. Alternatively, manifold reconstruction algorithms (manifold embedding)^{30,31,32,33} forego the explicit assembly in Fourier space and instead use the similarity between scattering images to determine the manifold of orientations. However, also for these methods, successful structure determination was only reported for much more than 100 photons per image.
In fluorescence microscopy or cryoelectron microscopy, time integrated and timecorrelated singlephoton counting is used at extremely low signaltonoise ratios^{34}. In the context of singlemolecule Xray scattering, twophoton correlations were successfully used to determine the molecular shape of symmetric particles^{35,36} and the structure of particles randomly oriented around one axis^{37,38}. However, two photons are not sufficient to retrieve the structure de novo.
Based on early analytic work on degenerate threephoton correlations^{39}, structure determination of mesoscopic cylindrical particles^{40} and of a highly symmetric icosahedral virus^{41,42} was demonstrated. This approach is limited to only a small fraction of the recorded correlations; however, also this method has so far not been applied to de novo singlemolecule structure determination.
Here, we use the full threephoton correlation as an orientationindependent representation of the scattering images. We demonstrate that only three coherently scattered photons per image are required for de novo structure determination, such that nearatomic resolution for single biomolecules should in principle be possible even at extremely low photon counts.
Results
Structure determination
Like in Xray crystallography, the photon distribution of each scattering image follows the intersection between the Ewald sphere and the 3D intensity, \(I\left( {\mathbf{k}} \right) \propto \left {{\it{{\cal F}{\cal T}}}\left[ {\rho \left( {\mathbf{x}} \right)} \right]} \right^2\), which is proportional to the absolute square of the Fouriertransformed electron density ρ(x). The orientation of the Ewald sphere depends on the molecular orientation and so does the scattering image. In contrast to Xray crystallography, I(k) is continuous for singlemolecule scattering, rendering the phase problem accessible to established methods^{43,44,45,46}. Because the orientation of the molecule is unknown, here I(k) is determined via the threephoton correlation function t(k_{1}, k_{2}, k_{3}, α, β) which is accumulated from all photon triplets in the recorded scattering images as illustrated in Fig. 1b.
To recover I(k), an analytic expression of the full threephoton correlation as a function of the 3D intensity I(k) was derived using shellwise spherical harmonics (SH) expansions^{47} for \(I\left( {\mathbf{k}} \right) = \mathop {\sum}\nolimits_{lm} A_{lm}\left( {\left {\mathbf{k}} \right} \right)Y_{lm}\left( {\theta ,\varphi } \right)\) (Methods and Supplementary Notes 1–3). This choice allows for adapting the number K(L^{2} + 3L + 2)/2 of SH basis functions to the target resolution via the largest considered wave number k_{cut}, the number K of used shells between 0 ... k_{cut}, and the expansion order L. We were unable to invert the analytic expression of the threephoton correlation, and the number of unknowns (e.g., 4940 for K = 26, L = 18) is too large for a straightforward numeric solution. To circumvent this problem, we used a probabilistic approach and solved for those SH coefficients {A_{ lm }(k)} that maximize the probability, \(p\left( {\left\{ {k_1^i,k_2^i,k_3^i,\alpha ^i,\beta ^i} \right\}\left\{ {A_{lm}\left( k \right)} \right\}} \right) = \mathop {\prod}\nolimits_{i = 1...T} \tilde t\left( {k_1^i,k_2^i,k_3^i,\alpha ^i,\beta ^i} \right)_{\left\{ {A_{lm}\left( k \right)} \right\}}\) (Bayesian with uniform prior), of observing all T recorded triplets (Methods and Supplementary Notes 4 and 5). Due to their statistical independence, p is the product of the probabilities of observing each recorded photon triplet which is given by the normalized threephoton correlation \(\tilde t\left( {k_1,k_2,k_3,\alpha ,\beta } \right)_{\{ A_{lm}\left( k \right)\} }\). The search space was further reduced by utilizing the analytic inversion of the twophoton correlation^{39} (Methods and Supplementary Note 6), rendering the problem accessible to Monte Carlo simulated annealing^{48}. We found that independent Monte Carlo runs converged to similar intensities (Pearson correlation of 0.99), suggesting that the solution of the inversion of the three photon correlations is unique.
Contrary to intuition, smaller molecules are more demanding than larger ones^{24}. We therefore challenged our approach by using the 46 residue comprising Crambin protein, which is known to 0.8 Å resolution^{49} (Fig. 2e). We estimated an average of 14 coherently scattered photons per Crambin shot, a number which is achieved, e.g., at the XFEL at DESY using an Xray intensity of 10^{12} photons per pulse at 5 keV and a 1 μm beam diameter. The estimates were calculated with the Condor package by Hantke et al.^{10} using a flattop beam profile. An independent calculation using the SimEx simulation framework for imaging single particles at the European XFEL by FortmannGrote et al.^{9,11} using a realistic beam profiles yielded similar numbers.
As a conservative test case, and to challenge our method, we generated up to 3.3 × 10^{9} synthetic scattering images with only 10 photons on average, totaling up to 3.3 × 10^{10} recorded photons (Methods). With an expected XFEL repetition rate of up to 27 kHz^{50}, and assuming a hit rate of 10%, we expect this data to be collected within a few days (Fig. 3d). As discussed in Supplementary Note 8, the data acquisition time substantially decreases to, e.g., approx. 30 min when on average 100 photons per image are recorded (e.g., by shrinking the beam diameter by a factor of 3 to approx. 300 nm), reducing the total number of required photons by a factor 100 to 3.3 × 10^{8} (and reducing the number of images by a factor 1000 to 3.3 × 10^{6}). Even for a lower hit rate such as 1%, 300 min would suffice in this case.
From the synthetic scattering images, we performed 20 independent structure determination runs (Methods and Supplementary Fig. 7). For all runs we used an expansion order L = 18, K = 26 shells and a cutoff k_{cut} = 2.15 Å^{−1} (Supplementary Note 7 discusses the optimal parameters), thus setting the maximum achievable resolution to 2.9 Å. Fig. 2ac compares the average intensity obtained from these 20 runs (green) with the reference intensity derived from the known Xray structure (blue). Overall, the shape of the intensity is recovered very well and only minor deviations in the outer shells, where fewer photons are recorded, are present.
To assess the achievable resolution of the determined Fourier intensities, we calculated 20 real space electron density maps using an iterative phase retrieval algorithm^{45}. Figure 2d and e compares the average of the 20 retrieved densities (d, green shaded structure) with the the reference electron density (e, blue shaded structure) which has been calculated from the Fourier density (including phases) with same cutoff k_{cut} as (d). The crosscorrelation between the two densities is 0.9. The Fourier shell correlation (FSC) between the known reference electron density of Crambin and the retrieved averaged electron density was calculated as a function of the wave number k^{51} (note that we use k_{in} = 2π/λ for all wave number calculations). Similar to singleparticle electron microscopy^{51}, the wave number k_{res} at which FSC(k_{res}) = 0.5 was used to estimate the achieved radial resolution Δr = 2π/k_{res}. Here, a nearatomic resolution of 3.3 Å was achieved.
Resolution as function of number of recorded images
Next we explored how the achieved resolution depends on the number of observed photons (and triplets, respectively), and hence the number of recorded images. To this end, electron densities were calculated and averaged as above from 1.3 × 10^{7} up to 3.3 × 10^{10} photons gathered from images with 10 photons on average (4.7 × 10^{8} up to 1.2 × 10^{12} triplets). Figure 3a depicts the respective FSC curves for different photon counts along with the 0.5 cutoff (vertical dashed line) and the corresponding resolutions (inset).
As mentioned before, for 3.3 × 10^{10} photons a nearatomic resolution of 3.3 Å was achieved. Decreasing the number of photons by a factor of 10 decreased the resolution only slightly by 0.4 Å to 3.7 Å, which indicates that very likely fewer than 3.3 × 10^{10} photons suffice to achieve nearatomic resolution. If much fewer photons are recorded, e.g. 1.3 × 10^{7} (4.4 × 10^{8} triplets), the resolution decreased markedly to 14 Å. To address the question how much further the resolution can be increased, we mimicked an experiment with infinite number of photons by determining the intensity from the analytically calculated threephoton correlation using Eq. (3) from the Methods section. As can be seen in Fig. 3a (purple line), the resolution only slightly improved by 0.1 Å to about 3.2 Å indicating that at this point either the expansion order L or insufficient convergence of the Monte Carlobased structure search became resolution limiting. To distinguish between these two possible causes, we phased the electron density directly from the reference intensity, using the same expansion order L = 18 as in the other experiments. The reference intensity is free from convergence issues of the Monte Carlo structure determination and the resulting electron density only includes the phasing errors introduced by the limited angular resolution of the SH expansion in Fourier space. The FSC curve of the optimal phasing (gray dashed) shows only a minor increase in resolution to 3.1 Å indicating that the Monte Carlo search decreases the resolution by 0.1 Å. The remaining 0.2 Å difference to the optimal resolution of 2.9 Å at the given k_{cut} (not shown) is attributed to the finite expansion order L and the corresponding phasing errors. We have also independently assessed the overall phasing error by calculating the intensity shell correlation (ISC) between the intensities of the phased electron densities \(I_{{\mathrm{phased}}} = \left {{\it{{\cal F}{\cal T}}}[\,\rho _{{\mathrm{retrieved}}}]} \right^2\) and the intensities before phasing I_{retrieved} (Methods and Supplementary Fig. 8). As discussed in the Methods section, the phasing method does not markedly deteriorate our structures.
Because a large expansion order L requires a larger number of shells K, and, therefore, much larger numbers of unknowns (Supplementary Note 7), the question remains at which point overfitting occurs. To quantify this effect for our sets of images, we calculated the achieved resolution as a function of expansion order L for four different total photon counts 5.1 × 10^{7}, 2.0 × 10^{8}, 8.2 × 10^{8}, and 3.3 × 10^{10} (1.8 × 10^{9}, 7.1 × 10^{9}, 2.8 × 10^{10}, and 1.2 × 10^{12} triplets, respectively) at a fixed number of shells K = 26. Indeed, as shown in Fig. 3b, for up to 2.0 × 10^{8} photons, the obtained threephoton correlation is too noisy to yield an improved resolution when increasing the model detail and for larger L, the probability p of the intensity model still increases whereas the resolution decreases again, indicating overfitting. In contrast, for larger photon counts (>8.2 × 10^{8}), the resolution improves even up to the expansion order L = 18 and no overfitting is expected here. However, due to the large parameter space, convergence of the simulated annealing becomes computationally demanding (Supplementary Notes 4, 5, and 7).
Robustness to noise
We finally assessed how robust our approach is in the presence of additional experimental noise due to, e.g., incoherent scattering, background radiation, detector noise, or scattering at the unstructured fraction of water molecules that may adhere to the surface of the macromolecules^{1}. Since only very few singlemolecule scattering experiments have been carried out so far, quantitative noise models are available only for incoherent scattering, for which a noise level of ca. γ = 25%^{52} is expected. Here we modeled the noise as a Gaussian distribution, G(k, σ) = γ(2πσ^{2})^{−1/2}exp(−k^{2}/2σ^{2}). Depending on the width σ, different signaltonoise ratios are expected in the lowresolution and highresolution regions of the image, respectively. For incoherent scattering (indicated as gray background) a width of σ = 2.5 Å^{−1} was assumed^{53} (Supplementary Note 9), which corresponds to a relatively uniform noise distribution. Figure 3c (black line) shows a moderate decrease in resolution to approx. 3.5 Å when this noise is included within our synthetic experiments (as described in Supplementary Note 9). Additional noise with a uniform distribution from, e.g. background radiation or detector noise, slightly decreased the resolution to 3.8 Å at 50% noise level.
For scattering from disordered water molecules that are attached to the macromolecular surface, a narrower intensity distribution is expected (Supplementary Fig. 4). To also investigate this effect and the effect of other potential noise sources with nonuniform distribution, in Fig. 3c, we considered noise with widths of σ = [0.5, 0.75, 1.125] Å^{−1} and noise levels γ between 10 and 50%, the latter corresponding e.g. to up to 100 disordered water molecules per Crambin molecule. The resolution remained better than 5 Å within the 25% noise level but decreases markedly to 9 Å with γ = 50%, in particular for narrow noise widths of σ = [0.5, 0.75] Å^{−1}.
Sample application to experimental data
To test if our method is also robust against noise in real experimental data, we have determined the structure of the coliphage PR772 virus from the Reddy et al. data set^{54} (Supplementary Note 10), albeit at much higher photon counts than our method is targeted for. As described in ref. ^{54}, this imageset has been obtained by filtering the raw images for single molecule hits with diffusion map embedding. Therefore, to mimic low photon counts, we downsampled the images, which contain over 400,000 photons per image, and generated 3 × 10^{12} triplets using the same rejection sampling method that we used to generate the Crambin images, and subsequently applied the same reconstruction procedure (Supplementary Fig. 5). A resolution of 11.7 nm was achieved, as evidenced from the FSC between two independently determined structures (Supplementary Fig. 6). This resolution is slightly lower than the 9 nm obtained by Hosseinizadeh et al.^{55}, which may be due to the fact that we used fewer photons, implying additional Poisson noise. Also, in contrast to Hosseinizadeh et al., we have not implicitly imposed any icosahedral symmetry in our reconstructions.
Discussion
The presented method demonstrates de novo structure determination from as few as three photons per XFEL scattering image at nearatomic resolution. Our synthetic scattering experiments with subsequent structure determination have shown that, for the most challenging case of small biomolecules, a resolution better than 3.3 Å should be achievable with available technology at realistic beam times; specifically, as our conservative estimate rests on a beam fluence of 5.0 × 10^{11} photons per pulse. Assuming a 10% hit rate, our method requires only ca. 10^{10} molecules, which is, compared to nanocrystallography, smaller by a factor of 10 (10^{5} nanocrystals with 10^{6} nm^{3} volume)^{13}.
Even higher resolutions are conceivable for larger molecules due to the larger scattering signal^{24}, albeit computational resources may become a limiting factor when determining larger structures at the same resolution of around 3 Å. However, as shown for the structure determination of the much larger coliphage virus in Supplementary Note 10, the computational complexity only depends on the ratio between the size of the molecule and the desired resolution. For a given resolution, the computational complexity scales slightly faster than the molecular weight cubed.
Given that currently available de novo refinement methods require at least 100 photons per image, we consider our finding that only three photons per image suffice quite unexpected. Further, in this extreme Poisson regime, our threephoton correlation approach—in contrast to previous structure determination methods—allows to compensate for fewer photons per image P by acquiring more images I. In particular, because two photons per image do not uniquely determine the structure^{39}, here we have reached the fundamental limit.
Our analysis also suggests that the method is robust against noise from incoherent scattering, and that removing as much as possible disordered water (or other contaminants) from the molecule in the experiment is crucial. Further, fluctuations of the beam intensity—both in time and due to beamparticle impact parameter fluctuations, which are a limiting factor for imagewise orientationbased methods, should not deteriorate the resolution in our approach, as the correlations are insensitive to such fluctuations. Clearly, further experimental data and improved noise models are required to study the effect of these and other potential noise sources such as background radiation from the evaporated water and detector noise. Structural fluctuations and inhomogeneities of the sample turn more and more into a limiting factor for all current structure determination methods—particularly for high resolutions. Notably, for mixtures of several structures, singleparticle scattering implies that the threephoton correlation on which our method rests is a linear superposition of the threephoton correlations of the individual structures. Hence, our approach should be generalizable in a straightforward way to refine such mixtures, albeit at the cost of more required images, larger computational effort, and more severe convergence issues. Further, due to the averaging properties of the threephoton correlations, our method should be more robust than methods that rely on an accurate orientation of individual scattering images.
We have tested our approach for a conservative estimate of 10 coherently scattered photons. Should the number of coherently scattered photons per shot be larger, e.g., by reducing the size of the beam focus, our method might even bring singlemolecule structure determination within reach of less bright free electron lasers or even table top setups^{56}.
Overall, our results suggest that nearatomic structure determination by singlemolecule Xray scattering is within experimental reach. We would like to point out that our correlationbased method can also determine structures from images containing more than one particle which may further reduce the data acquisition time and facilitate sample delivery (Supplementary Note 11 discusses how the twophoton and threephoton correlation of single molecules is calculated from multiparticle correlations). The method is potentially also useful to extract as much as possible information from other types of scattering experiments, in particular when 3D structures are inferred from noisy twodimensional projections, such as cryoEM^{57,58}, Xray microscopy, subdiffractive optical microscopy^{59,60}, and from fluctuations in correlated Xray scattering.
Methods
Threephoton correlations expressed in SH
The threephoton correlation t(k_{1}, k_{2}, k_{3}, α, β) is the orientational average 〈〉_{ ω } of the product between three intensities I(k) that lie on the intersection between the Ewald sphere and the 3D Fourier density (see Supplementary Note 12),
Here, without loss of generality, the three vectors \({\mathbf{k}}_1^ \star ,\,{\mathbf{k}}_2^ \star ,\,{\mathrm{and}}\,{\mathbf{k}}_3^ \star\), are the projection onto the Ewald sphere of the three photons k_{ 1 } = (k_{1}, 0, 0), k_{ 2 } = k_{2}(cos α, sin α, 0), and k_{ 3 } = k_{3}(cos β, sin β, 0) in the detector plane. Using a shellwise SH decomposition of the intensity^{47},
with the coefficients A_{lm}(k) describing the intensity function on the respective shells, the threephoton correlation is expressed in sums of products of SH coefficients together with known Wigner3j symbols and SH basis functions Y_{ lm }(θ, φ),
See Supplementary Note 1 for the full derivation of Eq. (3).
Synthetic data generation
We validated our structure determination approach using synthetic scattering experiments on the structure of the 46 residue protein Crambin (PDB descriptor: 3U7T)^{49} which has been determined to 0.8 Å resolution. To this end, we approximated the 3D electron density ρ(x) by a sum of Gaussian functions centered at the atomic positions with height γ and variance σ depending on the atom type. The absolute square of the electron densities’ Fourier transformation \(I({\mathbf{k}}) = \left {{\it{{\cal F}{\cal T}}}[\rho ({\mathbf{x}})]} \right^2\) was used to generate synthetic scattering images. In each synthetic scattering experiment, the molecule, and thus also I(k), was randomly oriented. On average P photons per image were generated each shot, according to the distribution given by the randomly oriented Ewald slice of the intensity I_{ ω }(K).
To generate the distributions numerically, first, a random set of N_{pos} positions {K_{ i }} in the k_{ x }k_{ y }plane was generated according to a 2D Gaussian distribution G(K) with width σ = 1.05 Å^{−1}. Given a random 3D rotation U (see Supplementary Note 4 for uniform sampling of SO(3)), rejection sampling method was used to accept or reject each position according to ξ < I_{ ω }(U ⋅ K_{ i })/(M ⋅ G(K_{ i })) using uniformly distributed random numbers ξ ∈ [0, 1] each. Here, the constant M was chosen as I_{max} ⋅ max(G(K)) such that the ratio I_{ ω }(U ⋅ K_{ i })/(M ⋅ G(K_{ i })) is below 1 for all K. In accordance with our most conservative estimate discussed in the main text, the number of positions N_{pos} was chosen such that on average 10 scattered photons were generated. For assessing the dependency of the resolution on the number of scattered photons, additional image sets with 25, 50, or 100 scattered photons were also generated (Supplementary Note 8).
For technical reasons, we used a SH expansion of the intensity with a high expansion order L = 35 as a sufficiently accurate approximation for I(k) to generate the images. The accuracy of the intensity model was crosschecked with the intensity calculated on a cubic grid (150 grid size) using the Fast Fourier Transform, resulting in a 0.9999 correlation, thus establishing sufficient accuracy. Altogether, up to 3.3 × 10^{10} images were generated using a high degree of parallelism.
Probability of observing a set of triplets
Because we were not able to derive an analytic inversion for Eq. (3), we chose a probabilistic approach and asked which intensity I(k) is most likely to have generated the complete set of measured scattering images and triplets, respectively. To this end, we considered the probability p that a given intensity I(k), expressed in SH by {A_{lm}(k)}, generated the set of triplets, \(\left\{ {k_1^i,k_2^i,k_3^i,\alpha ^i,\beta ^i} \right\}_{i = 1...T}\),
Due to the statistical independence of the triplets, this probability p is a product over the probabilities \(\tilde t(k_1^i,k_2^i,k_3^i,\alpha ^i,\beta ^i)\) of observing the individual triplets i which is given by the normalized threephoton correlation \(\tilde t\left( {k_1,k_2,k_3,\alpha ,\beta } \right)\). Here, \(\tilde t\left( {k_1,k_2,k_3,\alpha ,\beta } \right)\) was calculated using Eq. (3) for varying intensity coefficients {A_{ lm }(k)} and the coefficients that maximized \(p\left( {\left\{ {k_1^i,k_2^i,k_3^i,\alpha ^i,\beta ^i} \right\}} \right)\) were determined using a Monte Carlo scheme.
In contrast to the direct inversion, the probabilistic approach has the benefit of fully accounting for the Poissonian shot noise implied by the limited number of photon triplets that are extracted from the given scattering images. We note that this approach also circumvents the limitation faced by Kam^{39}, where only triplets with two photons recorded at the same position could be considered. Because all other triplets had to be discarded, Kam’s approach is limited to very high beam intensities, and cannot be applied in the present extreme Poisson regime.
Reduction of the search space using twophoton correlations
In our approach, we used the structural information contained within the twophoton correlation to reduce the highdimensional search space. In analogy to the threephoton correlation, the twophoton correlation is expressed as a sum over products of SH coefficients A_{ lm }(k) weighted with Legendre polynomials P_{ l }^{35,39},
Please note that the α which is seen on the detector is different from the angle \(\alpha ^ \star = {\mathrm{cos}}^{  1}\left( {{\mathrm{sin}}\left( {\theta _1} \right){\mathrm{sin}}\left( {\theta _2} \right){\mathrm{cos}}\left( \alpha \right) + {\mathrm{cos}}\left( {\theta _1} \right){\mathrm{cos}}\left( {\theta _2} \right)} \right)\) between the two points in 3D intensity space due to the Ewald curvature \(\left( {\theta = {\mathrm{cos}}^{  1}\left( {k\lambda /4\pi } \right)} \right)\).
The inversion of Eq. (5) yields coefficient vectors \({\mathbf{A}}_l^0\left( k \right) = \left( {A_{l  m}^0,...,A_{lm}^0} \right)\) for all l ≤ L ≤ K_{max}/2 and −l < m < l, as first demonstrated by Kam^{39}. However, all rotations in the 2l + 1dimensional coefficient eigenspaces of \({\mathbf{A}}_l^0\left( k \right)\) by U_{ l } are also solutions,
The result implies that the inversion only gives a degenerate solution for the coefficients and the intensity cannot be determined solely from two photons. Here, we used Eq. (6) to search for the optimal rotations U_{ l } instead of optimal coefficients \(A_{{\mathrm{lm}}}^{{\mathrm{all}}}\left( k \right)\), which reduced the size of the search space from \(\left( {\frac{1}{2}L^2 + \frac{3}{2}L + 1} \right) \cdot K\) to \(\frac{1}{3}\left( {L^3 + \frac{{15}}{4}L^2 + \frac{7}{2}L} \right)\) unknowns (e.g., reducing the number of unknowns from 4940 coefficients to 2370 rotation angles for L = 18 and K = 26). See Supplementary Note 6 for more details.
Monte Carlo simulated annealing
The probability p from Eq. (4) was maximized by a Monte Carlo/simulated annealing approach on the energy function:
in the space of all rotations U_{ l } given by the inversion of the twophoton correlation. Each Monte Carlo run was initialized with a random set of rotations {U_{ l }} and the set of unaligned coefficients \(\left\{ {{\mathbf{A}}_l^0} \right\}\). In each Monte Carlo step j, all rotations \({\mathbf{U}}_l^j\) were varied by small random rotations Δ_{ l }(β_{ l }) such that the updated rotations for each l (l ≤ L) read \({\mathbf{U}}_l^{j + 1} = {\bf{\Delta }}_l(\beta _l) \cdot {\mathbf{U}}_l^j\) using stepsizes β_{ l }. In order to escape local minima, a simulated annealing was performed using an exponentially decaying temperature protocol, T(j) = T_{init}exp(j/τ). Steps with an increased energy were also accepted according to the Boltzmann factor exp(−ΔE/T). We further used adaptive stepsizes such that all β(l) were increased or decreased by a factor μ when accepting or rejecting the proposed steps, respectively. Convergence was improved by using a hierarchical approach in which the intensity was first determined with low angular resolution and further increased to high resolution. To this end, the variations of lowresolution features were frozen out faster than the variations of highresolution features. See Supplementary Note 4 on how to generate random rotations in SO(n) and how the parameters of the Monte Carlo search were determined.
Calculation of real space electron densities and resolutions
Supplementary Fig. 7 summarizes the calculation of the electron densities as carried out in this work. All intensities were obtained up to an arbitrary Euler rotation (θ, ϕ, ψ) and were therefore rotationally fit to the known reference intensity for subsequent comparison. The phases of the aligned intensities were calculated using the relaxed averaged alternating reflections (RAAR) method by Luke^{45}. The resolution of the electron densities was characterized by the FSC,
In analogy to cryoEM^{51}, the resolution is defined as the wave number k_{res} at which FSC(k) = 0.5, yielding a radial resolution Δr = 2π/k_{res}.
Starting from an individual set of doublet and triplet histograms (Supplementary Fig. 1), 20 independent intensity determination runs were carried out to asses and improve convergence of the Monte Carlo simulated annealing runs. To reduce the phasing error, the phase retrieval of one intensity was carried out eight times and the resulting eight electron densities were averaged. The final electron density, for which the resolution is given, is the average of those 20 individual densities and the resolution error was estimated from the standard deviation of the resolution of the 20 individual electron densities. We chose to average in real space instead of Fourier space before phasing because we found that this sequence yielded more accurate electron densities.
Evaluation of phasing errors
To asses the phasing error, we compared the intensities of the phased electron densities \(I_{{\mathrm{phased}}} = \left {{\it{{\cal F}{\cal T}}}\left[ {\rho _{{\mathrm{retrieved}}}} \right]} \right^2\) with the intensities I_{retrieved} before phasing. To this end, the ISC was calculated as:
In analogy to the FSC, we considered ISC(k) = 0.5 as a resolution measure. As can be seen in Supplementary Fig. 8, the phasing shifted this crossover from approx. 2.8 to 3.1 Å, but does not distort the shapes and relative heights of the ISC curves. Assuming that the phasing error can be estimated from the shift of this crossover, for our highresolution density result with 3.3 Å resolution (retrieved from 3.3 × 10^{10} photons), a decrease in resolution of ca. 0.3 Å is expected to be due to phasing.
Data availability
All relevant data are available from the authors.
Code availability
The code is available at https://github.com/h4rm/ThreePhotons.jl and the data analysis was done using IJulia notebooks which are available at https://github.com/h4rm/ThreePhotonsNotebook. For more information, please visit http://www.mpibpc.mpg.de/grubmueller/threephotons.
References
Neutze, R., Wouts, R., van der Spoel, D., Weckert, E. & Hajdu, J. Potential for biomolecular imaging with femtosecond Xray pulses. Nature 406, 752–757 (2000).
Hajdu, J. Singlemolecule Xray diffraction. Curr. Opin. Struct. Biol. 10, 569–573 (2000).
Huldt, G., Szoke, A. & Hajdu, J. Diffraction imaging of single particles and biomolecules. J. Struct. Biol. 144, 219–227 (2003).
Gaffney, K. J. & Chapman, H. N. Imaging atomic structure and dynamics with ultrafast Xray scattering. Science 316, 1444–1449 (2007).
Miao, J., Ishikawa, T., Robinson, I. K. & Murnane, M. M. Beyond crystallography: diffractive imaging using coherent xray light sources. Science 348, 530–535 (2015).
Seibert, M. M. et al. Single mimivirus particles intercepted and imaged with an Xray laser. Nature 470, 78–81 (2011).
Ekeberg, T. et al. Threedimensional reconstruction of the giant mimivirus particle with an Xray freeelectron laser. Phys. Rev. Lett. 114, 98102 (2015).
Hosseinizadeh, A. et al. Highresolution structure of viruses from random diffraction snapshots. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130326–20130326 (2014).
Yoon, C. H. et al. A comprehensive simulation framework for imaging single particles and biomolecules at the European Xray freeelectron laser. Sci. Rep. 6, 24791 (2016).
Hantke, M. F., Ekeberg, T. & Maia, F. R. Condor: a simulation tool for flash Xray imaging. J. Appl. Crystallogr. 49, 1356–1362 (2016).
FortmannGrote, C. et al. in Proceedings of SPIE—The International Society for Optical Engineering (eds Tschentscher, T. & Patthey, L.) Vol. 10237, 102370S (International Society for Optics and Photonics, Bellingham, 2017).
Chapman, H. N. et al. Femtosecond diffractive imaging with a softXray freeelectron laser. Nat. Phys. 2, 839–843 (2006).
Chapman, H. N. et al. Femtosecond Xray protein nanocrystallography. Nature 470, 73–77 (2011).
Boutet, S. et al. Highresolution protein structure determination by serial femtosecond crystallography. Science 337, 362–364 (2012).
Fromme, P. & Spence, J. C. H. Femtosecond nanocrystallography using Xray lasers for membrane protein structure determination. Curr. Opin. Struct. Biol. 21, 509–516 (2011).
Kirian, R. A. et al. Femtosecond protein nanocrystallography—data analysis methods. Opt. Express 18, 5713–5723 (2010).
Schlichting, I. Serial femtosecond crystallography: the first five years. IUCrJ 2, 246–255 (2015).
Roedig, P. et al. Highspeed fixedtarget serial virus crystallography. Nat. Methods 14, 805–810 (2017).
Philipp, H. T., Ayyer, K., Tate, M. W., Elser, V. & Gruner, S. M. Solving structure with sparse, randomlyoriented xray data. Opt. Express 20, 13129 (2012).
Philipp, H. T., Ayyer, K., Tate, M. W., Elser, V. & Gruner, S. M. Recovering structure from many lowinformation 2D images of randomlyoriented samples. J. Phys. Conf. Ser. 425, 192016 (2013).
Ayyer, K., Philipp, H. T., Tate, M. W., Elser, V. & Gruner, S. M. Realspace Xray tomographic reconstruction of randomly oriented objects with sparse data frames. Opt. Express 22, 2403 (2014).
Shneerson, V. L., Ourmazd, A. & Saldin, D. K. Crystallography without crystals. I. The commonline method for assembling a threedimensional diffraction volume from singleparticle scattering. Acta Crystallogr. A Found. Crystallogr. 64, 303–315 (2008).
Loh, N. T. D. & Elser, V. Reconstruction algorithm for singleparticle diffraction imaging experiments. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 80, 26705 (2009).
Walczak, M. & Grubmüller, H. Bayesian orientation estimate and structure information from sparse singlemolecule xray diffraction images. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 90, 22714 (2014).
Flamant, J., Bihan, N. L., Martin, A. V. & Manton, J. H. Expansionmaximizationcompression algorithm with spherical harmonics for single particle imaging with Xray lasers. Phys. Rev. E 93, 053302 (2016).
Kassemeyer, S. et al. Optimal mapping of xray laser diffraction patterns into three dimensions using routing algorithms. Phys. Rev. E 88, 042710 (2013).
Elser, V. Threedimensional structure from intensity correlations. New J. Phys. 13, 123014 (2011).
Donatelli, J. J., Sethian, J. A. & Zwart, P. H. Reconstruction from limited singleparticle diffraction data via simultaneous determination of state, orientation, intensity, and phase. Proc. Natl Acad. Sci. USA 114, 7222–7227 (2017).
Tegze, M. & Bortel, G. Atomic structure of a single large biomolecule from diffraction patterns of random orientations. J. Struct. Biol. 179, 41–45 (2012).
Fung, R., Shneerson, V., Saldin, D. K. & Ourmazd, A. Structure from fleeting illumination of faint spinning objects in flight. Nat. Phys. 5, 64–67 (2008).
Moths, B. & Ourmazd, A. Bayesian algorithms for recovering structure from singleparticle diffraction snapshots of unknown orientation: a comparison. Acta Crystallogr. A Found. Crystallogr. 67, 481–486 (2011).
Schwander, P., Giannakis, D., Yoon, C. H. & Ourmazd, A. The symmetries of image formation by scattering. II. Appl. Opt. Express 20, 12827–12849 (2012).
Giannakis, D., Schwander, P. & Ourmazd, A. The symmetries of image formation by scattering. I. Theoretical framework. Opt. Express 20, 12799–12826 (2012).
Enderlein, J. Maximumlikelihood criterion and singlemolecule detection. Appl. Opt. 34, 514 (1995).
Saldin, D. K., Shneerson, V. L., Fung, R. & Ourmazd, A. Structure of isolated biomolecules obtained from ultrashort xray pulses: exploiting the symmetry of random orientations. J. Phys. Condens. Matter. 21, 134014 (2009).
Saldin, D. K. et al. Beyond smallangle xray scattering: exploiting angular correlations. Phys. Rev. B 81, 1–6 (2010).
Saldin, D. K. et al. Structure of a single particle from scattering by many particles randomly oriented about an axis: a new route to structure determination? New J. Phys. 12, 35014 (2010).
Saldin, D. K. et al. New light on disordered ensembles: ab initio structure determination of one particle from scattering fluctuations of many copies. Phys. Rev. Lett. 106, 115501 (2011).
Kam, Z. The reconstruction of structure from electron micrographs of randomly oriented particles. J. Theor. Biol. 82, 15–39 (1980).
Starodub, D. et al. Singleparticle structure determination by correlations of snapshot Xray diffraction patterns. Nat. Commun. 3, 1276 (2012).
Saldin, D., Poon, H.C., Schwander, P., Uddin, M. & Schmidt, M. Reconstructing an Icosahedral virus from singleparticle diffraction experiments. Opt. Express 19, 18 (2011).
Poon, H. C. & Saldin, D. K. Use of triple correlations for the sign determinations of expansion coefficients of symmetric approximations to the diffraction volumes of regular viruses. Struct. Dyn. 2, 041716 (2015).
Fienup, J. R. Phase retrieval algorithms: a comparison. Appl. Opt. 21, 2758–2769 (1982).
Fienup, J. & Wackerman, C. C. Phase retrieval stagnation problems and solutions. J. Opt. Soc. Am. A 3, 1897–1907 (1986).
Luke, D. R. Relaxed averaged alternating reflections for diffraction imaging. Inverse Probl. 37, 13 (2004).
Shechtman, Y. et al. Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32, 87–109 (2015).
Baddour, N. Operational and convolution properties of threedimensional Fourier transforms in spherical polar coordinates. J. Opt. Soc. Am. A 27, 2144 (2010).
Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Science 220, 671–680 (1983).
Chen, J. C. H. et al. Roomtemperature ultrahighresolution timeofflight neutron and Xray diffraction studies of H/Dexchanged crambin. Acta Crystallogr. F Struct. Biol. Cryst. Commun. 68, 119–123 (2012).
Barty, A., Küpper, J. & Chapman, H. N. Molecular imaging using Xray freeelectron lasers. Annu. Rev. Phys. Chem. 64, 415–435 (2013).
Van Heel, M. & Schatz, M. Fourier shell correlation threshold criteria. J. Struct. Biol. 151, 250–262 (2005).
Hubbell, J. H. et al. Atomic form factors, incoherent scattering functions, and photon scattering cross sections. J. Phys. Chem. Ref. Data 4, 471–538 (1975).
Klein, O. & Nishina, T. Über die Streuung von Strahlung durch freie Elektronen nach der neuen relativistischen Quantendynamik von Dirac. Z. für Phys. 52, 853–868 (1929).
Reddy, H. K. N. Data descriptor: coherent soft Xray diffraction imaging of coliphage PR772 at the Linac coherent light source Background & Summary. Sci. Data 4, 170079 (2017).
Hosseinizadeh, A. et al. Conformational landscape of a virus by singleparticle Xray scattering. Nat. Methods 14, 877–881 (2017).
Grüner, F. et al. Design considerations for tabletop, laserbased VUV and Xray free electron lasers. Appl. Phys. B Lasers Opt. 86, 431–435 (2007).
Ischenko, A. A., Weber, P. M. & Dwayne Miller, R. J. Capturing chemistry in action with electrons: realization of atomically resolved reaction dynamics. Chem. Rev. 117, 11066–11124 (2017).
Miller, R. J. D. Ultrafast imaging of photochemical dynamics: roadmap to a new conceptual basis for chemistry. Faraday Discuss. 194, 777–828 (2016).
Hell, S. W. & Wichmann, J. Breaking the diffraction resolution limit by stimulated emission: stimulatedemissiondepletion fluorescence microscopy. Opt. Lett. 19, 780 (1994).
Balzarotti, F. et al. Nanometer resolution imaging and tracking of fluorescent molecules with minimal photon fluxes. Science 355, 606–612 (2016).
Acknowledgements
Financial support from the Deutsche Forschungsgemeinschaft (DFG) Grant No. SFB 755.B4 and helpful discussions with Russel Luke are gratefully acknowledged.
Author information
Authors and Affiliations
Contributions
B.v.A., M.M., H.G. conceived research, B.v.A. carried out research, B.v.A., M.M., H.G. wrote paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
von Ardenne, B., Mechelke, M. & Grubmüller, H. Structure determination from single molecule Xray scattering with three photons per image. Nat Commun 9, 2375 (2018). https://doi.org/10.1038/s41467018048304
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467018048304
Further reading

CryoEM, XFELs and the structure conundrum in structural biology
Nature Methods (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.