## Introduction

Recent advances in coherent diffraction imaging (CDI) and its variants have increased the spatial resolution of hard X-ray microscopy to the range of several tens to sub-ten nanometres, even for samples that are opaque to electron beams, thereby providing insight into the structural basis of material properties1,2,3 and the functioning of biological cells4,5,6. CDI is a lensless imaging technique that involves the reconstruction of the complex refractive index or electron density of an object projected along an incident beam from oversampled Fraunhofer diffraction patterns, obtained via spatially coherent illumination. Iterative phase retrieval (PR) algorithms7,8,9 are used for this reconstruction. In full-field CDI10,11, the well-defined shape of a sample object (termed support) is used as a constraint of the PR12,13, which is usually not satisfied for an extended object. Thus, the regions-of-interest (ROI) of an extended object is investigated by ptychography14,15, which is a scanning variant of CDI where a set of diffraction patterns is collected such that illumination areas (typically of micron or sub-micron size) are sufficiently overlapped16 to constrain image reconstruction. The three-dimensional structure of an object can be reconstructed by combining CDI with tomography2,3,4,5,6.

In-situ/operando microscopy provides a substantially deep understanding of material functions. In principle, ptychography is weak in terms of the visualisation of dynamic processes due to the consistency requirement for the overlapped illumination area. Furthermore, the temporal resolution of the scanning measurement is limited due to processes of motion control, detector readout, etc., and it typically amounts to 0.1–0.2 s17. Thus, the in-situ/operando ptychography of extended objects reported in the literature is limited to their visualisation after reaching a near-static state18,19,20. Most recently, fly-scan ptychography17,21 using highly sophisticated instrumentation with a fast control unit and large GPU computation system22 achieved an imaging rate of 1.2 s for a 3 × 3 μm2 field-of-view (FOV), which is equivalent to the upper limit of the detector frame rate at 3000 Hz. An alternative effort to realise faster dynamic imaging is the development of full-field CDI techniques for extended objects. One of the obstacles is weak diffraction fringes that are widely distributed around illumination beams, which break the finite support requirement for the full-field PR. A simple extension of the conventional full-field CDI to the local imaging of extended objects is the apodised-illumination CDI23, which employs a near-Gaussian beam generated with two pairs of Kirkpatrick–Baez mirrors and an apodising slit placed at the focal plane between the mirror pairs for local illumination. Another approach is keyhole-CDI24,25, also referred to as Fresnel-CDI, where an ROI is illuminated with a divergent beam after focusing, and a Fresnel diffraction pattern, including the in-line hologram, is collected using a detector with a small pixel size. A method employing dual beam illumination, where one of the beams illuminates a dynamic ROI and the other a static region that is used as a reference, was also proposed and demonstrated in the optical regime26.

In this article, we propose a full-field CDI method, which we call multiple-shot CDI, utilising projection illumination optics for fast visualisation of a dynamic ROI in an extended object. Projection illumination optics can illuminate an adjustable area with a nearly top-hat intensity distribution, yielding well-defined support for the PR constraint and uniform imaging quality across the FOV. For the PR of a dynamic ROI, we introduced the smoothness of the structural variation as a constraint, which dramatically improved the reconstructions. We constructed the proposed CDI system at BL24XU of SPring-8 and conducted both ptychographic imaging with a wider FOV and full-field imaging of a dynamic ROI. The numerical simulations and proof-of-concept experiment demonstrate the possibility of CDI of extended, time-evolving systems at a frame rate higher than that of ptychography: a spatio-temporal resolution beyond the current instrumentation limitations.

## Results

### Strategy

We assume that the structural change of an imaging target is triggered by external stimuli, including exposure to gases or solutions2, heating19, and voltage27. The strategy employed to visualise such dynamic structural changes, utilising both ptychography and multiple-shot CDI, is illustrated in Fig. 1. In each imaging scheme, the exit wave fields ψ of a target object are reconstructed from a set of diffraction patterns and then separated into an illumination probe P and an object transmission function O, i.e. absorption and phase images.

Because the current implementation of PR for multiple-shot CDI requires an accurate initial guess of the probe wave field, the probe function is first measured by ptychographic imaging of the static object. Candidate ROIs in the sample can also be searched by ptychography. Subsequently, the temporal evolution of the sample is initiated with an external stimulus, after which the sample is imaged by multiple-shot CDI, where multiple-frame acquisition is performed with full-field CDI at a fixed illumination area. The frame rate must be sufficiently high to capture structural changes in the sample, such that each frame is approximated to be static. In this scenario, it is reasonable to assume that reconstructed images vary smoothly frame-by-frame. Therefore, we apply the smoothness constraint along the temporal direction to the PR of multiple-shot CDI. The details of the PR algorithms are described later.

### Coherent projection illumination optics for finite area illumination

Because the PR of full-field CDI heavily depends on the oversampling requirement satisfied owing to the finite support12, it is important to produce an illumination probe with a well-defined illumination area23. Rather than a Gaussian or apodised probe, a top-hat probe is promising owing to its steep edge for defining the support and its almost flat intensity, yielding a uniform image quality in terms of the signal-to-noise ratio across the FOV. To generate a spatially coherent top-hat probe, we developed coherent projection illumination optics.

Figure 2a, b shows geometrical illustrations of the proposed projection illumination and conventional optics, respectively. The respective illumination probes, calculated by the wave optical simulation (Methods and Table 1) for each configuration, are compared in Fig. 2c, d. With the proposed optics (Fig. 2a), a spatially coherent top-hat probe is produced by forming a real image of the beam-defining aperture (BDA) onto the sample with demagnification. We realised this illumination optics by employing a Fresnel zone plate (FZP) in the off-axis configuration to separate the +1st-order image from the others28. The BDA and the sample are placed at the conjugate points for the +1st-order diffraction of the FZP according to the thin lens formula. Other orders of diffraction from the FZP and parasitic scattering from the upstream are blocked with the order-sorting aperture (OSA) placed just before the sample. It should be noted that, in the wave optical view, spatially coherent X-rays spread across the entire FZP area due to diffraction after passing through the BDA, and thus the numerical aperture of the FZP does not decrease. The wave optical simulation at a photon energy of 8 keV shown in Fig. 2c demonstrates that the proposed optics generates an approximate top-hat probe with a full-width at half-maximum (FWHM) intensity of 1.60 μm at a demagnification of ~12.

In contrast to the proposed optics, conventional illumination optics projects a light source at infinity from the FZP onto the focal point with infinite demagnification (Fig. 2b). Owing to the diffraction effect, the intensity of the main lobe at the off-centre decreases gradually, and the side lobe spreads beyond 3 μm, despite a similar FWHM intensity of 1.85 μm (Fig. 2d, e).

In principle, the off-axis configuration of the imaging lens induces quadratic phase curvature on the image plane29, resulting in the inclination of the wavefront. The phase inclination is corrected by shifting the origin of the Fourier plane from the centre of the first-order diffraction of FZP to the zero-angle diffraction of the BDA prior to supplying the diffraction patterns to the PR calculation (Supplementary Fig. 1). This procedure, i.e. the translation of Fourier components, corresponds to the correction of the linear phase offset in real space according to the Fourier shift theorem.

### PR in multiple-shot CDI of dynamic objects

To facilitate the convergence of the PR in the multiple-shot CDI of dynamic objects, we derived the PR algorithm with both the support constraint in the spatial domain and the smoothness constraint in the temporal domain, which we termed the multiple-frame ptychographic iterative engine (mf-PIE), based on the extended PIE8 (ePIE) for the mixed-state model30. In the multiple-shot CDI measurement of a dynamic object with a total frame length of N, the measured diffraction intensity at the n-th frame In is a temporal integration of the time-evolving diffraction intensity $$I\left( t \right) = \left| {{\cal{F}}\left[ {PO\left( t \right)} \right]} \right|^2$$ within the dwell time, written as

$$I_n = \int_{t_{n - 1}}^{t_n} I \left( t \right)dt = \int _{t_{n - 1}}^{t_n} \left| {{\cal{F}}\left[ {PO\left( t \right)} \right]} \right|^{2} dt$$
(1)

where P and $$O\left( t \right)$$ are the time-invariant illumination probe function and the time-evolving object transmission function at time t, respectively, and their product $${\Psi}\left( t \right) \equiv PO\left( t \right)$$ represents the exit wave. $${\cal{F}}$$ denotes the Fourier transform operator. Assuming that the structural change in the object during the acquisition is small relative to the spatial resolution, In is approximated to an incoherent sum of a small number of subframe diffraction intensities $$I_{n,l}$$:

$$I_{n} \approx \mathop {\sum }\limits_{l = - L}^{L} I_{n,l} = \mathop {\sum }\limits_{l = - L}^{L} \left| {{\cal{F}}\left[ {PO_{m + l}} \right]} \right|^{2}$$
(2)

where Omis the m-th frame of $$O\left( t \right)$$ discretised to a total frame length of M in the temporal domain. This leads to the following optimisation equation for the object function against the n-th diffraction measurement, which is of the same form as the equation derived for the ePIE of the mixed-state model30,31:

$$O_{m + l}^\prime = \mathop {{{\mathrm{argmin}}}}\limits_{O_{m + l}} \frac{1}{2}\mathop {\sum }\limits_{l = - L}^L \| PO_{m + l} - {\Psi}^\prime _n \| ^2$$
(3)

$${\Psi}^{\prime} _n$$ denotes a revised exit wave after applying the constraint of the measured diffraction intensity.

From the assumption of slow structural change relative to the acquisition frame rate, it is reasonable to assume that several adjacent subframes $$I_{n,l}$$ are almost the same. Thus, we introduced a virtual temporal overlap, illustrated in Fig. 3, as the temporal smoothness constraint. Here, m is set as $$m = \left( {L + 1} \right)n$$, and the object subframes $$O_{m - L}, ... ,O_{m - 1}$$, and $$O_{m + 1}, ... ,O_{m + L}$$ belonging to the intensity measurement $$I_n$$ are also subjected to the measured intensity constraint with regard to the adjacent intensity frames $$I_{n - 1}$$ and $$I_{n + 1}$$, respectively, such that the optimisations against the measured $$I_n$$ and $$I_{n - 1}$$ share the same object frames.

Incorporating the above consideration to the mixed-state ePIE, the update equation of the object function in mf-PIE was derived by the steepest descent of the objective function in Eq. (3):

$$O_{m + l}^\prime = O_{m + l}^{\,} + \alpha \frac{{\bar P}}{{\left| P \right|_{{\mathrm{max}}}^2}}\left( {{\Psi}^{\prime} _{m + l} - PO_{m + l}^{\,}} \right)$$
(4)
$${\Psi}^{\prime} _{m + l} = {\cal{F}}^{ - 1}\left( {\frac{{\sqrt {I_n} }}{{\sqrt {\mathop {\sum }\nolimits_{l = - L}^L \left| {{\cal{F}}\left[ {PO_{m + l}} \right]} \right|^2} }}{\cal{F}}\left[ {PO_{m + l}} \right]} \right)$$
(5)
$$m = \left( {L + 1} \right)n$$
(6)

where $$\alpha$$ is a feedback parameter, and $${\bar{\,\!}}$$ denotes the complex conjugate. The quantity of the object update in the second term of Eq. (4) is weighted with the normalised illumination intensity $$\left| P \right|^2/\left| P \right|_{{\mathrm{max}}}^2$$31. In the case of top-hat illumination, this is expected to lead to an effect similar to that of the support constraint for the full-field CDI7,12, as the illumination intensity outside the top-hat region decreases by more than three orders of magnitude (Fig. 2e). Thus, the object inside the top-hat region becomes the main contributor to the update. The virtual temporal overlap constraint is included as the denominator in Eq. (5), where the Fourier amplitude of the exit wave at the virtual subframe is shared in the update against the adjacent intensity frames $$I_{n - 1}$$ or $$I_{n + 1}$$. When the structural changes in the object are slow, the lower spatial frequency part of the Fourier amplitudes is almost the same between the adjacent subframes. This is analogous to the spatial overlap constraint in ptychography, where the overlapped part of the n-th object function $$O_n$$ is shared in the update of the others. However, the virtual temporal overlap shares only part of the object information, i.e. the Fourier amplitude, and thus the convergence of the mf-PIE is expected to be weaker than that of the ptychographic PR.

To increase the convergence of mf-PIE, we heuristically introduced a spatio-temporal smoothness constraint by applying a total-variation (TV) optimisation32,33 to the temporal stack of $$O_m^\prime$$, written as $$O^\prime$$, after the update for all measured diffraction frames. The spatio-temporal smoothness constraint solves the following minimisation problem:

$$O^{\prime\prime} = {\mathrm{prox}}_{\lambda ,{\mathrm{TV}}}\left( {O^\prime } \right) = \mathop {{{\mathrm{argmin}}}}\limits_{O^{\prime\prime} } \frac{1}{2}\|O^{\prime\prime} - O^\prime\|^2 + \lambda \| O^{\prime\prime}\| _{{\mathrm{TV}}}$$
(7)

where $$\|\ \|_{{\mathrm{TV}}}$$ denotes the TV norm in three-dimensions, i.e. two spatial and one temporal dimensions, and $$\lambda$$ is a regularisation parameter. We refer to this variant as mf-PIE-TV.

Although the mf-PIE and mf-PIE-TV seem to have the potential to visualise object structures at the subframe level, there is ambiguity regarding the time steps between each subframe. Thus, in this study, we averaged the reconstructed object subframes $$O_n = \mathop {\sum }\nolimits_{l = - L}^L O_{\left( {L + 1} \right)n + l}/(2L + 1)$$ to visualise the final reconstruction (Fig. 3).

### Numerical demonstration of multiple-shot CDI with Brownian motion systems

The feasibility of the proposed scheme was examined using a numerical simulation with Brownian motion systems. For example, the dynamics of colloidal particles are often used to indirectly probe the polymer dynamics related to the glass transition through X-ray photon correlation spectroscopy (XPCS)34. The XPCS is a powerful tool to investigate the dynamics including spatial heterogeneity35 up to the millisecond time scale. However, the derived information is a dynamic property averaged over an ensemble of particles in an illumination area, and it is difficult to directly investigate the distribution of the local mobility contributing to material properties at a high-spatial resolution. Thus, the complementary use of the proposed method with XPCS is beneficial for a seamless description of the spatio-temporally hierarchical structures of the materials.

Herein, as the model objects, colloidal gold particles with a diameter of 400 nm dispersed in aqueous glycerine solutions shown in Table 2 were prepared at a pixel size of 40 nm. The diffusion coefficients of the models were controlled using the glycerine concentration and temperature. A top-hat illumination beam with a diameter of ~4 μm, a photon energy of 8 keV, and a photon flux of ~8 × 109 photons s−1 at the sample was produced using the proposed optics applying the configuration shown in Table 1, and sets of time-evolving diffraction patterns were calculated for each model at a frame rate of 100 Hz with a total time of 10 s (1000 frames). The blurring effect on the diffraction patterns owing to the motion of particles during each exposure was considered (Methods).

We examined the effectiveness of the localised illumination and the proposed PR using the prepared model with a root mean square displacement (RMSD) of 0.16 pixels per frame (Fig. 4a). The localisation of the illumination was examined through image reconstruction from a single diffraction intensity frame by applying a conventional full-field PR for isolated objects consisting of hybrid input-output7 (HIO) and shrink-wrap13 (SW) algorithms. At the edge of the diffraction pattern of the model (Fig. 4b), where the spatial frequency is 12.5 μm−1 (a full-period spatial resolution of 80 nm), ~10 or fewer photons were observed in each pixel. The best and average of the 50 highest reconstructions out of 1000 independent trials are shown in Fig. 4c, d. Whereas the single reconstruction does not provide clear particle images, the shape, position, and overlap of each particle in the model are reproduced well in the phase image of the averaged reconstruction, which suggests that the support constraint worked as expected. The absorption image is poor owing to the weak absorption of the particles. The Fourier ring correlation (FRC)36 of the averaged reconstruction and the original model indicates an effective full-period spatial resolution of 362 nm (Fig. 4e). The deterioration of the reconstruction is probably due to the difficulty in the PR of complex-valued objects using the HIO with the support constraint alone37,38.

The time-series of the object image frames (Fig. 5a) was reconstructed using the mf-PIE and mf-PIE-TV described above as well as the conventional ePIE algorithm, which was included for comparison. Among the reconstructions using the three PR algorithms, the best one was achieved by mf-PIE-TV, as shown in Fig. 5b and Supplementary Movie 1. Both the phase and absorption images finely reproduce the random motion in the original model, although several frames include low-frequency wavy features. According to the temporal-averaged FRC shown in Fig. 5e, the effective full-period spatial resolution was improved to higher than 80 nm, which was the maximum resolution attainable with the diffraction data supplied. The mf-PIE also provided the time-evolving phase images similar to or slightly clearer than those by the HIO-SW even without averaging independent trials, as demonstrated by visual inspection and FRC analysis (Fig. 5c, e). These results indicate an improvement in convergence by the temporal smoothness constraint in the mf-PIE and mf-PIE-TV. The reconstruction by ePIE (Fig. 5d) makes it difficult to identify each particle because of the strong modulation.

### Tolerable amount of structural variation for multiple-shot CDI and ptychography

We further investigated the effect of the amount of structural variation on the image reconstruction of multiple-shot CDI using mf-PIE-TV and compared it with ptychography using a conventional ePIE. The Brownian motion systems listed in Table 2 were used as object models. Multiple-shot CDI datasets collected at a frame rate of 100 Hz (10 ms exposure per frame) were prepared as described in the previous section and in the Methods section. The RMSDs of the colloidal gold particles in each dataset ranged from 0.16 to 3.22 pixels per frame, where the largest RMSD was ~30% of the particle size (Table 2). The ptychographic datasets of each model in the middle of the time-series were prepared through a 2 × 2 raster grid scan with an overlap ratio16 of 87.5% and 2 ms exposure per point, corresponding to an 8-ms exposure per frame. The overhead time for the raster scan was ignored.

The accumulated diffraction patterns and averaged real-space images over each frame time of the multiple-shot CDI are shown in Fig. 6a–d. The images averaged in a real space are those expected to be reconstructed. In the diffraction patterns with RMSDs larger than 0.64 pixels per frame, the blurring effect owing to a structural variation becomes clear. Nevertheless, the phase and absorption images reconstructed using the mf-PIE-TV show good agreement with those of the models up to RMSDs of 1.52 pixels per frame (Fig. 6b–d and Supplementary Movie 1). The shape of the particles in the reconstruction with an RMSD of 3.22 pixels per frame is blurred; however, the positions and contrasts of the particles owing to the overlap are mostly reproduced. The temporal-averaged FRCs between each reconstruction and model (Fig. 6e) indicate that the deterioration of the reconstructions owing to the increase in the amount of the motions is relatively smaller than that of the ptychography shown below.

The model and reconstruction results of the ptychography are shown in Fig. 7. In contrast to multiple-shot CDI, the diffraction patterns in the ptychography (Fig. 7a) are less blurred, although their signal-to-noise ratio is lower because of the fivefold shorter exposure time for each scan point in the ptychography used to achieve a similar frame rate as that of multiple-shot CDI. The reconstructed phase images with RMSDs of up to 0.64 pixels per frame (Fig. 7b) reproduced the original model, but with more blurring than those of the multiple-shot CDI. The temporally averaged FRCs also show a decrease in the spatial resolution (Fig. 7e). The arrangement of several particles in the reconstructed phase images with larger RMSDs are different from the original (Fig. 7c), and those of 3.22 pixels per frame were no longer interpretable (Fig. 7d). It was difficult to identify the particles in the amplitude images.

The results described above indicate that multiple-shot CDI is more robust than ptychography regarding the large structural variation of the objects, where the spatial overlap constraint in the ptychography is broken. In addition, multiple-shot CDI has the capability to provide dynamic images with a natural contrast, similar to those of real-space imaging, despite the temporal accumulation of data being applied with the squared amplitude in a reciprocal space.

### Effect of illumination uniformity on PR convergence in single- and multiple-shot CDI

Considering the real-space update in the single- and multiple-shot PR, the uniform, well-defined local illumination demonstrated by the projection illumination optics is expected to have a superior convergence compared with Gaussian or apodised beams because dim peripheries of those beams yield ambiguous illumination boundaries. To assess the effect of illumination uniformity on the convergence of single- and multiple-shot PR, we performed a numerical simulation of a multiple-shot CDI of the Brownian motion system with apodised illumination. The Brownian motion model with the smallest RMSD (Table 2) shown in Figs. 45 was used. The apodised beam was generated by cropping the Airy disk of the conventional focused beam shown in Fig. 2d, influenced by the method proposed in the literature23. The diameter of the cropped Airy disk was comparable to that of the top-hat illumination used in the above numerical simulations.

A typical diffraction pattern of the model calculated with the apodised beam is shown in Fig. 8a. Because the effective numerical aperture of the FZP was reduced with the BDA to generate the wide probe, the bright-field region on the pattern was smaller than that of the projection illumination optics. The complex object functions were reconstructed by both the HIO-SW single-shot PR (Fig. 8b–c) and mf-PIE-TV multiple-shot PR (Fig. 8d). The single-shot PR yielded a reconstruction similar to that of the top-hat probe, but the spatial resolution was further reduced to 495 nm (Fig. 8e) because of the difficulty in estimating the correct support for a gradual boundary. The reconstruction by the multiple-shot PR also reproduced the expected arrangement of the particles, but the contrast was significantly decreased. In particular, the periphery of the FOV was blurred compared to the centre, which was expected from the object update Eq. (4); the equation was designed to make the amount of the object update in the dim peripheral region smaller than that in the bright region. Further iterations could not improve the reconstruction, which suggests that the non-uniformity of the amount of the object update prevents convergence in mf-PIE-TV. These results demonstrate the superior properties of the top-hat probe in both single- and multiple-shot PR. Therefore, the projection illumination optics is indispensable for the proposed dynamic imaging scheme.

### Experimental demonstration of ptychography with projection illumination optics

The illumination probe produced by the proposed optics and the ptychographic imaging capability employing this probe were experimentally examined at BL24XU of SPring-839,40 with the experimental setup described in Methods and Table 1. A tantalum X-ray resolution chart with a thickness of 500 nm was used as the sample. The experiment was performed at a photon energy of 8.000 keV. The BDA with a diameter of 20 μm extracted spatially coherent X-rays, which were then projected onto the sample at a demagnification of ~12 by the +1st-order diffraction of the FZP. Fraunhofer diffraction patterns of the sample were collected 3.2 m downstream from the sample.

Figure 9a shows the bright-field region of a diffraction pattern obtained without a sample, which corresponds to the diffraction pattern of the BDA cropped by the aperture of the FZP. The region spreads across the spatial frequencies of 5.4 μm−1 with an offset of 2.6 μm−1 along the vertical (Sz) direction in the reciprocal space and displays a wide dynamic range of 105.8 photons pixel−1 s−1. Because the count rate was near the upper limit of the detection linearity of the detector, the illumination photon flux at the sample was decreased to ~3 × 107 photons s−1 (Methods). In future studies, the effective dynamic range will be extended by placing a semi-transparent beam stopper41 to cover the intense Airy disc of the BDA. The width of the diffraction fringes spanned four or five pixels, reflecting the size of the beam at the image plane, i.e. the “just-focus” plane of the projection illumination optics. When the sample was placed at the just-focus plane, coherent diffraction patterns with interference speckles of sizes similar to the diffraction fringes of the BDA were observed. A typical diffraction pattern after the focus adjustment (Methods) is shown in Fig. 9b, which was collected by a ptychographic measurement with an exposure time of 1 s.

The ptychographic dataset collected as shown in Table 1, including the diffraction pattern shown in Fig. 9b, was subjected to the ptychographic PR to reconstruct the object image and the probe wave field. Spatial frequencies up to 25 μm−1, which correspond to the real-space pixel size of 20 nm, were used for the PR. In the object image shown in Fig. 9c, the finest feature of 50-nm-wide lines and spaces is clearly resolved with quantitative contrast. The phase retrieval transfer function (PRTF) analysis42, which provides a measure of phase convergence, also indicates an effective full-period resolution of 46 nm. The reconstructed illumination probe in Fig. 9d displays a top-hat amplitude profile and a slightly concave wavefront, as found in the wave optical simulation (Fig. 2c), with a FWHM intensity of 1.5 μm, which is in good agreement with the expectation. At the same spatial resolution of the object image, all probe features are larger than the resolving power of the employed FZP. The intensity outside the top-hat region was suppressed to 3–4 orders of magnitude lower than the maximum intensity (Fig. 9e). These results indicate the production of an almost ideal probe with the proposed projection illumination optics.

### Proof-of-concept experiment of multiple-shot CDI with a moving object

The aim of the development of the full-field CDI technique in this study is the in-situ visualisation of structural dynamics, rather than the observation of static objects. As a proof-of-concept experiment, we demonstrated the imaging of a moving object using multiple-shot CDI (Fig. 10). The resolution chart shown in Fig. 9c was used as the imaging target sample, which was continuously moved at a speed of 125 nm s−1 against the beam. During the movement, diffraction patterns from the illuminated ROI were continuously recorded at a 10 Hz frame rate for 200 s (2000 frames). Representative diffraction patterns of the dynamic dataset are shown in Fig. 10a. In every diffraction pattern, at a 0.1 s exposure, at least approximately five photons were detected up to spatial frequencies of ~10 μm−1. Thus, spatial frequencies up to 11.8 μm−1 were employed for the PR, which corresponds to the real space pixel size of 43 nm. The moving speed of the object was 12.5 nm per frame or 0.3 pixels per frame corresponding to ~25% of the finest feature of the object. A decrease in the visibility of the speckles in the diffraction patterns was not evident.

The time-series of the object image frames was reconstructed using mf-PIE-TV. The average of 100 independent reconstructions is shown in Fig. 10b, c, and Supplementary Movie 2, where each object frame reproduces the object image reconstructed through the ptychography well. Most of the object frames could also be reconstructed by the single-shot PR (see Supplementary Fig. 2), which supports the theory that the sample was effectively illuminated by the localised probe. However, the frames of the single-shot reconstruction are of a lower quality, as in the numerical simulation. The FOV of the object frames had a diameter of ~2 μm, which represented an illumination area above ~0.1% maximum intensity (Fig. 9e). In the object frames shown in Fig. 10b, the fabricated structures with a width of 50–100 nm could be partially resolved, although the contrast in some object frames was decreased and modulated. The smoothness of the reconstructed object along the temporal direction can be examined using the spatio-temporal images shown in Fig. 10c. In most parts of the image, the finest features and sharp edges of the wider lines were also resolved. However, phase images exhibited frame-by-frame phase offset and shadow-like artefacts around the fabricated lines. In the object frames, a shadow-like artefact appeared at a twofold symmetrical position of the real image and appeared to move in the opposite direction (Supplementary Movie 2). This suggests that the shadow-like artefact originated from a virtual image arising in each frame, which has an opposite phase shift and twofold symmetry to the real image. This is the reason behind the phase offset due to the intrinsic ambiguity of the phase origin in the phase problem.

In both object frames and spatio-temporal images, the phase and absorption images displayed a contrast similar to that of the ptychographic images, which indicates the possibility of quantitative dynamic imaging. We conducted an FRC analysis between the 0 and 180th object frames (at 0.0–18.0 s) in the multiple-shot CDI and the corresponding area in the high-spatial-resolution object image reconstructed through the ptychography shown in Fig. 9c. The temporal-averaged FRC (Fig. 11) indicates the consistency of the reconstructions up to a full-period spatial resolution of 158 nm, supporting the visualisation of a ~79 nm line and space pairs. Because the rest of the object frames in a multiple-shot CDI were out of the FOV of the ptychographic image, we also calculated the PRTF profile to assess the convergence of the PR (Fig. 11), yielding a similar spatial resolution of 128 nm.

## Discussion

We proposed CDI with projection illumination optics and experimentally demonstrated two imaging schemes in the hard X-ray regime, namely, the ptychography of static objects and multiple-shot CDI of a dynamic object at a frame rate of 10 Hz. The numerical simulation at a frame rate of 100 Hz demonstrated a more robust capability of multiple-shot CDI for visualisation of fast structural variation than ptychography. Apart from dynamic processes that are too fast to image by ptychography, multiple-shot CDI would be applicable to in-situ/operando observations, for example, in a heating or fluid environment, where a notable drift of objects often occurs. Another possible application is the qualitative imaging of chemical state changes in batteries or exhaust gas catalysts. This can be achieved by setting the photon energy of the illumination X-ray to the absorption edge of a specific element. In the latter application, the reconstruction of a high-quality absorption image is indispensable, despite the low contrast in the hard X-ray regime. The PR employing the smoothness constraints introduced in this study is promising for this purpose, as indicated.

The unique features of the proposed optics are as follows: (1) the top-hat beam realises nearly uniform illumination across the FOV, which improves both the convergence of the single- and multiple-shot PR and the contrast at the periphery of the reconstructed images, leading to a superior performance to that obtained using a Gaussian or apodised beam. (2) The proposed optics allows a longer working distance in comparison to simple pinhole optics, as there is no need to place the BDA almost in contact with a sample. This is beneficial for the setup of devices to control the sample environment. (3) The illumination size can be adjusted by varying the demagnification ratio, which is achieved by rearranging the BDA, FZP, and the sample according to the thin lens formula (Fig. 2a). The upper size of the illumination, i.e. the FOV of the multiple-shot CDI, is limited by the oversampling requirement for diffraction patterns. The demagnification optics is beneficial because we can employ a BDA larger than the upper limit of the illumination size for the efficient use of the coherent flux without breaking the oversampling requirement. It is probably possible to extend the area with spatially coherent illumination using the proposed optics in a magnification setup, even though the flux density at the position of the sample decreases; this will be achieved using future light sources. In this case, the instability of the illumination beam due to drifts in the optical system over time will also be emphasised; thus, significant efforts will be required to stabilise the measurement system and compensate for the illumination position drift22,43,44. (4) As in lithography, the illumination pattern can be designed by employing an absorption or phase mask instead of the BDA. For example, the use of a dual pinhole as a mask allows the implementation of in situ CDI26 and holographic techniques, such as Fourier transform holography45 and double-blind holography46, which will aid in the phasing of multiple-shot CDI. Demagnification optics will mitigate difficulties by fabricating such masks.

The multiple-shot CDI with the proposed optics experimentally demonstrated the dynamic imaging of the extended object at 10 Hz. A possible reason for the weak convergence of the experimental demonstration is the low signal-to-noise ratio of the diffraction dataset owing to the short exposure time (Table 1). Errors in the probe supplied to the PR owing to the ptychographic reconstruction and/or temporal variation of the probe after the ptychographic measurement may also affect the results. Improvements in the PR for multiple-shot CDI regarding the implementation of the noise model following the Poisson photon-counting statistics and the simultaneous recovery of the illumination probe will further enhance the imaging quality and spatial resolution. The use of the apodised zone plate47 will probably diminish the weak fringes remaining around the probe (Fig. 9), which improves the strictness of the support constraint. However, it is known that PR with only the support constraint is unstable because of its inherent ambiguity regarding defocus and twin-image artefacts37,38,48. The robustness of ptychographic PR originates from the consistency constraint in the spatial domain. Similarly, it was demonstrated that the consistency constraint in the time domain, i.e., the existence of a time-invariant structure in the FOV, improves the convergence of the full-field PR26. Although this is difficult to apply to targets in general, metal structures fabricated on a sample substrate or directly on a target11 may be used as such a constraint. Structures with large scattering cross-sections will also contribute to enhancing the diffraction signals of weakly scattering or radiation-sensitive targets49,50,51. More generally, it is reasonable to assume that the target structure will change smoothly, frame-by-frame, as in the present study. Although the temporal smoothness constraint does not seem to be as strong as the spatial overlap in ptychography, we expect that advanced and well-designed algorithms based on the modern optimisation theory will make multiple-shot CDI more robust and reliable. For example, the alternating direction method of multiplier-based ptychographic PR achieved the blind removal of a common structured background with simultaneous object and probe reconstruction52. Furthermore, the effectiveness of such modern optimisation algorithms implementing TV regularisation has been demonstrated in a wide variety of imaging applications, including tomographic reconstruction from imperfect datasets53,54.

The upcoming next-generation synchrotron radiation source will provide a coherent X-ray probe with a photon flux that is orders of magnitude higher55, which will further extend the application of CDI to time-evolving systems. We expect the proposed approach to contribute to the understanding of local nanodynamics in heterogeneous systems.

## Methods

### Wave optical simulation

The wave optical simulation of the illumination optics was performed using the angular spectrum method29,56. A plane wave light source at a photon energy of 8.000 keV (wavelength of 0.155 nm) was assumed. The BDA and OSA both with a diameter of 20 μm, negative FZP, and sample plane were arranged according to the geometry shown in Fig. 2 and Table 1. The FZP specification is as follows: it is made of 700-nm-thick tantalum and has an outer diameter of 416 μm; the number of zones is 1200, with the outermost zone having a width of 86 nm. The complex transmission function of the FZP was calculated from a complex refractive index taken from an open database57. The incident two-dimensional complex wave field, sampled with 32,768 × 32,768 points at an interval of 21 nm, was successively propagated from the BDA to the sample plane using the angular spectrum method. The sampling and interval were determined to avoid the aliasing effect.

### Numerical simulation of Brownian motion system

As Brownian motion model systems, we simulated the suspensions of spherical gold particles in an aqueous glycerine solution with a thickness of 2 μm. The diameter $$d$$ and volume ratio of the particles were 400 nm and 5%v/v, respectively. The mass concentrations of glycerine $$C$$ and temperatures $$T$$ of the systems were adjusted as shown in Table 2 to control the RMSDs of the particles. The RMSD is given as $$\sqrt {2D{\Delta}t}$$, where $$D = k_{\mathrm{B}}T/3\pi \eta \left( {C,T} \right)d$$ is the diffusion coefficient, $$k_{\mathrm{B}}$$ is the Boltzmann constant, $$\eta \left( {C,T} \right)$$ is the viscosity, and $${\Delta}t$$ is a time step of the time-evolving calculations58. The value of $$\eta \left( {C,T} \right)$$ for each model was calculated using an empirical formula in the literature59. $${\Delta}t$$ was set to a tenfold finer time step than the frame rate of the imaging, i.e. $${\Delta}t$$ = 1 ms. The time evolution of the objects was calculated by translating each particle using shift-vectors with components following normal distribution with a standard deviation equal to the RMSD. The complex transmission functions of the simulated objects were calculated using complex refractive indices from the open database57. The diffraction intensity datasets of the multiple-shot CDI and ptychography with dwell times of 10 and 2 ms, respectively, were then calculated according to Eq. (1) in the Results section using a probe function with a 4-μm diameter simulated using the geometry shown in Table 1. In the ptychography, 2 (H) × 2 (V) points on the time-evolving objects were raster-scanned with a scan step of 500 nm. Photon-counting noise, following the Poisson statistics, was added to the datasets. The simulated diffraction data of the multiple-shot CDI for the model at a glycerine concentration and temperature of 90%w/v and 283 K are provided in Supplementary Data 1.

### CDI system with projection illumination optics

For the experimental demonstration, the CDI system with projection illumination optics was constructed at an imaging station of the Hyogo ID beamline BL24XU at SPring-839,40 based on a previously reported system60.

The fundamental harmonic at a photon energy of 8.000 keV (wavelength of 0.155 nm) from a figure-8 undulator was extracted using a liquid-nitrogen-cooled silicon (111) double-crystal monochromator (DCM) immediately after transport channel (TC) slit 1 with a 500 μm (H) × 500 μm (W) opening. The third harmonic was eliminated by both the slight detuning of the parallelism of the two DCM crystals and the optical filtering property of the FZP61 equipped in the illumination optics. The opening of TC slit 2, placed at 44.1 m from the light source, was adjusted to ~ 50 μm (H) × 40 μm (W) to increase spatial coherence along the horizontal direction and also to reduce the photon flux below the linearity threshold of the detector.

The projection illumination optics unit shown in Fig. 2a was placed such that the BDA was at a distance of 20.9 m from TC slit 2. The BDA and the OSA, both with a diameter of 20 μm, made of 50-μm-thick tungsten and the FZP with the same specifications as in the wave optical simulation (NTT-AT, Japan), were arranged according to the parameters listed in Table 1. The samples were mounted on an open-air computed tomography (CT) stage60 placed at the just-focus plane of the FZP. The CT stage was composed of stepping motor stages with thermal insulation (Kohzu Precision Co. Ltd, Japan) to reduce thermal drift during measurement. Fraunhofer diffraction patterns of the samples were collected with a photon-counting pixel-array detector EIGER X 1 M (Dectris Ltd, Switzerland) placed 3.29 m downstream from the samples. Flight tubes filled with helium were placed in the X-ray path between the BDA and the FZP and between the sample and the detector to reduce X-ray scattering and attenuation by air. A thin polyimide film with a thickness of 8 μm and a hole with a diameter of 3 mm at the centre was used for the window of the flight tube facing the detector. The direct beam passed through the hole to prevent intense background scattering from the film. Silicon nitride windows (Norcada Inc., Canada) were used for the other windows to further reduce background scattering. The photon flux at the sample position was adjusted to ~3 × 107 photons s−1, as described above.

### Ptychography and multiple-shot CDI experiments

A tantalum X-ray resolution chart with a thickness of 500 nm (NTT-AT, Japan) was used as the sample in the CDI experiment. It was mounted on the CT stage and positioned at the just-focus plane. The just-focus plane was confirmed via the numerical wave propagation of an illumination wave field, which was measured by ptychographic imaging of the mounted sample. In the ptychographic measurements, sets of diffraction patterns were collected from 10 (H) × 10 (V) grid points on the ROIs of the sample by raster-scanning the sample against the beam with a step width of 250 nm and exposure time of 1 s. In the multiple-shot CDI measurement, the sample was horizontally moved on the just-focus plane at a speed of 125 nm s−1, and the diffraction pattern was continuously acquired at a frame rate of 10 Hz during the movement. The read-out time for each frame was 7.5 μs; thus, it was negligible.

### Phase retrieval

Image reconstruction from the diffraction data sets was performed using HIO-SW-ER software implementing the HIO algorithm and “Ptycho no Tatsujin” software implementing PIE-based PR algorithms coded by Y.T. in Python3. For the TV optimisation in mf-PIE-TV, Chambolle’s algorithm33 implemented in the scikit-image library63 was used. The simplified code of the mf-PIE-TV is provided as Supplementary Software 1.

The PR of the single diffraction pattern was performed using a combination of the HIO PR7 and the SW shape estimation algorithm13, followed by the error-reduction algorithm7 for better convergence51,60. In a single trial, 100 HIO iterations followed by one SW calculation were repeated 100 times (10,000 HIO iterations in total), and 1000 ER iterations were applied. We conducted 1000 independent trials starting from a complex array of random numbers, and the 50 most probable reconstructions were selected according to the literature62 based on the Manhattan distances between every reconstruction pair. Using a principal component analysis, we confirmed that these reconstructions were distributed around the centroid of the largest cluster in the solution space62,64. The most probable reconstructions were averaged to produce a reliable exit wave, which was then divided by the ground truth of the probe function to produce the reconstructed object function.

The ptychographic images were reconstructed using the ePIE algorithm incorporating the illumination position correction. The ePIE iteration was repeated 2000 times in the numerical simulation and 1000 times in the experimental demonstration, where a complex array of random numbers was used as the initial guess of the object function, and a circular top-hat function with a diameter equal to a designated value was used for the probe. The feedback parameter for the object function $$\alpha$$ was fixed to 0.9, whereas that for the probe function $$\beta$$ at the $$k$$-th iteration decreased according to $$\beta ^k = \beta ^0\sqrt {\left( {K - k + 1} \right)/K}$$, where $$\beta ^0$$ was set to 0.09, and K denotes the total number of iterations65. In the numerical simulation, the probe function was fixed to the ground truth. The pre-registered illumination positions were supplied to the PR of the experimental demonstration, and at each iteration were iteratively corrected with a sub-pixel precision by the conjugate gradient method66, by employing an adaptive step size proposed in the literature67.

The PR of the multiple-shot CDI was performed using mf-PIE and mf-PIE-TV, as described in the Results section, and ePIE was used for comparison. A complex array of random numbers was used as the initial guess of the object function to prevent stagnation around the initial conditions. The probe function was fixed to the illumination probe reconstructed by ptychography to ensure convergence. In a single mf-PIE iteration, the update for each measured diffraction frame is performed in a random sequence, and the virtually overlapping object frame $$O_{m + l}^\prime$$ is used for updating the other frame $$O_{m^\prime + l^\prime }$$ as the temporal smoothness constraint. The object feedback parameter $$\alpha$$ was set to 0.9 and 0.45 for the non-overlapped and virtually overlapped frames (Fig. 3), respectively, to account for the update frequency. The number of overlapping frames in the temporal domain $$L$$ was set to one. A single trial of the mf-PIE and ePIE comprised 2000 iterations. In the mf-PIE-TV, the first 50 iterations were performed without the TV optimisation for preconditioning, and the TV optimisation was subsequently applied to the following 300 iterations in the numerical simulation or 150 in the experimental demonstration with a regularisation weight $$\lambda$$ of 0.01, followed by 50 iterations again without optimisation for convergence. During the experiment, the object functions of 100 independent trials were averaged to calculate the PRTF function. In mf-PIE and mf-PIE-TV, the final reconstruction was further subjected to subframe averaging, as described in the Results section and Fig. 3.

The consistency of the reconstructions with the observed diffraction datasets during PR iterations was monitored with the $$R_F$$ factor, as defined below68:

$$R_F^k = \frac{{\mathop {\sum }\nolimits_n \mathop {\sum }\nolimits_{\boldsymbol{S}} \left| {\left| {{\cal{F}}\left[ {{\Psi}_n^k\left( {\boldsymbol{r}} \right)} \right]} \right| - \sqrt {I_n\left( {\boldsymbol{S}} \right)} } \right|}}{{\mathop {\sum }\nolimits_n \mathop {\sum }\nolimits_{\boldsymbol{S}} \left| {\sqrt {I_n\left( {\boldsymbol{S}} \right)} } \right|}}$$
(8)

The effective spatial resolution of final reconstructions was estimated by an FRC analysis36 with the ground truths and defined as the inverse of the spatial frequency where FRC dropped below 0.536. When the ground truth is unavailable, the effective spatial resolution is estimated from the convergence of the PR calculated through a PRTF analysis42,64 instead of using the FRC. $$R_F$$ factors and effective spatial resolutions of the final reconstructions are summarised in Tables 1 and 2.