Three-modal X-Ray Computed Tomography (CT) delivers volumetric images in three (largely complementary) contrast channels: attenuation, phase (refraction), and small-angle scattering. Compared to conventional CT, which produces attenuation images only, the availability of the additional two channels is a key strength: phase (refraction) can show weakly attenuating details with a higher contrast-to-noise ratio (CNR) than the attenuation channel1, and scattering is sensitive to sample inhomogeneities below the imaging system’s resolution2, providing information that would otherwise be inaccessible. Owing to these advantages, three-modal CT is showing great promise for use in, e.g., biomedicine3,4,5 and materials science6.

Different technical solutions for three-modal CT exist7,8,9. Many of them use a modulator in the x-ray beam, typically placed upstream of the sample, to create a reference pattern that is then perturbed by the sample’s attenuation, refraction, and scattering. Some techniques also employ some form of analyzer (a crystal, grating, or mask), placed downstream of the sample, to detect those disturbances. Generally, the three contrast channels are not measured directly but must be retrieved from the raw images10 before further processing, e.g., CT reconstruction, can take place. When an analyzer is used, the retrieval requires a minimum of three images (frames) to be acquired with the analyzer in different positions, to obtain three pixelwise equations that can be solved for the three unknowns (attenuation, refraction, scattering)11,12,13,14,15. In CT, the need to acquire n ≥ 3 frames per rotation angle leads to extended scan times, as the necessary repositioning of the analyzer enforces a step-and-shoot acquisition which incurs overheads. In other words, flyscans, characterised by a continuous sample rotation, cannot be implemented. This is problematic because the overheads impose a scan time bottleneck that cannot be overcome even when reducing the exposure time, and therefore it limits the throughput capabilities of three-modal CT as well as options for time-resolved imaging.

More recently, methods have emerged that allow retrieving three-modal x-ray images from a single frame16,17,18,19,20,21,22,23,24,25. In these, the analyzer is replaced by a high-resolution detector, featuring pixels small enough to resolve the reference pattern created by the modulator, and therefore allowing to extract the changes caused by the sample’s attenuation, refraction, and scattering without acquiring and processing multiple frames. One of these methods is “beam tracking”, an x-ray implementation of the Hartmann mask26, in which the modulator is a mask with alternating absorbing and transmitting septa, generating beamlets with a spatial separation sufficient to make their mutual interference negligible. In this scenario, the sample’s attenuation, refraction, and scattering lead to an intensity reduction, shift, and broadening of the beamlets, respectively. If a mask with long, narrow slits is used22,23, beam tracking is sensitive to phase effects (refraction, scattering) in the direction orthogonal to the slits, while two-dimensional phase sensitivity is achieved with a mask with square or round apertures (“holes”)24,25.

A drawback of beam tracking is that the parts of the sample covered by the absorbing areas of the mask cannot contribute to the image. Previously, this was handled by applying a “dithering” procedure, by which the sample is scanned in steps smaller than the mask period until all its parts have been illuminated by the x-ray beam, a frame is acquired at each step, and all frames are combined into an up-sampled image. When using a mask with slits, a one-directional scan is needed (in the direction orthogonal to the slits); when using a mask with holes, the sample must be scanned along two directions to obtain a fully sampled image. Although effective, this process is not compatible with flyscans when applied in CT, as dithering must be applied at each angle, forcing corresponding interruptions in the sample’s rotation. On the other hand, if no dithering is applied (i.e., by only rotating the sample in the beamlet array), data are heavily under-sampled, which has a detrimental effect on overall image quality and spatial resolution, with the latter then being driven by the mask period rather than by the apertures.

We have previously proposed cycloidal CT as an alternative to dithering for a setup where the modulator is a mask with 1D long, narrow slits27,28. In a cycloidal scan, the sample is translated in the direction orthogonal to the rotation axis, continuously and simultaneously with being rotated. This was shown to provide a resolution comparable to dithering, while also being flyscan compatible, as it involves the acquisition of only a single frame per angle (contrary to dithering, where multiple frames are acquired). So far, cycloidal CT had only been used with “edge illumination”29, a phase imaging technique that also uses a mask as the modulator but requires a geometrically magnified replica of the mask as an analyzer, placed in front of a detector with pixels much larger than the beamlets. The need to acquire three frames per angle, each with a different offset between the masks, to retrieve attenuation, refraction, and scattering signals with that technique30 has so far prevented us from performing three-modal flyscans, despite the cycloidal approach’s flyscan compatibility. Rather, the flyscan CT data we have acquired up to this point show a single “hybrid” contrast channel (a merge of attenuation and refraction), enabled by a retrieval method that relies on approximations and assumptions on the sample31.

With this article, we build on the introduction of cycloidal CT by proposing a “cycloidal-spiral” acquisition scheme, by which the sample is translated along two directions, continuously and simultaneously with its rotation. To facilitate the acquisition of three-modal CT images through flyscans, we have combined this scheme with beam tracking, rather than edge illumination. This is implemented using a mask with holes, which provides the added benefit of two-dimensional phase sensitivity. In the following, we report on the development and optimization of the approach on a phantom (425–500 μm diameter polyethylene spheres in a 3 mm diameter plastic straw) and provide a proof-of-concept demonstration of its flyscan compatibility on a complex biological sample (a piglet esophagus).

Results and discussion

Beam tracking and sampling schemes

The beam tracking setup is shown in Fig. 1. It consists of the x-ray source, the mask (placed upstream to the sample), the sample, and the detector (see “Methods” section). As mentioned, such a setup would normally be used in conjunction with dithering, although simple rotation-only scans are also possible. The former provides fully sampled data but is slow as the sample must be scanned in sub-period steps, Δxopt along the horizontal and Δyopt along the vertical direction, at each angle (the dithering steps typically match the size of the apertures and define the pixel size, as well as the slice thickness, in the reconstructed images); the latter are fast and flyscan compatible, but image quality is compromised and resolution driven by the mask period. Figure 2a,b show the sampling grids for the two approaches, depicted for one mask period (but repeating itself for adjacent periods in either direction). The filled squares indicate the sub-period “slots” of the 3D dataset for which data are acquired (i.e., where a beamlet passes through the sample), while the empty squares indicate points where no data are available (i.e., corresponding to sample areas covered by the absorbing regions of the mask).

Figure 1
figure 1

Schematic (not to scale) of the experimental setup.

Figure 2
figure 2

Sampling grids (shown only for a single mask period in the horizontal, px, and vertical, py, direction) for a dithered scan (a), a rotation-only scan (b), and a cycloidal-spiral scan (c). In the former case, the sample is scanned in horizontal and vertical steps of Δxopt = px/8 and Δyopt = py/8, respectively, at each rotation angle. In the latter case, the sample is displaced by horizontal and vertical (sub-period) distances of dx = Δxopt and dy = 4Δyopt, simultaneously with being rotated to the next angle.

The proposed cycloidal-spiral scan shares similarities with a rotation-only scan in that only a single frame is acquired per angle. However, with each rotation, the sample is displaced by horizontal and vertical distances of dx and dy, hence a beamlet probes the sample in a different sub-period position, which creates an interlaced sampling pattern like the one shown in Fig. 2c. The schematic shows an example case where dx = Δxopt and dy = 4Δyopt. More generally, dx and dy can be flexibly chosen, so long as they are fractions of the mask period.

Cycloidal-spiral optimization

To determine which values of sample displacement per rotation angle (dx and dy) would deliver the best results, we initially acquired a dithered dataset of the sphere phantom (scan duration of 7.5 h) and sub-sampled it according to cycloidal-spiral patterns by assuming different combinations of dx and dy, as per Table 1; see “Methods” section. Note that the parameters dx = dy = 0 represent a rotation-only scan, which is therefore inherently part of the investigation.

Table 1 Spatial resolution estimates (mean ± one standard deviation) along the x, y, and z direction extracted from the phase images of the sphere phantom, obtained through sub-sampling the dithered dataset using the values of dx and dy given in the left most columns.

The outcomes (reconstructed axial, sagittal, and coronal slices, see “Methods” section for details on how the images were obtained) of the sub-sampling are shown in the Supplementary Material S1; the images were compared in terms of their visual quality as well as their spatial resolution along the three axes of the reconstructed volume. Since the sphere phantom consisted of a weakly attenuating, homogeneous material (polyethylene), which meant there was little contrast in the attenuation and scattering channels (images not shown), the analysis was conducted on the phase channel. Resolution estimates were obtained by fitting error functions to five edges of spheres along each direction (x, y, and z axes) (one of the five edges is indicated in the corresponding Supplementary Material figure for each direction), computing their derivatives to obtain line spread functions (LSF), and extracting their resulting full width at half maxima (FWHM). Such an LSF based estimation should, in principle, be applied to an edge without curvature or inclination over the slice width, which, strictly speaking, is not the case for a sphere. However, due to the size of the spheres in the phantom used and the thickness of the reconstructed CT slices, those spheres that were intercepted by a CT slice at their centre were considered as having a negligible curvature. For this reason, only the largest spheres in each reconstructed plane were considered for estimating the spatial resolution. The mean and standard deviation of the extracted spatial resolution values at each direction were then computed; these are reported in Table 1. Among the different combinations of dx and dy, the best results were obtained for dx = dy = 20 μm. In this case, resolution was estimated to be 35 μm ± 9 μm, 39 μm ± 7 μm, 35 μm ± 3 μm along x, y, z, respectively, which, given the extent of uncertainty in these estimates, may be considered in line with our isotropy target. For dx = dy = 0 μm (equivalent to a rotation-only scan), the estimates were 44 μm ± 7 μm, 94 μm ± 9 μm, 48 μm ± 8 μm, which not only is a worse outcome, but it is also much less isotropic (note the relatively large uncertainties may be attributed to a distinct noise pattern in the rotation-only and cycloidal-spiral images). For reference, resolution in the fully sampled images was estimated at 12 μm ± 1 μm, 10 μm ± 3 μm and 13 μm ± 2 μm, respectively. Figure 3a–l shows the results for dx = dy = 20 µm, dx = dy = 0 µm, and the fully sampled (dithered) case. The size of each CT slice was 434 × 434 pixels and a total of 436 CT slices were reconstructed; both the pixel size and slice thickness are defined by the dithering step in the fully sampled scan, which was equal to 10 μm. It should be noted that a slight drop in resolution towards the sample periphery can be seen in the cycloidal images, which we attribute to the fact that the challenge of interpolation (needed for the reconstruction from under-sampled data, see “Methods” section) is more severe for areas further away from the centre of rotation, due to the outer sample areas in CT being sampled less densely than the inner ones.

Figure 3
figure 3

Reconstructed axial (left), sagittal (central left), and coronal (central right) planes of the sphere phantom, and the corresponding line spread functions (right) extracted from the sphere edges indicated with corresponding colours along the x, y and z axes. Results are shown for a cycloidal-spiral sub-sampling with dx = dy = 20 μm (ad); a rotation-only sub-sampling, i.e., dx = dy = 0 μm (eh); the fully sampled (dithered) case (scan duration of 7.5 h) (il); a cycloidal-spiral flyscan performed with dx = dy = 20 μm (scan duration of 200 s) (mp).

Three-modal flyscans

Informed by the above results, we then performed true three-modal flyscans with dx = dy = 20 μm (see “Methods” section); the scan duration was 200 s. Flyscan images (phase channel only) of the polyethylene spheres are shown in Fig. 3m–o. The overall quality of these images appears to be in line with their step-and-shoot counterparts, with no obvious artefacts arising from the continuous scanning. However, the effective field of view during the flyscan was smaller c.f. that during the fully sampled scan (see “Methods” section for explanation); here, the size of each CT slice was 338 × 338 pixels and a total of 255 CT slices were reconstructed. Resolution was estimated to be 31 μm ± 5 μm, 34 μm ± 3 μm, 27 μm ± 3 μm in x, y and z, respectively, implying that (near) isotropy was preserved; this is shown in Fig. 3p.

Using the same parameters and procedures, we then scanned a more complex sample: a decellularised piglet esophagus (a so-called “scaffold”), which had been prepared in the context of tissue engineering research (Savvidis et al.32), to demonstrate the ability to extract three-modal images from a single flyscan (Fig. 4). Also in this case, spatial resolution appears to be isotropic in all three contrast channels. Of note is the greater CNR in the phase images, shown in Fig. 4d–f, relative to the attenuation and scattering ones.

Figure 4
figure 4

Reconstructed (ac) attenuation, (df) phase, and (gi) scattering, (a,d,g) axial, (b,e,h) sagittal, and (c,f,i) coronal slices.


In summary, we reported three-modal flyscan x-ray CT images, obtained by combining a beam tracking setup with two-dimensional phase sensitivity with a cycloidal-spiral acquisition scheme. Previously, similar setups required the implementation of step-and-shoot acquisitions, at least if high, isotropic resolutions were to be achieved. This, however, incurred significant overheads and imposed a scan time bottleneck that cannot be overcome by decreasing the exposure time. By contrast, in a flyscan, the overall scan time is determined by exposure time alone, as dead times are typically negligible.

The key outcome of the work is the reduction in scan time: we demonstrated that the cycloidal-spiral flyscan can be completed in a 135 × faster time than a fully sampled (dithered) scan (scan duration of 200 s for the cycloidal-spiral flyscan c.f. 7.5 h for the fully sampled scan), while maintaining the ability to reconstruct three contrast channels with (near) isotropic resolution. Although the flyscan results had a lower resolution than their dithered counterpart (31 μm ± 5 μm, 34 μm ± 3 μm, 27 μm ± 3 μm in x, y and z, respectively for the cycloidal-spiral flyscan c.f. 12 μm ± 1 μm, 10 μm ± 3 μm, and 13 μm ± 2 μm in x, y and z, respectively for the dithered step-and-shoot scan), it must be considered that the former were reconstructed from 1/64th of the frames (and therefore overall amount of data) of the latter, hence some degree of degradation is to be expected. At the same time, the cycloidal-spiral approach was shown to be advantageous over a rotation-only scan. While both schemes can be implemented as flyscans (thus there is no difference in the overall scan time), the simple addition of sample motion within the plane parallel to the modulator during the rotation appears to increase resolution and enable resolution isotropy. This is likely because the added translation “spreads” the acquired data more evenly across the 3D sinogram, thus improving the de facto sampling, while in a rotation-only scan data are densely packed along θ but spread coarsely along x and y.

For the experimental setup and scan parameters considered here, the optimal “spreading” occurred at dx = dy = 20 μm. More generally, a link between the horizontal and vertical displacement per projection and the angular step (i.e., number of projections for a full rotation) can be anticipated; this is a topic for future investigation. To fully understand the effect of, and interaction between, the different scan parameters, complementary metrics such as SNR and CNR will also need to be considered. Moreover, for this proof-of-concept demonstration, the cycloidal-spiral and rotation-only datasets were completed using simple interpolation; more advanced mathematical data-recovery techniques to fill the missing entries in the acquired 3D datasets could ultimately lead to a better performance and will be investigated in future work.

Looking forward, we believe that the cycloidal-spiral approach could open new applications for three-modal CT that are currently unattainable due to constraints on scan time. The flyscans presented in this paper took 200 s, but it is reasonable to assume that this can be brought down to tens of seconds or below by reducing the exposure time per frame, at least at synchrotrons where there is an abundance of x-ray flux. Scan time would then be at a level where some degree of time-resolved imaging may become feasible, a goal that will we seek to attain in the future. Furthermore, the cycloidal-spiral approach has positive implications for three-modal CT performed with lab sources (to which beam tracking can be adapted to23), where the much lower flux available exacerbates the problem of extended scan times, with scans currently lasting hours. By implementing flyscans, the acquisition of three-modal CT images with a few tens of µm resolution may become possible within minutes. Indeed, the translation of our method from the synchrotron to a laboratory environment will be another key focus of our future work. As part of that, we will also investigate the compatibility of cycloidal-spiral scanning with cone beam geometries.


Experimental setup

The experimental setup is shown in Fig. 1. All scans were performed at Diamond Light Source, Beamline I13-2, using a pink beam with a spectrum in the 8 keV to 30 keV range and centered around 27 keV. The filters present in the beam were a 0.28 mm graphite filer, a 1.06 mm graphite filter, and a 3.2 mm Al filter; the photon flux at the field of view was estimated to be 5 × 108 photons/s at energies < 15 keV, 2.4 × 1011 photons/s at energies between 15 and 25 keV, and 2.1 × 1011 photons/s at energies > 25 keV. The sample was placed at 210 m from the source33, and the mask 1 cm upstream of the sample. The mask consists of a 200 ± 20 μm thick Au layer on a 1 mm thick graphite substrate (Microworks, Germany). It has 12 μm circular apertures with a period of 80 μm in both directions; every other aperture row had an offset of half the horizontal period (i.e., 40 μm, see Fig. 1). The experimental setup further consisted of a pco.edge 5.5 camera coupled to a scintillator-objective combination leading to an effective pixel size of 2.6 × 2.6 μm2; the number of pixels was 2560 × 2160 (h × v), leading to a 6.7 mm × 5.6 mm (h × v) field of view. The beamlets remained separated for a wide range of sample-to-detector distances, due to the beam geometry being parallel and the mask period (80 μm) being relatively large compared to the apertures (12 μm). A sample-to-detector distance of 37 cm was chosen; this value has been previously shown to provide sufficiently strong refraction signals at the same beamline34 and is a value close to the sample-detector distances that would ultimately be implemented in a laboratory translation35.

Data acquisition

The dithered dataset consisted of 400 projections taken by rotating the sample in steps of 0.9° over 360°. At each angle, the rotation was interrupted, and the sample scanned in steps (dithering steps) of Δxopt = Δyopt = 10 μm across one mask period, p, in each direction (step-and-shoot scan). This led to the acquisition of 64 frames per angle, with an exposure time of 0.5 s each. In total, 25,600 frames were acquired during this scan, which took 7.5 h (including overheads arising from the scan’s step-and-shoot nature).

The cycloidal-spiral flyscans also consisted of 400 projections (i.e.,  = 0.9 degrees), but now only a single frame was acquired per angle, as the sample was translated in the x–y plane while being rotated; the sample displacement per rotation angle was 20 μm along each axis (horizontal and vertical), as defined during the optimization process. The exposure time per frame also remained unchanged (0.5 s). The horizontal and the vertical translators moved at a nominal speed of 40 μm/s, defined by the exposure time and the required displacement per rotation angle along each corresponding axis. This speed was easily achievable with the motors used (HUBER stages), but more generally it should be noted that the speed requirements may be more challenging for different experimental parameters, e.g. when working with shorter exposure times. In our specific case, we had to implement “back-and-forth” translations along both directions to prevent the 3 mm wide sample from exiting the 6.7 mm × 5.6 mm field of view. This involved moving the sample by 2 mm along x and y, reversing the translators’ direction, moving the sample by 2 mm in the opposite directions, reversing translators reversed again and so forth, until all 400 projections had been acquired. This “back-and-forth” trajectory resulted in field of view that was 2 mm smaller in either direction than that for the fully-sampled step-and-shoot scans, which was defined by the detector size.

The continuous acquisition allowed for negligible overheads to be incurred, meaning that the entire scan was completed in 200 s, which is 135 × faster than the dithered scan.

Data processing

Before the dithered dataset was sub-sampled to mimic rotation-only and cycloidal-spiral scans, attenuation, two-dimensional refraction (along x and y), and two-dimensional scattering (along x and y) signals were retrieved from the raw frames by tracking each beamlet’s profile and quantifying the changes induced by the sample. Specifically, attenuation was obtained from the beamlets’ intensity reduction, and refraction from their horizontal and vertical displacements (through subpixel image registration based on cross correlation36), respectively. The scattering signal, here defined as the variance of the sample’s scattering function along the horizontal, \({\upsigma }_{x}^{2}\), and vertical, \({\upsigma }_{y}^{2}\), direction was obtained from the variance of the beamlets’ second moments25. The magnitude of the directional scattering function, assumed to be a Gaussian, was then obtained by calculating the eigenvalue of the covariance matrix of the scattering function; the covariance matrix included the differences in variance and covariance of each beamlet with and without the sample in the beam25.

The sub-sampling, which was applied individually to the retrieved contrast channels, involved discarding all but one of the 64 frames acquired per angle. At angle j, and for the corresponding mask period, p, in each direction, we only kept the frame, i, with the horizontal (h) and vertical (v) indices:

$${i}_{h}=\left(\frac{\left(\left(j-1\right)dx\right)\mathrm{ mod }p}{{dx}_{opt}}\right)+1$$
$${i}_{v}=\left(\frac{\left(\left(j-1\right)dy\right)\mathrm{ mod }p}{{dy}_{opt}}\right)+1,$$

where j = 1,…,400, as this created the desired cycloidal-spiral patterns. Consequently, the sub-sampled dataset were 3D matrices with “gaps”, containing only 1/64th ≈ 1.56% of the entries of their fully sampled counterparts. The missing 63/64th ≈ 98.44% of entries were filled through interpolation, for which three schemes were initially considered. The first one was a full 3D natural neighbor interpolation; natural neighbor was selected over the other two methods (linear and nearest) available in MATLAB function, griddata, as initial tests revealed it was the best performing one of all of them. The second one consisted of applying 1D cubic spline interpolation along the y direction of the sub-sampled sinograms, followed by 2D cubic interpolation in the x-θ planes (1D-2D approach). The third one consisted of applying 2D cubic interpolation in the y-θ planes, followed by 2D cubic interpolation in the x-θ planes (2D-2D approach). The three schemes were compared through visual inspection of the respective results, as well as based on resolution estimates obtained via the above described LSF approach (see Supplementary Material S1, where results are shown for cycloidal-spiral sub-sampling with dx = dy = 20 μm). It was found that the second scheme, the 1D-2D approach, provided the best image quality in the reconstructed axial planes, while the first (full 3D) and third (2D-2D) schemes performed better, and approximately equally well, in the coronal and sagittal planes. Therefore, and taking into account that the 3D approach required much longer computation times than the 2D-2D one, we decided to apply a combined interpolation approach to recover the missing entries in the under-sampled 3D datasets; that is, the axial planes presented in this article resulted from interpolation with the 1D-2D scheme, while the coronal and sagittal planes resulted from use of the 2D-2D scheme. It should be noted, however, that this combined approach is restrictive in terms of data visualization; it only allows for viewing axial, coronal and sagittal slices. Following interpolation, the sub-sampled and fully sampled sinograms contained the same number of entries.

In all cases, CT reconstruction was performed with filtered back projection (FBP), applied on a slice-by-slice basis. Prior to reconstruction, the refraction images in x and y were converted into integrated phase images through a Fourier space method37.

Flyscan images were reconstructed in the same manner, but the extracted attenuation, refraction, and scattering signals from the individual frames must be placed into their sub-period “slots” in the 3D sinogram before interpolation can be applied. In principle, this can be done according to the indices calculated via Eqs. (1a and 1b). However, this approach cannot account for any discrepancy between the actual and nominal speeds of the translation stages, which would cause an inaccurate allocation. Moreover, in our specific case, we had to implement the above mentioned “back-and-forth” translations. Owing to this, we found that there was indeed a discrepancy between actual and nominal sample trajectories, which became more pronounced with every turn of the sample, due to the motors’ deceleration and acceleration. To take this into account, we calculated the indices as:

$${k}_{h}=\frac{\left({\mathrm{CP}}_{h}-{\mathrm{IP}}_{h}\right) \mathrm{mod} p}{{\Delta x}_{opt}}$$
$${k}_{v}=\frac{\left({\mathrm{CP}}_{v}-{\mathrm{IP}}_{v}\right) \mathrm{mod} p}{{\Delta y}_{opt}},$$

where CP and IP refer to the sample’s current and initial position in mm, i.e., its position at the time of acquiring a particular frame and at the beginning of the scan, respectively. It should be noted, however, that CP cannot be inferred from the acquired frames directly, as the horizontal and vertical translations of the sample are entangled with its rotation. Motor encoder readout synchronized to the detector acquisition can output CP with a high-level of accuracy but, if this is not available, the sample’s translation must be tracked or measured explicitly. In our case, we repeated the flyscan on a reference sample (a tungsten sphere, diameter 300 μm) without the modulator and without the rotation, but with otherwise identical scan parameters. The horizontal and vertical positions of the tungsten sphere during that scan were tracked by applying subpixel image registration36 to each pair of consecutive frames, and subsequently used as CP in Eqs. (2a, 2b). Once the attenuation, refraction and scattering signals extracted from the flyscan data had been correctly placed into the 3D sinograms, interpolation and CT reconstruction were applied as described above.