Motion guided Spatiotemporal Sparsity for high quality 4D-CBCT reconstruction

Conventional cone-beam computed tomography is often deteriorated by respiratory motion blur, which negatively affects target delineation. On the other side, the four dimensional cone-beam computed tomography (4D-CBCT) can be considered to describe tumor and organ motion. But for current on-board CBCT imaging system, the slow rotation speed limits the projection number at each phase, and the associated reconstructions are contaminated by noise and streak artifacts using the conventional algorithm. To address the problem, we propose a novel framework to reconstruct 4D-CBCT from the under-sampled measurements—Motion guided Spatiotemporal Sparsity (MgSS). In this algorithm, we try to divide the CBCT images at each phase into cubes (3D blocks) and track the cubes with estimated motion field vectors through phase, then apply regional spatiotemporal sparsity on the tracked cubes. Specifically, we recast the tracked cubes into four-dimensional matrix, and use the higher order singular value decomposition (HOSVD) technique to analyze the regional spatiotemporal sparsity. Subsequently, the blocky spatiotemporal sparsity is incorporated into a cost function for the image reconstruction. The phantom simulation and real patient data are used to evaluate this algorithm. Results show that the MgSS algorithm achieved improved 4D-CBCT image quality with less noise and artifacts compared to the conventional algorithms.

are only a trade-off and achieve sub-optimal temporal resolution, of which the residual motion could still be found in the reconstructed 4D-CBCT images 26 . On the other hand, algorithms that only use the projections assigned to the current phase to reconstruct the final image can fully achieve the temporal resolution, such as the total variation minimization based algorithm. For these algorithm, when the projection number is very low or there exist small size and low-contrast objects, tiny structure may be usually erased with the piecewise smooth constraint.
In recent years, the image restoration or denoising algorithms based on block processing have shown increasing vitality. The earliest concept of 'patch' was proposed in Haralick's study on textural features for image classification 27 . In 1980, JS Lee pioneeringly used this concept for image enhancement 28 . Latterly, algorithms such like the dictionary learning based image reconstruction/restoration approaches 29 , the nonlocal means filter 30 and the BM3D algorithm 31 , et al. [32][33][34] , have been successfully developed. Local patches often experience much less distortion than the global image and therefore it becomes easier to define the similarity between local patches. Low-rank and sparsity can be better reflected in patch based processing.
In this work, we propose to reconstruct 4D-CBCT volumes from the subset projections of current phase and incorporate the image domain phase-correlated information into the iterative procedure. Motived by the success of block based image restoration, we propose a Motion guided Spatiotemporal Sparsity (MgSS) to formulate the regularization for 4D-CBCT reconstruction. In this scheme, CBCT images of different phases are divided into small cubes (three-dimensional blocks) and the cubes are tracked with estimated motion vector fields through time (phase). After then, regional spatiotemporal sparsity is applied on the tracked cubes. Specifically, we recast the tracked cubes into four-dimensional matrix, and use the higher order singular value decomposition (HOSVD) 35 technique to analyze the regional spatiotemporal sparsity. Finally, a cost function is formulated with embedding the block based spatiotemporal sparsity. One simple but effective optimization algorithm was used for the cost function solution.
This paper is structured as follows: In the method part, we first present the flow chart of the proposed MgSS scheme and then give detailed introduction of each step. After this, we formulate the reconstruction framework that incorporates the MgSS scheme. In the experiment part, we exhibit the results of the NCAT phantom simulation data, 4DCT based simulation data and real patient data by using FDK, SART-TV and the MgSS algorithms. Lastly, in the discussion part, we simply discuss the superiority and limitation of the proposed algorithm.

Methods
Flow chart of the proposed MgSS algorithm. In this work, the proposed Motion guided Spatiotemporal Sparsity (MgSS) applying to 4D-CBCT reconstruction comprises the following steps; (1) Estimate the motion maps for each voxel between adjacent phases of the 4D-CBCT images.
(2) Divide the CBCT sequence images into cubes (3D blocks), and track the cubes through time using the three-dimensional motion maps obtained in step 1. (3) Stack the tracked cubes and apply regional spatiotemporal sparsity on the tracked cubes by framing the 4D-CBCT reconstruction problem to be a constrained optimization problem, and then optimize the problem to get the reconstructions.
In the following sections, we will introduce these steps in detail.

Motion Maps Estimation.
In this study, we are not devoted to develop new algorithm to achieve the three-dimensional motion vector fields (3D-MVFs) between CBCT images of adjacent phases, but rather we use the Real-Time Image-based Tracker (RTIT) 36 toolbox to estimate the three-dimensional motion maps. The RTIT toolbox is an available open source with implementation of the optical flow based registration algorithms 37,38 . Assume that we obtain the initial/current 4D-CBCT estimation, with the RTIT, we can achieve the pixel based motion vector fields consisting of the changes in space coordinates that describe the distribution of the apparent motion velocities of intensity patterns in the sequences of images.
Cubes Tracking with MVFs. For the task of cubes tracking, we define the voxel space-time position as , where x, y, z and t represent the spatial position of the voxel in the CBCT volumetric image and the temporal phase index, respectively. In MgSS, we fetch the cubes (3D blocks) from the images of the first phase and use the 3D-MVFs to track structurally similar cubes in the other phases. Specifically, the first phase image was initiated with highly overlapping cubes. In the extreme case, all the voxels in the first phase image can be defined as the central voxel of one cube, but this may result in huge computation burden. Thus in this study, rather than sliding by one voxel to every next, we use a step of N step = 2 voxels to define the cubes. For the motion tracking, considering one cube ∈ indicates the central pixel position of the cubes with spatial position to be x y ( , , z ) 1 1 1 and temporal phase index to be t 1 . By using Δu to denote the displacement of the central voxel in the cube between t 1 and t 2 phases, we can use u 1 + Δu to track the center pixel location for cube in the t 2 phase. Considering u 1 + Δu might be non-integral, the next block center voxel was taken as u 2 = {u 1 + Δu 2 }, where "{}" is a rounding operation. Extracting the voxels centered from u 2 , we can constitute a new cube B(u 2 ) with the same block size. Following this schedule, B(u n ) can be tracked with u n = {u 1 + Δu n } using the above block-center-tracking method. A block cluster can be ulteriorly constructed: t . Based on the fact that not all the chest is moving during the CBCT scan, the tracked cubes for the static parts should be noisy blocks with same anatomy structures. As shown in Fig. 1, compared to the blocks of the same spatial position in the original phases, the tracked blocks exhibit more mutual similarity.
ScIentIfIc RepoRtS | 7: 17461 | DOI:10.1038/s41598-017-17668-5 Regional Spatiotemporal Sparsity. The technology about matrix rank sparsity has been successfully investigated in the field of dynamic image reconstruction, such as the k-t SLR method 39 . In some previous studies 40 , matrix rank sparsity was often used to dispose the entire image. But the recent studies, such like Low-dimensional-structure self-learning and thresholding (LOST) 41 and compartment-based k-t principal component analysis 42 have indicated that the spatiotemporal sparsity and reconstruction quality could be further promoted by separating the entire image into blocks. Under the patch-based processing theory, decomposition of the tracked regions of dynamic datasets using the singular value decomposition (SVD) algorithm has been reported 43 . In Chen's work, blocks in the clusters were vectorized and each cluster Θ was first rearranged into a 2-D matrix The SVD technique was then adopted to decompose the cluster: If the cluster is truly spatiotemporal sparsity, a least number of significant values will be found in the singular matrix S svd . For the standard SVD, the image blocks are manually vectorized and then the image structural properties in the spatial domain are ignored. To address the problem, the higher order singular value decomposition (HOSVD), which is able to directly decompose dynamic datasets into a multidimensional singular matrix rather than unfolding the blocks into column vectors, has been reported 35 . Also, in this work, we use the HOSVD to decompose the clusters due to its naturality and flexibility. By using the HOSVD technique, a four-dimensional tensor Θ can be decomposed as: where U (1) , U (2) , U (3) , U (4) are orthogonal matrices that contain the orthonormal vectors spanning the column space of the matrix Θ. Here, the symbol × n stands for the n-th mode tensor product. The core tensor S hosvd is not necessarily diagonal matrix, which implies that each dimension of the tensor can have a different rank. Generally, noise and artifacts can be attenuated by approximating the rank of the core tensor. However, the computation of the best rank approximation requires an iterative alternated least-square (ALS) algorithm and is quite time consuming 44 . Moreover, the approximation obtained from simple truncation has been proved to be in most cases quite similar to the optimal approximation 45 . Base on this, in our study, we apply a soft-thresholding strategy on the HOSVD coefficients to reduce noise and artifacts. The thresholding of the rank coefficients can be represented as follows: where H τ denotes the soft-thresholding operator with threshold τ. Then a new tensor can be synthesized by the inverse HOSVD transformation with truncated coefficients Ŝ hosvd and the orthogonal matrices U (1) , U (2) , U (3) , U (4) : The above operation is repeated for each reference cube, thus provides multiple estimates at the same coordinate. For this reason, the final estimates are aggregated by weighted averaging all the obtained block-wise estimates that overlapped at each voxel. MgSS based 4D-CBCT reconstruction. To incorporate the motion tracking induced blocky spatiotemporal sparsity into the CBCT reconstruction framework, in this section, we formulate the following minimization scheme: where f represents the estimated dynamic images, A is the CBCT imaging system matrix with elements of a ij , y denotes the projection measurements. Operator Φ is the cube based sparsity penalties, while ⋅ T f u denotes the dataset that includes a sequence of 4D matrix of the tracked cubes. Operator T u includes two steps: (1) Track three-dimensional blocks with the motion vector, and (2) Rearrange the three-dimensional blocks into 4D matrix.
To optimize the problem in (4), in this work, inspired by the techniques used by Pan et al. 46 , we present an efficient way to solve (4), which can be summarized as: (1) SART 47 step: Here λ is an over-relaxed factor.

MgSS n
( 1/2) denotes the restored block-based clusters based on f n ( 1/2) + by using the HOSVD approach W is a weighted averaging operator indicates how the cubes are merged back into the image. The elements of W is the reciprocal of the times which one pixel was overlapped by different cubes.
In our implementation, the MgSS step is carried out after 10 th iteration of SART step until f f

Results
Data acquisition of digital simulation. NCAT phantom based simulation. In this work, the 4D NURBSbased Cardiac-Torso (NCAT) 48 phantom, which is capable of providing the realistic model of human anatomy     and simulating cardiac and respiratory motion simultaneously, was used for data simulation. A dynamic phantom with ten respiratory phases and breathing period set to be 5 s was generated. The maximum diaphragm motion and the maximum chest anterior-posterior motion is 20 mm and 5 mm. We manually added several tumors with different sizes, contrast and shapes in the right lung field of phantom to test the robustness of the MgSS algorithm. The diameters of the spherical tumors were: 6 mm, 10 mm, 16 i i e 0 2 σ = − + Here I 0 and σ e 2 represent the incident x-ray intensity and the background noise, respectively. I 0 is set to be 2 × 10 6 and σ e 2 is chosen to be 10. Evaluation metrics. To quantitative evaluate the performance of the proposed algorithm, we calculate the relative root mean square error (rRMSE) between the phantom images and the reconstructions. The rRMSE is defined as: Here, f denotes the target image, f p denotes the phantom image, and m is the voxel index. The universal quality index 50 (UQI) index was utilized to conduct region of interest (ROI) based analysis by evaluating the degree of similarity between the reconstructed and the reference images. We select ROIs including the tumor and lung details within the reconstructed and reference images, the mean, variance and covariance of intensities in the ROIs can be respectively calculated as:    UQI measures the intensity similarity between the two images, and its value ranges from zero to one. A UQI value closer to one suggests better similarity to the reference image.

4DCT based simulation.
Digital NCAT phantom study. Visual inspection. Figure 2 shows the results of the NCAT phantom reconstructed by using different methods at transverse, coronal, and sagittal planes for phase #1 with the projection number set to be 21. The columns one to three show the transverse, coronal, and sagittal images, respectively. First row in Fig. 2 shows the designed digital phantom images. The second row shows the results which were reconstructed by using the FDK algorithm from all projections. As we can see, the motion blurring artifacts are   Quantitative evaluation. Table 1 presents the results of rRMSE measures with the projection views range from 21 to 51 for each phase. The rRMSEs of ten phases reconstructions by using the proposed MgSS reduced by an average of 43% compared to those of FDK reconstructions. It can be obviously seen that the MgSS algorithm can achieve smaller rRMSE values compared to the other algorithms, which suggests the promising performance of the proposed MgSS approach. Figure 5 illustrates the horizontal profiles of the transverse plane images in Fig. 2. It can be observed that the profiles obtained from the MgSS reconstructions match much better than the others, which suggest that the present MgSS approach achieves more noticeable gains than other approaches.
Motion trajectory accuracy. Because of the various applications of IGRT is interested in the tumor position information, we are devoted to extract the motion trajectories from different reconstruction schemes: FDK, SART-TV and the proposed MgSS algorithms. We define the motion trajectories which extracted from the NCAT phantom with high-contrast tumor as a reference. In this paper, we extract the movement information from the center of tumor. The motion trajectories have been showed in Fig. 6. As we can see, the motion information of tumor extracted from MgSS algorithm matches well with the reference trajectory, while the trajectories extracted from the FDK diverges the reference trajectory.

Low contrast lesion detection study.
To test the robustness of the MgSS algorithm in reproducing the low contrast lesion, Figs 7 and 8 present the reconstructions of NCAT phantom with different tumor sizes and shapes at the transverse and coronal planes. The first row shows the designed phantom image used for visual comparison. The second to fourth rows show 4D-CBCT at the begin-expiration phase reconstructed by using FDK, SART-TV and proposed MgSS algorithms, respectively. It can be seen, the low contrast tumors in the NCAT phantom can't be completely rebuilt by the FDK and SART-TV algorithms. The morphology of tumors is partly destroyed. The phenomenon is particularly evident for tumors with small size. On the contrast, the MgSS algorithm can mostly recover the tumors with relatively complete morphological structures. Furthermore, Fig. 9 presents the UQI test on the ROIs shown in Fig. 8. All the results suggest that, with the introduction of phase-correlated information, the MgSS algorithm has better lesion detection ability than algorithms that only using the single phase information.
Parameter selection. In our method, there are two parameters need to be tuned: (1) the tracked cube size; (2) the soft-thresholding coefficients for the HOSVD processing. The cube size plays an important role in our work as it not only directly affects the accuracy of the results, but also affects the computation time. Traditionally, for the block-based techniques, the size of the cube is an empirical parameter specified by the user. A reasonable cube size can help us attend to the local structure characteristics while removing undesirable distortions.
To study the influence of the cube size on the proposed algorithm, we experimentally change the size of cube to get the associated reconstructions and calculate the rRMSEs between the reconstructions and the designed phantom image. We reconstruct the NCAT phantom from -21 projections with block size set from 3 × 3 × 3 to 23 × 23 × 23. Figure 10 shows the reconstructions of cube size to be 5 × 5 × 5, 7 × 7 × 7, 9 × 9 × 9, 13 × 13 × 13, 17 × 17 × 17, 23 × 23 × 23. As shown in Fig. 10, on one hand, boundary distortion could be observed when the cube size is too small. On the other hand, the image blur increases with the increase of the cube size. This phenomenon is due to the large cube contains too much structural information, the movement of central voxel in Figure 17. The zoomed tumor areas in the reconstructions of ten phases. The first to the third rows show the tumor image which were reconstructed by FDK, SART-TV, and MgSS, respectively. the cube is not enough to describe the whole cube. Thus, to generate high quality 4D-CBCT image, appropriate size of the cube is needed. Moreover, as shown in Fig. 11, the averaged rRMSEs of ten phases reconstructions indicate that with the cube size range from 7 × 7 × 7 to 11 × 11 × 11, we can get relative smaller rRMSEs. In order to balance the reconstruction quality and computational time, in our other experiments, the cube size was fixed to 9 × 9 × 9.
For the parameter of thresholding coefficients for the HOSVD processing, we choose with a threshold of σ p 2log 2 to manipulate the coefficients of core tensor. This is also an experiential selection as used in other works 51 . Here, σ is defined as the standard deviation of a uniform region in the intermediary images during the iteration and p is the cube size.
Algorithm convergence. To validate and analyze the convergence of the present MgSS method, the − − f f n n ( ) ( 1) (absolute value of differences between two adjacent estimations) measuring on the entire to-be-reconstructed NCAT phantom image were performed. Figure 12 shows the f f n n ( ) ( 1) − − measures with respect to the number of iterations. Results show that the present MgSS algorithm can yield a steadily convergence solution.
Influence of motion tracking. To demonstrate the effect of the estimated motion fields on the reconstruction, we plot the rRMSE measures as a function of the number of iterations for the MgSS with and without motion tracking. Figure 13 shows the benefits of motion guidance, as MgSS with motion tracking can obtain reduced rRMSE as compared to MgSS without motion tracking. Figure 13 also illustrates the convergence of the proposed MgSS algorithm.
Realistic 4D CT based digital phantom study. Figure 14 shows the results of the realistic digital phantom reconstructed by using different methods at transverse, coronal, and sagittal planes for phase #1 with the projection number set to be 21. First column in Fig. 14 shows the 4DCT images using as the golden standard for comparison. The second column shows 4D-CBCT images reconstructed by the FDK algorithm. It can be seen that the FDK reconstructions are seriously contaminated by noise and artifacts, and some anatomical structures can't be clearly seen. For the SART-TV reconstruction, some fine structures have been erased though most of the view aliasing artifacts are suppressed. Compared to FDK and SART-TV algorithms, the proposed MgSS approach can yield images with superior quality. In addition, the UQI test between the reconstructions and golden standard image were calculated. Figure 15 shows the test results of then phases at the transverse, sagittal and coronal planes, separately. For then phases, the UQI values of MgSS reconstructions are always higher than 0.95, which suggests the promising performance of the proposed MgSS algorithm.
Patient study. Figure 16 shows the representative reconstructions of patient data by using the FDK, SART-TV and MgSS methods. Each row of Fig. 16 shows the image of 4D-CBCT at different phase in respiratory cycle: 20-30%, 40-50%, 60-70%, 80-90%. The first column in Fig. 16 shows the sagittal view of 4D-CBCT images reconstructed by conventional FDK. Because of the limited number of projections at each phase, severe noise and artifacts present in the FDK reconstruction. The second column of Fig. 16 shows the results of SART-TV algorithm. Although noise and artifacts have been remarkably reduced, details within the lung area and edges of bony structure can't be clearly seen. Last column of Fig. 16 shows the images reconstructed by MgSS method, from which we can observe noise is suppressed. The boundaries of bony structures as well as fine structures inside the lung are well preserved. To further illustrate the performance of our algorithm, zoomed images of the tumor areas in the then phasesof the reconstructions by using different reconstruction are presented (Fig. 17).

Discussion
In this work, we developed a MgSS algorithm to improve the image quality of 4D-CBCT. This algorithm is developed on the assumption that there exists high structural similarity between the images of neighbored phases, which is similar with Chen's work on dynamic MR reconstruction 43 . In Chen's work, two dimensional patches are tracked with estimated motion maps and then SVD was used to decompose the tacked cluster. For the standard SVD, the image blocks are manually vectorized and then the image structural properties in the spatial domain are ignored. For the proposed MgSS algorithm, the 3D-MVFs between phases were utilized to track the three dimensional cubes and then form the cluster Θ MgSS with the size of To preserve the structural properties, in this work, the higher order singular value decomposition which is able to directly decompose dynamic datasets into a multidimensional singular matrix rather than unfolding the cubes into column vectors, was used to process the four dimensional cluster Θ MgSS . The MgSS algorithm can reveal the fine details shared by grouped blocks and preserve the essential unique features of each individual block. Results of digital phantom and patient data demonstrate the proposed approach can significantly suppress the view aliasing artifacts and noise.
One important step in the proposed MgSS algorithm is: cube based motion tracking. We utilize the Real-Time Image-based Tracker (RTIT) 36 toolbox to obtain the voxel-by-voxel displacement maps between 3D images of different phases. Specifically, we rely mainly on the displacement of the center voxel of cube in the current phase to determine the cube in the next phase. We assume the displacements of region are changing smoothly. Thus, the central voxel is sufficient to describe the cube motion. And in this case, the accuracy of 3D-MVFs estimation would play an important role in finding cubes with similar structures. In this work, the initial 3D-MVFs are generated after the first SART iteration, which may not so accurate. Notably, as the iteration goes on, the 3D-MVFs would be updated with intermediate reconstructions, and therefore both the image quality and the accuracy of the 3D-MVFs will be improved. Other approaches such like the 3D-MVFs initialized with those got from the 4D planning CT of the same patient can be considered. There is a very important parameter need to be tuned for the proposed MgSS approach: the tracked cube size. A large size cube may include more details and local geometry, but on the other side, if the cube size is too large, the central voxel will be insufficient to describe the whole cube motion and also result in increasing computation time. Thus, the proper selection of the cube size is critical. In our studies, we change the block size from 3 × 3 × 3 to 23 × 23 × 23 and generate the reconstructions as shown in Fig. 10 and rRMSEs between the reconstructions and the phantom images. For visual inspection, too small or large cube size would lead to image blur or distortion. The rRMSE results suggest that when the cube size set to be 9 × 9 × 9, we can get relative higher quality image with smaller rRMSE for the NCAT phantom. Also as illustrated in other studies of HOSVD technique, the appropriate selection of the cube size should base on the structural complexity of the target image and may be different for different images. But overall, in our work for lung CT imaging, we found the cube size set to be 9 × 9 × 9 is robust enough to balance the structural information and computation time. Another issue of the proposed MgSS algorithm is the heavy computation burden. It takes about 30 minutes on a desktop computer (3.60 GHz Intel(R) i7 CPU with 8GB RAM) to run one iteration. On one hand, as mentioned above, we can short the computation time with a relatively small cube size as well as an elegant initialization. On the other hand, we can use a step of N step pixels in transverse, coronal, and sagittal directions, respectively. Hence, the number of overlapping cubes is decreased by 1/N step 3 . But with large N step or large motion, gaps between moved blocks may appear, then a mask of the uncovered areas (gaps) on the n-th frame after block motion tracking can be detected and non-motion tracking is performed for the gap blocks to avoid potential additional gaps. Other techniques including using the graphics processing unit (GPU) can be also considered to short the computation time.
In summary, we have developed a MgSS algorithm to improve the image quality of 4D-CBCT. This method effectively utilizes the correlated information from other phases to reconstruct any particular phase of 4D-CBCT. By enforcing the regional spatiotemporal sparsity on the tracked cubes, noise and artifacts can be suppressed and the image quality of 4D-CBCT can be substantially improved.