Motion Corrected Multishot MRI Reconstruction Using Generative Networks with Sensitivity Encoding

,


I. INTRODUCTION
M AGNETIC Resonance Imaging (MRI) is a safe, nonionizing, and non-invasive imaging modality that provides high resolution and excellent contrast of soft tissues. It has emerged as a powerful and effective technique for early diagnosis of many common but potentially treatable diseases including stroke, cancer and ischemic heart disease. Despite of these advantages, the long data acquisition time of MRI causes many difficulties in its clinical as well as research applications. Numerous efforts have been proposed in the literature to expedite the data acquisition process including the use of single-shot echo planar imaging (EPI) [1], parallel imaging (PI) [2], and compressed sensing (CS) [3]).
In single-shot echo-planar imaging (EPI), all the k-space data, necessary to reconstruct final MR image is acquired in single excitation pulse. EPI significantly accelerates the data acquisition time and minimizes the possibility of motion artifacts [4]. However, single-shot EPI suffers from low resolution and susceptibility artifacts. The stringent hardware requirements also limit the application of single-shot EPI. To overcome the limitations of single-shot EPI, segmented or multishot MRI is used [5], which is an excellent compromise between echo-planar and standard spin echo imaging. It significantly reduces the demands on gradient performance and allows the in-plane spatial resolution to be improved to a level comparable to that of standard pulse sequences [6]. However, However, the high-resolution volumetric imaging requires the acquisition of k-space data with a large number of shots at different time instances. As a result, the image may be severely degraded due to subject motion between consecutive shots. This makes the multishot sequences very sensitive to shot-to-shot variabilities caused by the motion. Therefore, the motion compensation techniques are imperatively employed to improve the quality of final MR image in multishot MRI [7].
On the basis of source of motion, motion in MRI is classified into two categories. Rigid motion is caused when some rigid part of the body such as head moves while nonrigid motion arises from the motion of non-rigid parts of the body like arterial pulsation, cardiac motion, or by any other source in the field of view (FOV) (e.g., eyeball motion) [8]. Image degradation in MR examiniation is mostly caused by rigid motion [9] and artifacts associated with rigid motion may cause suboptimal image quality. Subsequently, it may negatively impact radiologic interpretation [10], which effects the patient safety and enhances the medicolegal risks related to the interpretion of motion degraded images. Therefore, motion correction techniques are considered as an imperative part of MRI reconstruction processes. Previously, the problem of motion correction has been solved mostly in an iterative manner [11], which is time-consuming as well as computationally extensive. Researchers are now increasingly interested in leveraging recent advances in machine learning (ML) and deep learning (DL) for improving the state-ofthe-art in MRI motion correction. In particular, the use of generative adversarial networks (GANs) [12] is interesting due to its capability of generating data without the explicit modeling of the probability density function and also due to its robustness to over-fitting. The adversarial loss brought by the discriminator formulated in GANs provides a clever way of incorporating unlabeled samples into the training and imposing higher order consistency that can be useful for motion correction in MRI.
In this paper, we propose using a GAN-enhanced framework to correct rigid motion in multishot MRI during the brain structural scan due to its higher significance in clinical application [13]. This work is the extension of our previous preliminary work [14], where we empirically showed the suitability of GAN for motion correction in multishot MRI. In particular, we are proposing a GAN based conjugate gradient (CG) SENSE [15] reconstruction model to correct the motion in multishot MRI. The proposed techniques involve the use of CG-SENSE for the reconstruction of the motion-corrupted multishot kspace data, which is then fed to GAN to produce an artifactfree image. The proposed technique effectively reduces the motion artifacts within significantly less amount of time which is essential for the clinical applications. Most importantly, we have validated our method on publicly available data by changing various parameters of multishot MRI such as the amount of motion, the number of shots, and the encoding trajectories. Results show that the proposed framework consistently performed better across these parameters and produces the motion-free image in significantly less reconstruction time as compared to traditional iterative techniques.

II. BACKGROUND AND RELATED WORK
MRI is highly sensitive to subject motion during the kspace data acquisition, which can reduce image quality by inducing the motion artifacts. The artifacts by rigid motion are widely observed in multishot MR images during the clinical examination [13], therefore, the application of motion correction techniques is essentially performed during or after the reconstruction process to obtain an artifact-free image. Retrospective motion correction (RMC) techniques are applied to the rigid motion correction [16], [17]. They perform the k-space data acquisition without considering the potential motion and object motion is estimated from acquired k-space data [8]. Many researchers proposed different RMC based method for rigid motion correction. For instance, Bydder et al. [18] studied the inconsistencies of k-space caused by subject motion using parallel imaging (PI) technique. The inconsistent data is discarded and replaced with consistent data generated by the parallel imaging technique to compensate the motion artifacts. This method produces an image with fewer motion artifacts albeit with a lower signal to noise ratio (SNR).
Loktyushin et al. [19] proposed a joint reconstruction and motion correction technique to iteratively search for motion trajectory. Gradient-based optimization approach has been opted to efficiently explore the search space. The same authors extended their work in [20] by disintegrating the image into small windows that contain local rigid motion and used their own forward model to construct an objective function that optimizes the unknown motion parameters. Similarly, Cordero et al. [21] proposed the use of a forward model to correct motion artifacts. However, this technique utilises the full reconstruction inverse to integrate the information of multicoils for estimation and correction of motion. In another study [22], authors extended their framework to correct threedimensional motion (i.e., in-plane and through-plane motion). Through the plane, the motion is corrected by sampling the slices in overlapped manner.
Conventional techniques (mentioned above) estimate the motion iteratively, which makes them computationally extensive and time-consuming. Such constraints hinder their use in the time-critical environment of medical facilities. Recently deep learning has been extensively applied in various other fields including audio [23], speech [24], [25], and vision [26]  but very few attempts have been made for motion correction in MRI. Loktyushin et al. [27] studied the performance of convolution neural network (CNN) for retrospective motion correction in MR images. They trained the model to learn a mapping from the motion-corrupted data to motion-free images. The study indicated the potential application of deep neural networks (DNNs) to solve the motion problem in MRI, however, it lacks the detailed investigation of technique and quantitative representation of results. Similarly, Duffy et al. [28] used CNN to correct motion-corrupted MR images. The work has been compared with traditional Gaussian smoothing [29] and significant improvement has been reported but comparison with the advanced state-of-the-art iterative motion correction techniques was unaccounted. Most importantly, studies on motion correction using deep learning have not exploited GANs, despite of their success in bio-medical image analysis [30] and modeling of natural images [31], [32] to date. However, they have been proved very powerful for MRI reconstruction [33], [34]. In our previous work [14], we proposed the utilization of GAN [12] for multishot MRI motion correction. This work presented the preliminary results on motion correction by reducing the computational time greatly. However, the study has the deprivation of detailed investigation of the proposed framework for multishot MRI against the various parameters such as the number of shots and the encoding trajectories. We extend that work and propose an adversarial CG-SENSE reconstruction framework for the correction of the motion. A detailed analysis of the proposed framework has been presented with respect to different parameters of multishot imaging such as the amounts of motion, the number of shots, and the encoding trajectories.

III. METHODOLOGY
In our proposed method, reconstruction and motion correction are performed, independently. Standard CG-SENSE is employed to reconstruct k-space data which provides the motion-corrupted image in the spatial domain. Motioncorrupted images are given to the GAN for the cleaning of motion artifacts in the second stage. Fig. 1 shows the overall proposed architecture.

A. Motion Model for Multishot MRI
In Multishot MRI, k-space data is acquired in multiple shots (i.e., 2, 4 or 8 shots) in order to cover the whole k-space.
The MRI scanners capture Fourier coefficients along encoding trajectories which is directed by the gradient shapes of the MRI sequence. For generating motion-corrupted data, we opted the same model as followed by [27], [21], originally proposed in [16]. In this model, motion M s is introduced for each s th shot in a motion-free image x. Subsequently, Fourier transform F and sampling matrix A is applied to achieve the k-space representation. Finally, the segment u s of k-space is extracted for each shot and eventually, all the segments are combined to obtain the full k-space data. Mathematically, it can be written as: where, N represents the number of shots, M s the translation as well rotational motion for s th shot, and y the motion-corrupted k-space data. Fig. 2 shows the forward motion model for single coil and two shots.

B. Conjugate Gradient SENSE (CG-SENSE) Reconstruction
In our proposed technique, we employ CG-SENSE reconstruction technique to reconstruct motion-corrupted k-space data. It utilises conjugate gradient (CG) [35] algorithm to efficiently solve the SENSE equations [36], which relates the gradient encoding, sensitivities and aliased images to unaliased ones. CG-SENSE algorithm relates the object to be imaged x m , the encoding matrix E and the acquired k-space data y as follows: The acquired data y has size n c n k , where n c and n k are the number of coils and the number of sampled positions in kspace, respectively. The size of reconstructed image x m is N 2 , while N is the matrix size of the image. The spatial encoding information of gradients and coil sensitivities, is presented by the encoding matrix E.
To solve equation (2), E has to be inverted, which is a difficult task due to its large size. Therefore, CG algorithm is used, to iteratively solve equation (2) for the unaliased image, due to its fast convergence compared to other methods [37].
To facilitate the formulation of the CG-SENSE reconstruction, another matrix Z is introduced to inverse the encoding as follows: where, Z and I d represents the reconstruction matrix and the identity matrix, respectively. Multiplying both sides of equation (2) by the F matrix results into an unaliased image which can be described as: The reconstruction matrix Z can be computed by employing Moore-Penrose inversion: Now the set of equations can be solved without finding the inverse of the E matrix by employing CG algorithm. To efficiently perform the CG-SENSE reconstruction process preconditioning is performed for better initial estimation of x [37].

C. Generative Adversarial Framework
GANs [12] are latent variable generative models that learn via an adversarial process to produce realistic samples from some latent variable code. It includes a generator G and a discriminator D which play the following two-player min-max game: In a simple vanilla GAN, the generator G maps the latent vectors drawn from some known prior p z (simple distribution e.g. Gaussian) to the sample space. The discriminator D is tasked with differentiating between samples generated G(z) (fake) and data samples (real).
Here, we use conditional GAN, where instead of random samples, G is fed corrupted MRI images x m and is trained to produce motion corrected image x c . The adversarial training loss L adv for G is defined as: To facilitate the generator, in addition to the adversarial loss, we also incorporate data mismatch term.
Adversarial training encourages the network to produce sharp images, which is of crucial importance in MRI imaging, whereas data mismatch loss forces the network to correctly map degraded images to the original ones. Thus the final loss for G, dubbed generator, is a weighted sum of L data and L adv .
where λ is a hyper-parameter that controls the weight of each loss term. As training progresses, G and D are trained iteratively.

A. Dataset
For the evaluation of the proposed method, publicly available data is utilized. T2 FLAIR images from Brain Tumor Image Segmentation (BraTS) Challenge 2015 [38] dataset is used, which contains 274 scans and each scan contains 255 slices. Scans are separated into 70% and 30% for training and testing data, respectively. Images of BraTS dataset are considered as motion-free images and motion is introduced by employing the model described in Section 2. The same perturbation technique has been employed in [22], [27]. As BraTS contains spatial domain images, we used a reference scan to estimate the coil sensitivity maps by using [39]. For our work, we produce data with varying degrees of angular motion, number of shots, and trajectories to validate the robustness of our proposed technique.

B. Model Architecture
We adopt a U-Net like architecture (shown in Figure 3) because of its recent success in image restoration task [40] [3]. This involves an encoder and decoder. Due to the bottleneck in this hour-glass structure, the encoder learns to compress relevant information from the corrupted MRI scan discarding the corruption such that decoder is able to restore a clean, un-corrupted counterpart. Encoder consists of convolutions blocks, where each block consists of convolutional layers following by non-linear activation; decoder blocks are composed of transposed convolution layers.
This hourglass structure of U-Net consists of symmetric skip connections from encoder blocks to the decoder blocks. This is necessary to recover fine details for better image restoration: encoder learns to compress image into the high-level features necessary for image restoration, but may remove fine details along with the corruptions, whereas the skip connections from encoder to decoder transfer low-level features from the encoding path to the decoding path to recover the details of the image. In addition to these skip connections, we employ residual connections inside each encoder and decoder block. These residual connections along with skip connections allow efficient gradient back-propagation, which helps in alleviating issues such as vanishing gradients and slow convergence.
The high-level model architecture is described in Fig. 3. Each encoder block consists of 5 convolution layers, each with n feature maps except for the layer in the middle with n/2 feature maps. Padding is employed to keep the dimension of feature maps same inside each block. We set the strides equal to 1 for all layers except the first one, where we choose it to be 2. This stride 2 convolution serves to down-sample feature maps using a learned kernel. Inside each encoder block, a residual connection is used between the first layer and the last layer. Decoder block has the same structure as the encoder except that we replace all the convolutional layers with transposed convolutions and use a stride of 2 at the last layer instead of the first layer. Here stride 2 transposed convolution serves to up-sample the feature maps along the U-Net architecture. The discriminator is exactly the same as the encoder part of the generator.

C. Model Training
We train our network on synthetically generated dataset using the RMSProp optimizer with the learning rate being 1 × 10 −4 and a batch size of 16, until convergence. For each update of G, we update D twice. We pre-train the generator G using Adam optimizer with same learning rate and batch size. This allows training of G to converge faster, while we choose λ to be 0.01.

V. RESULTS AND DISCUSSION
In this section, we have performed a detailed investigation of our proposed technique for the reconstruction of motion-free images in the presence of varying amounts of motion, number of shots, and encoding trajectories. For validation, we used peak signal to noise ratio (PSNR), structural similarity Index (SSIM), and artifact power (AP) as quantification parameters.

A. Effect of the amount of motion
To evaluate the effect of motion, different rotational motion artifacts have been introduced into motion-free images with 16-shots and random trajectory. Motion-corrupted k-space data has been reconstructed using CG-SENSE (without motion correction) and then fed to the adversarial network, which is tasked to generate the motion-free images. Table I summarizes the average results obtained for varying degrees of rotational motion (∆θ = {2°, 5°, 8°, 10°, 12°, 14°}) on test data. It can be noted from Table I that the proposed framework shows excellent performance for small amount of motion by capturing the underlying statistical properties of MR images, and recover sharp and excellent images. However, with the increase in the amount of motion, a smooth decay in the performance of model is observed, as expected, because at higher degree (i.e., 14°) MRI scans severely degraded and it becomes very difficult to recoverable the motion free image. Moreover, the performance of our technique is better than the previous stateof-the-art iterative technique [21] for higher amounts of motion (i.e., ∆θ = 14°) (see Fig. 4). For a small amount of motion, the approach of Cordero et al. [21] performs slightly better in terms of AP, however, the long computational time restrains its efficiency.

B. Influence of the Number of Shots
In this experiment, we investigate the performance of the proposed framework for different number of shots. We generated motion-corrupted data for various number of shots, (i.e., S = {2,4,8,16,32,64,128}) with five degree of motion and the random trajectory. We trained our model individually for each number of shots and evaluated the performance. The results are summarized in Table II, which describes the mean values  of results obtained on all the test scans. It can be seen from Table II that the network is able to learn the artifact pattern and provides significantly promising results for all the number of shots. Encouragingly, our network produces sharp images with high values of PSNR and SSIM even for a higher number of shots. In contrast, state-of-the-art iterative technique [21] were only able to correct the motion for lower number of shots effectively (see Fig. 6(a)). For higher number of shots (S >= 32), the convergence of such iterative techniques [27], [21] becomes very difficult. In our case, the motion is corrected in the spatial domain after the full reconstruction of the motion-corrupted image, which enables the adversarial network to correct the motion artifacts in the image domain without encountering such convergence challenges.

C. Influence of the Encoding Trajectory
From the vast range of trajectories, we restricted ourselves to the four trajectories (as shown in Fig. 5) to validate the performance of the proposed framework. The motion corrupted data of each encoding trajectory is generated with eight number of shots (S = 8) and a relative rotation of ∆θ = 5°is assumed between shots. We first performed full reconstruction of motion corrupted k-space data for each encoding trajectory and then trained the GANs with resultant motion artifactcorrupted images, individually for each trajectory. Table III describes the mean results of our proposed framework for each encoding trajectory. The results show that our approach performs significantly well for all the encoding trajectories. However, it can be noted through a close observation that the performance of the proposed technique is slightly better for the random trajectory since the random trajectory is least effected by the motion. The same reasoning can be applied for slightly degraded performance for Cartesian sequential trajectory as this trajectory is most affected by the motion artifacts. On the other hand, the iterative technique [21] vigorously changes its performances against different encoding trajectories (see Fig. 6(b)). For Cartesian sequential trajectory this technique takes extraordinarily large number of iterations to reach the convergence, while the proposed technique has universal acceptance and it can be employed to any encoding trajectory.
(a) (b) Fig. 6: Comparison of our framework with previous iterative technique [21] in terms of number of shorts in 6(b) and encoding trajectories in 6(a) for randomly selected fifty test images.

D. Computational time analysis
In this section, we performed a series of experiments to evaluate the efficiency of the proposed technique in term of computational time. We compared the computational time of our technique with the previous state-of-the-art iterative technique [21]. For the sake of fair analysis, we performed the motion correction of same motion corrupted k-space data on the same hardware by employing both techniques. Intel ® Core TM i3-2120 CPU with 3.5GHz speed, 16GB of memory and NVIDIA ® Quadro M5000 Graphic Processing Unit (GPU) with 8GB GDDR5 memory, has been used for our experiments. The proposed technique involves two steps, i.e., CG-SENSE reconstruction and motion correction. Therefore, to calculate the total computational time, we added the reconstruction and motion correction time. Table V provides a summary of the results comparing the computational time analysis of our technique with that proposed by Cordero et al. [21] for varying number of shots for fifty randomly selected test images. It can be seen that our technique is several times faster than the previous iterative approach [21]. The previous technique is an iterative method that first iteratively estimates the motion and then corrects for that motion, which needs extra computational time. With the increase of number of shots, it becomes difficult to estimate the motion between two consecutive shots, subsequently, it further increases the time required to correct the motion for higher numbers of shots. Moreover, changing the encoding trajectory also significantly effects the computational performance of conventional iterative technique [21]. Alternatively, in our proposed technique, motion correction is independent of the reconstruction process and it is performed after full reconstruction of k-space data. Therefore, the motion correction for all the number of shots takes the same computational time. However, the CG-SENSE reconstruction takes more time for higher number of shots, which slightly increases the overall motion corrected reconstruction time (see Table V). In Table IV, we summarize the computational time of our technique and iterative technique [21], against different amounts of motion. The time required to correct for motion in our technique is not dependent upon the amount of motion, therefore, it remains the same for all amounts of motion. Alternatively, the conventional technique takes longer time to estimate the higher amount of motion, thus it takes more time to correct such motion.

VI. CONCLUSIONS
We introduced a flexible yet robust retrospective motion correction technique that employs generative adversarial networks (GANs) to correct motion artifacts in multishot Magnetic Resonance Imaging (MRI). This work is an extension of our previous preliminary work, where we empirically showed the suitability of GAN for motion correction in multishot MRI. The proposed technique first performs the full reconstruction of motion-corrupted k-space data and then the resultant artifact-affected image is fed into the deep generative networks that learns the mapping from motion artifact-affected images to the artifacts free images. Our GAN based framework removes the motion artifacts without any prior estimation of motion during the data acquisition or reconstruction process in contrast to the previous iterative methods. Such parameterfree technique can be employed to any encoding scheme without introducing modifications in the acquisition sequence. To validate our method, we carried out a comprehensive experimentation by varying different parameters, such as different levels of motion, the number of shots, and encoding schemes, of multishot MRI. Based on the results, we demonstrated that the performance of the proposed technique is more robust against these parameters and it also reduced the computational time significantly in contrast to the state-of-the-art techniques. Future plans include the extension of framework to perform end-to-end learning using generative network from motion corrupted k-space data to artifacts free image.