Deep learning-based diffusion tensor cardiac magnetic resonance reconstruction: a comparison study

In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the microstructure of myocardial tissue in living hearts, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice poses challenging due to the technical obstacles involved in the acquisition, such as low signal-to-noise ratio and prolonged scanning times. In this study, we investigated and implemented three different types of deep learning-based MRI reconstruction models for cDTI reconstruction. We evaluated the performance of these models based on the reconstruction quality assessment, the diffusion tensor parameter assessment as well as the computational cost assessment. Our results indicate that the models discussed in this study can be applied for clinical use at an acceleration factor (AF) of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 4$$\end{document}×4, with the D5C5 model showing superior fidelity for reconstruction and the SwinMR model providing higher perceptual scores. There is no statistical difference from the reference for all diffusion tensor parameters at AF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 or most DT parameters at AF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 4$$\end{document}×4, and the quality of most diffusion tensor parameter maps is visually acceptable. SwinMR is recommended as the optimal approach for reconstruction at AF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 2$$\end{document}×2 and AF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 4$$\end{document}×4. However, we believe that the models discussed in this study are not yet ready for clinical use at a higher AF. At AF \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 8$$\end{document}×8, the performance of all models discussed remains limited, with only half of the diffusion tensor parameters being recovered to a level with no statistical difference from the reference. Some diffusion tensor parameter maps even provide wrong and misleading information.


Introduction
In vivo cardiac diffusion tensor (DT) imaging (cDTI) is an emerging Magnetic Resonance Imaging (MRI) technique that has the potential to describe the micro-structure of myocardial tissue in the living heart.The diffusion of water molecules occurs anisotropically due to the restrictions imposed by the micro-structure of the myocardium, which can be approximated by fitting three-dimensional (3D) tensors with a specific shape and orientation in cDTI.Various parameters can be derived from the DT, including mean diffusivity (MD) and fractional anisotropy (FA), which are crucial indices that can indicate the structural integrity of myocardial tissues.The helix angle (HA) signifies local cell orientations, while the second eigenvector (E2A) represents the average sheetlet orientation 1 .The development of cDTI provides insights into the myocardial micro-structure and offers new perspectives on the elusive connection between cellular contraction and macroscopic cardiac function 1,2 .Furthermore, it presents opportunities for novel assessments of the myocardial micro-structure and cardiac function, as well as the development and evaluation of innovative therapeutic strategies 3 .
Despite the numerous advantages, there are still significant technical obstacles that must be overcome to integrate cDTI into routine clinical practice.For the calculation of the DT, diffusion-weighted images (DWIs) with diffusion encoding in at least six distinct directions need to be collected.Due to the movement derived from the heart beat and human breath, in vivo cDTI exploits single-shot encoding acquisition for repetitive fast scanning, e.g., single-shot echo planar imaging (SS-EPI) or spiral diffusion-weighted imaging 4 .The utilisation of these single-shot encoding acquisitions, which lead to low signal-to-noise (SNR) images, usually requires multiple repetitions to enhance the accuracy of the DT estimation 5,6 .Each repetition necessitates an undersampled k-space measurements.Teh et al. 25 introduced a directed TV-based method for DWI images reconstruction, applying the information on the position and orientation of edges in the reference image.
In addition to these major technical routes, Liu et al. 26 explored the deep learning-based image synthetics for the interdirectional DWIs generation.The true b0 and 6 DWIs were concatenated with the generated data and passed to the CNN-based tensor fitting network.

Deep Learning-Based Reconstruction
The aim of MRI reconstruction is to recover the ground truth image x from the undersampled k-space measurement y, which is mathematically described as an inverse problem: in which the degradation matrix A can be further presented as the combination of the undersampling trajectory M, Fourier transform F and coil sensitivity maps S. λ is the coefficient that balances regularisation term R(x).
Deep learning technique has been widely used for MRI reconstruction.Based on the association with traditional iterative CS algorithms, deep learning-based MRI reconstruction methods can be categorised into 1) unrolling-based models 11,27,28 and 2) non-unrolling-based models 12,13 .
Unrolling-based models usually integrate neural networks with traditional CS algorithms, simulating the iterative reconstruction algorithms through learnable iterative blocks 9 .Yang et al. 27 reformulated an Alternating Direction Method of Multipliers (ADMM) algorithm to a multi-stage deep architecture, namely Deep-ADMM-Net, for MRI reconstruction, of which each stage corresponds to an iteration in traditional ADMM algorithm.Some unrolling-based models improved Eq (1) with a deep learning-based regulariser 11,28 , which can be formulated as: in which f θ (•) is a deep neural network and x u is the undersampled zero-filled images (ZF).Schlemper et al. 11 designed a deep cascade of CNNs for cardiac cine reconstruction, in which a spatio-temporal correlations can be also efficiently learned via the data sharing approach.Aggarwal et al. 28 proposed a model-based deep learning methods, namely MoDL, which exploited a CNN-based regularisation prior for MRI reconstruction.Non-unrolling-based models usually train an deep learning-based function f θ (•) that maps the undersampled k-space measurement y or zero-filled images x u to estimate fully-sampled images xu or its residual in an end-to-end manner, which can be formulated as xu = f θ (x u ) or xu = f θ (x u ) + x u .Yang et al. 12 proposed a de-alising Generative Adversarial Networks for MRI reconstruction, in which the U-Net-based generator produced the estimated fully-sampled MRI images in an end-to-end manner.Feng et al. 29 exploited the task-specific novel cross-attention and designed a end-to-end Transformer-based model for jointly MRI reconstruction and super-resolution.Huang et al. 13 proposed a Swin Transformer-based model, namely SwinMR, for end-to-end MRI reconstruction, and they further explored the combination of Swin Transformer and GAN for the edge and texture preservation in MRI reconstruction 30 .
Deep learning community constantly provides a wide range of novel and powerful network structures for both kinds of MRI reconstruction methods, including CNNs 11,12 , Recurrent Neural Networks 31,32 , Graph Neural Networks 33 , recently thriving Transformers 13,29,30,34,35 , etc.These rapidly evolving deep learning-based networks enable advances in for MRI reconstruction.

Methodology
In this study, we implement three deep learning-based MRI reconstruction methods, namely, DAGAN 12 , D5C5 11 and SwinMR 13 , and assess their performance on cDTI dataset.The overall data flow is depicted in Figure 1.

Data Acquisition
All data used in this study were approved by the National Research Ethics Service.Written informed consent was obtained from all subjects.
Retrospectively acquired cDTI data were acquired using Siemens Skyra 3T MRI scanner and Siemens Vida 3T MRI scanner (Siemens AG, Erlangen, Germany).A diffusion-weighted stimulated echo acquisition mode (STEAM) SS-EPI sequence with reduced phase field-of-view and fat saturation.Some MR sequence parameters are listed: TR = 2 RR intervals; TE = 23 ms; SENSE or GRAPPA with AF = 2; echo train duration = 13 ms; spatial resolution = 2.8 × 2.8 × 8.0 mm 3 .Diffusion-weighted images were encoded in six directions with diffusion-weighted of b = 150 and 600 sec/mm 2 (namely b150 and b600) in a short-axis mid-ventricular slice.Reference images, namely b0, were also acquired with with a minor diffusion weighting.We used 481 cDTI cases including 2 cardiac phases, i.e., diastole (n = 232) and systole (n = 249), for the experiments section.The dataset contains 241 healthy cases, 31 amyloidosis (AMYLOID) cases, 47 dilated cardiomyopathy (DCM) cases, 35 in-recovery DCM (rDCM) cases, 39 hypertrophic cardiomyopathy (HCM) cases, 48 HCM genotype-positive-phenotypenegative (HCM G+P-) cases, and 40 acute myocardial infarction (MI) cases.The overall data distribution of our dataset is shown in Table 1.The detailed data distribution per cohort and cardiac phase can be found in Table S1 in Supplementary.

3/23
This work separately discussed the reconstruction of systole and diastole cases.For each deep learning-based methods, two network weights were trained for either systole or diastole reconstruction.In the training stage, we applied 5-fold-crossvalidation strategy, using 169 diastole cases (TrainVal-D) or 183 systole cases (TrainVal-S).In the testing stage, four testing sets were utilised, including mixed ordinary testing set with diastole cases (Test-D) or systole cases (Test-S) and out-of-distribution MI testing set with diastole cases (Test-MI-D) or systole cases (Test-MI-S).According to Table S1, Test-D and Test-S includes the data of Health, AMYLOID, rDCM, DCM, HCM and HCM G+P-, which are also included in the TrainVal.For further examining the model robustness and ability to handle out-of-the-distribution data, Test-MI dataset includes only MI cases, which are 'invisible' for models during the training stage.

Data Pre-Processing
In the data pre-processing stage, all DWIs (b0, b150 and b600) were processed following the same protocol.The pixel intensity ranges of DWIs vary considerably across different b-values.To address this, We normalised all DWIs in the dataset to a pixel intensity range of 0 ∼ 1 using the max-min method, while the maximum and minimum pixel values of all DWIs were recorded for the pixel intensity range recovery at the beginning of the data post-processing stage.
In our dataset, the majority of DWIs have a resolution of 256 × 96, while a small subset of 2D slices exhibit a resolution of 256 × 88.In order to standardise the resolution, we zero-padded the edges of the images with a resolution of 256 × 88 to achieve a resolution of 256 × 96.
In this study, GRAPPA-like Cartesian k-space undersampling masks with AF ×2, ×4 and ×8, generated by the official protocol of fastMRI dataset 10 , were applied to simulate the k-space undersampling process.Since all the 2D slices have been reconstructed with zero-padding factor of two, the phase encoding (PE) of our undersampling masks was set to 48 instead of 96, for a more realistic simulation.The undersampling masks were then zero-padded from 128 × 48 to 256 × 96 as shown in Figure 1.More details regarding the undersampling masks can be found in Figure S1 in Supplementary.
For DAGAN and SwinMR, DWIs were further cropped to 96 × 96, as both models only support square-shaped input images.

Deep Learning-Based Cardiac Diffusion Tensor Imaging Reconstruction
In this stage, deep learning-based models were utilised to took the k-space undersampled data as the input and produced the reconstructed MR images.We implemented and evaluated three deep learning-based models, namely DAGAN 12 , D5C5 11 and SwinMR 13 in this stage.

DAGAN
DAGAN 12 is a conditional GAN-based and CNN-based model designed for general MRI reconstruction, of which the model structure is presented in Figure .2. DAGAN comprises two components: a generator and a discriminator, which are trained in an adversarial manner as a two-player game.
The generator is a modified CNN-based U-Net 36 with a residual connection 37 , which takes the k-space zero-filled MR images as input and aims to produce reconstructed MR images as close as possible to the ground truth images.The discriminator is a standard CNN-based classifier that attempts to distinguish the 'fake' reconstructed MR images generated by the generator, from the ground truth MR images.
During the inference stage, only the generator is applied, which takes the ZF MR images as input and outputs the reconstructed images.
DAGAN is trained with a hybrid loss function including an image space l2 loss, a frequency space l2 loss, a perceptual l2 loss based on a pre-trained VGG 38 , as well as an adversarial loss 39 .More implement details can be found in the original paper 12 .D5C5 takes the undersampled k-space measurement as well as ZF MR images as the input and outputs the reconstructed MR images.It is composed of multiple stages, each comprising a CNN block and a data consistency (DC) layer.The CNN block contains a cascade of convolutional layers with Rectifier Linear Units (ReLU) for feature extraction, an optional data sharing (DS) layer for learning spatio-temporal features, as well as a residual connection 37 .The DC layer takes a linear combination 5/23 between the output of the CNN block and the undersampled k-space data, enforcing the consistency between the prediction of CNNs and the original k-space measurements.D5C5 has five stages, with five convolution layers in each CNN block, and no DS layer is applied for our 2D MRI reconstruction task.

Input
D5C5 is trained end-to-end using an image space l2 loss function.Further implementation details can be found in the original paper 11  SwinMR is composed of a CNN-based input module and output module for projecting between the image space and the latent space, a cascade of residual Swin Trasformer blocks (RSTBs), and a convolution layer with a residual connection for feature extraction.A patch embedding and a patch unembedding layer are placed at the beginning and end of each RSTB, facilitating the inter-conversion of feature maps and sequences, since the computation of Transformers is based on sequences.Multiple standard Swin Transformer layers (STLs) 40 and a single convolutional layer are applied between the patch embedding and unembedding layer.
SwinMR is trained end-to-end with a hybrid loss function consisting of an image space l1 loss, a frequency space l1 loss, a perceptual l1 loss based on a pre-trained VGG 38 .More implementation details can be found in the original paper 13 .

Data Post-Processing
We applied our in-house developed software (MATLAB 2021b, MathWorks, Natick, MA) for cDTI post-processing, following the protocol described in 1,14 .The post-process procedure for reference data includes: 1) manual removal of low-quality DWIs; 2) DWI registration; 3) semi-manual segmentation for left ventricle (LV) myocardium; 4) DT calculation via the LLS fit; 5) DT parameter calculation including FA, MD, HA and E2A.The initial post-processing of reference data was performed by either Z.K.For the post-processing of deep leanring-based reconstruction results, the output (96 × 96) of DAGAN and SwinMR were 'pasted' back to the corresponding zero-filled images (256 × 96) at their original position.(This process does not affect the final post-processing results since the ROI region is set in the central 96 × 96 area.) All the DWIs were 'anti-normalised' (pixel value range recovery) to their original pixel intensity range using the maximum and minimum values recorded in the pre-processing stage.
The reconstruction results were arranged to construct new reconstruction dataset with the same structure as the reference dataset.The reconstructed dataset was then automatically post-processed following the configuration of reference data (e.g.,

Experiments and Results
In this section, the experimental results are presented from the perspective: 1) the quality of DWIs reconstruction and 2) the quality of DT parameter estimations.

Reconstruction Quality Assessment
In this study, four metrics were considered to assess the reconstruction quality.Peak Signal-to-Noise Ratio (PSNR) is a simple and commonly used metric for measuring the reconstruction quality, which measures the ratio of the maximum possible power of the signal to the power.Higher PSNR value indicates a better reconstruction quality.Structural Similarity Index (SSIM) is a perceptual-based metric that measures the similarity between two images by comparing their structural information.Higher SSIM value indicates a better reconstruction quality.Learned Perceptual Image Patch Similarity (LPIPS) 41 is a learned metric that measures the perceptual similarity between two images by computing the distance in the latent space using a pre-trained deep neural network.LPIPS has shown a high correlation with human perceptual judgements of the image similarity.Lower LPIPS value indicates a better generated images quality.Fréchet Inception Distance (FID) 42 is a learned metric that measures the similarity between two sets of images by comparing their feature statistics, using a pre-trained deep neural network.FID has also shown to have high correlation with human perceptual experience.Lower FID value indicates a better generated images quality.
Quantitative reconstruction results on the Test-S and Test-D datasets are presented in Table 2.The two-sample t-test was applied for the statistical analysis, and in Table 2 indicates the specific result distribution is significantly different (p < 0.05) from the best result distribution.Among the evaluated models, D5C5 demonstrates superior fidelity in the reconstruction, while SwinMR provides results with higher perceptual score.
Visualised samples of the reconstruction results on Test-S and Test-D datasets are shown in Figure 5.

Diffusion Tensor Parameter Quality Assessment
We further evaluated the quality of DT parameters, including FA, MD, E2A and HA, after post-processing.3 and Table S2, respectively.The mean absolute error for FA, MD and the mean absolute angular error for the HA gradient (HA Slope) and E2A were employed to quantify the difference.The Mann-Whitney test was utilised for the statistical analysis, and in Table 3 and Table S2 indicates that the specific error distribution is significantly different (p < 0.05) from the best results distribution.Data point with green background indicates that the specific distribution of corresponding DT parameter global mean values is NOT significantly different (p > 0.05) from the reference distribution according to the Mann-Whitney Test.
Overall, SwinMR can achieve better or comparable (not significantly different) MD, HA slope and E2A results on all testing sets.DAGAN can achieve better or comparable (not significantly different) FA results on all testing sets.D5C5 has provided better results only on Test-S at AF ×2, but it is not significantly better than SwinMR (on MD, HA Slope and E2A) and DAGAN (on FA).Some cases of visualised DT parameter maps are presented in this study, including FA, MD, HA and absolute value of E2A (|E2A|).The DT parameter maps of a systole healthy case from Test-S with different AFs are shown in Figure 6

Discussion
In this study, we have investigated the performance of deep learning-based methods in the context of cDTI reconstruction.We have implemented three deep learning-based MRI reconstruction methods, namely DAGAN, D5C5 and SwinMR, on our cDTI dataset.Experimental results have been reported from the perspective of reconstruction quality assessment and DT estimation quality assessments.
According to Table 2, for the reconstruction tasks with undersampling masks of AF ×2 and AF ×4, D5C5 has achieved superior PSNR and SSIM, while SwinMR has achieved better deep learning-based perceptual scores, i.e., LPIPS and FID.For the reconstruction tasks at AF ×8, SwinMR has outperformed other methods across all the metrics applied.
According to Figure 5, for the reconstruction task of AF ×2, all three methods have produced fairly good visual reconstruction results.For the reconstruction task of AF ×4, all three methods have successfully recovered overall structure information, whereas they have behaved differently in the recovery of the high-uncertainty area.For example in the experiment on Test-S at AF ×4 (Row 3-4, Col 1-5, Figure 5), the red arrows indicate the high-uncertainty area on the LV myocardium due to the signal loss.DAGAN has provided a noisy estimation while SwinMR has clearly preserved this part of information.However, the results of D5C5 have missed the information in this area.For reconstruction task of AF ×8, neither of three methods is able to produce visually satisfied reconstruction results.For example in the experiment on Test-S with AF ×8 (Row 5-6, Col 1-5, Figure 5), a large amount of visible aliasing artefacts along the PE direction has remained in the reconstruction results of both D5C5 and SwinMR, with D5C5 performing relatively worse than SwinMR.DAGAN, to some extent, has eliminated the aliasing artefacts at the expense of the increased noise, leading to a low-SNR reconstruction.Regarding the recovery of high-uncertainty area, both DAGAN and D5C5 have failed to preserve the information in this area.SwinMR can retain most information of this area, but meanwhile it has produced 'fake' estimation (green arrow).
For the fidelity of the reconstruction, D5C5 has shown superiority on the condition of a relative lower AF, whereas this superiority has been observed disappearance on the condition of a relative higher AF.This phenomenon is caused by the utilisation of DC module in D5C5 (Figure 3), which combines the k-space measurements information with the CNN estimation to keep the consistency.According to Figure S1, in the reconstruction task at a relative lower AF, a large proportion of information in the final output of D5C5 is provided by the DC module, whereas this proportion is significantly decreased in a relative higher AF reconstruction task (AF ×8).Therefore, this kind of unrolling-based methods with DC module is more suitable for the reconstruction at a relative lower AF.
For the perceptual score of the reconstructions, experiments have shown SwinMR outperforms D5C5 and DAGAN on metrics LPIPS and FID.However, even though the perceptual score has a high correlation with the observation of human, it is not always equivalent to a better reconstruction quality 30 .According to Figure 5 (green arrow), SwinMR has learnt to estimate a 'fake' reconstruction detail for a higher perceptual score, which is totally unacceptable and dangerous for clinical use.We believes this phenomenon is caused by the nature of the Transformer applied in SwinMR, which is powerful enough to estimate and generate details that does not exist originally.In addition, the utilisation of the perceptual VGG-based loss restricts SwinMR to produce more perceptual-similar reconstruction instead of pixel-wise-similar reconstruction.
In general, the differences for tensor parameter global mean values between reference and reconstruction results tend to increase as the AF rises.Concerning the global mean values of FA, DAGAN has demonstrated superiority on the Test-S and Test-MI-S, with its superiority growing as the AF increases.On the Test-D and Test-MI-D, the three methods have yielded similar results, with no statistically significant difference observed.Regarding the global mean values of MD, D5C5 and SwinMR have outperformed DAGAN across all the testing sets.Specifically, D5C5 has delivered better results on Test-S, while SwinMR has excelled on Test-MI-S.On the Test-D and Test-MI-D, SwinMR and D5C5 have achieved similar results with no statistical difference at AF ×2 and AF ×4, while SwinMR has surpassed D5C5 at a higher AF (AF ×8).For the global mean values of HA Slope, it is clear that SwinMR has outperformed DAGAN and D5C5 on all testing sets, with its superiority being statistically significant on Test-S and Test-D.In terms of the global mean values of E2A, generally SwinMR has achieved better or comparable results among the three methods, but the differences are typically not statistically significant.
Generally, the quality of DT parameter maps have decreased as the AF increases.We believes that at AF ×2 and AF ×4, the DT parameter maps calculated by these three methods can achieve similar level with the reference.For the MI cases from out-of-the-distribution testing set Test-MI, these three methods can successfully preserve the information in lesion area for clinical use.For example at AF ×2, all three methods have provided visually similar DT parameters maps with the reference (Figure 6 and Figure S5).At AF ×4, all three methods can recover most information of the DT parameters maps.DAGAN tends to produce noisier DT parameter maps, while SwinMR and D5C5 tend to produce the smoother DT parameter maps, which matches the results from the reconstruction quality assessment.We can observed from the MD map and its corresponding error map, that the vertical aliasing (along PE) direction has affected the DT parameter maps (Figure 7, red arrows).The intensity of the MI area in the MD map of DAGAN has had a trend to decrease, while D5C5 and SwinMR has clearly preserved it (Figure S6, red arrows).
However, at AF ×8, the quality of DT parameter maps have significantly gone worse, which also matches the results from the reconstruction quality assessment.For the FA map, a band of higher FA is expected to be observed in the mesocardium for a healthy heart 43 .However, DAGAN and D5C5 have failed to recover the band of higher FA, where DAGAN has produced very noisy FA map, and D5C5 over-smoothed the FA map and wrongly estimate a highlight area (Figure S7, blue arrows).For the MD map, the affect from the aliasing that has been observed at AF ×4, has gone more severe.In the healthy case, the highlight area has wrongly appeared in MD maps from all three methods, which is unacceptable for clinical use and may lead to misdiagnosis (Figure 8, red arrows).In the MI case, the MI lesion area tends to decrease for all the methods, especially for the results of DAGAN, where the lesion area has nearly disappeared (Figure S7, red arrows).For the HA map, it has been observed that SwinMR can produce relatively smooth HA map, while DAGAN can only reconstruct very noisy one.However, the direction of HA has been wrong estimated in the epicardium of the healthy case (Figure 8, green arrows).This is not acceptable for clinical use and easier to lead to misdiagnosis such as MI.For the |E2A| map, DAGAN tends to reconstruct a noisy map, while SwinMR tends to produce a smooth map.All three maps can reconstruct similar results with reference even at AF ×8.
Through our experiments, we have demonstrated that the models discussed in this paper can be effectively applied for clinical use at AF ×2 and AF ×4.However, at AF ×8, the performance of these three models, including the best-performing SwinMR, has still remained limited.
We hope that this study will serve as a baseline for the future cDTI reconstruction model development.Our findings have indicated that there are still limitations when directly applying these general MRI reconstruction methods for cDTI reconstruction.
There is an absence of restrictions on diffusion.The loss functions utilised in the three models discussed in this study all rely on image domain loss, with D5C5 and DAGAN additionally incorporating the frequency domain loss and the perceptual loss.In other words, these is no diffusion information restriction implemented during the model training stage.For further work, the diffusion tensor or parameter maps can be jointly considered into the loss function.Moreover, physical constraints on diffusion can be also incorporated into the training stage.
There is a trade-off between perceptual performance and quantitative performance.Cardiac diffusion tensor MRI is a quantitative technique, which places greater emphasis on the accuracy contrast, pixel intensity range, and pixel-wise fidelity, also referred to as pixel-wise distance.However, the models discussed in this study were originally designed for structural MRI, and they tend to pay more attention on the 'perceptual-similarity', which can be regarded as the latent space distance.A trade-off exists between pixel-wise fidelity and perceptual-similarity 44 .For example, blurred images generally exhibit better pixel-wise fidelity, while the images with clear but 'fake' details tend to have better perceptual-similarity 30 .Such 'fake' details can sometimes be harmful for clinical use.Consequently, for further work, more efforts should be made to consider about 12/23 how to improve the pixel-wise fidelity rather than the perceptual-similarity, or how to prevent the appearance of the 'fake' information.
There is a gap between current DT evaluation methods and the true quality of cDTI reconstruction.This study has revealed that the global mean value of diffusion parameters is not always accurate or sensitive enough to evaluate the diffusion tensor quality.For example, Table 3 indicates no statistically significant difference in MD between reconstruction results (even including ZF) and the reference on Test-S, whereas the Figure 8 shows that the MD maps are entirely unacceptable.This discrepancy arises because the MD value increases and decreases in different parts of the MD map, while the global mean value maintains relative consistency, rendering the global mean MD ineffective in reflecting the quality of the final DT estimation.For future work, apart from the visualised assessment, we will applied the down-stream task assessment, e.g., utilising a pre-trained pathology classification or detection mdoel to evaluate the reconstruction quality.Theoretically, better classification or detection accuracy corresponds to improved reconstruction results.
There are still limitations for this study.1) The size of testing sets is not sufficiently large.The relative small testing sets enlarge the randomness of experimental results and reduce the reliability of statistical tests.In future studies, we will expand our dataset, and provide more accurate results.2) Our simulation experiment is based on the retrospective k-space undersampling on single-channel DWIs that have been reconstructed by the MR scanner.The retrospective undersampling step itself has removed a large amount of noise, leading to unrealistic post-processing results.In future studies, we will conduct our experiment on prospectively-acquired k-space raw data.

Conclusion
In conclusion, we have investigated the application of deep learning-based methods for accelerating cDTI reconstruction, which has significant potential for improving the integration of cDTI into routine clinical practice.Our study focuses on three different models, namely D5C5, DAGAN, and SwinMR, which have been evaluated on cDTI datasets with the AF of ×2, ×4, and ×8.The results have demonstrated that the examined models can be effectively utilised for clinical use at AF ×2 and AF ×4, with SwinMR being the recommended optimal approach.However, at AF ×8, the performance of all models has remained limited, and further research is required to improve their performance at a relative higher AF.

Figure 1 .
Figure 1.The data flow of our implementation for cardiac diffusion tensor imaging data.The whole procedure consists (A) data acquisition, (B) data pre-processing, (C) deep learning-based reconstruction and (D) data post-processing.It is noted that D5C5 does not require the cropping and pasting step and additionally takes the undersampled k-space data and the corresponding undersampling mask as input.

Figure 2 .
Figure 2. The model architecture of DAGAN.(A) the generator of DAGAN is a modified Convolutional Neural Network (CNN)-based U-Net with a residual connection; (B) the discriminator of DAGAN is a standard CNN-based classifier.Conv2D: 2D convolution layer; Recon: reconstructed MR images; GT: groung truth MR images.

Figure 3 .
Figure 3. (A) The model architecture of D5C5.D5C5 has five stages, each comprising a Convolutional Neural Network block (CNN Block) and a data consistency layer (DC).(B) The structure of the CNN Block.One optional data sharing module (DS) and five convolutional layers (Conv Layers) are included in the CNN Block.(C) The structure of the DC.M denotes the undersampling mask, and M = I − M. F and F −1 denote the Fourier and inverse Fourier transform.λ is an adjustable coefficient controlling the level of DC.

Figure S1 .
Figure S1.Three Cartesian k-space undersampling masks applied in this work.AF: acceleration factor; CF: centre factor.

Table 1 .
The overview of the dataset.

Table 2 .
The quantitative reconstruction results on the testing sets Test-S and Test-D with undersampling masks of the acceleration factor (AF) ×2, ×4 and ×8.SSIM, PSNR and LPIPS results are quoted as 'mean (standard deviation)'.indicates the specific distribution is significantly different (p < 0.05) from the best results distribution by the two-sample t-test.

Table 3 .
Differences of diffusion tensor parameter global mean values between the reference and reconstruction results (undersampled k-space zero-filled images ZF included), on systole testing sets Test-S and Test-MI-S.Mean absolute error are applied for fractional anisotropy (FA), mean diffusivity (MD), and mean absolute angular error are applied for helix angle gradient (HA Slope) and second eigenvector (E2A).The results are quoted as 'median [interquartile range]'.indicates the specific error distribution is significantly different from the best results distribution by Mann-Whitney Test (p < 0.05).Data point with green background indicates that the specific distribution of corresponding diffusion tensor parameter global mean values is NOT significantly different from the reference distribution by Mann-Whitney Test (p > 0.05).Units: FA unitless; MD 10 −3 • mm 2 • sec −1 ; HA Slope degrees • mm −1 and E2A degrees.