A generalized dual-domain generative framework with hierarchical consistency for medical image reconstruction and synthesis

Medical image reconstruction and synthesis are critical for imaging quality, disease diagnosis and treatment. Most of the existing generative models ignore the fact that medical imaging usually occurs in the acquisition domain, which is different from, but associated with, the image domain. Such methods exploit either single-domain or dual-domain information and suffer from inefficient information coupling across domains. Moreover, these models are usually designed specifically and not general enough for different tasks. Here we present a generalized dual-domain generative framework to facilitate the connections within and across domains by elaborately-designed hierarchical consistency constraints. A multi-stage learning strategy is proposed to construct hierarchical constraints effectively and stably. We conducted experiments for representative generative tasks including low-dose PET/CT reconstruction, CT metal artifact reduction, fast MRI reconstruction, and PET/CT synthesis. All these tasks share the same framework and achieve better performance, which validates the effectiveness of our framework. This technology is expected to be applied in clinical imaging to increase diagnosis efficiency and accuracy.

The authors proposed a generalized dual-domain generative framework with hierarchical-consistency for medical image reconstruction and synthesis.The performance was demonstrated by low-dose PET/CT reconstruction, metal artifact reduction, fast MRI reconstruction, and PET-CT synthesis.Overall, the paper is well organized, and it can be further improved from the following aspects.1.In recent two years, many dual-domain deep learning based networks were reported for medical image reconstruction and synthesis.The authors may need a deep survey on this topic in the introduction to enhance the background of this paper.2. For all the experimental results, in Tables 1-8, both PSNR and NRMSE are used.However, PSNR are identical to NRMSE up to a log operation and a constant bias.The authors may consider keep one of them to remove the redundancy.3. Regarding the ablation study, in Tables 2, 4, 6 and 8, to demonstrate that all the components are necessary, experiments should be performed for all the settings without the testing component.Hence, the authors may consider the settings "S2+S3" and "S1+S3".4. In all the tables, the values of SSIM are close to 1.0, and almost all the values are greater than 0.95.The differences between the proposed method and the competing methods are very small.Noting that the default parameters in the SSIM function were designed for nature images with a pixel value range of 0-255.Because the original medical image pixel values have different ranges, the default parameters in SSIM cannot be directly applied for medical images.What parameters are employed to compute SSIM for each of the experiment? 5.For a deep learning based method, network training is time-consuming.For all the deep learning based methods in each experiment, it is better to clarify the computational cost to train the proposed network framework and the related competing networks.6. Regarding the application of low-dose PET/CT reconstruction, all the competing methods listed in Table 1 only use image-domain information.Since the proposed method is in dual-domain, at least one state-of-the-art dual-domain image reconstruction approach should be compared.7. Regarding the application of metal artifact reduction, at least one state-of-the-art dual-domain approach should be compared.For example, references 18 and 22 cited in this paper.8. Regarding the application of fast MRI reconstruction, at least one state-of-the-art dual-domain approach should be compared.For example, references 19 and 21 cited in this paper.
Reviewer #3 (Remarks to the Author): This paper presents a novel generalized dual-domain generative framework with hierarchicalconsistency for medical image reconstruction and synthesis.Extensive experimental results demonstrate the effectiveness of the proposed method.

Strengths:
The motivation is clear.The paper is well-written and easy to follow.Extensive experimental results are conducted to support the effectiveness of the method.As the authors claimed, code will be published and some of the source datasets will also be released, which will facilitate the research in this area.
Weakness: Just curious, in PET/CT reconstruction and PET-CT synthesis a fully connected network is employed for Gs^A and G_t^A, but in MAR, RU-Net is employed, and in fast MRI, E2EVarNet is employed.Are there any suggestions for the deployment of different architectures for different tasks?As a generalized framework, the method should work robustly no matter which network architectures are employed.The authors are encouraged to conduct experiments on one of the tasks by employing different network architectures as baseline networks.
Seems that the same matric is used in all the experimental results.A more thorough or clinically related metric may be considered for further evaluation on one of the dataset, for example (SUVs) for PET [Ref1,Ref2] or radiologist report for the synthesized images.Ref1.Low-count whole-body PET/MRI restoration: an evaluation of dose reduction spectrum and five state-of-the-art artificial intelligence models Ref2.Deep learning-assisted ultra-fast/low-dose whole-body PET/CT imaging Dear Reviewer, Thank you very much for your affirmation and constructive comments on our paper.Your suggestions have helped us a lot to improve the quality of our manuscript.Based on your comments and suggestions, we have made the following revisions.

[Comment 1]: [Author's answer and modification]:
Thanks for your comment.We acknowledge that there are many dual-domain learning works for reconstruction tasks.However, most of these methods use dualdomain images as inputs of the cascaded dual-domain networks, which indeed involve the dual-domain knowledge but cannot guarantee the dual-domain consistency across two domains.To address this issue, we propose the dual-domain cycle-consistent generative framework by concerning dual-domain hierarchical consistency which intends to better regularize the potential solution space of medical image reconstruction or synthesis.Compared with other dual-domain works, to our best knowledge, the proposed framework is the first work that explicitly considers hierarchical inter-and intra-domain consistency constraints for medical synthesis and reconstruction which can better explore the latent physical relationships in medical images and hence achieve superior reconstruction performance.

[Comment 2]: [Author's answer and modification]:
Thanks for your comment.It's true that the cycle-consistent concept is proposed by CycleGAN.Based on the initial idea of cycle-consistent, we further extend the concept in our dual-domain framework.We not only apply the cycle-consistency constraint in The dual domain learning is a very common strategy of reconstruction, synthesis and MAR etc.I just wonder the novelty of this study, it seems this study mainly focus on the application of dual domain learning method.
In respect to the proposed method, including cycle consistency, inter-domain consistency and intra-domain consistency, the fundamental principle is very similar to the CycleGAN.What the difference between these two methods.the image domain cycle, more importantly, we propose to apply hierarchical cycleconsistency constraint in the cross domain cycle.To our best knowledge, it's the first work to explore the dual-domain consistency in a unify framework, which is the main difference between the two methods.

[Comment 3]: [Author's answer and modification]:
Thanks for your constructive comment.As the reviewer suggested, in the revised version, we have added the plot of noise power spectrum for all the studied medical applications as shown in Fig. 3 and Fig. 5.The error map between the synthesized image and the GT image is used as the noise map for NPS calculation.Referring to Dobbins III et.al [1], we use nonoverlapping region of interest with size of 64x64 for NPS calculation for low-dose PET/CT, Fast MRI reconstruction, and PET-CT synthesis.The number of patches are 4, 64, 25, and 4 respectively.The window size for metal artifact reduction is 80× 80 and the number of patches is 25.
As we can see, our method exhibits the lowest noise power across the spatial frequency, indicating the superiority of our proposed algorithm, which is in fact in line with the quantitative evaluation as reported in Table 1, 3, 5, and 7.In terms of demonstrating the advantages of the proposed with reconstruction results, only the error to the ground truth is shown and that is not sufficient.This reviewer suggests the authors should proposed noise power spectrum comparison.

[Comment 4]: [Author's answer and modification]:
Thank you for your comment.In the revised version, we have added the more recent MAR methods DuDoNet into comparison as the reviewer suggested.We summarize the quantitative performance in Table 3.As we can see that our proposed network based on ResUNet-32 backbone outperforms the other methods significantly in terms of both PSNR and SSIM.

[Comment 5]:
The comparison methods with respect to reconstruction and MAR are too old.The authors should add advanced reconstruction methods to highlight the performance of proposed algorithm.

[Author's answer and modification]:
Thanks for your comment.As your suggestion, we have removed the RMSE metric in the revised version to avoid redundancy.

[Comment 6]: [Author's answer and modification]:
We apologize for the unclarity of our previous draft.In the revised version, we have clearly clarified the difference between our proposed framework and the existing methods in the Discussion Section as highlighted in blue.In fact, our framework is built on dual domains, which originates from the medical imaging mechanism of the representative imaging systems such as MRI, CT, and PET.Unlike the others which usually adopt sequentially cascaded or parallel connected sub-networks for processing the individual domain patterns, we explicitly impose hierarchical consistency including intra-domain consistency, inter-domain consistency, and cycle consistency which are performed in three stages during the training phase.The stepwise consistencyconstraint is able to achieve a stabilized and structured similarity match and hence an improved network performance.
In Table 1, the metrics of PSNR and RMSE are the same meaning, the authors can give one of them.
In respect to the table 2, the authors try to highlight the dual domain advantages over single domain, in fact, there are some great works have investigated.Therefore, this reviewer still concerns the originality of this paper.
Thank you very much for your affirmation and constructive comments on our paper.Your suggestions have helped us a lot to improve the quality of our manuscript.Based on your comments and suggestions, we have made the following revisions.

[Comment 1]: [Author's answer and modification]:
We appreciate your constructive comment.In the modified manuscript, we have added additional literature survey including medical image reconstruction and synthesis in the Introduction Section as highlighted in blue.We copy the added contents as below: "Different from the conventional CycleGAN framework, where cycle consistency is performed between the source and target images using unsupervised learning scheme in a single modality, e.g., the image domain, our proposed dual-domain based generative framework adopts the principle of hierarchical consistency in dual domains based on supervised learning." "More importantly, unlike most of the existing dual-domain based generative methods which either adopt sequentially cascaded or parallel connected sub-networks for processing the individual domain patterns , we explicitly impose hierarchical consistency including intra-domain consistency, inter-domain consistency, and cycle consistency."

[Comment 2]: [Author's answer and modification]:
Thanks for your comment.In our revised version, as the reviewer suggested, we have removed the RMSE metric to avoid redundancy.
In recent two years, many dual-domain deep learning based networks were reported for medical image reconstruction and synthesis.The authors may need a deep survey on this topic in the introduction to enhance the background of this paper.
For all the experimental results, in Tables 1-8, both PSNR and NRMSE are used.However, PSNR are identical to NRMSE up to a log operation and a constant bias.The authors may consider keep one of them to remove the redundancy.

[Comment 3]: [Author's answer and modification]:
Thank you for your comments.In the revised manuscript, we have added the additional experiment settings as the reviewer suggested for ablation study in all the investigated applications including low-dose PET/CT reconstruction, metal artifact reduction, MRI reconstruction, and PET-to-CT/CT-to-PET synthesis in Table 2, 4, 6, 8, respectively.According to the more comprehensive analysis, we better demonstrate the effectiveness of our proposed training strategy.
Regarding the ablation study, in Tables 2, 4, 6 and 8, to demonstrate that all the components are necessary, experiments should be performed for all the settings without the testing component.Hence, the authors may consider the settings "S2+S3" and "S1+S3".

[Comment 4]: [Author's answer and modification]:
Thank you for your comments regarding our evaluation of image quality.In our study, we utilize the parameter settings for the SSIM metric as specified in the paper by Wang et al. [1], which is in fact widely used in many representative works of medical image reconstruction or synthesis tasks [2,3,4].Since the established parameter setting (K1 = 0.01, K2 = 0.03) has been widely adopted in the medical image processing community, by adhering to the parameters defined in [1], we aim to maintain consistency and ensure comparability with the existing literature.Although we acknowledge that the current parameter setting in SSIM may not describe the performance difference of different methods in the best way, these recommended setting is too widely used and considered as "standard" in our field.Using customized parameter setting of SSIM could introduce bias in our evaluation and may also make it difficult for the others to directly compare with our results on public datasets such as DeepLesion dataset for MAR.

[Comment 5]:
In all the tables, the values of SSIM are close to 1.0, and almost all the values are greater than 0.95.The differences between the proposed method and the competing methods are very small.Noting that the default parameters in the SSIM function were designed for nature images with a pixel value range of 0-255.Because the original [Author's answer and modification]: Thank you for your comments.In the revised manuscript, we have detailed the required training time of all the investigated methods for different applications as marked in blue in the Implementation Details Section.

[Comment 6]: [Author's answer and modification]:
Thanks for your comment.In the revised manuscript, we have added additional experiment to compare our method with the recent dual-domain reconstruction framework iBP-Net [1] as the reviewer suggested.We summarize the quantitative evaluation in Table 1.We can see that although iBP-Net exploits both domains and obtains better reconstruction performance than other comparative methods, it cannot explicitly guarantee dual-domain consistency.In contrast, our method uses elaborately designed consistency-constraints and three-stage training scheme, and hence outperforms the investigated methods significantly.
For a deep learning based method, network training is time-consuming.For all the deep learning based methods in each experiment, it is better to clarify the computational cost to train the proposed network framework and the related competing networks.
Regarding the application of low-dose PET/CT reconstruction, all the competing methods listed in Table 1 only use image-domain information.Since the proposed method is in dual-domain, at least one state-of-the-art dual-domain image reconstruction approach should be compared.Thanks for your suggestion.In the revised manuscript, we have added additional experiments using the more recent method DoDuNet as the reviewer suggested.We summarize the quantitative evaluation on both private and public datasets in Table 3.It is shown that our method outperforms the DoDuNet also in terms of both PSNR and SSIM.

[Comment 8]: [Author's answer and modification]:
Thanks for your suggestion.In the revised manuscript, we have added additional experiments using reference 19 (DuDoRNet) according to the reviewer's comment.Experimental results are summarized in Table 5.We can see that our method outperforms DuDoRNet on both datasets and both subsampling rates in terms of SSIM and PSNR.
Regarding the application of metal artifact reduction, at least one state-of-the-art dual-domain approach should be compared.For example, references 18 and 22 cited in this paper.
Regarding the application of fast MRI reconstruction, at least one state-of-the-art dual-domain approach should be compared.For example, references 19 and 21 cited in this paper.
Reply to Reviewer 3 Paper No: COMMSENG-23-0013 By: Jiadong Zhang, Kaicong Sun, Junwei Yang, Yan Hu, Yuning Gu, Zhiming Cui, Xiaopeng Zong, Fei Gao, Dinggang Shen.Dear Reviewer, Thank you very much for your affirmation and constructive comments of our paper.Your suggestions have helped us a lot to improve the quality of our manuscript.Based on your comments and suggestions, we have made the following revisions.

[Comment 1]: [Author's answer and modification]:
Thank you for your constructive comments.In the revised manuscript, we have added some recommendations on the network backbone for different tasks as described in the Discussion Section marked in blue based on our experimental experience.In fact, one can always use UNet-shaped network as the baseline backbone for the individual generative functions G, and individually designed network structure for different applications can further facilitate the model performance.
In the MRI reconstruction task, as the reviewer suggested, we have used two backbones, i.e., UNet and VarNet, and we demonstrate the results in Table 6.Based on the experimental results, it is shown that using UNet as backbone performs better than the VarNet one.

[Comment 2]: [Author's answer and modification]:
Thanks for your comment.In the revised version, we have calculated the SUV mean bias and SUV max bias as the reviewer suggested for synthesized PET evaluation in both low-dose PET reconstruction and CT-to-PET synthesis tasks.The SUV bias is calculated via following equation: Just curious, in PET/CT reconstruction and PET-CT synthesis a fully connected network is employed for G_s^A and G_t^A, but in MAR, RU-Net is employed, and in fast MRI, E2EVarNet is employed.Are there any suggestions for the deployment of different architectures for different tasks?As a generalized framework, the method Seems that the same matric is used in all the experimental results.A more thorough or clinically related metric may be considered for further evaluation on one of the dataset, for example (SUVs) for PET [Ref1, Ref2] or radiologist report for the synthesized images.
where is the mean or max SUV of ground-truth PET images and is the mean or max SUV of synthesized PET images.We demonstrate the performance of different methods in Table 1 and Table 7, and we can see that our method has the least SUV bias compared with other methods, indicating great advantages of our method.