# Assessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks

## Abstract

A unique advantage of magnetic resonance imaging (MRI) is its mechanism for generating various image contrasts depending on tissue-specific parameters, which provides useful clinical information. Unfortunately, a complete set of MR contrasts is often difficult to obtain in a real clinical environment. Recently, there have been claims that generative models such as generative adversarial networks (GANs) can synthesize MR contrasts that are not acquired. However, the poor scalability of existing GAN-based image synthesis poses a fundamental challenge to understanding the nature of MR contrasts: which contrasts matter, and which cannot be synthesized by generative models? Here, we show that these questions can be addressed systematically by learning the joint manifold of multiple MR contrasts using collaborative generative adversarial networks. Our experimental results show that the exogenous contrast provided by contrast agents is not replaceable, but endogenous contrasts such as T1 and T2 can be synthesized from other contrasts. These findings provide important guidance for the acquisition-protocol design of MR in clinical environments.

A preprint version of the article is available at ArXiv.

## Main

One of the most important advantages of magnetic resonance imaging (MRI) over other imaging modalities is its unique image contrast mechanism. Signal contrasts in MRI arise from various biophysical parameters depending on the tissue microenvironment, and careful design of the imaging pulse sequence allows a specific contrast to be emphasized while other contrasts are minimized. Since different combinations of various MR contrast images convey different information about pathologies, multiple MR contrasts such as T1 weighted (T1), gadolinium contrast enhanced T1 weighted (T1Gd), T2 weighted (T2) and T2-FLAIR (fluid-attenuated inversion recovery) (T2F), where T1 is the spin–lattice relaxation time and T2 the spin–spin relaxation time, are often acquired for accurate diagnosis and segmentation of the cancer margin and radiomic studies1,2,3. Accordingly, MRI is one of most valuable medical imaging modalities for cancer diagnosis and therapeutic indication.

Unfortunately, a complete set of MR contrast images is often difficult to obtain due to the different acquisition protocols at different institutions for multicentre studies, and long acquisition times. Moreover, it is impossible to use gadolinium contrast agents for some patients with kidney failure or allergic responses. Even with the acquisition of a complete MR contrast data set, some of the data cannot be used due to operator errors and patient motion during the data acquisition. Without a complete set of different contrasts, the subsequent analysis can be prone to substantial biases and errors that may reduce the statistical efficiency of the analysis4, and accurate segmentation of the whole tumour, tumour core and effective tumour core may not be possible.

Moreover, in some situations, although multiple-contrast images are available, some of the images suffer from systematic errors. For example, a synthetic MRI technique such as Magnetic Resonance Image Compilation (MAGiC, GE Healthcare)5 has become popular by enabling the generation of the various contrast MR images using a multidynamic multiecho (MDME) scan. While MAGiC can provide clinically useful synthetic MR images with various contrasts, such as T1, T2 and T2-FLAIR, some of the synthetic contrasts have readily recognizable artefacts5,6,7. In particular, the characteristic granulated hyperintense artefacts apparent in the margins along the cerebrospinal fluid–tissue boundaries with MAGiC FLAIR can be mistaken for true pathological conditions such as meningeal disease or subarachnoid haemorrhage in clinical practice. Furthermore, flow and/or noise artefacts are more frequent in MAGiC FLAIR than in conventional FLAIR. This often leads to additional MR acquisition to confirm the diagnosis, which requires increasing scan time, increasing cost, and inconvenience to the patient.

In clinical research dealing with non-imaging (numerical or text) big data sets, it is common to fill missing data with substituted data, instead of reacquiring all the data as a complete set in a similar situation. Statistically, this process, called missing data imputation, allows us to use the standard statistical analysis for the complete data set because the missing data can be derived from the collected data. Thanks to the enormous success of deep neural networks, the problem of missing data imputation has now become relevant in the field of imaging data8,9,10,11,12. Typically, the problem of missing image imputation can be formulated as an image translation problem from one domain to another13,14, and its performance has been greatly improved with the advance of generative adversarial networks (GANs)15. The main purpose of the GAN architecture is to generate realistic images. A typical GAN consists of two neural networks: a generator and a discriminator. The discriminator attempts to find features that distinguish fake images from real ones, while the generator learns to synthesize images so that it is difficult for the discriminator to judge images as real or fake. After training both neural networks, the generator produces realistic outputs that cannot be distinguished as fake samples by the discriminator. Since the introduction of the original GAN15, many ingenious extensions have appeared. For example, for translation between two domains A and B, CycleGAN13 constructs two generators, GA→B and GB→A, and two discriminators, DA and DB, so that the images between the two domains can be successfully translated by cycle-consistency loss16. In another variation, to handle more than two domains, Choi et al. proposed StarGAN14, which used shared feature learning using a single generator and a single discriminator. Using the concatenated input image with a target domain vector, the generator produces a fake image, which is classified as the target domain by the discriminator.

Inspired by the success of GAN-based image translation techniques, there have been many attempts to generate new or additional MR contrasts. For example, Dar et al. proposed MR contrast synthesis with a conditional GAN and an additional perceptual loss17. Specifically, they used Pix2pix18 and CycleGAN to translate the MR contrast images between T1- and T2-weighted images. Welander et al. compared the performance of CycleGAN and UNIT19 in the task of translation between T1- and T2-weighted images20. Furthermore, there have been several studies that translated the images between MR and computed tomography using a similar cycle-consistency loss16,21,22. Meanwhile, Hagiwara et al. proposed conditional GAN-based frameworks to generate desired FLAIR images from a MAGiC data set by a two-step approach23. Since MAGiC FLAIR gave rise to synthetic artefacts, they tried to remove these and to improve the quality of the synthetic MRI by utilizing Pix2pix after the MAGiC.

Despite this success, handling multiple inputs is still challenging for existing image-to-image translation approaches. For example, for translating among a number N of domains using CycleGAN, it is necessary to train N(N − 1) generators for each pair of domains, and N discriminators for each domain. Therefore, CycleGAN requires a large number of neural networks in a multidomain setting, since it is trained without feature sharing among the multiple domains. Although StarGAN14 addresses the multiple domain mapping, it cannot exploit the redundancies across MR contrast images to reconstruct the output contrast, since StarGAN is designed to utilize only one input.

In fact, this poor scalability of existing GAN-based image translation approaches poses a fundamental challenge for us to understand the nature of the MR contrast imputation problem: that is, which contrast matters? With the eye-catching success of generative models, clinicians have realized that it is crucial to understand clearly which MR contrast is genuinely indispensable, and cannot be synthesized using a generative model. This is especially important for adopting artificial intelligence successfully in clinical decision making, since there have been many hyped claims that any MR contrast can be successfully synthesized from a very limited set of data (limited MR contrasts). These claims are based on the belief that the success of MR contrast is due to redundancies in different contrasts. Although some MR contrasts represent specific biological features, most of them represent mixed biological information and provide redundant information. Therefore, to understand the nature of MR contrast imputation, we need to know the degree of redundancy across many different MR contrasts to determine what kinds of information can or cannot be generated. However, such an analysis is not trivial, since the understanding of the redundancies across multiple MR contrasts requires complete knowledge of joint image manifolds, which is considered a complicated machine-learning task.

To address general image imputation problems in computer vision and image processing, we recently developed an image imputation method called the collaborative generative adversarial network (CollaGAN)24, which reconstructs missing images by learning the redundancies across many image pairs. In CollaGAN, a set of images from the entire range of domains is treated as a complete set, and the network is trained to estimate missing images by synergistically combining the information from the multiple inputs. The power of this method has been successfully demonstrated to generate facial expressions, lighting conditions and so forth24.

Here, we show that CollaGAN can be used to systematically understand the importance of each specific MR contrast, and to reveal which contrasts are indispensable and cannot be reproduced by the generative model. For example, Fig. 1 illustrates how CollaGAN can be used for the case of MR contrast imputation problems, where a missing contrast in any order can be estimated from the remaining contrasts. The collaborative processing of the multiple-domain input images is very important in MR contrast imputation problems, since it is impossible to find the accurate pixel intensity without understanding the image manifold across different contrasts. This may appear similar to MAGiC, which calculates the voxel intensity from multicontrast MR images from a MDME scan. However, in contrast to MAGiC, the collaborative learning with CollaGAN also utilizes the semantic and/or structural information beyond the pixel-wise relationship, so more systematic studies on the MR contrast can be performed. Moreover, unlike CycleGAN, CollaGAN utilizes a single discriminator and a single generator to reconstruct the image of all the domains so that the generator can effectively exploit the multiple-domain redundancy by learning the high-dimensional manifold structure across images. Specifically, by estimating a specific contrast from the rest, we can understand the joint manifold structure across multiple contrasts to determine which contrast is most essential and cannot be generated effectively. This is very important in a clinical environment, since we can reduce the unnecessary examinations while retaining the most essential ones.

## Synthesizing MR contrasts using CollaGAN

To validate the use of CollaGAN in understanding the essential MR contrast, we first perform a quantitative study comparing the segmentation performance by replacing one real contrast with a synthesized contrast. Here, we utilized the multimodal brain tumour image segmentation benchmark (BraTS, 2015)2,3. All the scans from BraTS consist of T1, T1Gd, T2, T2F and the ground-truth segmentation labels for brain tumours. The segmentation performances were evaluated using five different BraTS data sets: Original, T1Colla, T1GdColla, T2Colla and T2FColla. The data sets with subscript ‘Colla’ represent the data sets with the substitution of a specific contrast by the reconstructed contrast from CollaGAN. Here, for brain-tumour segmentation, we used the state-of-the-art segmentation network known as the convolutional neural network with variational autoencoder regularization25 with some minor modifications.

Figure 2 shows the segmentation results for the five different BraTS data sets. As shown in Fig. 2, the segmentation network performs well to find the whole-tumour, the tumour-core and the enhancing-tumour-core maps on the original BraTS data set. The segmentation maps from the synthetic BraTS data sets (T1Colla, T2Colla, T2FColla and T1GdColla) produced similar results to the ground-truth and the result maps from the original BraTS. For quantitative evaluation, we measured the segmentation performance in terms of the Dice similarity score26 between the prediction map, Ypred, and the ground truth, Ygt:

$${\rm{Dice}}\left( {Y_{\rm{gt}},Y_{\rm{pred}}} \right) = \begin{array}{*{20}{c}} \frac{2|Y_{\rm{gt}} \cap Y_{\rm{pred}}|} {|Y_{\rm{gt}}| + |Y_{\rm{pred}}|} \end{array}$$

where | | represents the cardinality of the set (number of elements in each set). The segmentation network achieves 0.8531 ± 0.0869/0.7368 ± 0.1850/0.7066 ± 0.2717 (mean ± s.d., N= 28) Dice scores for whole tumour/tumour core/enhancing tumour core, respectively, with the original BraTS data sets. When the original T1-weighted images are replaced with the image reconstructed by CollaGAN (T1Colla), the Dice scores reach 0.8567 ± 0.0882/0.7342 ± 0.1857/0.6979 ± 0.2718 for whole tumour/tumour core/enhancing tumour core, respectively, without any additional training or fine-tuning process. The segmentation performances for the original and (T1Colla, T2Colla and T2FColla) are very similar, as shown in Fig. 3. The results validate that the contrast images reconstructed by CollaGAN for the data sets T1Colla, T2Colla and T2FColla are very similar to the original contrast images from the original BraTS data set.

However, the injection of the gadolinium contrast agent provides additional tissue information, so the postcontrast T1-weighted (T1Gd) images show an important role in the segmentation of tumour core and enhancing tumour core, as shown in the performance drop of the segmentation results using T1GdColla. While the performance drop from CollaGAN-reconstructed T1Gd images using the other contrasts is relatively small for whole tumour and tumour core in Fig. 3, the performance drop in enhancing tumour core is statistically significant. This experiment provides a systematic understanding that the information of contrast injection is still indispensable unless an additional diagnostic evaluation is performed. This is as expected given the wide use of the MR contrast agent and the clearly different biologic features of this contrast (vascularity of the tumour) as compared with the other MR contrasts.

## CollaGAN can accurately reconstruct endogenous MR contrasts

Although the previous experiment showed that the exogenous contrast of external origin from the intravenous gadolinium injection could not be synthesized accurately by the generative model, it also provided promising results indicating that endogenous MR contrasts that originate from the intrinsic properties within tissue and cells may be estimated from the remaining contrasts. Thus, we investigated whether collaborative learning can essentially overcome the limitation of MAGiC images. As shown in Fig. 4, accurate contrast was generated using CollaGAN by synergistically utilizing the redundancies across the remaining contrast. In contrast to CycleGAN and StarGAN, which utilize a single input MR image, accurate reconstructions of the voxel intensity are only possible by synergistically combining multiple-contrast information via CollaGAN (Fig. 5). To verify the clinical efficacy of the method, the reconstructed MR contrast images underwent radiological evaluation. CollaGAN performs very well not only for the brain MR images from the normal subjects, but also for the brain scans from the subjects with lesions (Fig. 6a,b). The hyperintensity signal of the cerebrospinal fluid space (circled yellow in Fig. 6a) compared with the other hemisphere is well reconstructed with both MAGiC T2-FLAIR and T2-FLAIR. Here, MAGiC T2-FLAIR and T2-FLAIR refer to the synthetic T2-FLAIR from MAGiC and the true T2-FLAIR contrast from additional acquisition, respectively. The cortical and sulcal abnormality (yellow circled in Fig. 6b) is also visible for the reconstructed MAGiC T2-FLAIR and T2-FLAIR. The lesions of the subjects are better reconstructed than on the original scans. On the other hand, even if there exists a systemic artefact with synthetic MAGiC T2-FLAIR, CollaGAN still reconstructs the artefact-free T2-FLAIR results with the help of the collaborative input as shown in Fig. 6c,d). The focal sulcal hyperintensity (yellow arrow in Fig. 6c) is only visible for T2-FLAIR (both original and reconstructed) and not visible in MAGiC T2-FLAIR images. Since the synthetic images (T1-FLAIR, T2 weighted, MAGiC T2-FLAIR) from MAGiC cannot capture the aforementioned hyperintensity, it is usual to acquire an additional scan of T2-FLAIR to detect the lesion. However, the hyperintensity lesion was detected on the reconstructed T2-FLAIR by CollaGAN. Moreover, in the reconstructed MAGiC T2-FLAIR, there is a pseudolesion (yellow arrow in Fig. 6d), which is not visible in either the original or the reconstructed T2-FLAIR. The radiologist concludes that the reconstructed conventional T2-FLAIR contrast from CollaGAN not only reflects the original contrast well, but also removes the systemic artefacts from MAGiC well. If the CollaGAN-based approach were implemented in the synthetic MRI by reconstructing the specific desired contrast MR images without any artefacts, we could save the scan time by avoiding the additional scan for accurate clinical diagnosis.

## Synthesized images pass a ‘visual Turing test’ by human experts

Given that the synthesized images can fool the discriminator of CollaGAN and the segmentation network, we also performed a visual evaluation of the image quality to see whether generated images could fool real radiologists. Specifically, we additionally performed visual Turing tests27,28 using reconstructed MR images as follows. We randomly sampled 116 images from the real data and fake (synthesized) data in four different contrasts of BraTS data sets (total 928 images). We performed a blind test to check whether the presented images were seen as real or fake. Specifically, the samples were examined by a board-certified neuroradiologist and a board-certified oncologist in a blind manner. Most of the synthesized MR images (70.1%) were evaluated as real, while 69.1% of the real images were checked as real. See Supplementary Table 1a for the details. We also performed another Turing test using 272 MR images of the MaGiC data sets. As shown in Supplementary Table 1b, similar results were obtained. With these results of the Turing tests by the human experts, we confirmed that most of the images generated by CollaGAN not only deceived the neural networks, but also had a good visual quality in the human rating.

## Conclusion

We employed a recently developed architecture, CollaGAN, to systematically investigate the essential MR contrast for imaging studies and to test which contrasts could and could not be reproduced by generative models. This was made possible by CollaGAN, which can impute missing images by synergistically learning the joint image manifold of multiple MR contrasts. Our experimental results using a BraTS segmentation task revealed that the gadolinium contrast agent was indispensable and the resulting contrast images could not be completely reproduced by generative models. For the case of intrinsic contrasts such as T2-FLAIR, we demonstrated that CollaGAN reconstructed the specific contrast MR images without any artefacts, such that scanning time is saved by avoiding additional scans for accurate clinical diagnosis. Our proposed CollaGAN model can be utilized for other types of imaging study to investigate which contrasts are essential and which contrasts are redundant.

## Methods

### Background theory for CollaGAN

Here, we explain our CollaGAN framework handling multiple inputs to generate more realistic output for image imputation. For ease of explanation, we assume that there are four types (N= 4) of domain: a, b, c, and d. To handle the multiple inputs using a single generator, we train the generator to synthesize the output image in the target domain, $$\widehat {x_{\rm{a}}}$$, using a collaborative mapping from the set of the other types of multiple image,{xa}C = {xb, xc, xd}, where the superscript C denotes the complementary set. This mapping is formally described by

$$\widehat{x_k} = {\it{G}}\left( {\left\{ {x_k } \right\}^{\rm{C}};k } \right)$$
(1)

where k {a, b, c, d} denotes the target domain index that guides us to the generation of the output of the proper target domain, κ. As there are N combinations for a single output and its corresponding complementary set as multiple inputs, we randomly choose these combinations during the training so that the generator learns the various mappings to the multiple target domains.

One of the key concepts for the proposed method is multiple cycle consistency. Since the original cycle-consistency loss cannot be defined for the multiple inputs, the cyclic loss should be redefined. Suppose that the fake output from the forward cycle for the generator, G, is $$\widehat {x_{\rm{a}}}$$, Then, we could generate N − 1 new inputs by the combinations with the fake output, $$\widehat {x_{\rm{a}}}$$, and the inputs, xb, xc, xd. Using the new combination inputs, the generator synthesizes the reconstructed outputs,$$\tilde x_{ \cdot |a}$$, for the backward flow of the cycle. For example, when N= 4, there are three combinations of multiple inputs and single output, so we can reconstruct the three images of original domains using a backward flow of the generator as

$$\begin{array}{l}\widehat{x_{{\rm{b}}|{\rm{a}}}} = {\it{G}}\left( {\left\{ {\widehat{x_{\rm{a}}},x_{\rm{c}},x_{\rm{d}}} \right\};\,{\rm{b}}} \right)\\ \widehat{x_{{\rm{c}}|{\rm{a}}}} = {\it{G}}\left( {\left\{ {\widehat{x_{\rm{a}}},x_{\rm{b}},x_{\rm{d}}} \right\};\,{\rm{c}}} \right)\\ \widehat{x_{{\rm{d}}|{\rm{a}}}} = {\it{G}}\left( {\left\{ {\widehat{x_{\rm{a}}},x_{\rm{b}},x_{\rm{c}}} \right\};\,{\rm{d}}} \right)\end{array}$$

Then, the associated multiple-cycle-consistency loss can be defined as follows:

$${\it{L}}_{\rm{mcc,a}} = ||x_{\rm{b}} - \tilde x_{{\rm{b}}|{\rm{a}}}||_1 + ||x_{\rm{c}} - \tilde x_{{\rm{c}}|{\rm{a}}}||_1 + ||x_{\rm{d}} - \tilde x_{{\rm{d}}|{\rm{a}}}||_1$$

where $$|| \cdot ||_1$$ is the L1 norm. In general, the multiple-cycle-consistency loss for the multiple domains κ can be written as

$${\it{L}}_{{\rm{mcc}},k} = \mathop {\sum}\nolimits_{k \ne k\prime } {||} x_{k\prime } - \tilde x_{k\prime |k}||_1$$
(2)

where

$$\tilde x_{k\prime |k} = G\left( {\left\{ {\widetilde {x_k}} \right\}^{\it{C}};k^\prime } \right)$$
(3)

To use a single generator, we need to use the mask vector to guide the generator to the target domain. The mask vector is a one-hot encoding vector that represents the target domain. When it is fed into the encoder part of G (Fig. 5 left), it is enlarged with the same dimensions as the input images to be easily concatenated. The mask vector has N class channel dimensions to represent the target domain as one-hot encoding along the channel dimension. This is a simplified version of the mask vector that was originally introduced in StarGAN14.

### Discriminator loss

As mentioned before, the discriminator has two roles: one is to classify the source, which is real or fake, and the other is to classify the type of domain, which is class a, b, c or d. Therefore, the discriminator loss consists of two parts: adversarial loss and domain-classification loss. This can be realized using the two subpaths Dgan and Dclsf in a single discriminator that shares the same neural network weights for feature extraction except the last layers for subpaths.

Specifically, the adversarial loss is necessary to make the generated images as realistic as possible. The regular GAN loss may lead to the vanishing gradients problem during the learning process29,30. To overcome this problem and improve the robustness of the training, the adversarial loss of least-square GAN29 was utilized instead of the original GAN loss. In particular, for the optimization of the discriminator, Dgan, the following loss is minimized:

$${\it{L}}_{\rm{gan}}^{\rm{dsc}}\left( {D_{\rm{gan}}} \right) = {\Bbb E}_{x_k}\left[ {\left( {D_{\rm{gan}}\left( {x_k} \right) - 1} \right)^2} \right] + {\Bbb E}_{\tilde x_{k|k}}\left[ {\left( {D_{\rm{gan}}\left( {\tilde x_{k|k}} \right)} \right)^2} \right]$$

whereas the generator is optimized by minimizing the loss

$${\it{L}}_{\mathrm{gan}}^{{\mathrm{gen}}}\left( G \right) = {\Bbb E}_{\tilde x_{k|k}}\left[ {\left( {D_{\mathrm{gan}}\left( {\tilde x_{k|k}} \right) - 1} \right)^2} \right]$$

where$$\tilde x_{k|k}$$ is defined in (3).

Next, the domain-classification loss consists of two parts: $${\it{L}}_{{\mathrm{clsf}}}^{{\mathrm{real}}}$$ and $${\it{L}}_{{\mathrm{clsf}}}^{{\mathrm{fake}}}$$. They are the cross-entropy losses for domain classification from the real images and the fake images, respectively. Recall that the goal of training G is to generate the image properly classified to the target domain. Thus, we first need a best classifier Dclsf that should be trained only with the real data to guide the generator properly. Accordingly, we first minimize the loss $${\it{L}}_{{\mathrm{clsf}}}^{{\mathrm{real}}}$$ to train the classifier Dclsf, then $$L_{\mathrm{clsf}}^{\mathrm{fake}}$$ is minimized by training G with fixed Dclsf so that the generator can be trained to generate samples that can be classified correctly.

Specifically, to optimize Dclsf, the following $${\it{L}}_{\mathrm{clsf}}^{\mathrm{real}}$$ should be minimized with respect to Dclsf:

$${\it{L}}_{{\mathrm{clsf}}}^{{\mathrm{real}}}\left( {D_{\mathrm{clsf}}} \right) = {\Bbb E}_{x_k}\left[ { - {\mathrm{log}}\left( {D_{\mathrm{clsf}}\left( {k;x_k} \right)} \right)} \right]$$
(4)

where Dclsf(k; xk) can be interpreted as the probability of correctly classifying the real input xk as the class k. On the other hand, the generator G should be trained to generate fake samples which are properly classified by the Dclsf. Thus, the following loss should be minimized with respect to G:

$${\it{L}}_{{\mathrm{clsf}}}^{{\mathrm{fake}}}\left( {\it{G}} \right) = {\Bbb E}_{\hat x_{k|k}}\left[ { - {\mathrm{log}}\left( {D_{\mathrm{clsf}}\left( {k;\hat x_{k|k}} \right)} \right)} \right]$$
(5)

### SSIM loss

SSIM is one of the state-of-the-art metrics to measure image quality31. The L2 loss, which is widely used for image restoration tasks, has been reported to cause blurring artefacts in the results32,33,34. SSIM is one of the perceptual metrics and it is also differentiable, so it can be backpropagated34. The SSIM for pixel p is defined as

$${\mathrm{SSIM}}\left( p \right) = \frac{{2\mu _X\mu _Y + {\it{C}}_1}}{{\mu _X^2 + \mu _Y^2 + {\it{C}}_1}} \frac{{2\sigma _{XY} + {\it{C}}_2}}{{\sigma _X^2 + \sigma _Y^2 + {\it{C}}_2}}$$
(6)

where µX is the average of X, σX2 is the variance of X and σXX* is the covariance of X and X*.

There are two variables to stabilize the division: C1 = (k1L)2 and C2 = (k2L)2. L is the dynamic range of the pixel intensities. k1 and k2 are constants: by default, k1 = 0.01 and k2 = 0.03. Since the SSIM is defined between 0 and 1, the loss function for the SSIM can be written as

$${\it{L}}_{{\mathrm{SSIM}}}\left( {{\it{X,Y}}} \right) = - {\mathrm{log}}\left( {\frac{1}{{2|P|}}\mathop {\sum }\limits_{p \in P(X,Y)} \left( {1 + {\mathrm{SSIM}}\left( p \right)} \right)} \right)$$
(7)

where P denotes the pixel location set and |P| is its cardinality. The SSIM loss was applied as an additional multiple-cycle-consistency loss as follows:

$${\it{L}}_{{\mathrm{mcc}{{-}\mathrm{SSIM}},k}} = \mathop {\sum}\nolimits_{k\prime \ne k} {{\it{L}}_{{\mathrm{SSIM}}}\left( {x_{k^\prime },\tilde x_{k\prime |k}} \right)}$$
(8)

### Generator

CollaGAN consists of single pair of a generator, G, and a discriminator, D. For the generator, we redesigned the U-net35 structure with the following three modifications: the CCNL unit, the multibranched encoder and the channel attention, as shown in Fig. 5.

First, the modified U-net basically consists of the CCNL unit instead of the CBR unit (series of convolution, batch normalization and ReLU layer) in the original U-net architecture. Similar to the multiresolution approach of GoogLeNet36, the CCNL unit has two branched inputs: the 1 × 1 convolution layer and the 3 × 3 convolution layer. The two convolution layers are concatenated and pass through the leaky-ReLU layer as shown in Fig. 5. It is important to utilize the 1 × 1 convolution layer since the voxel-wise synthesis of the reconstruction is necessary as well as the 3 × 3 convolution feature extraction for a large receptive field. Thus, two branches of feature information are processed in parallel in CCNL units.

Second, we designed a multibranched encoder for individual feature extraction for each input image (Fig. 5 left). The generator consists of two parts: the encoder and the decoder. In the encoding step, each image is encoded separately by four branches. Here, the mask vector is concatenated to every input image to extract the proper features for the target domain. Then, the encoded features are concatenated at the end of the encoder and the concatenated features are fed into the decoder with the contracting paths between the encoder and the decoder. Since the inputs are not simply mixed in the first layer, the separated features for each contrast image are extracted with the help of the multibranched encoder.

Third, the CCAM37 is applied to the decoder part of the generator with the following modifications. The CCAM was originally designed for image translation to a mixed domain using the sym-parameterized generative network37. The CCAM selectively excludes channels and reduces the influence of unnecessary channels to generate images in a mixed domain conditioned by sym-parameters. Here, we applied channel attention in the decoder part of the generator by CCAM modules using the one-dimensional mask vector as a sym-parameter. The input mask and the average pooled input features are concatenated and pass through the attention multilayer perceptron (MLP). The channel attentions are calculated in the form of scaling weights for each channel of the input feature:

$${\mathrm{CCAM}}\left( {X,\,m} \right) = X \cdot \sigma \left( {{\mathrm{MLP}}\left( {\left[ {P_{\mathrm{avg}}\left( X \right),m} \right]} \right)} \right)$$

where X and m represent the input features and the one-dimensional input mask vector for the target domain, respectively. Pavg, σ and · are the average pooling, the sigmoid operation and element-wise multiplication, respectively. The refined features are calculated by the element-wise multiplication between the input features and the scaling weights. The CCAM module chooses the channels with the calculated attention according to the target domain and the input features.

### Discriminator

To classify the contrast of the MR images, the feature extraction by the multiresolution processing is important. This kind of multiscale approach is reported to work well in the classification of MR contrasts38. The discriminator has three branches that have different scales of resolution. Specifically, the first branch extracts the features at the original scale of resolution and then reduces the size of the feature domain. Another branch processes the feature extraction on the quarter resolution scales (height/4, width/4). The last branch sequentially reduces the scales by two to extract features. These three branches are concatenated to gather the features in a multiscale manner. After this, the discriminator consists of three series of convolutions with stride two and leaky-ReLU. At the end of the discriminator, there are two output headers: one is the source classification header for real or fake images and the other is the domain classification header. PatchGAN13,18 was utilized on the source classification header to classify whether local image patches are real or fake. We also found that dropout39,40 was very effective in preventing the overfitting of the discriminator.

### Brain tumour segmentation data sets

For quantitative analysis for the reconstruction performance, BraTS (available at https://www.smir.ch/BraTS/Start2015)2,3 was used. BraTS supplies the routine clinically acquired 3-T multimodal MRI scans and the ground-truth labels for brain tumour segmentation. The ground-truth labels were manually revised by expert board-certified neuroradiologists. The routine MRI scans consist of four different contrasts, namely T1, T1Gd, T2 and T2F volumes, and were acquired with different clinical protocols and various scanners from multiple institutions. The data sets were divided into 218/28/28 subjects for training/validation/test sets, respectively. The networks were trained using randomly ordered samples in the training set and evaluated with the images in the validation set. The results evaluated on the test set are presented as the final results. A total of five different experiments were performed for each MR contrast.

### Tumour segmentation algorithm

A semantic segmentation network for brain tumour segmentation from three-dimensional MRIs using autoencoder regularization25 achieved the top performance score in the BraTS 2018 challenge. We implemented the segmentation network with some modifications to handle memory efficiently.

The segmentation network consists of a shared encoder part and two branches of the decoder part. The encoder has an asymmetrically larger convolutional neural network architecture compared with the decoder part, to extract the features from the inputs. To fit into the graphics processing unit memory size, we modified the three-dimensional convolution layer to a two-dimensional convolution layer to perform 2.5-dimensional segmentation instead of three dimensional, which utilized the multiple neighbourhood slices of MR images to map the single segmentation label. We chose five slices (two adjacent slices from each dorsal and ventral slice) as input to find the tumour segmentation maps of the centre slice. The encoder part used the blocks where each block consisted of two convolutions with group normalization41 and ReLU, followed by an additive identity skip connection. After the two unit blocks in each spatial level, the image dimensions were progressively downsized by two using the strided convolutions, and the feature size was simultaneously increased by two.

One branch of the decoder is for the segmentation map. The decoder reconstructs each of the segmentation maps for following three tumour subregions: the whole tumour, the tumour core and the enhanced tumour core. The decoder utilized the same blocks in the encoder, but with a single block for each spatial level. The other branch of the decoder is for the regularization. The additional variational autoencoder branch reconstructs the input image itself to regularize the shared encoder during the training phase. The variational autoencoder branch was added to the encoder endpoint; it is similar to the autoencoder architecture for additional guidance and regularization of the encoder part.

### Synthetic MR data sets

We prepared the four types of contrast for 280 axial brain images from 10 subjects. The subjects were scanned with the MDME sequence and the additional T2-FLAIR sequences. Synthetic T1-FLAIR (T1F), T2 and MAGiC T2F images were acquired from MAGiC5 using MDME scans. The MR scan parameters for T1-FLAIR/T2 weighted/MAGiC T2-FLAIR are as follows: repetition time TR 2,500 ms, echo time TE 10 ms, inversion time TI 1,050 ms, flip angle FA 90°/TR 3,500 ms, TE 128 ms, FA 90°/TR 9,000 ms, TE 95 ms, TI 2,408 ms, FA 90°, respectively. Additional T2-FLAIR scans were acquired with different scan parameters, namely, T2F: TR 9,000 ms, TE 93 ms, TI 2,471 ms, FA 160°. The following are common parameters for the four scans: field of view 220 × 220 mm2, 320 × 224 acquisition matrix, 4.0 mm slice thickness. The MR images were divided into the training (224 images), validation (28 images) and test sets (28 images) depending on the subjects. The networks were trained with the randomly ordered samples in the training set and evaluated with the images in the validation set. The results evaluated in the test set are presented. A total of four different experiments were performed for each MR contrast.

### Data preprocessing and implementation details

The MR images were normalized to have unit standard deviations on the basis of the non-zero-voxels only. For the data augmentation, we applied a random scale (0.9–1.1) and a random flip on a lateral-to-lateral direction with a probability 0.5. For the experiments for the MAGiC data set, the learning rate started from 0.00001 and decayed exponentially every 400 steps during 1,000 epochs. The weight decay (L2-regularizer) was applied only for the generator with the regularization parameter value of 0.01. The weight decay was not applied to the discriminator. The dropout with a rate of 0.5 was applied to the last layers of the discriminator. The weights of the losses were as follows: the SSIM cycle-consistency loss, the adversarial loss, the classification loss of the generator, and the discriminator are weighted by 1, while the multiple-cycle-consistency loss is weighted by 10. The generator was trained every epoch, while the discriminator was trained once every even epoch. For the experiments for the BraTS data set, the learning rate started from 0.000001 and decayed exponentially every 10,000 steps. The L2-regularizer was applied for the whole network with 0.00001. The loss term consists of the sum of the L2-loss term and the Kullback–Leibler divergence term with a scale of 0.1. All the details of the hyperparameter and the training procedure can also be found at https://github.com/jongcye/CollaGAN_MRI.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

BraTS data are available at https://www.smir.ch/BraTS/Start2015. The MAGiC data sets are available at https://github.com/jongcye/CollaGAN_MRI.

## Code availability

The CollaGAN codes with the hyperparameter and the training procedure can also be found at https://doi.org/10.5281/zenodo.3567003.

## References

1. 1.

Drevelegas, A. & Papanikolaou, N. in Imaging of Brain Tumors with Histological Correlations (ed. Drevelegas, A.) 13–33 (Springer, 2011).

2. 2.

Menze, B. H. et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).

3. 3.

Bakas, S. et al. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 4, 170117 (2017).

4. 4.

Baraldi, A. N. & Enders, C. K. An introduction to modern missing data analyses. J. School Psychol. 48, 5–37 (2010).

5. 5.

Tanenbaum, L. N. et al. Synthetic MRI for clinical neuroimaging: results of the Magnetic Resonance Image Compilation (MAGiC) prospective, multicenter, multireader trial. Am. J. Neuroradiol. 38, 1103–1110 (2017).

6. 6.

Hagiwara, A. et al. Synthetic MRI in the detection of multiple sclerosis plaques. Am. J. Neuroradiol. 38, 257–263 (2017).

7. 7.

Hagiwara, A. et al. SyMRI of the brain: rapid quantification of relaxation rates and proton density, with synthetic MRI, automatic brain segmentation, and myelin measurement. Invest. Radiol. 52, 647 (2017).

8. 8.

Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS 2012) (eds Pereira, F. et al.) 1097–1105 (Neural Information Processing Systems Foundation, 2012).

9. 9.

Zhang, K., Zuo, W., Chen, Y., Meng, D. & Zhang, L. Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans. Image Processing 26, 3142–3155 (2017).

10. 10.

Dong, C., Loy, C. C., He, K. & Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2016).

11. 11.

Xie, J., Xu, L. & Chen, E. Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems 25 (NIPS 2012) (eds Pereira, F. et al.) 341–349 (Neural Information Processing Systems Foundation, 2012).

12. 12.

Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

13. 13.

Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision 2223–2232 (IEEE, 2017).

14. 14.

Choi, Y. et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition 8789–8797 (IEEE, 2018).

15. 15.

Goodfellow, I. J. et al. Generative adversarial nets. In Advances in Neural Information Processing Systems 27 (NIPS 2014) (eds Ghahramani, Z. et al.) 2672–2680 (Neural Information Processing Systems Foundation, 2014).

16. 16.

Wolterink, J. M. et al. Deep MR to CT synthesis using unpaired data. In International Workshop on Simulation and Synthesis in Medical Imaging (eds Tsaftaris, S. et al.) 14–23 (Springer, 2017).

17. 17.

Dar, S. U. et al. Image synthesis in multicontrast MRI with conditional generative adversarial networks. IEEE Trans. Med. Imaging 38, 2375–2388 (2019).

18. 18.

Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition 1125–1134 (IEEE, 2017).

19. 19.

Liu, M.-Y., Breuel, T. & Kautz, J. Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems 30 (NIPS 2017) (eds Guyon, I. et al.) 700–708 (Neural Information Processing Systems Foundation, 2017).

20. 20.

Welander, P., Karlsson, S. & Eklund, A. Generative adversarial networks for image-to-image translation on multicontrast MR images—a comparison of CycleGAN and UNIT. Preprint at https://arxiv.org/abs/1806.07777 (2018).

21. 21.

Yang, H. et al. Unpaired brain MR-to-CT synthesis using a structure-constrained CycleGAN. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA 2018, ML-CDS 2018 (Lecture Notes in Computer Science Vol. 11045) 174–182 (Springer, 2018)..

22. 22.

Hiasa, Y. et al. Cross-modality image synthesis from unpaired data using CycleGAN. In International Workshop on Simulation and Synthesis in Medical Imaging (eds Gooya, A. et al.) 31–41 (Springer, 2018).

23. 23.

Hagiwara, A. et al. Improving the quality of synthetic FLAIR images with deep learning using a conditional generative adversarial network for pixel-by-pixel image translation. Am. J. Neuroradiol. 40, 224–230 (2019).

24. 24.

Lee, D., Kim, J., Moon, W.-J. & Ye, J. C. CollaGAN: collaborative GAN for missing image data imputation. In 2019 IEEE Conference on Computer Vision and Pattern Recognition 2487–2496 (IEEE, 2019).

25. 25.

Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization. In International Conference on Medical Image Computing and Computer-Assisted Intervention Brainlesion Workshop (eds Crimi, A. et al.) 311–320 (Springer, 2018).

26. 26.

Dice, L. R. Measures of the amount of ecologic association between species. Ecology 26, 297–302 (1945).

27. 27.

Salimans, T. et al. Improved techniques for training GANs. In Advances in Neural Information Processing Systems 29 (NIPS 2016) (eds Lee, D. D. et al.) 2234–2242 (Neural Information Processing Systems Foundation, 2016).

28. 28.

Shrivastava, A. et al. Learning from simulated and unsupervised images through adversarial training. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2107–2116 (IEEE, 2017).

29. 29.

Mao, X. et al. Least squares generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV) 2813–2821 (IEEE, 2017).

30. 30.

Arjovsky, M., Chintala, S. & Bottou, L. Wasserstein GAN. Preprint at https://arxiv.org/abs/1701.07875 (2017).

31. 31.

Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).

32. 32.

Ledig, C.et al. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Vol. 2, 4 (IEEE, 2017)..

33. 33.

Mathieu, M., Couprie, C. & LeCun, Y. Deep multi-scale video prediction beyond mean square error. Preprint at https://arxiv.org/abs/1511.05440 (2015).

34. 34.

Zhao, H., Gallo, O., Frosio, I. & Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–57 (2017).

35. 35.

Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N. et al.) 234–241 (Lecture Notes in Computer Science Vol. 9351, Springer, 2015).

36. 36.

Szegedy, C. et al. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–9 (IEEE, 2015).

37. 37.

Chang, S., Park, S., Yang, J. & Kwak, N. Image translation to mixed-domain using sym-parameterized generative network. Preprint at https://arxiv.org/abs/1811.12362 (2018).

38. 38.

Remedios, S., Pham, D. L., Butman, J. A. & Roy, S. Classifying magnetic resonance image modalities with convolutional neural networks. Proc. SPIE 10575, 105752I (2018).

39. 39.

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. Preprint at https://arxiv.org/abs/1207.0580 (2012).

40. 40.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learning Res. 15, 1929–1958 (2014).

41. 41.

Wu, Y. & He, K. (2018). Group normalization. In 2018 IEEE European Conference on Computer Vision (ECCV) 3–19 (IEEE, 2018).

## Acknowledgement

This research was supported by the National Research Foundation (NRF) of Korea grant NRF-2016R1A2B3008104.

## Author information

J.C.Y. supervised the project in conception and discussion. D.L. and J.C.Y. designed the experiments and analysis. D.L. performed all experiments and analysis. W.-J.M. prepared the MAGiC MRI databases and evaluated the qualitative assessment of the results. D.L., W.-J.M. and J.C.Y wrote the manuscript.

Correspondence to Jong Chul Ye.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions