Conditional generative learning for medical image imputation

Image imputation refers to the task of generating a type of medical image given images of another type. This task becomes challenging when the difference between the available images, and the image to be imputed is large. In this manuscript, one such application is considered. It is derived from the dynamic contrast enhanced computed tomography (CECT) imaging of the kidneys: given an incomplete sequence of three CECT images, we are required to impute the missing image. This task is posed as one of probabilistic inference and a generative algorithm to generate samples of the imputed image, conditioned on the available images, is developed, trained, and tested. The output of this algorithm is the “best guess” of the imputed image, and a pixel-wise image of variance in the imputation. It is demonstrated that this best guess is more accurate than those generated by other, deterministic deep-learning based algorithms, including ones which utilize additional information and more complex loss terms. It is also shown that the pixel-wise variance image, which quantifies the confidence in the reconstruction, can be used to determine whether the result of the imputation meets a specified accuracy threshold and is therefore appropriate for a downstream task.

where α: R N z → R C ′ and β : R N z → R C ′ are learnable convolution layers that take z as input.and are used to represent mathematical operations.
represents element-wise multiplication, which means that each element of one tensor is multiplied by the corresponding element of the other tensor.
represents summation in the channel direction, which means that the values in each channel of the tensors are summed together.µ(w j ) and σ (w j ) compute the mean and fluctuations along the spatial directions, respectively.Therefore, each feature of the intermediate tensor is broken down into these mean and fluctuating components which are stochastic depending on the value of z through the learned layers α and β .

Dense Block
DenseBlocks are a type of building block used in convolutional neural networks for image classification tasks.They were introduced in the DenseNet architecture, which is a deep neural network that connects each layer to every other layer in a feed-forward fashion.We use the notation DenseBlock(k, n), where k represents the number of filters of size 3 and stride 1 in the 2D convolution, and n represents the number of sub-blocks.An example of a DenseBlock with n equal to 3 is shown in Figure S1.When the DenseBlock appears in the generator, it receives two inputs: an intermediate tensor w and a latent variable z.The latent variable is incorporated into the block using conditional instance normalization (CIN).However, when the DenseBlock appears in the critic, it only receives the intermediate tensor w as input.In this case, CIN is replaced with layer normalization.
In a DenseBlock, each layer receives the feature maps of all preceding layers as input, which are concatenated channel-wise.This means that the output of each layer in the DenseBlock is fed as input to all subsequent layers, allowing for highly efficient information flow throughout the network.By densely connecting the layers, DenseBlocks aim to reduce the vanishing gradient problem and improve gradient flow, which can lead to faster convergence during training.
Overall, DenseBlocks are a powerful tool for building highly efficient and accurate convolutional neural networks, especially for image classification tasks with limited data.Dense Blocks therefore work on preserving the feed-forward nature of the network as each layer obtains additional inputs from all the layers preceding it and then in itself passes feature-maps to all subsequent layers.

Down-Sampling Block
The down-sampling block (Figure S2) is designed to decrease the spatial resolution while simultaneously increasing the number of channels.This block comprises a convolution layer that produces an output with twice the number of channels as the input (q equal to 2), a 2D average pooling layer that decreases the spatial dimensions by a factor p of two, and a Dense block.The latent variable z is also an input when this block appears in the cGAN generator.

Up-Sampling Block
The up-sampling block (Figure S3) takes in the output w of the previous block along with the output w of a down-sampling block of the same spatial size via a skip connection.These tensors are then concatenated together in the channel dimension.Next, the up-sampling block carries out a convolution operation that divides the channel size by q equal to 2, followed by a 2D up-sampling operation that doubles the spatial dimension.Here, a 2D nearest neighbour interpolation increases the spatial resolution by a factor p of 2. Finally, the signal is passed through a Dense block.