Accelerated MRI reconstructions via variational network and feature domain learning

Giannakopoulos, Ilias I.; Muckley, Matthew J.; Kim, Jesi; Breen, Matthew; Johnson, Patricia M.; Lui, Yvonne W.; Lattanzi, Riccardo

doi:10.1038/s41598-024-59705-0

Download PDF

Article
Open access
Published: 14 May 2024

Accelerated MRI reconstructions via variational network and feature domain learning

Ilias I. Giannakopoulos¹,
Matthew J. Muckley²,
Jesi Kim¹,
Matthew Breen¹,
Patricia M. Johnson^1,3,4,
Yvonne W. Lui^1,3,4 &
…
Riccardo Lattanzi^1,3,4

Scientific Reports volume 14, Article number: 10991 (2024) Cite this article

Metrics details

Subjects

Abstract

We introduce three architecture modifications to enhance the performance of the end-to-end (E2E) variational network (VarNet) for undersampled MRI reconstructions. We first implemented the Feature VarNet, which propagates information throughout the cascades of the network in an N-channel feature-space instead of a 2-channel feature-space. Then, we add an attention layer that utilizes the spatial locations of Cartesian undersampling artifacts to further improve performance. Lastly, we combined the Feature and E2E VarNets into the Feature-Image (FI) VarNet, to facilitate cross-domain learning and boost accuracy. Reconstructions were evaluated on the fastMRI dataset using standard metrics and clinical scoring by three neuroradiologists. Feature and FI VarNets outperformed the E2E VarNet for 4$\times$, 5$\times$ and 8$\times$ Cartesian undersampling in all studied metrics. FI VarNet secured second place in the public fastMRI leaderboard for 4$\times$ Cartesian undersampling, outperforming all open-source models in the leaderboard. Radiologists rated FI VarNet brain reconstructions with higher quality and sharpness than the E2E VarNet reconstructions. FI VarNet excelled in preserving anatomical details, including blood vessels, whereas E2E VarNet discarded or blurred them in some cases. The proposed FI VarNet enhances the reconstruction quality of undersampled MRI and could enable clinically acceptable reconstructions at higher acceleration factors than currently possible.

Segment anything in medical images

Article Open access 22 January 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Deep learning with diffusion MRI as in vivo microscope reveals sex-related differences in human white matter microstructure

Article Open access 14 May 2024

Introduction

Rapid magnetic resonance (MR) imaging (MRI) techniques, such as parallel imaging (PI)^1,2,3 and compressed sensing (CS)⁴, have significantly enhanced the cost-efficiency and expanded the range of applications for MRI. In subsequent advancements, researchers have formulated PI as a nonlinear inversion process rooted in CS principles^{5,6,7,8,9,10,11,12}. These approaches leverage regularization techniques to cohesively optimize both the anatomical image and the coil sensitivity profiles. More recently, supervised deep learning (DL) has been used with PI to facilitate MRI reconstructions from highly accelerated acquisitions^13,14. One of the first supervised DL-based MRI reconstruction methods was based on a variational network (VarNet)¹³, in which all the free regularization parameters in the CS iterative gradient descent scheme were learned from data instead of being set empirically. In particular, the regularizer used in the VarNet was the fields of experts (FoE) model¹⁵ and the gradient descent was unrolled yielding a deep neural network. In the more recent end-to-end VarNet (E2E VarNet)¹⁶, the gradient of the FoE was replaced with a UNET¹⁷ in each iteration of the gradient descent, resulting in improved performance^18,19,20. Moreover, the E2E VarNet incorporated an additional UNET to estimate the coil sensitivity maps needed for PI from the auto calibration signal (ACS) k-space lines.

Several other approaches^21,22,23 have integrated supervised DL into the image reconstruction pipeline. Among these, the Model-Based Deep Learning (MoDL) network used a convolutional neural network-based regularization prior while enforcing data-consistency through numerical optimization conjugate gradient blocks. As in the E2E VarNet, MoDL unrolls the iteration steps to yield a deep network. The densely interconnected network (DIRCN)²⁴ adapted the E2E VarNet using input level dense connections to improve gradient and information flow as in^25,26. DIRCN also used long range skip-connections to directly connect the UNETs in each gradient descent step. Recurrent VarNet²⁷ is another adaptation of the E2E VarNet, which replaces the traditionally used UNET with a recurrent unit. In this approach a hidden state is provided as an additional input to each gradient descent step, that stores the sequence of information from the previous steps. However, the use of recurrent networks increases the memory demand of the network due to the need of accumulating more gradients in the memory. Methods like the Deep J-SENSE²⁸ and Joint-ICNet²⁹ also follow the unrolled optimization scheme of VarNet but refine both the image and the coil sensitivity maps through an alternating optimization model, which enables the use of a small number of ACS lines to estimate accurate coil sensitivity maps. Learned DC³⁰ learns the data likelihood model in a dynamic MRI setting to better approximate the noise distribution in k-space. CTFNet³¹ can exploit spatiotemporal correlations simultaneously from both the frequency and the time domain. Additionally, studies on dynamic MR imaging such as CINENet³² and L+S-Net³³ are able to operate with complex-valued data directly, avoiding potential information loss associated with treating real and imaginary components in separate channels. Although networks using complex-valued data can reduce the number of parameters and accelerate convergence, they require complex arithmetic, which can lead to greater computational and memory demands, making such models more challenging to develop and optimize. More recently, score-based approaches and self-supervised or unsupervised DL methods^34,35,36,37 have been introduced. For example, SURE-Score³⁸ combines a denoiser and a score function using only noisy training data, offering a cost-effective alternative to supervised DL-based methods. Self-Score³⁹ introduced a fully-sampled-data-free score-based diffusion model that learns the MR image prior in a self-supervised manner using Bayesian deep learning.

Cross-domain learning methods, such as the KIKI-net⁴⁰, incorporate learning in both the image-space and k-space to improved MR image reconstruction. Another example of cross-domain learning is provide by⁴¹, where dynamic image reconstruction is performed by iterating across the frequency-time domain and the image domain. However, such framework does not utilize an unrolled gradient descent scheme, so it does not directly preserve the physics of parallel imaging. DCT-net⁴² and the method presented in⁴³ also perform cross-domain learning. In particular, DCT-net reconstructs both the image and the undersampled k-space with two networks running in parallel joined with transformer blocks. To our knowledge, both DCT-net and the method in in⁴³ have not been applied yet to multi-coil parallel imaging reconstructions. DIIK-Net⁴⁴ interleaves the image and the k-space into a cross-domain interaction block in each refinement module, thus performing cross-domain learning in each gradient descent step. DIIK-Net reports a slightly lower PSNR score than the XPDNet model⁴⁵, which ranked below the E2E VarNet in the fastMRI public leaderboard¹⁹. Finally, IKWI-net⁴⁶ performs learning using image, k-space, and a wavelet domain. However, this approach relies on magnitude DICOM images to simulate fully-sampled raw data, which leads to unrealistic results^47,48.

DL-approaches for image reconstruction are nowadays incorporated in most commercial products. At the time of writing, self-supervised methods achieve competitive reconstruction performance compared to supervised methods, although the latter still maintain an edge. Notably, the E2E VarNet model and its extension, the DIRCN model, have secured the third and second positions, respectively, on the fastMRI leaderboard for 4$\times$ accelerated reconstructions (Supporting Fig. S1). In the case of 8$\times$ accelerations, the E2E VarNet dropped to the fourth position, whereas the Iterative Refinement with Fourier-Based Restormer reached third place⁴⁹, and DIRCN maintained second place (Supporting Fig. S2). The reported results show that there is room for improvement to improve image reconstruction for large undersampling factors.

The aim of this work was to improve the original E2E VarNet by implementing and evaluating three modifications. First, we adapted the network’s architecture to perform training in a feature-space instead of image-space, which preserved high-level features between the iterations of gradient descent. In our approach, the feature-space data-consistency term decodes the feature space to k-space, performs data consistency, and finally encodes back to feature space. Second, we leveraged the feature-space representation of the MR image and employed a transformer^50,51. In particular our attention mechanism in the transformer utilizes the knowledge of what aliasing artifacts look like in the case of Cartesian undersampling and attenuates them in the reconstructed images. Finally, we combined our proposed feature-space approach with the image-space representation of the E2E VarNet to build a Feature-Image (FI) VarNet, in an attempt to boost performance, by merging a comprehensive CNN model (E2E VarNet) and a CNN model augmented with attention mechanisms (Feature VarNet). The models were compared and the best one was evaluated by three neuroradiologists. A preliminary version of this work was presented at the 2023 Annual Meeting of the International Society for Magnetic Resonance in Medicine^52,53.

Methods

MR image reconstruction

The MR signal $\textbf{k}_{i}$ received by the i-th coil is related to the MR image $\textbf{x}$ by the forward problem:

$$\begin{aligned} \textbf{k}_{i} = \textbf{m} \odot \textbf{F} \left( \textbf{c}_{i} \odot \textbf{x} \right) , \quad i = 1, \ldots , N. \end{aligned}$$

(1)

Here, N is the number of receive coils, $\textbf{c}_{i}$ are the receive coil sensitivity profiles, $\textbf{F}$ is the discretized Fourier transform operating on a vector with multichannel images concatenated, and $\textbf{m}$ is the predefined undersampling mask.

One can solve for $\textbf{x}$ by precomputing $\textbf{c}$ and inverting (1) through a regularized optimization routine based on CS^4,54. In particular, we can express the optimization problem as:

$$\begin{aligned} \tilde{\textbf{x}} = \mathop {\textrm{argmin}}\limits _{\textbf{x}} \frac{1}{2} \sum \limits _{i}^{N} \left\Vert \textbf{m} \odot \textbf{F} \left( \textbf{c}_{i} \odot \textbf{x} \right) - \tilde{\textbf{k}}_{i}\right\Vert + \lambda \varvec{\mathcal {Q}}\left\{ \textbf{x}\right\} , \end{aligned}$$

(2)

where $\varvec{\mathcal {Q}}$ is the regularizer, $\lambda$ is its weighting factor, and $\tilde{\textbf{k}}_{i} = \textbf{m} \odot \textbf{k}_{i}$ is the undersampled measured k-space (signal) data. The solution of (2) for a fully sampled k-space and $\lambda = 0$ is the inverse Fourier transform. If $\varvec{\mathcal {Q}}$ is differentiable, the inverse problem can be solved with a few gradient descent iteration steps as:

$$\begin{aligned} \textbf{k}^{j+1}= & {} \textbf{k}^{j} - \eta ^{j} \textbf{m} \left( \textbf{k}^{j}-\tilde{\textbf{k}} \right) + \lambda \textbf{F}\mathcal {E}\frac{\partial \varvec{\mathcal {Q}}\left( \mathcal {R}\left( \textbf{F}^{-1} \textbf{k}^{j}\right) \right) }{\partial {\textbf{k}}}. \end{aligned}$$

(3a)

$$\begin{aligned} \mathcal {E}\left\{ z\right\}= & {} \left[ \textbf{c}_1 \odot z, \dots , \textbf{c}_N \odot z \right] \end{aligned}$$

(3b)

$$\begin{aligned} \mathcal {R}\left\{ z_1, \dots , z_N \right\}= & {} \sum \limits _{i=1}^N \textbf{c}^*_i \odot z_i \end{aligned}$$

(3c)

Here, $F$, is the discretized Fourier transform operating on a vector with single channel images and $\textbf{k}$ is the vector containing the k-space from all individual coils. $\mathcal {E}$ is the expand operator, which performs the multiplication of the individual coil images with $\textbf{c}_{i}$. $\mathcal {R}$ is the reduce operator, which multiplies element-wise the conjugate of $\textbf{c}_{i}$ with the coil images and sums over the number of the coil channels. $\eta ^{j}$ is the learning rate of the gradient descent and j is the iteration number. Finally, the individual coil images are given from $\textbf{x}_{i} = |\textbf{F}^{-1}\textbf{k}_{i}|$ and the coil combined image is the root-sum of squares of the individual coil images.

Variational network

The above described inverse problem remains inherently ill-posed for high undersampling rates⁶. This happens because the regularization techniques normally employed in CS rely on hand-crafted parameters that may not be suitable to reconstruct the complex details of the image⁵⁵. In addition, a poor choice of these priors might result in excessively smooth images, or under-regularized noisy images. Motivated by these limitations, the VarNet¹³ embedded CS into a deep learning framework, where the gradient of the regularizer of (2) is learned from data, resulting in a physics-based reconstruction network. In particular, in the VarNet, $\varvec{\mathcal {Q}}^{j}$ is a FoE model for the j-th gradient descent iteration¹⁵ (a generalization of total variation) where all its parameters, including $\lambda$ are learned from data. The network is trained using an unrolled gradient descent scheme^26,56,57,58 where the neural network weights in $\varvec{\mathcal {Q}}^{j}$ are updated at each step. In the original VarNet, the coil sensitivities $\textbf{c}$ are computed with the ESPIRiT method⁹ and passed as an additional input to the network along with the undersampled k-space.

The E2E VarNet¹⁶ addressed the limited expressive power of the FoE by substituting the gradients of $\varvec{\mathcal {Q}}^{j}$ with a UNET¹⁷, due to the UNET’s capacity to learn complex representations and their capability to model objects at different scales. In addition, $\textbf{c}$ is also learned in parallel with the gradient of $\varvec{\mathcal {Q}}^{j}$ during the training process of the E2E VarNet. This is done with an additional UNET that takes the low-resolution image generated using the ACS lines of k-space as an input and outputs the sensitivity maps. Each of the $J_\mathrm{ima}$ unrolled gradient descent steps (cascade²⁵) of E2E VarNet is

$$\begin{aligned} \textbf{k}^{j+1} = \textbf{k}^{j} - \eta ^{j} \textbf{m}\odot \left( \textbf{k}^{j} - \tilde{\textbf{k}}\right) + \textbf{F} \mathcal {E}\left\{ \varvec{\mathcal {N}}^{j}\left( \mathcal {R}\left\{ \textbf{F}^{-1} \textbf{k}^{j} \right\} \right) \right\} , \end{aligned}$$

(4)

where $\tilde{k}$ is the undersampled measured k-space signal concatenated as a vector for all coil channels, $\varvec{\mathcal {N}}^{j}$ is a convolutional neural network and $j = 1, \ldots , J_\mathrm{ima}$ with $J_\mathrm{ima}$ being the number of the gradient descent steps. Individual coil images are reconstructed using the inverse Fourier transform from the fully-sampled k-space obtained at the last gradient descend step and combined using the root sum of squares to obtain the final image. The parameters of all $\varvec{\mathcal {N}}^{j}$, the learning rate of the gradient descent $\eta ^{j}$ and the sensitivities $\textbf{c}$ are learned by minimizing a cost-function between the combined image and the ground-truth image $\hat{\textbf{x}}$. The metric used in the cost-function can be any metric of choice, such as the mean and normalized mean squared error (MSE, NMSE), the peak signal-to-noise ratio (PSNR)⁵⁹, or the structural similarity index measure (SSIM)⁶⁰, among others.

VarNet architecture modifications

Feature-space encoding

In the E2E VarNet architecture and other unrolled optimization-based models⁶¹, most of the high-level features are discarded in the last convolutional layers of each cascade to obtain the update of the image or k-space in order to perform data consistency (4). In particular, the number of $\varvec{\mathcal {N}}^{j}$ output channels decreases from a high number (usually set to 32) to 2 (to represent the real and imaginary part of the image-space update). Nevertheless, the remaining features (30) could contain useful information for the reconstruction. Here, we propose a different approach, dubbed Feature VarNet, where we use the unrolled gradient descent algorithm as in the E2E VarNet (4), but we perform the updates of the gradient descent in a feature-space ($\textbf{f}$), instead of the k-space ($\textbf{k}$), or image-space. This approach allows us to maintain a high number of feature channels across the network’s cascades. In particular, we introduce an encoder ($\mathcal {A}$) neural network that maps $\textbf{k}$ to $\textbf{f}$, and a decoder ($\mathcal {B}$) neural network that maps $\textbf{f}$ to $\textbf{k}$. By substituting $f = \mathcal {A}\left( \textbf{k}\right)$ and $k = \mathcal {B}\left( \textbf{f}\right)$, the gradient descent step of (4) can be updated as:

$$\begin{aligned} \textbf{f}^{j+1} = \textbf{f}^{j} - \eta ^{j} \mathcal {A}\left( \mathcal {R}\left\{ \textbf{F}^{-1} \textbf{m} \odot \left( \textbf{F}\mathcal {E}\left\{ \mathcal {B}\left( \textbf{f}^{j}\right) \right\} - \tilde{\textbf{k}} \right) \right\} \right) - \varvec{\mathcal {N}}^{j}\left( \textbf{f}^j\right) , \end{aligned}$$

(5)

where $j = 1, \ldots , J_\mathrm{fea}$ with $J_\mathrm{fea}$ being the number of the gradient descent steps (Fig. 1 top). In this feature-space representation, each $\varvec{\mathcal {N}}^{j}$ network produces directly a feature tensor with a high number of channels (32), $\mathcal {B}$ decodes it to k-space, and data consistency is performed. The updated k-space in encoded to a feature tensor (32 channels) with $\mathcal {A}$ and is passed to $\varvec{\mathcal {N}}^{j+1}$. As a result, the tensor maintains its high number of channels throughout all $\varvec{\mathcal {N}}$, avoiding the information bottleneck that happens when the channels are reduced to 2.

We can either enforce consistency by using the same encoder and decoder to and from feature-space throughout all cascades or use independent encoder and decoder in the network. For the most part of this work we used consistent encoder and decoders, except in (Model Ablations) where we experimented with different encoder and decoders. We used single convolutional layers with kernel size of 5 and padding equal to 2 to represent the encoder and the decoder. Specifically, the encoder mapped the 2 input channels, corresponding to the real and imaginary part of the image, to a predefined number of feature channels $q=32$, and the decoder mapped back from $q=32$ to 2 channels. Both the encoder and decoder were used without an activation function. The parameters of all $\varvec{\mathcal {N}}^{j}$, the encoder, and the decoder were learned from the training data.

Block-wise attention

In the proposed Feature VarNet, all $\varvec{\mathcal {N}}^{j}$ can be UNETs as in the E2E VarNet. In this work we propose to precede each UNET with a self-attention layer⁵¹. First, we added a positional encoding to the input features to provide spatial information to the attention mechanism. Next, we modified the input using dilated convolutions to compute the query, key, and value embeddings in order to calculate the attention weights. The attention weights are then used to attend to the value embeddings and produce the output features. Finally, the output features are projected back onto the same shape as the original input features using a convolution with a $1 \times 1$ kernel, and the two are added together to produce the final output.

The matrix multiplications in the attention mechanism were performed in blocks to reduce computational complexity. In particular, we reshaped the feature tensor into a block-based representation to help the model identify the spatial location of the aliasing artifacts caused by the Cartesian undersampling. For example, consider a feature tensor of dimensions $C \times H \times W$ (channels, height, width), and acceleration rate R. First, we collect the elements of the tensor that are $N = W/R$ elements apart in the width into tall matrices of size $C\times (H \cdot N)$, since the aliasing artifacts appear at regular intervals of N voxels along the phase-encoding direction. In case the width is not divisible with the acceleration, the tensor can be padded before the reshape. After this, the resulting tall matrices are concatenated and form a feature tensor of dimensions $(H \cdot N) \times C \times R$, and a batch matrix multiplication follows⁵¹. This block representation helps the model identify the spatial location of the aliasing artifacts. Note that N and R adapt depending the acceleration factor. The reshaping process is depicted in Fig. 1 (bottom) for a toy example with $C \times H \times W = 1 \times 12 \times 12$ and $R=4$.

Feature-image variational network

Given the superior performance demonstrated by cross-domain convolutional neural networks⁶², such as the KIKI-net⁴⁰, we combined the Feature VarNet (with attention) and the E2E VarNet into a single network. The new network, dubbed FI VarNet, combines feature-space and image-space based reconstructions to improve performance.

Figure 1 (middle) presents the FI VarNet architecture. First, the coil sensitivity maps are estimated as in the E2E and Feature VarNet approaches, and the $J_\mathrm{fea}$ gradient descent steps of equation (5) are performed (feature cascades). The resulting feature-space representation $\textbf{f}^{J_\mathrm{fea}}$ is then decoded into a k-space representation, which is passed as the initial value $\textbf{k}^{1}$ to (4). Equation (4) is solved for $J_\mathrm{ima}$ gradient descent steps (image cascades) and the final image is reconstructed. We dubbed the E2E VarNet’s cascades as image cascades for simplicity, as they refer to operations that bridge both k-space and image-space.

Model training

Datasets

The datasets used in the current study were obtained from the fastMRI public database (fastmri.med.nyu.edu)^18,63. The fastMRI dataset includes both the raw k-space data and ground-truth MRI images presented in this work. We combined the training (4469 volumes) and validation (1378 volumes) brain fastMRI datasets for training. We used the validation brain fastMRI dataset for validation. For the performance assessment (Performance Assessment), all models were tested on the entire brain fastMRI test dataset, which consisted from 558 volumes. 49 of these volumes were scanned with fluid attenuated inversion recovery (FLAIR), 187 were T1 and T1 post contrast, and 322 were T2-weighted. These ratios reflect the contrast distribution of the validation and train datasets. For the comparative study with the public fastMRI leaderboard (Leaderboard Comparison), a subset of the brain fastMRI test dataset (standard leaderboard test dataset) was used (281 volumes to evaluate 4$\times$ acceleration and 277 volumes to evaluate 8$\times$ acceleration). The clinical study (Clinical Evaluation) was implemented using a subset of the leaderboard test dataset as in⁶⁴, consisting of 20 cases (4 FLAIR, 5 T1-weighted, and 11 T2-weighted volumes) with abnormalities (clinical dataset). The abnormalities included postsurgical complications, vascular-related conditions, masses and tumors, and fluid-related conditions. We also used the knee fastMRI dataset to determine the generalizability of our models. Since the ground-truth images for the knee fastMRI testing dataset are not publicly available, we used the knee fastMRI validation dataset (199 volumes) for testing. We used the knee fastMRI training dataset (973 volumes) for training. We skipped the validation process for the knee and applied the same hyperparameters for training that were previously used for the brain (see “Optimization and network configurations”).

Undersampling

We used Cartesian undersampling. We used either $8\%$, $7\%$, or $4\%$ of the central k-space as ACS lines and uniformly sampled the rest of k-space to achieve an acceleration factor (R) of 4, 5, or 8, respectively. All models were trained and tested using the same undersampling mask.

Optimization and network configurations

The optimization model used to train the neural networks in this study was based on a combination of the AdamW optimizer with a learning rate of 0.0003⁶⁵ and a learning rate scheduler, to adjust the learning rate during training. We trained the networks using 210k iteration steps and used a custom step function for the learning rate scheduler. In particular, the step function gradually increased the learning rate from 0 to 0.0003 over a period of 7.5k steps, after which the learning rate remained constant for 140k steps. For the remaining steps, the learning rate switched to a cosine annealing schedule. The cosine annealing schedule was used to gradually decrease the learning rate from its maximum value down to a small value of $10^{-8}$, ensuring good convergence without oscillations.

The UNETs, serving as the backbone for both the E2E VarNet and the proposed VarNet models, shared a similar architecture. Specifically, we employed four layers of average pooling and transpose convolutions, each with a kernel size, stride, and padding of 2, 2, and 0, respectively. The convolution layers used a kernel size of 3 with both padding and stride set to 1. Leaky ReLU activation functions were used with a negative slope of 0.2. The complex-valued k-space data were split into a two-channel real-valued representation. The input tensors to the U-Nets, whether representing k-space data or features, were normalized to ensure that each channel had a mean of 0 and a standard deviation of 1. Finally, all networks were trained with a batch size of 1.

Model size

In this study, we compared the original E2E VarNet against the three proposed variations: Feature VarNet, Feature VarNet with attention, and FI VarNet. Despite differences in their architectures, for a fair comparison, all models were designed to have a similar number of parameters. In particular, we used 12 cascades and 32 feature channels in the UNETs for the E2E and Feature VarNets for all studied cases. In this setting the E2E VarNet required 93.6 million parameters, while Feature VarNet required an additional 0.3 million parameters for its encoder, decoder, and convolutions in the attention layer. Finally, FI VarNet required 93.8 million parameters for a 6 feature-6 image cascade architecture and 187 million parameters for a 12 feature-12 image cascade architecture.

Evaluation strategy

Quantitative evaluation

To assess the performance of all VarNet models, we conducted a quantitative evaluation using three metrics: SSIM, PSNR, and NMSE. All models were trained on 2D input-output pairs representing individual slices of the training set’s volumes. The metrics were measured by comparing the entire 3D reconstructed volume of the sample with the corresponding ground truth and computing the volume average over the entire dataset. During training, 1-SSIM was used as the loss function to optimize the network parameters, ranging between 0 and 1, with lower values indicating better similarity. SSIM was also used as the primary evaluation metric due to its ability to capture both structural and perceptual similarities between the predicted and ground truth volumes⁶⁶. PSNR and NMSE were also computed to provide additional quantitative measures of performance.

Clinical evaluation

Three neuroradiologists with 23 (Reader C), 4 (Reader B), and 2 (Reader A) years of clinical experience assessed the image quality of FI VarNet, which was found to be the best model in the quantitative evaluation among the three proposed approaches (see “Leaderboard comparison”). Each radiologist was assigned the ground-truth fully-sampled image and two undersampled reconstructions: one with the FI VarNet (12 feature-12 image cascades) and one with the pretrained E2E VarNet model¹⁶. All radiologists had knowledge of the ground-truth, but were blinded to the particular reconstruction model and performed their reviews independently. The readers were tasked to label the images as “FI VarNet”, “E2E VarNet”, or “cannot tell”, based on their overall quality. Additionally, three Likert-like grading scales were used to assess artifacts, sharpness, and contrast-to-noise ratio (CNR), in comparison to the ground-truth, similar to what was done in a previous study⁶⁷. For the artifacts scale, a score of 1 indicated no artifacts present, while 2 indicated minimum artifacts that do not affect diagnostic quality. In the sharpness scale, a score of 1 indicated that the sharpness for structures and findings matched the ground-truth, while 2 indicated differences. Lastly, in the CNR scale, a score of 1 indicated equal conspicuity for structures and findings as the ground-truth, while 2 indicated differences.

Results

All models were trained on a high-performance cluster using four NVIDIA A100 Tensor Core GPUs, each equipped with 80 GB of memory.

Performance assessment

Table 1 (top) compares the average SSIM, PSNR, and NMSE between the E2E VarNet, Feature VarNet with and without attention, and FI VarNet. FI VarNet used 6 feature and 6 image cascades to ensure that differences in performance with respect to the Feature VarNet (12 cascades) and the E2E VarNet (12 cascades) were due to their architectural variations rather than their sizes. Feature VarNet outperformed the E2E VarNet in SSIM by 0.0002 and 0.0007 for $4 \times$ and $5 \times$ acceleration, respectively, due to the preservation of high-level features in each cascade. When the block-wise attention was incorporated the SSIM improvement increased to 0.0004 and 0.0009 for four and five-fold acceleration, respectively. The FI VarNet also outperformed the E2E VarNet in SSIM by 0.0009 and 0.0011 for four and five-fold acceleration, respectively, showing the superiority of cross-domain convolutional neural networks. Finally, the results are statistically significant as indicated by a paired t-test⁶⁸ at a $5\%$ significant level.

Table 1 Average SSIM, PSNR, and NMSE comparison on all test data for brain scans using the E2E, Feature, and FI VarNet architectures and four and five fold accelerations.

Full size table

Figure 2 (left) presents the convergence of the validation error for PSNR and SSIM for both $4 \times$ and $5 \times$ acceleration factors. The large gain towards the end of the training is due to the cosine annealing in the optimization process. FI VarNet always maintains smaller errors than all other models in both accelerations for the SSIM. In the case of PSNR, FI is similar to Feature VarNet (with and without attention) during training and marginally outperforms them towards the end of the training, except in the case of PSNR and $5 \times$ acceleration. The Feature-based models outperform E2E VarNet in all cases during training.

Figure 2 (middle four-fold acceleration, bottom five-fold acceleration) compares the SSIM and PSNR score differences of the FI VarNet and Feature VarNet (with attention) with respect to the E2E VarNet for the entire testing dataset. Both the Feature VarNet and the FI VarNet obtained larger PSNR values than the E2E VarNet for all testing cases (except a few outliers). The FI VarNet had higher SSIM values than the E2E VarNet for almost the entire testing dataset, while the Feature VarNet had lower SSIM scores than E2E VarNet for a few cases. The latter can be attributed to their training on the SSIM metric, as both models were explicitly optimized to excel in SSIM performance. The E2E VarNet’s slightly higher SSIM scores than the Feature VarNet’s in a few cases may be due to potential overfitting to specific patterns in the training data, leading to improved SSIM performance on some examples but reduced PSNR performance.

Figure 3 compares the performance of the E2E VarNet, Feature VarNet (with and without attention), and FI VarNet for the three difference image contrasts in the fastMRI test dataset and for both four-fold and five-fold accelerated reconstructions. The large average SSIM and PSNR performance for both the T1-weighted and T2-weighted image datasets was expected as the network was trained mostly on these types of contrasts. The results for the individual contrasts are in a good agreement with the average results from all contrast-weighted images, except for FLAIR images with 4x acceleration, where the E2E VarNet outperformed the Feature VarNet (w and w/o attention) in terms of SSIM. This can explain the higher SSIM scores of E2E VarNet over Feature VarNet that were observed for a few cases in the top-left histogram of Fig. 2.

Figure 4 shows a representative reconstruction, in which the E2E VarNet resulted in an artifact in the zoomed area (yellow arrow), which varied with the two different acceleration factors. In contrast, the Feature (with attention) VarNet and FI VarNet reconstructions did not exhibit this artifact. Additionally, at $4 \times$ acceleration, the E2E VarNet caused blurring of the blood vessel visible on the left side of the panel, while the blood vessel remained visible in both the Feature VarNet and the FI VarNet reconstructions (red arrows). At $5 \times$ acceleration, all models resulted in a similar smoothing on the vessel. These results suggest that both the the Feature VarNet and FI VarNet architectures could be more robust to acceleration artifacts than the the E2E VarNet architecture. Figure 5 presents $4\times$ and $5\times$ accelerated reconstructions for another representative case. The E2E VarNet blurs a blood vessel next to the lesion in the zoomed area (yellow arrow) at $4\times$ acceleration and misses it at $5 \times$. The vessel is visible with the Feature VarNet w/o attention at $4\times$ acceleration, but it is missed at $5\times$. The Feature VarNet w/ attention reconstruction is able to preserve the vessel at $4\times$ acceleration and blurs it at $5\times$.

Model ablations

We trained a Feature VarNet model without attention (12 cascades) and with distinct encoders and decoders at each cascade. We again used single convolutional layers to represent them, but this time their weights were not shared between different cascades. The network was tested on the fastMRI brain test dataset and yielded a slight enhancement in both SSIM and PSNR (0.9591 and 41.41) compared to using consistent encoders and decoders (0.9589 and 41.39). This ablation in the Feature VarNet introduced 12 (as many as the gradient descent iterations) unique feature spaces, which provided increased flexibility during the training.

We also explored the performance of an image-feature (IF) VarNet with 6+6 cascades and four-fold accelerated brain MRI reconstructions. This model achieved SSIM and PSNR scores of 0.9596 and 41.35, respectively, on the fastMRI test dataset. The SSIM was equal to the one obtained with the FI VarNet of the same size, while the PSNR was lower by 0.1.

Leaderboard comparison

We evaluated our FI VarNet model, which yielded the highest performance in this study, against the leading models on the fastMRI public leaderboard^18,67, including the pretrained E2E VarNet model¹⁶. To ensure a fair comparison, we tested our model on the same leaderboard test dataset used by the other models (Datasets). To enhance our model’s performance, we increased the number of cascades from 6 to 12 in both its feature and image sub-network components. This does not compromise the fairness of the comparison, since the memory and operations complexity vary across all networks submitted in the public leaderboard⁶⁷. Table 2 includes the comparison results for four-fold and eight-fold accelerations. For four-fold accelerations our FI VarNet model outperformed the DIRCN model (which is based on densely interconnected networks and the E2E VarNet architecture) by 0.0006 and 0.2 in terms of SSIM and PSNR, respectively. For eight-fold accelerations, FI VarNet was marginally outperformed by DIRCN in SSIM by 0.0002, but its PSNR was larger by 0.02. These results position our model in second place and third place on the leaderboard, just below the AIRS-Net, which is a closed-source model from AIRS Medical (Seoul, South Korea). Two screenshots of the public leaderboard are provided in the supplementary information for reference.

Table 2 Average SSIM and PSNR on the leaderboard test dataset for the top six models in the fastMRI public leaderboard.

Full size table

Clinical evaluation

Table 3 includes the preference for each reader in terms of quality of the reconstructed images. On average, the readers preferred the FI VarNet in $\sim 62 \%$ of the 20 cases. For $7\%$ of the cases (2 cases for Reader A and 2 cases Reader B), the readers rated no major differences in the overall quality. Overall, the differences between the readers was non-significant based on the Wilcoxon signed-rank test⁶⁹. The p-values were 0.1, 0.16, and 0.39, for the findings of reader A vs. C, reader B vs. C, and reader A vs. B, respectively.

Table 3 Comparison of the three readers’ scores in terms of image quality preference (as number of cases), artifacts, sharpness and contrast-to-noise ratio (CNR) for four-fold accelerated reconstructions with the FI VarNet and E2E VarNet models for four-fold accelerated reconstructions.

Full size table

Table 3 also reports the comparison among the three readers in terms of reconstruction artifacts, image sharpness and CNR. Reader A gave similar scores to the FI VarNet and E2E VarNet in terms of reconstruction artifacts and CNR, while scored the FI VarNet higher than the E2E model in terms of image sharpness. Reader B returned the same scores for artifacts for both models, and higher sharpness and CNR scores for the FI VarNet. Reader C returned the same scores for artifacts and sharpness for both networks and a higher CNR score for the FI VarNet.

Figure 6 compares a representative T2-weighted brain image from the clinical dataset at four-fold acceleration, reconstructed using the FI VarNet and the E2E VarNet. Zoomed regions of interest (ROI) are shown to qualitatively compare the performance of the two models in capturing intricate details. In ROI 1, the E2E VarNet reconstruction exhibits a blurred representation of the thalamus and an artifact in the choroid plexus, which is highlighted by two yellow arrows. On the other hand, the FI VarNet model successfully preserves the anatomical features present in the ground truth. In ROI 2, the E2E VarNet reconstruction fails to capture a blood vessel indicated by the yellow arrow. The FI VarNet instead accurately retains the blood vessel.

Performance on the knee

We trained and evaluated our best FI VarNet model (12 feature and 12 image cascades) on the knee fastMRI dataset^18,67, and compared the reconstructions against the E2E VarNet model¹⁶. Table 1 (bottom) shows a comparison of the average SSIM, PSNR, and NMSE between the FI VarNet and E2E VarNet for the knee test dataset. The FI VarNet model outperformed the E2E model by 0.0049, 0.15, and 0.00013 in terms of SSIM, PSNR, and NMSE, respectively. These results show good generalizability of FI VarNet to other body regions. Figure 7 qualitatively shows that the FI VarNet can reduce noise compared to the E2E VarNet for the case of a four-fold under-sampled knee image reconstruction obtained with a fat-saturation sequence.

Discussion

The E2E VarNet has established itself as a formidable tool in image reconstruction, standing out as one of the top open-source models in the fastMRI leaderboard^19,20. Therefore, our aim in this work was not to replace the model but to refine it, introducing subtle architectural modifications that neither escalate its memory demands nor prolong training time and inference time. Our proposed Feature VarNet and FI VarNet architectures outperformed the E2E VarNet architecture in terms of image quality metrics such as SSIM, PSNR, and NMSE for four-, five-, and eight-fold undersampling rates. Unlike other feature-representation based networks^21,70 our Feature VarNet’s feature space, defined by a single convolution layer, facilitates the incorporation of the block-wise attention transformer at a pixel resolution-level feature-space representation of the image. We note that a direct application of attention to raw image pixels would be suboptimal⁵⁰. This single-layer convolutional encoding provides a direct representation of the image in feature space, with aliasing artifacts approximately mirroring their locations in the image domain (Supplementary Fig. S3). As a result, attention allows the network to attend directly to the aliasing artifacts in the phase-encoding direction due to the Cartesian undersampling (a key factor for its improved performance) and performs better denoising of the reconstruction through the network’s training Fig. 2). Finally, the marginal improvement observed in the ablated Feature VarNet, where encoders and decoders with non-shared weights are used, suggests that different architectures (other CNNs or transformers⁷¹) could be employed for the encoders and decoders to potentially enhance performance further.

Similar to KIKI-Net-based models⁴⁰, the cross-domain architecture of the FI VarNet brings together the advantages of image (a purely comprehensive CNN model) and feature space (a CNN model augmented with attention mechanisms) networks, improving the overall reconstruction performance. FI VarNet outperformed other cross-domain learning-based networks that do not rely on unrolled optimization schemes. For example, CDF-Net⁷² had reported 0.9003 SSIM and 36.77 PSNR scores for 4$\times$ accelerated knee image reconstructions, whereas FI VarNet achieved 0.9236 SSIM and 40.08 PSNR when trained on the same dataset (We note that FI VarNet was tested on the entire fastMRI knee validation dataset, while CDF-Net was tested on only half of such dataset). The histogram in Fig. 2 (middle, right) and the average values per contrast in Fig. 3 shows high consistency of these results across SSIM and PSNR. Although the average performance improvements are not large, the representative example in Fig. 4 and 5 demonstrates that both the Feature (with attention) and FI VarNet architectures are more robust to acceleration artifacts than the E2E VarNet architecture.

Our FI VarNet model reached the second place (4$\times$) and third place (8$\times$) on the fastMRI public leaderboard, behind the closed-source AIRS-Net, which performs additional data standardization methods and a multi-slice training process⁷³. Such data standardization methods appear to be key to achieve clinically good reconstructions for the entirety of the fastMRI dataset at higher acceleration factors and are left for future work⁶⁴. Our comparison with the leaderboard models demonstrates the effectiveness of FI VarNet in MRI image reconstruction tasks and its potential for further development. While the increase in SSIM and PSNR achieved by our models was quantitatively small (Table 2), it nevertheless resulted in improved image quality and clinical scores (Table 3). This is due to the fact that these metrics can correlate poorly with the radiologist’s evaluations. In fact, the appearance of subtle pathologies could be substantially altered in the MR images without a major change in SSIM, therefore small changes in SSIM could be significant for pathology detection if associated with localized improvements in image quality^19,67,74,75.

Our best model, the FI VarNet with $12+12$ cascades, outperformed the pretrained E2E VarNet¹⁶ according to three expert neuroradiologists. The results of the clinical evaluation were not statistically significant, which was expected due to the small number of cases (20). Minor differences in scoring were anticipated due to the binary structure of the employed Likert scales, the different years of experience among the readers, and the fact that the reconstruction quality of both networks was clinically acceptable for a four-fold acceleration factor. The FI VarNet excelled in preserving anatomical details, including small blood vessels, whereas the E2E VarNet discarded or blurred them in a few cases (Fig. 6). These findings suggest that the FI VarNet could enable reconstructions with diagnostic quality at five-fold or six-fold accelerations in cases where the E2E VarNet falls short⁶⁴.

Our FI VarNet model also outperformed the E2E VarNet model in four-fold accelerated knee reconstructions. The achieved 0.0049 SSIM improvement shows that FI VarNet can effectively handle different anatomies using a small training dataset, which underscore its potential to learn accelerated image reconstruction using datasets with limited number of cases⁷⁶.

The Feature and FI VarNet architectures are capable of accommodating Cartesian sampling patterns. In random or learned⁷⁷ undersampling patterns, special care must be taken into account for the attention layers, as we designed them to identify the location of the aliasing artifacts due only to Cartesian undersampling. For non-Cartesian sampling, our method, as most of the unrolled optimization networks, would be slow, since the FFT through the network must be replaced with the slower non-uniform FFT⁷⁸.

We also explored several alternative approaches that yielded either comparable or suboptimal results compared to the final models reported in this manuscript. For example, in the Feature VarNet, we attempted to improve the attention mechanism by incorporating two or three sequential attention layers. However, this was challenging during training due to the instability of the attention gradients. For this reason, future work will focus on alternative attention frameworks⁷⁹ to improve stability during training. For the FI VarNet, we experimented with an image-feature representation with six cascades for each space, a feature-image-feature-image representation with four cascades for each space, and a feature-image-k-space representation with six cascades for each space. However, we observed that these representations led to degraded or similar reconstructions compared to the feature-image model with six or twelve cascades per space. Although these alternative approaches did not yield the desired improvements, they provided valuable insights into the behavior of the VarNet models and highlight the potential for further optimization, for example, by changing the network layers used for feature encoding and decoding to more complex architectures.

Conclusion

We introduced three architectural modifications to the E2E VarNet model, namely feature-space training, block-wise attention layers based on the spatial position of the aliasing artifacts, and cross-domain learning between a CNN and a CNN augmented with attention. We have demonstrated the advantages of integrating these changes into the E2E VarNet model, showing improved reconstruction performance both quantitatively and qualitatively. The proposed approaches could enable clinically acceptable reconstructions at higher acceleration factors than currently possible.

Data availibility

The datasets used in the current study were obtained from the fastMRI public database fastmri.med.nyu.edu. The reconstructed MR images with the proposed neural networks used in the current study are available from https://rb.gy/vlfa4b. The PyTorch code for our models is available at https://github.com/facebookresearch/fastMRI.

References

Sodickson, D. K. & Manning, W. J. Simultaneous acquisition of spatial harmonics (SMASH): Fast imaging with radiofrequency coil arrays. Magn. Reson. Med. 38, 591–603 (1997).
Article CAS PubMed Google Scholar
Pruessmann, K. P., Weiger, M., Scheidegger, M. B. & Boesiger, P. SENSE: Sensitivity encoding for fast MRI. Magn. Reson. Med. 42, 952–962 (1999).
Article CAS PubMed Google Scholar
Griswold, M. A. et al. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 47, 1202–1210 (2002).
Article PubMed Google Scholar
Lustig, M., Donoho, D. L., Santos, J. M. & Pauly, J. M. Compressed sensing MRI. IEEE Signal Process. Mag. 25, 72–82 (2008).
Article ADS Google Scholar
Raj, A. et al. Bayesian parallel imaging with edge-preserving priors. Magn. Reson. Med. 57, 8–21 (2007).
Article PubMed PubMed Central Google Scholar
Uecker, M., Hohage, T., Block, K. T. & Frahm, J. Image reconstruction by regularized nonlinear inversion-joint estimation of coil sensitivities and image content. Magn. Reson. Med. 60, 674–682 (2008).
Article PubMed Google Scholar
Knoll, F., Bredies, K., Pock, T. & Stollberger, R. Second order total generalized variation (TGV) for MRI. Magn. Reson. Med. 65, 480–491 (2011).
Article PubMed Google Scholar
Knoll, F., Clason, C., Bredies, K., Uecker, M. & Stollberger, R. Parallel imaging with nonlinear reconstruction using variational penalties. Magn. Reson. Med. 67, 34–41 (2012).
Article PubMed Google Scholar
Uecker, M. et al. ESPIRiT-an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn. Reson. Med. 71, 990–1001 (2014).
Article PubMed PubMed Central Google Scholar
Muckley, M. J., Noll, D. C. & Fessler, J. A. Fast parallel MR image reconstruction via B1-based, adaptive restart, iterative soft thresholding algorithms (BARISTA). IEEE Trans. Med. Imaging 34, 578–588 (2014).
Article PubMed PubMed Central Google Scholar
Shin, P. J. et al. Calibrationless parallel imaging reconstruction based on structured low-rank matrix completion. Magn. Reson. Med. 72, 959–970 (2014).
Article PubMed Google Scholar
Holme, H. C. M. et al. ENLIVE: An efficient nonlinear method for calibrationless and robust parallel imaging. Sci. Rep. 9, 1–13 (2019).
Article CAS Google Scholar
Hammernik, K. et al. Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 79, 3055–3071 (2018).
Article PubMed Google Scholar
Knoll, F. et al. Deep-learning methods for parallel magnetic resonance imaging reconstruction: A survey of the current approaches, trends, and issues. IEEE Signal Process. Mag. 37, 128–140 (2020).
Article PubMed PubMed Central Google Scholar
Roth, S. & Black, M. J. Fields of experts. Int. J. Comput. Vision 82, 205–229 (2009).
Article Google Scholar
Sriram, A. et al. End-to-end variational networks for accelerated MRI reconstruction. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23, 64–73 (Springer, 2020).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 234–241 (Springer, 2015).
Zbontar, J. et al. fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv:1811.08839 (2018).
Knoll, F. et al. Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge. Magn. Reson. Med. 84, 3054–3070 (2020).
Article PubMed PubMed Central Google Scholar
Muckley, M. J. et al. State-of-the-art Machine Learning MRI Reconstruction in 2020: Results of the Second fastMRI Challenge, vol. 2 7. arXiv:2012.06318 (2020).
Zhu, B., Liu, J. Z., Cauley, S. F., Rosen, B. R. & Rosen, M. S. Image reconstruction by domain-transform manifold learning. Nature 555, 487–492 (2018).
Article ADS CAS PubMed Google Scholar
Sandino, C. M., Lai, P., Vasanawala, S. S. & Cheng, J. Y. Accelerating cardiac cine MRI using a deep learning-based ESPIRiT reconstruction. Magn. Reson. Med. 85, 152–167 (2021).
Article PubMed Google Scholar
Hammernik, K. et al. Physics-driven deep learning for computational magnetic resonance imaging: Combining physics and machine learning for improved medical imaging. IEEE Signal Process. Mag. 40, 98–114 (2023).
Article PubMed PubMed Central Google Scholar
Ottesen, J. A., Caan, M. W., Groote, I. R. & Bjørnerud, A. A densely interconnected network for deep learning accelerated MRI. Magn. Reson. Mater. Phys. Biol. Med. 2022, 1–13 (2022).
Google Scholar
Schlemper, J., Caballero, J., Hajnal, J. V., Price, A. & Rueckert, D. A deep cascade of convolutional neural networks for MR image reconstruction. In Information Processing in Medical Imaging: 25th International Conference, IPMI 2017, Boone, NC, USA, June 25-30, 2017, Proceedings 25 647–658 (Springer, 2017).
Hosseini, S. A. H., Yaman, B., Moeller, S., Hong, M. & Akçakaya, M. Dense recurrent neural networks for accelerated MRI: History-cognizant unrolling of optimization algorithms. IEEE J. Sel. Top. Signal Process. 14, 1280–1291 (2020).
Article ADS PubMed PubMed Central Google Scholar
Yiasemis, G., Sonke, J.-J., Sánchez, C. & Teuwen, J. Recurrent variational network: A deep learning inverse problem Solver applied to the task of accelerated MRI reconstruction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 732–741 (2022).
Arvinte, M., Vishwanath, S., Tewfik, A. H. & Tamir, J. I. Deep J-Sense: Accelerated MRI reconstruction via unrolled alternating optimization. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24 350–360 (Springer, 2021).
Jun, Y., Shin, H., Eo, T. & Hwang, D. Joint deep model-based MR image and coil sensitivity reconstruction network (joint-ICNet) for fast MRI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5270–5279 (2021).
Cheng, J. et al. Learning data consistency and its application to dynamic MR imaging. IEEE Trans. Med. Imaging 40, 3140–3153 (2021).
Article PubMed Google Scholar
Qin, C. et al. Complementary time-frequency domain networks for dynamic parallel MR image reconstruction. In Magnetic Resonance in Medicine 3274–3291 (2021).
Küstner, T. et al. CINENet: Deep learning-based 3D cardiac CINE MRI reconstruction with multi-coil complex-valued 4D spatio-temporal convolutions. In Scientific reports 13710 (2020).
Huang, W. et al. Deep low-rank plus sparse network for dynamic MR imaging. Med. Image Anal. 2021, 102190 (2021).
Article Google Scholar
Yaman, B. et al. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magn. Reson. Med. 84, 3172–3191 (2020).
Article PubMed PubMed Central Google Scholar
Yoo, J. et al. Time-dependent deep image prior for dynamic MRI. IEEE Trans. Med. Imaging 40, 3337–3348 (2021).
Article PubMed Google Scholar
Hu, C. et al. Self-supervised learning for mri reconstruction with a parallel network training framework. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VI 24 382–391 (Springer, 2021).
Yaman, B. et al. Multi-mask self-supervised learning for physics-guided neural networks in highly accelerated magnetic resonance imaging. NMR Biomed. 35, e4798 (2022).
Article PubMed PubMed Central Google Scholar
Aali, A., Arvinte, M., Kumar, S. & Tamir, J. I. Solving Inverse Problems with Score-Based Generative Priors learned from Noisy Data. arXiv:2305.01166 (2023).
Cui, Z.-X. et al. Self-score: Self-supervised learning on score-based models for mri reconstruction. arXiv:2209.00835 (2022).
Eo, T. et al. KIKI-net: Cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magn. Reson. Med. 80, 2188–2201 (2018).
Article CAS PubMed Google Scholar
Peng, Z. A Deep residual sparse and cross domain reconstruction network for dynamic MR imaging. In Proceedings of the 2020 9th International Conference on Computing and Pattern Recognition 350–355 (2020).
Wang, B. DCT-net: Dual-domain cross-fusion transformer network for MRI reconstruction. Magn. Resonan. Imaging 2024, 145 (2024).
Google Scholar
Liu, X. Dual-domain reconstruction network with V-Net and K-Net for fast MRI. Magn. Resonan. Med. 2022, 2694–2708 (2022).
Article Google Scholar
Liu, Y. DIIK-Net: A full-resolution cross-domain deep interaction convolutional neural network for MR image reconstruction. Neurocomputing 2023, 213–222 (2023).
Article Google Scholar
Ramzi, Z. XPDNet for MRI reconstruction: An application to the 2020 fastMRI challenge. arXiv:2010.07290 (2020).
Wang, Z. IKWI-net: A cross-domain convolutional neural network for undersampled magnetic resonance image reconstruction. Magn. Resonan. Imaging 2020, 1–10 (2020).
Article Google Scholar
Shimron, E. Implicit data crimes: Machine learning bias arising from misuse of public data. Proc. Natl. Acad. Sci. 2022, e2117203119 (2022).
Article MathSciNet Google Scholar
Guerquin-Kern, M. Realistic analytical phantoms for parallel magnetic resonance imaging. IEEE Trans. Med. Imaging 2011, 626–636 (2011).
Google Scholar
Darestani, M. et al. IR-FRestormer: Iterative refinement with fourier-based restormer for accelerated MRI reconstruction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929 (2020).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30 (2017).
Google Scholar
Giannakopoulos, I. I., Johnson, P., Lattanzi, R. & Muckley, M. J. Improving variational network based 2D MRI reconstruction via feature-space data consistency. Proc. ISMRM 2023, 3321 (2023).
Google Scholar
Giannakopoulos, I. I., Johnson, P., Lattanzi, R. & Muckley, M. J. Improving variational network based 2D MRI reconstruction via feature-space data consistency. In ISMRM Data Sampling & Image Reconstruction Workshop 35 (2023).
Candès, E. J., Romberg, J. & Tao, T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006).
Article MathSciNet Google Scholar
Hollingsworth, K. G. Reducing acquisition time in clinical MRI by data undersampling and compressed sensing reconstruction. Phys. Med. Biol. 60, R297 (2015).
Article ADS PubMed Google Scholar
Lefkimmiatis, S. Non-local color image denoising with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3587–3596 (2017).
Aggarwal, H. K., Mani, M. P. & Jacob, M. MoDL: Model-based deep learning architecture for inverse problems. IEEE Trans. Med. Imaging 38, 394–405 (2018).
Article PubMed PubMed Central Google Scholar
Gilton, D., Ongie, G. & Willett, R. Deep equilibrium architectures for inverse problems in imaging. IEEE Trans. Comput. Imaging 7, 1123–1133 (2021).
Article MathSciNet Google Scholar
Korhonen, J. & You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In 2012 Fourth International Workshop on Quality of Multimedia Experience 37–38 (IEEE, 2012).
Wang, Z., Simoncelli, E. P. & Bovik, A. C. Multiscale structural similarity for image quality assessment. The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2, 1398–1402 (IEEE, 2003).
Wang, S., Xiao, T., Liu, Q. & Zheng, H. Deep learning for fast MR imaging: A review for learning reconstruction from incomplete k-space data. Biomed. Signal Process. Control 68, 102579 (2021).
Article Google Scholar
Ramzi, Z., Ciuciu, P. & Starck, J.-L. Benchmarking MRI reconstruction neural networks on large public datasets. Appl. Sci. 10, 1816 (2020).
Article CAS Google Scholar
Knoll, F. et al. fastMRI: A publicly available raw k-space and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning. Radiol. Artif. Intell. 2, e190007 (2020).
Article PubMed PubMed Central Google Scholar
Radmanesh, A. et al. Exploring the acceleration limits of deep learning variational network-based two-dimensional brain MRI. Radiol. Artif. Intell. 4, e210313 (2022).
Article PubMed PubMed Central Google Scholar
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. arXiv:1711.05101 (2017).
Hore, A. & Ziou, D. Image quality metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition 2366–2369 (IEEE, 2010).
Muckley, M. J. et al. Results of the 2020 fastMRI challenge for machine learning MR image reconstruction. IEEE Trans. Med. Imaging 40, 2306–2317 (2021).
Article PubMed PubMed Central Google Scholar
Hsu, H. & Lachenbruch, P. A. Paired t test. In Wiley StatsRef: statistics reference online (2014).
Woolson, R. F. Wilcoxon signed-rank test. Wiley Encycl. Clin. Trials 2007, 1–3 (2007).
Google Scholar
Jiang, J. Latent-space Unfolding for MRI Reconstruction. In Proceedings of the 31st ACM International Conference on Multimedia 1294–1302 (2023).
Zhai, X. et al. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 3274–3291 (2022).
Nitski, O. Cdf-net: Cross-domain fusion network for accelerated mri reconstruction. In International Conference on Medical Image Computing and Computer-Assisted Intervention 421–430 (2020).
Kim, S. Feature-level multi-domain learning with a standardization for multichannel MRI data, In Medical Imaging Meets NeurIPS (2020).
Mason, A. et al. Comparison of objective image quality metrics to expert radiologists’ scoring of diagnostic quality of MR images. IEEE Trans. Med. Imaging 39, 1064–1072 (2019).
Article PubMed Google Scholar
Calivá, F., Cheng, K., Shah, R. & Pedoia, V. Adversarial robust training of deep learning MRI reconstruction models. arXiv:2011.00070 (2020).
Tibrewala, R. et al. FastMRI prostate: A publicly available, biparametric MRI dataset to advance machine learning for prostate cancer imaging. arXiv:2304.09254 (2023).
Zibetti, M. V. W., Knoll, F. & Regatte, R. R. Alternating learning approach for variational networks and undersampling pattern in parallel MRI applications. IEEE Trans. Comput. Imaging 8, 449–461 (2022).
Article MathSciNet PubMed PubMed Central Google Scholar
Greengard, L. & Lee, J.-Y. Accelerating the nonuniform fast Fourier transform. SIAM Rev. 46, 443–454 (2004).
Article ADS MathSciNet Google Scholar
Wang, Q. et al. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 11534–11542 (2020).

Download references

Acknowledgements

The authors are thankful to Ali Siavosh-Haghighi for valuable assistance with the high-performance computing framework. I.I. and R.L. contributions to this work were supported in part by NIH R01 EB024536, NIH R01 AR070297, and NIH P41 EB017183.

Author information

Authors and Affiliations

Department of Radiology, The Bernard and Irene Schwartz Center for Biomedical Imaging, New York University Grossman School of Medicine, New York, NY, 10016, USA
Ilias I. Giannakopoulos, Jesi Kim, Matthew Breen, Patricia M. Johnson, Yvonne W. Lui & Riccardo Lattanzi
Meta AI Research, New York, NY, 10003, USA
Matthew J. Muckley
Department of Radiology, Center for Advanced Imaging Innovation and Research (CAI2R), New York University Grossman School of Medicine, New York, NY, 10016, USA
Patricia M. Johnson, Yvonne W. Lui & Riccardo Lattanzi
Vilcek Institute of Graduate Biomedical Sciences, New York University Grossman School of Medicine, New York, NY, 10016, USA
Patricia M. Johnson, Yvonne W. Lui & Riccardo Lattanzi

Authors

Ilias I. Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Muckley
View author publications
You can also search for this author in PubMed Google Scholar
Jesi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Breen
View author publications
You can also search for this author in PubMed Google Scholar
Patricia M. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Yvonne W. Lui
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Lattanzi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Formal analysis and network development: I.I., M.M.; Analysis review: P.J.; Clinical validation study: J.K., M.B., Y.L.; Supervision: R.L.; Writing original draft: I.I.; Review and editing: I.I., M.M., P.J., R.L. All Authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Ilias I. Giannakopoulos.

Ethics declarations

Competing interests

M.M. is an employee of Meta AI. The rest of the authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Figures.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Giannakopoulos, I.I., Muckley, M.J., Kim, J. et al. Accelerated MRI reconstructions via variational network and feature domain learning. Sci Rep 14, 10991 (2024). https://doi.org/10.1038/s41598-024-59705-0

Download citation

Received: 20 November 2023
Accepted: 15 April 2024
Published: 14 May 2024
DOI: https://doi.org/10.1038/s41598-024-59705-0

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Segment anything in medical images

Towards a general-purpose foundation model for computational pathology

Deep learning with diffusion MRI as in vivo microscope reveals sex-related differences in human white matter microstructure

Introduction

Methods

MR image reconstruction

Variational network

VarNet architecture modifications

Feature-space encoding

Block-wise attention

Feature-image variational network

Model training

Datasets

Undersampling

Optimization and network configurations

Model size

Evaluation strategy

Quantitative evaluation

Clinical evaluation

Results

Performance assessment

Model ablations

Leaderboard comparison

Clinical evaluation

Performance on the knee

Discussion

Conclusion

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Figures.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links