PhysVENeT: a physiologically-informed deep learning-based framework for the synthesis of 3D hyperpolarized gas MRI ventilation

Functional lung imaging modalities such as hyperpolarized gas MRI ventilation enable visualization and quantification of regional lung ventilation; however, these techniques require specialized equipment and exogenous contrast, limiting clinical adoption. Physiologically-informed techniques to map proton (1H)-MRI ventilation have been proposed. These approaches have demonstrated moderate correlation with hyperpolarized gas MRI. Recently, deep learning (DL) has been used for image synthesis applications, including functional lung image synthesis. Here, we propose a 3D multi-channel convolutional neural network that employs physiologically-informed ventilation mapping and multi-inflation structural 1H-MRI to synthesize 3D ventilation surrogates (PhysVENeT). The dataset comprised paired inspiratory and expiratory 1H-MRI scans and corresponding hyperpolarized gas MRI scans from 170 participants with various pulmonary pathologies. We performed fivefold cross-validation on 150 of these participants and used 20 participants with a previously unseen pathology (post COVID-19) for external validation. Synthetic ventilation surrogates were evaluated using voxel-wise correlation and structural similarity metrics; the proposed PhysVENeT framework significantly outperformed conventional 1H-MRI ventilation mapping and other DL approaches which did not utilize structural imaging and ventilation mapping. PhysVENeT can accurately reflect ventilation defects and exhibits minimal overfitting on external validation data compared to DL approaches that do not integrate physiologically-informed mapping.

Functional lung imaging modalities such as hyperpolarized gas MRI ventilation enable visualization and quantification of regional lung ventilation; however, these techniques require specialized equipment and exogenous contrast, limiting clinical adoption. Physiologically-informed techniques to map proton ( 1 H)-MRI ventilation have been proposed. These approaches have demonstrated moderate correlation with hyperpolarized gas MRI. Recently, deep learning (DL) has been used for image synthesis applications, including functional lung image synthesis. Here, we propose a 3D multichannel convolutional neural network that employs physiologically-informed ventilation mapping and multi-inflation structural 1 H-MRI to synthesize 3D ventilation surrogates (PhysVENeT). The dataset comprised paired inspiratory and expiratory 1 H-MRI scans and corresponding hyperpolarized gas MRI scans from 170 participants with various pulmonary pathologies. We performed fivefold crossvalidation on 150 of these participants and used 20 participants with a previously unseen pathology (post COVID-19) for external validation. Synthetic ventilation surrogates were evaluated using voxelwise correlation and structural similarity metrics; the proposed PhysVENeT framework significantly outperformed conventional 1 H-MRI ventilation mapping and other DL approaches which did not utilize structural imaging and ventilation mapping. PhysVENeT can accurately reflect ventilation defects and exhibits minimal overfitting on external validation data compared to DL approaches that do not integrate physiologically-informed mapping.
Pulmonary imaging constitutes a primary component of the clinical workflow of patients with respiratory diseases; various modalities can provide anatomical or functional information that aids in their diagnosis, monitoring, and treatment. Thoracic computed tomography (CT) and proton MRI ( 1 H-MRI) are used to ascertain anatomical lung information. However, the relationship between parenchymal destruction and regional function is only somewhat understood. Therefore, functional lung imaging modalities such as single-photon emission CT (SPECT) 1,2 , positron emission tomography (PET) 3,4 and hyperpolarized gas MRI 5,6 can be used to glean functional insights. These techniques have shown efficacy in several lung disease applications, including diagnosis, treatment planning and treatment response mapping [7][8][9] . Hyperpolarized gas MRI is a specialized functional lung imaging modality which has excellent sensitivity to abnormal lung function and allows for the visualization of regional ventilation 10,11 . Hyperpolarized gas MRI can be acquired using either Helium-3 ( 3 He) or Xenon-129 ( 129 Xe); recently, 129 Xe has been preferred due to the increased cost and paucity of 3

Materials and methods
Dataset. The dataset comprised 3D isotropic 1 H-MRI scans acquired at approximately total lung capacity (TLC) and residual volume (RV), and hyperpolarized 129 Xe-MRI ventilation scans acquired at functional residual capacity (FRC) + bag (for any given participant, the bag volume was titrated based on standing height with a range of 400 mL-1 L) from 170 healthy participants or patients with various pulmonary pathologies. A summary of participant demographics, stratified by pathology, is provided in Table 1. Imaging data was collected retrospectively from several prospective clinical studies and patients referred for clinical imaging. Data use was approved by the Institutional Review Boards at the University of Sheffield and the National Research Ethics Committee. All data was anonymized and all investigations were conducted in accordance with the relevant guidelines and regulations with participants (or their guardians) providing informed written consent. Appropriate consent and permissions were granted by the Sponsors to utilize this data for retrospective purposes.
Image acquisition. All  Image registration. RV and TLC 1 H-MRI scans were aligned using deformable image registration and subsequently registered to the spatial domain and resolution of 129 Xe-MRI via a corresponding anatomical 1 H-MRI scan acquired at a similar inflation as 129 Xe-MRI 12,39 . The registration pipeline consisted of rigid, affine and diffeomorphic steps using the advanced normalization tools (ANTs) registration framework 40 based on parameters optimized in previous work 41 . The registration pipeline is further described in Fig. 1. H-MRI SV maps assume that differences in signal intensities of co-registered voxels in multi-inflation 1 H-MRI reflect naturally occurring density variations in the lungs during breathing 20,23 . SV is a unitless quantity that aims to model the proportion of inhaled air entering the lungs during normal breathing 24 and is approximated as follows: where SI RV  We assessed the effect of providing a physiologically-based 1 H-MRI SV map, alongside structural TLC and RV 1 H-MRI scans, as inputs to a CNN (approach 1). This approach, that we call "PhysVENeT", is compared to a network which is not physiologically-informed (approach 2) and a network which does not integrate structural multi-inflation 1 H-MRI (approach 3).
For each configuration, input scans with varying dimensions were read by the network using patch-based sampling with patches of 192 × 192 × 48 voxels 42 . The VNet CNN allows for non-isotropic patch sizes in-line with the anisotropic nature of 129 Xe-MRI. We modified the VNet CNN architecture 43 to learn functional representations from 3D input scans by outputting a 3D continuous representation of regional ventilation. The CNN contained 16, 32, 64, 128 and 256 feature channels where convolution operations are employed at each layer to both learn residual features and to reduce the resolution of the feature stack, analogous to commonly employed pooling operations. The input layer employs a convolution operation with a 5 × 5 × 5 kernel and stride of 1; two identical convolutions are employed at the second layer and three at the subsequent layers. After each 5 × 5 × 5 convolution, a subsequent 2 × 2 × 2 kernel with stride of 2 was utilized, generating non-overlapping patches; hence, the resolution of the image is divided by two. This is repeated at each layer, resulting in a minimum resolution of 12 × 12 × 3 in the final convolution step. The structure of the network is replicated in deconvolution steps bar the output layer. Each convolution operation employed a PReLU non-linear activation function with valid padding. As indicated by Milletari et al. 43 , the CNN learns residual fine-grained features at each step which informs corresponding deconvolution operations in the upsampling side of the network 43 . The VNet CNN architecture is modified to contain a regression output layer, allowing the network to generate continuous intensity maps in three dimensions. Furthermore, we employ a Huber loss function where the Huber loss (H Loss ) is defined as: where a represents the difference between given co-registered voxels in the ground truth and predicted outputs and δ is defined as 0.1. The Huber loss function is expressed as a representation of either the mean square error (MSE) or the absolute value function at δ. The Huber loss has the benefit of combining the minimum-variance estimator of the MSE loss and the median-unbiased estimator of the absolute value loss to produce a loss function that alternatively provides the sensitivity and robustness of the MSE and absolute loss, respectively. This loss was utilized for synthetic ventilation generation to minimize the impact of outliers in the first stages of training and improve sensitivity once the loss has significantly reduced. For DL approaches 1 and 2, which utilize multiple input images, weight sharing was not employed, resulting in input dimensions of 192 × 192 × 48 × 3 or 192 × 192 × 48 × 2 for the PhysVENeT and other DL configurations, respectively, similar to Kläser et al. 44 and Jahangir et al. 41 . This method combines the feature maps from spatially aligned TLC and RV 1 H-MRI alongside the 1 H-MRI SV map. Therefore, the network can leverage concurrent information distributed across multiple input feature maps 45 . The PhysVENeT architecture (approach 1) is detailed in Fig. 2. CNN training parameters. All warped and masked TLC and RV 1 H-MRI scans and 129 Xe-MRI ventilation scans underwent pre-processing before they were fed into the network; scans were normalized with image intensities between [0, 1]. Training data was augmented to reduce overfitting whilst still maintaining physiological plausibility. We used an augmentation method where the number of scans in the training set remained consistent; however, each set of input images is deformed using a random rotation and scaling factor between [− 10°, 10°] and [− 10%, 10%], respectively. Different rotation and scaling factors are randomly selected within these limits when the feature map is provided to the network. Thus, the network can be trained for an increased number of epochs as it is highly unlikely to be exposed to the exact same deformations in each epoch. Consequently, we train our network for 900 epochs. Batch normalization was applied at each layer using a mini-batch Data split. The dataset contained scans from 170 participants. 150 participants were used for fivefold crossvalidation, resulting in randomly selected training and testing sets of 120 and 30 participants, respectively, for each fold. The remaining 20 participants were used for external validation; these scans were from participants who had previously been hospitalized for COVID-19 approximately three to six months before imaging, a pathology not contained within the cross-validation dataset. A visual display of the data split, including the cross-validation and external validation procedure is contained in Supplementary Fig. S1.
Quantitative evaluation. Surrogates of ventilation were quantitatively evaluated using two common voxel-wise image synthesis metrics, namely, the voxel-wise Spearman's correlation (rs) and SSIM. The Spearman's rs was the primary evaluation metric in the CT ventilation imaging grand challenge, VAMPIRE 16 . In a recent review of DL in pulmonary imaging, SSIM was used for evaluation in several image synthesis investigations 26 . Further details of Spearman's rs and SSIM calculations are given in the following sections.
Spearman's correlation. Spearman's correlation between synthetic ventilation surrogates and corresponding 129 Xe-MRI scans was assessed at full resolution using Spearman's rs. The correlation was calculated on all voxels within the lung cavity region as defined by the lung volume in a 1 H-MRI scan acquired at the same inflation as 129 Xe-MRI. The voxel-wise Spearman's rs quantifies the degree of monotonicity between any two ventilation images within a range of [− 1, 1].
SSIM. SSIM is an image quality measure that encompasses similarity information in three domains, namely, the luminance, contrast and structure of the image. SSIM is calculated between non-zero voxels in the reference 129 Xe-MRI scan (Xe) and the synthetic ventilation surrogate (SVS) within the lung cavity region, as defined by the lung volume in a 1 H-MRI scan acquired at the same inflation as 129 Xe-MRI, as follows: where μ SVS and μ SVS are the average intensities of Xe and SVS, respectively, and σ Xe and σ SVS are the variances of Xe and SVS, respectively. σ Xe,SVS is the covariance of Xe and SVS. c 1 and c 2 are defined as follows: Figure 2. PhysVENeT architecture and training strategy. Statistical analysis. We initially determined whether the data was normally distributed via Shapiro-Wilk tests; if normality was not satisfied, non-parametric tests were conducted. Friedman tests with Bonferroni correction for post-hoc multiple comparisons were used to assess significances of differences between DL approaches. For each metric, paired t-tests were used to assess significances of differences between the DL approaches and the 1 H-MRI SV map. Wilcoxon tests were used to assess differences between folds on external validation data and differences in performance between the 1 H-MRI SV map and each fold on the external validation cohort. Statistical analyses were performed using GraphPad Prism 9 (GraphPad, San Diego, CA, USA). In this work, a p-value of < 0.05 was considered statistically significant.

Results
Qualitative evaluation.  Table 2. The distribution of Spearman's rs and SSIM for each method across all images within the cross-validation dataset is displayed in Fig. 4; significant p-values are provided. The PhysVENeT significantly outperformed all other DL approaches and 1 H-MRI SV mapping in terms of Spearman's rs. In addition, both the PhysVENeT and DL (TLC + RV) approaches significantly outperformed the DL (SV map) and 1 H-MRI SV map using the SSIM metric. No significant difference was observed between the PhysVENeT and DL (TLC + RV) networks using the SSIM metric (p = 0.14). For four participants, the PhysVENeT produced Spearman's rs below 0.2. Figure 5 displays Spearman's rs and SSIM values stratified by disease to assess differences in performance across pathologies for the PhysVENeT. It indicates that the framework generated more accurate synthetic ventilation scans for healthy participants and participants with asthma whilst synthetic ventilation scans were least correlated with 129 Xe-MRI in lung cancer participants for both metrics used.
(4) c 1 = (k 1 L) 2 , c 2 = (k 2 L) 2 , The proposed PhysVENeT showed minimal reduction in performance on external validation data, whereas DL approaches that were not physiologically-informed, or did not integrate structural imaging directly, showed larger reductions in both Spearman's rs and SSIM; results for DL approaches are given in Table 3.
Significant differences in performance of the PhysVENeT between networks trained on each cross-validation fold and tested on external validation data were observed; however, the ranges of average Spearman's rs and SSIM values across all folds were narrower than those of other approaches, with a Spearman's rs range of 0.60-0.63 and SSIM range of 0.57-0.60 indicated in Table 3. Significant p-values between the five trained models generated by each fold in the cross-validation process are shown in Fig. 6.

Discussion
In this work, we propose a framework for the generation of synthetic ventilation surrogates from multi-inflation structural 1 H-MRI and a physiologically-based SV map. The PhysVENeT approach integrates SV mapping and DL to produce physiologically-informed 3D surrogates of lung ventilation. These synthetic ventilation images correlate with 129 Xe-MRI ventilation in a voxel-wise manner and can mimic gross ventilation defects across a range of pathologies. Generating 3D synthetic ventilation surrogates from structural imaging modalities, without the requirement of specialized equipment or exogenous contrast, can reduce barriers in the widespread adoption of cutting-edge functional lung imaging modalities, such as hyperpolarized gas MRI. Synthetic ventilation surrogates generated by the PhysVENeT framework significantly outperformed 1 H-MRI SV maps. This was demonstrated using the voxel-wise Spearman's rs and SSIM metrics calculated across the whole-lung region where the PhysVENeT achieved a Spearman's rs of 0.68 and an SSIM of 0.56 on the crossvalidation dataset. Furthermore, the PhysVENeT significantly outperformed other DL approaches which did not leverage structural 1 H-MRI or physiologically-based 1 H-MRI SV mapping, using Spearman's rs. When inference was conducted on external validation data, the PhysVENeT exhibited increased performance compared to other DL approaches, achieving a Spearman's rs of 0.62 and an SSIM of 0.58. The inclusion of both structural 1 H-MRI and 1 H-MRI-based SV maps provides PhysVENeT with the ability to generalize effectively to participants of a previously unseen disease. The increase in generalizability on external validation data, in conjunction with significant increases in correlations on cross-validation data, indicates the benefit of using a physiologicallyinformed framework.
We used a large dataset that contained 170 participants with numerous pulmonary pathologies and varying degrees of lung function, as measured by the ventilation defect percentage (VDP) ( Table 1). 150 of these participants were used for five-fold cross-validation, leading to five separately trained networks. The remaining 20 participants were used for external validation whereby each of the five separately trained networks were used to generate ventilation surrogates for these 20 participants. The physiologically-informed PhysVENeT framework performed similarly on both the cross-validation and external validation datasets. In addition, the range of SSIM and Spearman's rs metrics on the external validation data is much narrower than the other DL approaches. Therefore, by leveraging structural 1 H-MRI and physiologically-informed mapping, the PhysVENeT framework exhibits minimal overfitting and is largely generalizable to scans outside the cross-validation dataset.
The framework uses a VNet CNN backbone previously developed for 3D segmentation tasks 43 . We adapted the VNet with a Huber loss function to output 3D continuous ventilation distributions with the integration of a multi-channel input configuration. The CNN architecture makes use of additional convolution operations to reduce the dimensionality of the image instead of traditional pooling methods. This limits the footprint of the network, reducing the memory consumption 50 . In turn, this facilitates the use of large anisotropic 3D patch sizes. An additional feature of the network architecture is the ability to use anisotropic input dimensions; 129 Xe-MRI scans have an anisotropic resolution with an in-plane resolution of ~ 4 × 4 mm 2 and a slice thickness of 10 mm. Thus, we make use of the anisotropic input capabilities of the VNet architecture in contrast to other architectures which require isotropic spatial windowing, such as the nn-UNet 51 .
Previous approaches have utilized DL to generate synthetic ventilation images in 2D. Capaldi et al. 30 used a 2D UNet CNN with a MAE loss function to generate ventilation images of a single 2D coronal section from free-breathing 1 H-MRI, limiting volumetric coverage 30 . Moreover, the 2D intensity images cannot contextualize the volumetric nature and spatial clustering of ventilation defects 52 . This can lead to discontinuities between slices which reduces the plausibility of ventilation defect patterns in DL-based ventilation surrogates. Here, we generate fully-volumetric synthetic ventilation surrogates in three dimensions which allows the proposed CNN to learn features which occur over multiple slices.
Levin et al. 53 has indicated that the resolution of functional lung images need not be higher than the smallest pulmonary gas exchange unit, namely, the acinus. The acinus is approximately 10 × 10 × 10 mm 3 in adult humans. They also report that the sufficient resolution of ventilation scans can be as low as 20 × 20 × 20 mm 3 due to the spatial clustering of many ventilation defects 53 . Consequently, we apply 3 × 3 × 1 median filtering as a post-processing step to 129 Xe-MRI, 1 H-MRI SV maps, and DL-based synthetic ventilation scans before evaluation. This increases the resolution to 12 × 12 × 10 mm 3 , in-line with appropriate resolutions proposed by Levin et al. 53 .
Contrast-based functional lung imaging modalities such as hyperpolarized gas MRI require specialized equipment and exogenous contrast, which limit their clinical adoption. In addition, functional lung imaging techniques such as CTVI and SPECT expose patients to ionizing radiation and have demonstrated large variability in performance 16 . Furthermore, SPECT has a lower spatial and temporal resolution and a susceptibility to inducing aerosol deposition artifacts when compared to hyperpolarized gas MRI. Therefore, the ability to synthesize hyperpolarized gas MRI ventilation scans in three dimensions from structural non-contrast 1 H-MRI scans has wide-reaching implications for functional lung imaging, including the potential to be used for functional lung avoidance radiotherapy 7,8 and treatment response mapping 9 . Kida et al. 17 has previously demonstrated that a Spearman's rs of ~ 0.4 between CTVI and SPECT images produces clinically indistinguishable radiotherapy plans. Therefore, the reported Spearman's rs in this work of 0.68 between 129 Xe-MRI and the proposed Phys-VENeT indicates its potential clinical utility for functional lung avoidance radiotherapy applications. In addition, ventilation surrogates generated in this work can potentially be used in a triaging capacity for instances where contrast-based functional lung imaging is unavailable.
Limitations. Despite significant improvements in Spearman's rs and SSIM when compared to 1 H-MRI SV mapping, the PhysVENeT framework generated only moderate correlations with 129 Xe-MRI. Synthetic ventilation surrogates were unable to accurately replicate all subtle ventilation defects, and, in some cases, they exhibit minimal correlation. As 129 Xe-MRI is a direct measure of gas distribution, it can accurately quantify regional www.nature.com/scientificreports/ ventilation; this characteristic is diminished in synthetic ventilation surrogates where the ability to accurately discern between ventilated and non-ventilated lung regions is reduced. There is a wide range of Spearman's rs values produced by the PhysVENeT framework, ranging between 0.13 and 0.85. Four cases produced Spearman's rs values below 0.2; two of these cases are from lung cancer participants of which there are only five participants in the dataset as a whole, potentially limiting the ability of a DL-based approach to generalize to ventilation distributions exhibited in lung cancer participants. Increasing the number of lung cancer participants in the dataset could improve performance for these cases. This may also be due to the large VDP values present in this cohort, which often lead to increased domain-shift between structural and functional imaging 39 . Additionally, the other two underperforming cases had 1 H-MRI SV maps that yielded Spearman's rs values below 0.1. The PhysVENeT framework utilizes the 1 H-MRI SV map as an input and, therefore, if the 1 H-MRI SV map exhibits poor correlation, it has the potential to impact the performance of the PhysVENeT framework. In future work, it may be appropriate to remove the 1 H-MRI SV map as an input in cases where its performance is below a certain threshold value. The repeatability of the proposed approach was not assessed in this work. Nevertheless, the repeatability of ventilation imaging has been previously assessed by our group 54,55 . We employ a robust and standardized protocol for acquiring scans at specific inflation levels. Hughes et al. investigated the repeatability of 3 He hyperpolarized gas MRI ventilation in healthy participants by repeat scanning participants at the lung inflation volumes employed in this study, namely, TLC, RV and FRC + bag 54 . Voxel-wise mean ± SD Spearman's correlations of 0.93 ± 0.02 for TLC, 0.92 ± 0.03 for RV, and 0.95 ± 0.03 for FRC + bag were achieved; these very high correlation values indicate that there is a high level of repeatability between lung volumes and regional ventilation when using a robust acquisition protocol. In addition, Smith et al. previously demonstrated that, after repeat 129 Xe-MRI, there was no significant difference in VDP between scans and demonstrated good repeatability with a Bland-Altman bias of 0.2% (LoA = − 1.4 to 1.8%) 55 . Furthermore, the within-session correlation for VDP was calculated as 0.99, demonstrating the high repeatability of key clinically significant ventilation biomarkers.
In addition, accurate registration is also required for the generation of ventilation surrogates and, therefore, the quality of these registrations significantly impacts the performance of the proposed approach. In future work, an approach independent of registration could be considered. Other DL approaches that utilize generative adversarial networks (GANs) or vision transformers (ViTs) have been used for image synthesis applications 56,57 . The proposed framework used a fully convolutional network that lacks the unsupervised learning benefits of GANs and the long-range feature extraction of ViTs. Future investigations could indicate that utilizing these methods over traditional CNNs leads to improved performance.
The dataset used in this work, whilst varied in pathologies and demographics, is limited in MRI acquisition parameters; all scans were acquired on the same scanner at the same field strength from a single center. Thus, the conclusions of this work cannot be appropriately extended to a dataset of differing sequence or field strength without further investigation. Nevertheless, further expansions of the dataset should focus on the inclusion of a diverse range of MRI acquisition parameters to increase generalizability.

Conclusion
In this study, we propose a multi-channel CNN to synthesize 3D surrogates of pulmonary ventilation from multiinflation 1 H-MRI. These structural scans are combined with an SV map to enhance the physiological plausibility of the synthetic ventilation scans. The PhysVENeT framework produces ventilation surrogates which correlate with 129 Xe-MRI, reflecting ventilation defects observed in the real scans.

Data availability
The imaging datasets generated and/or analyzed during the current study are not publicly available as they were generated as part of an industrial collaborative study that is still underway. Requests for data should be addressed to J.M.W.