2D-to-3D image translation of complex nanoporous volumes using generative networks

Image-based characterization offers a powerful approach to studying geological porous media at the nanoscale and images are critical to understanding reactive transport mechanisms in reservoirs relevant to energy and sustainability technologies such as carbon sequestration, subsurface hydrogen storage, and natural gas recovery. Nanoimaging presents a trade off, however, between higher-contrast sample-destructive and lower-contrast sample-preserving imaging modalities. Furthermore, high-contrast imaging modalities often acquire only 2D images, while 3D volumes are needed to characterize fully a source rock sample. In this work, we present deep learning image translation models to predict high-contrast focused ion beam-scanning electron microscopy (FIB-SEM) image volumes from transmission X-ray microscopy (TXM) images when only 2D paired training data is available. We introduce a regularization method for improving 3D volume generation from 2D-to-2D deep learning image models and apply this approach to translate 3D TXM volumes to FIB-SEM fidelity. We then segment a predicted FIB-SEM volume into a flow simulation domain and calculate the sample apparent permeability using a lattice Boltzmann method (LBM) technique. Results show that our image translation approach produces simulation domains suitable for flow visualization and allows for accurate characterization of petrophysical properties from non-destructive imaging data.


Results
Two-dimensional image reconstruction. We first examine predicting 2D FIB-SEM image patches from TXM image patches. Results for the 2D-to-2D image prediction models are summarized in Table 1 and example images shown in Fig. 2b. For unregularized models, we test several different configurations of network architectures, training objectives, and upsampling factors (for the SISR models). We also train the pix2pix with Wasserstein GAN (WGAN) loss and SRGAN 4x upsampling with vanilla GAN loss with the proposed Jacobian www.nature.com/scientificreports/ regularization term to improve volume generation from 2D-to-2D models (Fig. 2a). These two models were chosen because they showed the strongest performance on peak signal-to-noise ratio (PSNR), structural similarity index metric (SSIM), and structural texture similarity index metric (STSIM). Across all metrics, both the image translation and super-resolution models offer improvement over the unprocessed TXM images. STSIM and low density region segmentation show the largest improvement, while the high density segmentation showing minimal improvement over the raw TXM images. The WGAN models tend to outperform their vanilla GAN counterparts and GAN-based models perform a bit more strongly than the feed forward models in terms of PSNR and SSIM. The strongest performing GAN models are the WGAN models, particularly the pix2pix WGAN and SRGAN 2x WGAN models. The z-regularization results show that a Jacobian regularization term does not significantly impact the 2D-to-2D image translation results except for the low-density region segmentation. A comparison of image patches for original and regularized models (Fig. 3) shows that the regularization results in predicted FIB-SEM images with sparser x and y gradients and less noise. Furthermore, the z-regularization term causes the models to synthesize fewer low-density regions, resulting in reduced performance for low-density region segmentation.  37 and the SRCNN and SRGAN models use a 9-block super-resolution ResNet 34 . The pix2pix (WGAN) model performs the most strongly in terms of PSNR, SSIM, and highdensity region segmentation. The SRGAN 4x (Vanilla) model performs the most strongly in terms of STSIM (perceptual similarity) and second most strongly for low-density region segmentation. The z-regularization slightly reduces performance for all metrics except high-density region segmentation. www.nature.com/scientificreports/  www.nature.com/scientificreports/ Three-dimensional volume prediction. We next synthesize image volumes using the trained models.
The volumes are generated by passing each x-y slice independently through the 2D translation network and stacking along the z-axis to form a synthesized volume. Figure 4 shows image volumes generated with the pix-2pix (WGAN) and SRGAN 4x (Vanilla) models for the original models, regularized models, and regularized models post-processed with median filtering. As shown in the image volumes, the z-regularization improves z-direction continuity, and the median filtering reduces "jittering" between image slices in the z-direction. We also see that the regularized models contain fewer low density regions, but as shown in Fig. 5, the low density regions are more continuous between slices than for the original models. We evaluate the quantitative similarity metrics for 3D image volume generation by synthesizing 5 128 3 volumes from the test set TXM volume slices and computing the metrics for the x-y direction (the image generation plane) and the x-z and y-z directions (orthogonal to the image generation plane). We compute four image descriptors for each image in the synthesized volumes and test set: the area A(X), perimeter P(X), and mean curvature χ(X) for the low-density regions, and the joint pixel value distribution p(T ij , S ij ) . We then calculate the Kullback-Leibler divergence (KL divergence) between the synthesized and test set image descriptor distributions. The distribution of these descriptors should be similar between the test set images and the synthesized images along all image planes. Therefore, a smaller value of KL divergence indicates better model performance. Table 2 summarizes the results for the unpaired image prediction metrics. The results for the gray level cooccurrence probability distribution show similar results regardless of the model used. The models with z-regularization show a significant improvement in the perimeter and Euler characteristic metrics. However, the area metric showed worse performance for the regularized models, likely because regularized models produce fewer low-density regions. Results also show that post-processing with a median filter produces worse results for the pix2pix (WGAN) model but improves the perimeter and Euler characteristic results for the SRGAN 4x (Vanilla) model.

Flow in reconstructed volumes.
To demonstrate the application of the presented volume generation approach for evaluating petrophysical properties from nondestructive image data, we simulate flow of methane through volumes generated using the SRGAN 4x (Vanilla) model with and without regularization. This workflow is shown in  www.nature.com/scientificreports/ boundary treatment captures slip velocities in complex pore geometries accurately via the inclusion of local Knudsen numbers 47 . Results from the pressure computations are shown in Table 3 and visualizations of the pressure field and streamlines are shown in Fig. 6. We observe that the choice of model may substantially affect the morphology of the low density regions and therefore the Knudsen number and apparent permeability. The apparent permeability values for both models are reasonable for a shale sample at the rock fabric scale, showing that the proposed image processing method enables both flow visualization and accurate computation of flow properties from non-destructive image data.

Discussion
Both super-resolution and image translation models are shown to predict FIB-SEM image patches effectively from TXM input images. We also observed that for the GAN models, using the WGAN loss function tended to improve performance in terms of PSNR, SSIM, and STSIM, at the expense of synthesizing fewer low-density regions and reduced performance for low-density region segmentation. This presents a significant challenge when analyzing or visualizing flow through shale volumes, because most of the flow is assumed to take place in the kerogen and low-density mineral regions.
The z-regularization approach introduced is promising for improving 3D reconstruction. The image prediction results show that the z-regularization term has a minimal effect on the 2D quantitative error metrics while improving performance for some 3D image similarity metrics. In particular, z-regularization improves the Euler characteristic results that correlate with flow properties in the porous medium. Therefore this regularization is promising for replicating 3D structures in flow simulation domains. While z-regularization is shown to carry many advantages for 3D volume reconstruction, there remains room for improving particularly the quantitative performance of these models and applying this approach in related domains such as medical imaging. An Table 2. Image similarity metrics for volume generation. Metric represents the KL divergence between the distribution of image features computed for the test set image patches and slices from each image plane for synthesized volumes. Lesser is better for all metrics shown. Statistics are computed over 5 total 128 3 volumes. For the area and perimeter, the z regularization and post-processing decrease performance for the x-y plane and produce similar results for the orthogonal x-z and y-z planes. Both z-regularization and post-processing show a significant improvement for the Euler characteristic results.  www.nature.com/scientificreports/ additional significant drawback is that these models take much longer to train due to the extra backpropagation steps required to compute the Jacobian-vector products used in the regularization term. The simulation results demonstrate the practical application of our models to visualize and characterize flow in a shale rock volume using only non-destructive imaging data. The choice of model, however, may impact the computed apparent permeability and flow paths in a given volume. LBM simulations predict slightly smaller apparent permeability and Knudsen numbers for the regularized model, likely due to the complex pore space connectivity in the computational domain. Removal of active cells and creating disconnected flow paths reduces the ease of flow through the domain, which manifests in simulations as a reduced overall velocity and a less permeable medium.
We assume here that the active cells do not contain organic matter, e.g. kerogen, and their entire volume is available to flow. Although the computed apparent permeabilities are reasonable for a shale sample at the rock fabric-scale, the results may represent an upper bound as a result of this assumption. In future work, we will incorporate the effects of organic content and low permeability regions in the simulation models. The approach presented here could be used to reconstruct low-density regions at the O (10 1 nm) scale, then a generated pore structure obtained from high-resolution images (e.g. transmission electron microscopy images 15 ) super-imposed on the low-density regions to obtain a simulation domain at the O (10 0 nm) scale. Similar approaches have been applied to other multiscale source rocks 48 , and presents a promising future direction for nondestructive imagebased characterization of geological porous media.

Methods
Multimodal shale image dataset. The dataset consists of paired TXM and FIB-SEM images and is described in detail elsewhere 19 . The TXM images were acquired at beamline 6-2c of the SLAC National Accelerator Laboratory (SLAC) using X-rays at 8 keV. The TXM projections were reconstructed with TXM-Wizard 49 to produce an image volume of isotropic resolution of 31.2 nm/px. The FIB-SEM images were acquired on a FEI XL 835 DualBeam FIB/SEM instrument that milled and collected cross-sectional images perpendicular to the main axis of the cylinder. Instrument energy was 4kV, and the resolution was 33.6 nm/px in the x-y plane and variable z direction from 30 up to 50 nm. The TXM images were rescaled to be identical to FIB-SEM pixel resolution and aligned to achieve a paired image dataset. The final dataset contains 149 aligned 2D TXM and FIB-SEM grayscale image slices and a 3D TXM image volume consisting of 419 image slices. In both modalities, darker areas denote low density material or features that include: fractures, pores, organic matter and kerogen. Lighter areas denote high density minerals such as carbonates, silicates, barite, and pyrite.  www.nature.com/scientificreports/ Image prediction models. We use image translation [37][38][39] and SISR 33,34 models to predict FIB-SEM images from TXM images. These models are divided into two main families referred to as feedforward CNNs and CGANs. Feedforward CNNs map the input image I in to the output image I out using a CNN G(I in ) =Î out . These models are trained according to the objective: where T i is the input TXM image and S i is the ground truth FIB-SEM image. We train using L 1 loss because this has been shown to produce sharper images than L 2 loss 38 . This model is effectively a nonlinear regression model where the CNN models the mapping between the input data and response values. The architecture for the neural network G(·) can take many forms. In the feedforward CNN model, we use a 9-block ResNet architecture 37 . In the SRCNN model, the T i image is downsampled by 2x or 4x and G(·) is the SR-ResNet architecture 34 .
We also use the pix2pix 38,39 and SRGAN 34 CGAN models. These models consist of a generator network G(·) that predicts the FIB-SEM image and a discriminator network D(·) that evaluates the quality of an output image or pair of input/output images as being real or synthetic. These models are trained according to the objective as We experiment with the vanilla GAN loss 40 as and the Wasserstein GAN (WGAN) loss with gradient penalty 50,51 as All models are implemented in Pytorch 52 , trained using a framework based on code for the pix2pix and CycleGAN 38,39 models, optimized using the Adam optimizer 53 , and trained for 75 epochs total with a batch size of 10,000 randomly sampled 128 × 128 image patches. We split the dataset of 149 paired image slices as 101 training, 12 validation, and 36 test slices. The large test set was chosen to ensure that the 36 test set 2D slices spanned 128 3 voxel subvolumes of the 3D TXM volume. Image patches for training and testing are chosen to contain at least 75% non-zero pixels and 95% non-artifact pixels. We use a learning rate of 2 × 10 −4 for the first 50 epochs and anneal the learning rate by factor of 10 for the remaining epochs, as this was found to produce stable training and to train all models to convergence. GP = 10 is used for all WGAN models per results from previous work on WGANs 51 . z-regularization approach. We propose to reconstruct 3D image volumes with 2D-to-2D image models by enforcing continuity between input image slices, and thereby improve continuity in the z-direction of the predicted image volume. Our approach draws inspiration from existing work on robust learning 54 . Robust learning seeks to reduce the sensitivity of the class logits to the input data of the network; in our approach, we seek to reduce the sensitivity of the generator network G(·) to perturbations in the input image.
Our approach assumes that ∇ z S is sparse for any SEM image S. Therefore, we enforce continuity in the predicted image volumes by regularizing with ||∇ z S|| 1 . This term, however, is not easily computed. For Ŝ = G(T) , where Ŝ is the predicted SEM image and T is the input nano-CT image, we observe that The squared Frobenius norm of the Jacobian can be computed efficiently. During training, we add this term to the original objective function to train a regularized model as Paired image similarity metrics. We evaluate paired image similarity using four similarity metrics: peak signal to noise ratio (PSNR), structural similarity index metric (SSIM), structural texture similarity index metric (STSIM) and a segmentation-based similarity metric.
PSNR: measures the similarity between images in decibels (dB) and is proportional to the inverse of meansquared error (MSE). For images with pixel values normalized to have I ij ∈ [0, 1] , PSNR is computed as SSIM: measures the structural similarity between images. SSIM is based on the similarity of spatial statistics of the image and is calculated as www.nature.com/scientificreports/ C 1 , C 2 , and C 3 are small smoothing factors, µ X and σ X are respectively the mean and standard deviation of the pixel values in image patch X, and usually α = β = γ = 1. STSIM: measures textural similarity between images and is based in part on SSIM 55 . STSIM is designed to measure perceptual similarity between images and is computed as where ρ X computes the correlation coefficient of the image pixels in image X with offset k and ℓ in the x and y directions, respectively. The additional terms have been shown 55 to improve image retrieval by measuring perceptual similarities rather than structural.
Segmentation-based Image Similarity: similar inputs should produce FIB-SEM images that are segmentable into the same rock phases 19 . Hence, we evaluate the outputs of the GAN model by segmenting the outputs and comparing this to the segmentation of the ground truth image 38 . Let Seg k (·) be the segmentation classifier for class k that maps input image I to a binary mask of the pixels selected for class k. We compute the image similarity metric as the Dice score of the segmentation as where ε is a small smoothing factor to account for empty classes. Here we segment the images into five regions: background, low density, medium density, high density, and surface charging artifacts. The classifier is implemented in Ilastik 56 and uses a random forest classifier to segment images based on pre-computed image features.
Unpaired image similarity metrics. The lack of ground truth volumetric data precludes evaluation synthetized image volumes using paired image metrics. We propose to evaluate the 3D reconstruction by measuring the similarity between the distribution of structural features for generated and ground truth images. We compute this metric by using a function that maps an image I (either single modality or paired multimodal image) to a scalar value, then compute the Kullback-Leibler Divergence (KL divergence) between the distribution for synthetic images q(f (Î)) and ground truth images p(f (I gt )) . KL divergence is computed as KL divergence takes values on the range [0, +∞) , so a smaller value indicates a closer match between the probability distributions q(x) and p(x).
We compute this metric for features in the x-y plane (image generation plan) and the x-z and y-z planes (orthogonal to the image generation plane). This method assumes that structural features in porous media at the nanoscale are approximately isotropic and therefore the distribution of structural features in all image planes should match the distribution from ground truth test set images. We compute this metric for Minkowski functionals and the pixel value joint distribution.
Minkowski functionals distributions: Minkowski functionals measure topological features of images and provide structural descriptors of a solid volume and are a common metric for characterizing porous materials 57 . In two dimensions, there are three Minkowski functionals where κ(x) = 1 R(x) is the inverse of the principal radius. We measure structural similarity of the predicted FIB-SEM images by computing the Minkowski functionals 58 for the low-density regions segmented with the Ilastik classifier used for the paired similarity metrics.
Computation of petrophysical properties. We estimate petrophysical properties of source rocks by processing predicted FIB-SEM image volumes into simulation domains then applying digital rock physics techniques to simulate flow through the rock volume. The active cells for the simulation domain are the segmented lower density voxels. We identify and remove disconnected active cells to improve the stability and convergence behavior of the numerical simulations. Using results from LBM flow simulations, we calculate apparent gas permeability as 47 where µ is the gas dynamic viscosity, L is the domain length along the main direction of flow, q o is the average fluid velocity at the outlet, and p i and p o are the inlet and outlet pressures, respectively. A shortcoming of this model, with respect to the permeability values, is the underlying assumption that the simulation domain is open pore space. In reality, the low-density regions are mixes of kerogen and pore space. Therefore, this approach provides an upper bound on the permeability rather than an exact number. Nevertheless, the models allow us to visualize possible flow paths through the sample volume and evaluate the impact of different volume reconstruction methods on the computed petrophysical properties.