Automatic segmentation tool for 3D digital rocks by deep learning

Obtaining an accurate segmentation of images obtained by computed microtomography (micro-CT) techniques is a non-trivial process due to the wide range of noise types and artifacts present in these images. Current methodologies are often time-consuming, sensitive to noise and artifacts, and require skilled people to give accurate results. Motivated by the rapid advancement of deep learning-based segmentation techniques in recent years, we have developed a tool that aims to fully automate the segmentation process in one step, without the need for any extra image processing steps such as noise filtering or artifact removal. To get a general model, we train our network using a dataset made of high-quality three-dimensional micro-CT images from different scanners, rock types, and resolutions. In addition, we use a domain-specific augmented training pipeline with various types of noise, synthetic artifacts, and image transformation/distortion. For validation, we use a synthetic dataset to measure accuracy and analyze noise/artifact sensitivity. The results show a robust and accurate segmentation performance for the most common types of noises present in real micro-CT images. We also compared the segmentation of our method and five expert users, using commercial and open software packages on real rock images. We found that most of the current tools fail to reduce the impact of local and global noises and artifacts. We quantified the variation on human-assisted segmentation results in terms of physical properties and observed a large variation. In comparison, the new method is more robust to local noises and artifacts, outperforming the human segmentation and giving consistent results. Finally, we compared the porosity of our model segmented images with experimental porosity measured in the laboratory for ten different untrained samples, finding very encouraging results.

Model architecture. The Neural Network architecture used in this work is inspired by the architecture used by many top teams in the TGS Salt Identification Challenge on Kaggle 22,23 . It is based on an improved version of the popular U-net architecture 12 with SE-ResNeXt-50 encoder 24 initialized with pre-trained parameters. A general view of all the components in this architecture is presented in Fig. 1. We used 2D convolutional blocks, that takes a stack of consecutive slices where each slice is treated as an input channel. In this way, using a 2D convolution allows us to decrease the computational cost and redirect the resources to train a more complex and deeper architecture. Moreover, we maximize the field of view on the X-Y plane (i.e. rotation plane), which is beneficial to deal with global noises and artifacts (e.g. ring artifacts, beam hardening) due to the nature of image acquisition. It also means that we can use pre-trained blocks trained with large datasets, such as the ImageNet dataset 25 . The common challenge of using 2D convolutional based network to segment 3D images is to preserve the depth information/connectivity of the orthogonal direction (Z direction in our case). To segment a 3D image, our model iterates over every slice of the image and takes a stack of slices containing the center slice and 7 slices in each direction, where 3 of them are consecutive and 4 are skip slices. Finally, our model produces a probability map for each phase, which is build as an average of 2D probability maps for each layer. The final segmentation is generated by selecting the phase with the highest probability for each voxel. To improve the model's  Training dataset. Our training dataset consists of 10 3D Xray micro-CT images of different rock types. The properties of the images are listed in Table 1. These images were acquired using different procedures: scanners (e.g. Xradia, Heliscan), filters, exposure times, cleaning/cropping, reconstruction algorithms, etc. The main idea of using a diverse training dataset is to give our model the ability to generalize as much as possible. The ground truth segmentation for each of these images was obtained using different open and commercial software packages for filtering and segmenting (Avizo, Pergeos, ImageJ, Mango). In this way, we compensate for any systematic error in the acquisition, filtering, and segmentation procedures for our training data. All the selected images have low noise level and high-quality, which makes the segmentation step relatively easy. We have experienced that using segmentations from noisy images deteriorates the network's ability to generalize and produces good results on untrained datasets. For the Reservoir carbonates and Savonnieres samples, the utilized segmented images were obtained from dry/wet based porosity maps 28,29 , which allow us to have a proper estimation of open and micro-phase porosities.
In addition, we have used a high degree of data augmentation to improve the robustness and generalization capabilities of the model. Besides the typical data augmentation techniques like rotating, random cropping, adding Gaussian noise and/or Salt and Pepper noise, we have also used more domain-specific noise in our training pipelines to match the real problems such as ring artifacts, stripe artifacts, intensity variations, and local blurring.
Validation workflow. In order to evaluate the performance of our segmentation tool, we have used synthetic generated data from known ground truth images. We followed an existing workflow 32 for generating synthetic gray-scale images using the Astra-toolbox 33 . Figure 2 shows the generation/validation workflow, where the ground truth segmented data get projected and reconstructed with filtered back projection in a parallel-beam geometry using a Ram-Lak filter. In addition, we can perform a proper noise sensitivity analysis since this workflow allows us to add different types of noise during the projection and reconstruction processes, reflecting the nature of noises/artifacts in real images. Figure 3 shows four examples of reconstructions from sinograms with different types of noise/artifacts.

Results
Comparison to traditional methods. We have generated several synthetic gray-scale images from real segmented ground truth images by adding different types of noise in the generation process, as described in the previous section. Figure 4 shows an example of four synthetic reconstruction cases: (a) clean reconstruction without noise, (b) reconstruction with Gaussian noise pre-projection, (c) reconstruction with Gaussian noise post-projection, (d) reconstruction with ring artifacts. For these four cases, we compare the results from our method to conventional segmentation methods: multi-Otsu thresholding and Marker-controlled Watershed. In this figure we show the middle slice of the 3D segmented images. The calculated accuracy and geometrical properties for all these cases are presented in Table 2. For each phase we have measured the total voxel-wise accuracy, IOU (Intersection Over Union). We have also calculated the Euler number (or Euler characteristic), a topological www.nature.com/scientificreports/ invariant that describes an object in a topological way regardless of the way it bent, and the connected components count, i.e the number of connected regions, using a 26-neighborhood connection algorithm 34 . All methods are able to produce a correct segmentation on the noise-free case, Fig. 4a. However, it is interesting noticing that our method already produced closer Euler and connected counts to the ground truth. The micro-phase region is more disconnected in the case of multi-Otsu and Watershed segmentation. In the next case, Fig. 4b (i.e. Gaussian noise pre-projection), the results show a significant loss in accuracy for Otsu (58.43%) and watershed (55.66%) methods. These methods are very sensitive to Gaussian noise since multi-Otsu is based on global thresholding and watershed is based on neighbor gray-scale values. On the other hand, our method has shown a minor reduction in terms of voxel-wise accuracy (98.75%) and IOU, mostly due to the blurred border between the different phases. In the third case, Fig. 4 (i.e. Gaussian noise post-projection), the results show a significant variation in the fraction of each phase for conventional methods, even for open porosity which was relatively stable in the previous case. Despite the significant increment in the level of noise, our model showed 2% accuracy loss in terms of global accuracy, mostly due to the contrast reduction of interfaces, especially for micro-phase regions. This leads to a significant loss in IOU for the micro-phase, going down to 56%. However, as observed visually, this is a small error due to the small amount of micro-phase in the image (less than 1%). Finally in Fig. 4d, we show the last case (i.e. ring artefacts). The results are similar to the previous cases where multi-Otsu and watershed are really sensitive to the introduced noise performing very badly. As expected, the IOU of micro-phase for our method is lower than for the clean image (49.35%). As observer in this figure, the big   Noise sensitivity analysis. In this section, we analyse the accuracy of our model to different types of synthetic noise and artifacts for four different rock samples: a bentheimer sandstone, a reservoir sandstone, an estaillades carbonate, and a middle eastern carbonate. In Fig. 5, we can see a synthetic reconstruction of each of these samples without noise. Additionally, we present pore, micro-phase and solid fractions in Table 3. For each rock image and each noise type we have generated around 50 synthetic new images with different noise levels.
We have then processed all of them with our segmentation model and measured the accuracy respect to the ground truth. In the Figs. 6, 7 and 8 we show the Reservoir Sandstone gray-scale image and its corresponding segmentation at four different noise levels for Gaussian noise pre-projection, Gaussian noise post-projection, and ring artifact, correspondingly. As shown, in all the cases the model results do not break, even when the level of noise goes far beyond realistic noise levels, as shown in Fig. 7d. To the authors knowledge there is no segmentation model or algorithm in the literature reporting this level of robustness to noise.
To quantify the level of noise introduced in each image we have used PSNR (Peak signal to noise ratio) measured in 8 bits. The PSNR between a reference image f and a test image g of size (X, Y, Z) is defined as:   Figure 5. The four clean reconstructed synthetic images used for the noise sensitivity analysis. www.nature.com/scientificreports/ so PSNR is an inverse logarithmic scale measured in decibels. The accuracy (IOU) vs. noise level (PSNR) curves for all these cases are shown in Fig. 9. As expected, the accuracy of the segmentation decreases monotonically for increasing levels of noise (lower PSNR). For the sandstone cases, with well-defined grains and pores, we observed that micro-phase is the most sensitive to noise and artifacts, while the accuracy of solid and pore are relatively stable even for extreme noise cases. So as expected, the larger the specific surface of a phase the more sensitive to noise this phase is. Accordingly, for the synthetic carbonate cases, without well-defined grains, the results show  www.nature.com/scientificreports/ a slightly lower accuracy for pore than micro-phase regions. We observed that in these cases, the pores regions are more sensitive to noise than micro-phase regions. In general, for all types of noise we observe three different behaviours: no-influence, correlated-noise and model-breakage. For example, for Gaussian noise pre-projection, the inflection point between no-influence occurs at around PSNR = 35 dB and the model-breakage starts at around PSNR = 8 dB. In the first region it is hard to visually perceive any difference between segmented image and ground truth. In the second region, correlated-noise, we observe differences mainly located at the interfaces, starting from single voxels and small clusters. In the third region, model-breakage, one of the phases is affected significantly due to the added noise, most of the information defining the interfaces is missing. For this last region, small increases on the noise level produce significant accuracy loss. As observed in Fig. 7d,h, even for these extreme noise levels the solution maintains its coherence with the ground truth. When it comes to dealing with ring artifacts, Fig. 9c, it is important noticing that PSNR is less sensitive to these non-local artifacts. This means that for segmentations with similar IOUs, ring artifacts give a bigger visual effect than Gaussian noise. In these cases, as observed in Fig. 8, most of the error is concentrated at the center of the image where the information of local structure is completely missed. In general terms, these results show that when trained appropriately this type of methods are extremely robust dealing with different types of noise and artifacts. In the next sections, we test these capabilities by evaluating the model for several types of real rock images and by comparing its behaviour to state-of-the-art human-assisted segmented images.

Results on real images. Comparison to human-assisted segmentation.
Due to the difficulty of defining an objective ground truth for real images, in this section we have directed our focus on comparing the performance of our model with respect to several human-assisted segmentations using traditional processing workflows. In addition, we investigate the variation of human-assisted segmentations and evaluate its effect on derived properties. We have used two images corresponding to Bentheimer and Berea sandstones. These images are challenging due to local variations in the gray-scale values. However, these images are representative of the general type and level of noises found in micro-CT imaging. We have provided the raw images (as produced by the scanners) to 5 different expert users. Each user chose the tools and workflows they considered more relevant to solve the assigned task. All the users have performed some filtering steps before segmenting the images. They have used conventional filters such as anisotropic diffusion 35 , beam hardening correction, median filter 36 , non-local means filter 37 Figure 10 shows the middle slice of the Bentheimer image and the corresponding segmentations from our model and the five users. In general terms, the two more severe problems in dealing with this image are the typical micro-CT stripes in Z direction (local gray-scale variations) and the high-density mineral artifacts. As observed, all the human-assisted segmentations seem to be very sensitive to these types of noise. This is reflected in a non-neglectful over-estimation of the micro-phase regions, which evidently due to the stochastic nature of In addition, user 3 seems to have over-segmented the micro-phase in the interfaces between solid and pore. Figure 11 shows a detailed region of this image that has been heavily affected by highdensity mineral artifact (local beam hardening). In this particular case, all the human-assisted segmentations reflect the second main problem, mentioned above. For all of the human-assisted segmentation cases we observe a significant over-estimation of the micro-phase around the high-density mineral regions due to a sharp shift in local brightness. In contrast, the segmentation of the model seems to handle these types of noise very well, even if not trained with this type of specific noise. Figure 12 shows a slide of the gray-scale Berea image and the corresponding segmentation images. In terms of quality, this image has less noise and no significant artifact compared to the Bentheimer case. Figure 13 shows a more detailed crop where we can see a non-neglectful variation in the micro-phase volume for each   Table 4, we summarized the measured and calculated properties on these segmented images for each of the users and our model. The calculated properties (i.e. permeability, formation factor and tortuosity) are calculated using a pore network model technique 2,38 . As we can see, the differences in the segmentations are reflected in all the properties. A significant variation of the results between the users is observed, most notably for the micro-porosity fraction and permeability, which is a common problem when working with real images as reported in the past 39 .
In the case of the Bentheimer sandstone, our method produced less micro-phase volume compared to the users, which is mainly due to its robustness to noise and artifacts as mentioned above. Our segmentation also has larger permeability compared to the users. This can be due to the larger open-pores (less micro-porosity) but also by the stripes of noise observed for most of the users, spanning the XY plane of the image. In this case, the users micro-phase volume covers a range of 3.51%. For Berea, the results from the users do not show a clear  www.nature.com/scientificreports/ trend as previously (over-segmenting micro-phase). However, the variations of micro-phase volume are still quite significant ranging from 4.61 to 10.71%. Assuming a micro-phase volume as the users average, these variations translate to 75% variation for Bentheimer and a 86% variation for Berea (these numbers are worse assuming lower volumes). However, it is important to see that the model not only is more robust to noise but, additionally, it can help to reduce the human-usage variations since the model will perform equally independent of who is using it, since it does not require parameters tuning to do the segmentation work.
Comparison to experimental results. In this section, we compare the results of our model to porosities measured experimentally. We used the dataset presented in Shah et al. 40 , where the authors provided micro-CT images at different resolutions for 10 sandstone and carbonate samples and the corresponding experimental porosity, which was measured through the bulk volume by saturating the plugs with water. It is important to remark that these measurements were done on the whole cylindrical sample volume and the scanned region corresponds to a crop in the middle of the cylinder ( ∼ 50%). However, as these rocks are relatively homogene-  www.nature.com/scientificreports/ ous it is expected that porosity values are in the same range. We used the images with the highest resolution of around 4 µm. These images are all 575 cubic voxels. Figures 14 and 15 show the middle slice of each image and the corresponding segmentation by our model. Table 5 shows the measured experimental porosity and the open and micro-phase fractions from our segmentation. For the sandstone cases, assuming a 50% porosity for microphase regions there is less than 1% difference compared to the experimental total porosity. For the carbonates, it is harder to assign a given porosity for the micro-phase since carbonates, in general, have a wider porosity distribution. For example, the Ketton segmentation, shown in Fig. 15f illustrates that most of the micro-phase regions have very close gray-scale value to the solid region and, thus, a lower porosity. Another clear example is the case of Middle Eastern 3 carbonate shown in Fig. 15g, where the micro-phase regions are covering the whole image (96%). In such cases, it would be necessary to use other types of imaging techniques, such as dry/wet based porosity map 28,29 , instead of a three-phase segmentation. For the rest of the carbonate cases, the segmentation performed by our model seems visually correct and the porosities are in the same range as the experiments.

Conclusion
We have developed a deep learning based three-phase segmentation model and trained it on multiple 3D micro-CT rock images with a wide range of domain-specific augmentation steps. We then studied our model's performance on synthetic and real images in terms of accuracy and physical properties. Based on these results, it is clear that our segmentation model is capable of producing high-quality segmentations even when given noisy and low-quality input images. We have further validated our model's performance with experimental results from other studies on multiple types of rocks. In addition, we compared the performance of our segmentation model Table 4. Comparison of petrophysical properties between our method and human-assisted segmentations.  www.nature.com/scientificreports/ to five different expert users. The segmentation results strongly indicated that the model is better at dealing with typical image noises compared to the expert users. As a summary, the presented results demonstrate the potential capacity of a fully automated segmentation workflow, which would lead to a significant improvement on the image processing step compared to current processing workflows. It is worth mentioning that even if this study focuses on three-phase segmentation, it is relatively simple to extend the number of phases handled by the tool, given the availability of the necessary training data.