Introduction

Fluid flow and transport in heterogeneous porous media are of fundamental importance to the working of a wide variety of systems of scientific interest, as well as applications1,2. Examples of such porous media include catalysts, membranes, filters, adsorbents, print paper, wood, nanostructured materials, and biological tissues, as well as soil and pavement, and oil, gas, and geothermal reservoirs. Such porous media are typically heterogeneous, with the heterogeneity manifesting itself in the shape, size, connectivity, and surface structure of the pores at small scales, and in the spatial variations of the porosity, permeability, and the elastic moduli at large length scales. Thus, any attempt to model flow and transport in porous media entails having the ability to handle big data that is used3 in computing the spatial distribution of the pressure, fluid velocity, etc., throughout the pore space. The heterogeneity and the associated big data, contained in high-resolution two- or three-dimensional (2D or 3D) images of porous media used in their modeling imply that the calculations are highly intensive. Thus, developing efficient predictive algorithms has always been an active area of research.

Significant advancements have been made over the last decade in the development of deep-learning approaches that have made considerable contributions to progress in image analysis, which is also important to studying fluid flow and transport in porous media4,5,6, and other fields. The traditional neural networks, developed to handle large data, have great potential as approximators. Feed-forward neural networks (FFNNs), representing supervised learning methods, try to identify the relationship between the input and output iteratively by minimizing a cost function. To alleviate the computational burden associated with the minimization, advanced optimization methods have been developed7. There is, however, no systematic approach for increasing the accuracy of fully connected FFNNs, as they rely on correlations between data and the properties to be predicted, hence requiring a large amount of training data for acceptable accuracy. Thus, it is most desirable to develop alternatives.

Progress has been made very recently to develop such alternatives, commonly referred to as physics-informed machine learning (ML) in which the network is trained partly based on the fundamental equations that govern the physics of the process under study. Thus, by incorporating the equations in the cost function, one speeds up the convergence and produces accurate predictions, whereas the network requires training with much less data. The idea of using the equations that govern the physics of fluid flow and transport in porous media in the cost function and optimization was first proposed by Sahimi and colleagues8,9. Development of such techniques based on the ML methods was first reported for solving differential10 and partial differential equations11,12,13,14,15,16, and has been recently proposed for analyzing hydrodynamic systems17.

In this study, we aim to fill the gap between the ML methods and direct numerical simulation of fluid flow in a porous medium by combining the advantages of both approaches to obtain accurate solutions of the pressure and fluid velocity fields efficiently. The current ML methods are accurate estimator techniques when they have been sufficiently trained with a large number of datasets of enough variety. Multiple studies have used physics-informed FFNNs for a variety of purposes. However, as our goal is making predictions for fluid flow in images of porous media, it is not practical to use fully connected neural networks, due to the complexity of the images and low efficiency of such networks in discovering complex patterns.

To address the issue, we propose an approach based on an ML method that incorporates in its training (in the cost function) the governing equations for fluid flow in porous media, i.e., the mass conservation (MC) and the Navier–Stokes equations (NS). The result is a highly efficient method for predicting the flow properties. Our multiphysics approach integrates the MC and NS equations, and digital images of heterogeneous pore space with the training of the ML algorithm. As the input data, such as the images of pore space, are complex and large, one needs a network that decreases the computations time. Despite increasing applications of the ML methods, there has been only a limited number of attempts for addressing problems associated with porous materials4,5,6,19,20,21,22,23,24,25,26,27,28.

Results and discussion

We divide our results into two parts. In one part, we present and discuss calculations for the polymeric membrane. In the next part, we demonstrate that the same physics-informed recurrent encoder–decoder (PIRED) network can provide accurate predictions for a completely different porous medium.

Effect of incorporating the governing equations in the learning of the algorithm

Using the proposed PIRED network, we reconstructed the velocity and pressure field in new images using only a small number of images without specifying any boundary conditions. Figure 1a, b present, respectively, the change in the cost function σ2 for the training and testing datasets of the network. σ2 decreases for both variables during both the training and test data, indicating convergence toward the true solutions for both the pressure and fluid velocity fields.

Fig. 1: Computational efficiency of the PIRED.
figure 1

Comparison of cost function σ2 for the training and testing of the PIRED (a, b) and the DDML networks (c, d).

To demonstrate the accuracy and efficiency of the PIRED, we also computed σ2 using a data-driven ML (DDML) algorithm, one in which the governing equations were not incorporated in the cost function. The results for σ2 are presented in Fig. 1c, d for both the training and testing. As can be seen, compared with Fig. 1a, b, DDML has a much weaker performance.

To better illustrate the improvement produced by incorporating the governing equations in the cost function, the PIRED and DDML are compared in terms of the number of data points used for training the network based on the correlation coefficient R2, as well as σ2 in Fig. 2a, b. Clearly, there exists a significant difference between the two methods and, in particular, when the number of the samples is small. The discrepancy is, however, smaller when a larger number of datasets are used.

Fig. 2: Data size reduction of training of PIRED.
figure 2

Effect of data size on (a) the R2 score and (b) σ2 for PIRED and DDML networks.

PIRED-predicted pressure and velocity fields

Figure 3 compares the predicted spatial distribution of the pressure \(\hat{P}\) at four (dimensionless) times t1 = 15 < t2 < t3 < t4 = 185 with the actual P in one of the randomly selected 2D images not used in the training, with the results for all other slices being just as accurate (see below). Figure 4 compares the corresponding results for the magnitude v of the fluid velocity. As the spatial distributions of the differences \(P-\hat{P}\) and \(| {{{\bf{v}}}}| -| \hat{{{{\bf{v}}}}}|\) indicate, the predictions agree very closely with the actual data. Therefore, not only are the distributions of P and v reproduced accurately, the correlations between their values with increasing time are also honored.

Fig. 3: Prediction of the pressure field by the trained PIRED.
figure 3

Comparison of the predicted pressure \(\hat{P}\) with the numerically calculated P at four (dimensionless) times.

Fig. 4: Prediction of the fluid velocity field by the trained PIRED.
figure 4

Comparison of the predicted fluid velocity \(| \hat{{{{\bf{v}}}}}|\) with the numerically calculated values at four (dimensionless) times.

Another quantitative comparison is based on selecting at random a vertical line (a plane in 3D) in one of the 300 testing images and comparing the PIRED-predicted P and v along that line with their actual values. One example is shown in Fig. 5a, b for pressure and velocity, which indicates a very good agreement between the predictions and the actual data. The same accuracy was obtained for all other slices.

Fig. 5: Quantitative comparison of the PIRED predictions with the direct numerical results.
figure 5

Comparison of predicted (a) pressures and (b) fluid velocities with the numerical simulations in a randomly selected 2D image along a line perpendicular to the macroscopic direction of flow.

Next, we compared the calculated ensemble-averaged maps of the PIRED-predicted v and P over the 300 testing images with the actual averages. The results are presented in Fig. 6, where Fig. 6a, c are the averages of actual (numerical) results and Fig. 6b, d are the averages of PIRED prediction results for velocity and pressure. The results in Fig. 6 indicate an excellent agreement.

Fig. 6: Overall prediction accuracy of the PIRED.
figure 6

Comparison of the actual ensemble-averaged of (a) fluid velocities \(\bar{{{{\rm{v}}}}}\) and (c) the fluid pressure \(\bar{{{{\rm{P}}}}}\) with those predicted by the PIRED (b) and (d) for \(\bar{\hat{v}}\) and \(\bar{\hat{P}}\) in the polymeric network at four (dimensionless) times.

PIRED predictions for another porous material

We define an effective permeability K by, K = μLq/(SΔP), where q, S, and ΔP are, respectively, the steady-state volume flow rate and the surface area perpendicular to the macroscopic pressure drop ΔP. K was computed for all the 300 testing slices and was predicted by the PIRED network as well. The comparison is shown in Fig. 7a.

Fig. 7: PIRED efficiency in predicting permeability of another porous medium, a sandstone.
figure 7

Comparison of the actual and predicted permeabilities K (K is normalized according to \((K-{K}_{\min })/({K}_{\max }-{K}_{\min })\)) for (a) 300 2D images of the membrane, whose morphology is shown in (c), and (b) for 100 images of a sandstone, the morphology of which is shown in (d). In (c) and (d), black and white represent the solid matrix and the pores, respectively.

However, a most stringent test of the PIRED network is if we predict the properties of a completely different porous medium, without using any data associated with it. Thus, we used the image of a Fontainebleau sandstone18 with a porosity of 0.14. As the sandstone’s morphology is completely different from the membrane’s, we used a slightly larger number of 2D slices from the membrane (not the sandstone) to better train the PIRED network. Figure 7b compares the effective permeabilities of 100 2D slices of the sandstone with the predictions of the PIRED network. The images of the two types of porous media are also shown in Fig. 7c, d.

Figure 8 compares the predicted spatial distribution of the pressure \(\hat{P}\) at four (dimensionless) times t1 < t2 < t3 < t4 with the actual P field in one of the randomly selected 2D images of the sandstone, with the results for all other slices being just as accurate. Figure 9 presents the corresponding results for the velocity field in the sandstone. In both cases, the agreement between the predictions and the actual fields is excellent. As the spatial distributions of the differences \(P-\hat{P}\) and \(| {{{\bf{v}}}}| -\hat{| {{{\bf{v}}}}| }\) in Figs. 8 and 9 indicate, both the pressure and velocity fields agree very closely with the actual distributions.

Fig. 8: PIRED predictions for the fluid pressure in the sandstone.
figure 8

Comparison of the predicted pressure \(\hat{P}\) with the numerically calculated P at four (dimensionless) times in a randomly selected 2D cut of the sandstone.

Fig. 9: PIRED predictions for the fluid velocity in the sandstone.
figure 9

Comparison of the predicted fluid velocities v with the numerically calculated values \(\hat{{{{\rm{v}}}}}\) at four times.

Summarizing, we presented a physics-informed encoder–decoder algorithm, the PIRED network, by incorporating the MC and the NS equations in its learning process, in order to predict fluid flow in a complex porous medium. The network provides highly accurate predictions for the fluid velocity and pressure fields at every point of the medium that was not used in the training, as well as its effective permeability. Not only does the PIRED network require a significantly smaller amount of data to make accurate predictions and, therefore, much less computations, it also provides accurate predictions for other types of porous media without using their data.

Methods

We divide this section into four parts and explain the structure of PIRED and the details of the computations.

Computational details

If the input and output are both in the form of images, as is the case in this study, an autoencoder network produces more accurate predictions. However, when the input or output is represented by spatial images or time series, which is the case when one solves the MC and NS equations, recurrent neural networks (RNNs) connect the data series. The RNNs output is used as the input in the decoder section to generate output data. Therefore, we couple an RNN to an encoder–decoder network, resulting in a PIRED network in which the cost function is defined partly based on the solutions of the MC and the NS equations.

We use the reverse Kullback–Leibler (KL) divergence (relative entropy)29 for minimizing the cost function. We suppose that p(x) is the true probability distribution of the input/output data, whereas q(x) is an approximation to it. The reverse KL divergence from q to p is a measure of the difference between p(x) and q(x), and the aim is to ensure that q(x) represents p(x) accurately enough that it minimizes the reverse KL divergence DKL(qp), given by

$${D}_{{{{\rm{KL}}}}}[q(x)\parallel p(x)]=\mathop{\sum}\limits_{x\in X}q(x){{\mathrm{log}}}\,\left[\frac{q(x)}{p(x)}\right],$$
(1)

where X is the space in which p(x) and q(x) are defined. DKL = 0, if q(x) matches p(x) perfectly and, in general, it may be rewritten as

$${D}_{{{{\rm{KL}}}}}[q\parallel p]={E}_{x \sim q}[-{{\mathrm{log}}}\,p(x)]-H[q(x)],$$
(2)

where \(H[q(x)]={E}_{x \sim q}[-{{\mathrm{log}}}\,q(x)]\) is the entropy of q(x), with E denoting the expected value operator and, thus, \({E}_{x \sim q}[-{{\mathrm{log}}}\,p(x)]\) is the cross-entropy between q and p. Optimization of DKL with respect to q is defined by

$$\arg \min {D}_{{{{\rm{KL}}}}}[q\parallel p]=\arg \min {E}_{x \sim q}[-{{\mathrm{log}}}\,p(x)]-H[q(x)]=\arg \min {E}_{x \sim q}[{{\mathrm{log}}}\,p(x)]+H[q(x)].$$
(3)

Thus, according to Eq. (3), one samples points from q(x) and does so, such that they have the maximum probability of belonging to p(x). The entropy term of Eq. (3) “encourages” q(x) to be as broad as possible. Thus, the autoencoder tries to identify a distribution q(x) that best approximates p(x).

Network architecture

As in the present problem the input and output images are distinct, the PIRED network, a supervised one, consists of encoder and decoder, known as the U-Net and residual U-Net (see Fig. 10). The encoder has four blocks with the block containing the standard convolutional (CL) and activation layers (AL), and the pooling and batch normalization layers (PL and BNL). The PL compresses the input to its most important characteristics, eliminating the unnecessary features, and stores them in the latent layer, which itself consists of the AL, CL, and BNL. The BNL not only allows the use of higher learning rates by reducing internal covariate shift but also acts as a regularizer for reducing overfitting30. The mean 〈x〉 and variance Var[x] of batches of data x are computed in the BNL and a new normalized variable y is defined by

$$y=\gamma \frac{x-\langle x\rangle }{\sqrt{{{{\rm{Var}}}}[x]+\epsilon }}+\beta.$$
(4)

Here, γ and β are learnable parameter vectors that have the same size as the input data, and ϵ is set at 10−5. During training, the layer keeps running estimates of its computed mean and variance, and uses them for normalization during evaluation. The variance is calculated by the biased estimator.

Fig. 10: Schematic of the proposed PIRED network.
figure 10

Network architecture with Ei and Di, indicating the encoder and decoder blocks; σ2 is the cost function, xi is the input, and the pressure Pj and fluid velocity vj are the output.

The decoder also consists of four blocks, with each block containing the CL, AL, and BNL, as well as a transposed CL (TCL), which is similar to a deconvolutional layer in that if, e.g., the first encoder has a size 128 × 64 × 64 (i.e., 128 features with a size 64 × 64), then, one has a similar size in the decoder. The TCL uses the features extracted by the PL to reconstruct the output, the pressure P and fluid velocity v fields at each specified time epoch. The latent layer of the RNN that we use improves its performance and speeds up significantly the overall network’s computations, because it is in the form of residual blocks, i.e., layers that, instead of having only one connection, are connected to further previous layers.

The input and output data

We used a high-resolution 3D image of a polymeric membrane of size 500 × 500 × 1000 voxels. Its porosity, thickness, permeability, and mean pore size are, respectively, 0.77, 60 μm, 10−12 m2, and 8 μm, respectively. An image of a 2D slice of the membrane is shown in Fig. 7c. We selected at random 700 2D slices of the image with size 175 × 175 pixels for the fluid flow calculations and training the PIRED, and another 300 slices for testing the accuracy. The 700 images were inserted in the PIRED network’s first layer, whereas the last layer contained the output, the distributions of P and v. The fluid density and viscosity were set at 0.997 g/cm3 and 1.89 × 10−3 g/(cm  s), with the fluid injection velocity being 1 cm/s.

We computed the P and v fields at four times and represented them by images. It is noteworthy that the amount of the data needed for computing P and v is significantly smaller than what would be needed by the standard ML methods.

Training PIRED

If L and v0 represent the characteristic length scale and fluid velocity in the medium, we introduce dimensionless variables, x* = x/L, y* = y/L, v* = v/v0, t* = tv0/L, and P* = PL/(μv0). Deleting superscript * for convenience, the MC equation, v = ∂vx/∂x + ∂vy/∂y = 0 remains unchanged. The NS equation becomes

$$\frac{D{{{\bf{v}}}}}{Dt}=\frac{\partial {{{\bf{v}}}}}{\partial t}+{{{\bf{v}}}}\cdot {{\mbox{}}}\nabla {{\mbox{}}}{{{\bf{v}}}}={{{{\rm{Re}}}}}^{-1}\left(-{{\mbox{}}}\nabla {{\mbox{}}}P+{\nabla }^{2}{{{\bf{v}}}}\right),$$
(5)

where \({{{\rm{Re}}}}=\rho {v}_{0}L/\mu\) is the Reynolds number. We define three functions, ξ1 = v, \({\xi }_{2}=D{v}_{x}/Dt-{{{{\rm{Re}}}}}^{-1}\left(-\partial P/\partial x+{\nabla }^{2}{v}_{x}\right)\), and \({\xi }_{3}=D{v}_{y}/Dt-{{{{\rm{Re}}}}}^{-1}\left(-\partial P/\partial y+{\nabla }^{2}{v}_{y}\right)\), and incorporate them in the cost function σ2, minimized by the PIRED network, instead of naively minimizing the squared error between the data and predicted values of v and P. For exact convergence to the actual (numerically calculated) values (by solving the MC and NS equations), we must have ξi = 0 with i = 1 − 3. Thus, the PIRED network learns that the mapping between the input and output must comply with ξi = 0, which not only enriches its training but also accelerates convergence to the actual values. σ2 is defined by

$${\sigma }^{2}=\frac{1}{n}\left\{\mathop{\sum }\limits_{i=1}^{n}\left[{({P}_{i}-{\hat{P}}_{i})}^{2}+{(| {v}_{i}| -| {\hat{v}}_{i}| )}^{2}\right]\right\}+\mathop{\sum }\limits_{i=1}^{3}\mathop{\sum }\limits_{j=1}^{n}{\xi }_{i}{({x}_{j},{y}_{j},{t}_{j})}^{2},$$
(6)

where n is the number of data points used in the training, and Pi and vi are the actual pressure and magnitude of the velocity at point (xi, yi) at time ti, with superscript \(\hat{}\) denoting the predictions by the PIRED network.

The derivatives in ξi were estimated using the Sobel operator31, an inexpensive and effective way for computing the gradients, used commonly in image processing. It may be thought of as a smoothed finite-difference operator consisting of two 3 × 3 convolution kernels for the horizontal (H) and vertical (V) directions, which convolve with the image I in order to estimate the H and V derivatives. The kernels are given by, Gx = M*I and Gy = MT*I, where T and * represent the transpose and convolution operations, and

$${{{\bf{M}}}}=\left[\begin{array}{lll}1&0&-1\\ 2&0&-2\\ 1&0&-1\end{array}\right].$$
(7)

To further quantify the accuracy of the results, we also computed R2, which is a measure indicating the closeness of the predictions and the actual data; for a very accurate model, one should have, R2 ≈ 1.

$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\Vert {\psi }_{i}-{\hat{\psi }}_{i}\Vert }_{2}^{2}}{{\sum }_{i=1}^{n}{\Vert {\psi }_{i}-(1/n){\sum }_{j=1}^{n}{\psi }_{j}\Vert }_{2}^{2}},$$
(8)

where \({\hat{\psi }}_{i}\) is the network’s prediction, ψi is the actual value, and n is the number of the samples.

We solved the MC and the NS equations using the open-source OpenFOAM. The fluid was injected at one side and a fixed pressure was applied to the opposite side. The other two boundaries were assumed to be impermeable.

Solving the MC and NS equations in each 2D image took about 6 central processing unit (CPU) minutes. The computations for training the PIRED network on an Nvidia Tesla V100 graphics processing unit (GPU) took about 2 GPU hours. Then, the tests took less than a second.