Uncertainty propagation in pore water chemical composition calculation using surrogate models

Sochala, Pierre; Chiaberge, Christophe; Claret, Francis; Tournassat, Christophe

doi:10.1038/s41598-022-18411-5

Download PDF

Article
Open access
Published: 05 September 2022

Uncertainty propagation in pore water chemical composition calculation using surrogate models

Pierre Sochala^1,2,
Christophe Chiaberge²,
Francis Claret² &
…
Christophe Tournassat^3,4

Scientific Reports volume 12, Article number: 15077 (2022) Cite this article

869 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Performance assessment in deep geological nuclear waste repository systems necessitates an extended knowledge of the pore water chemical conditions prevailing in host-rock formations. In the last two decades, important progress has been made in the experimental characterization and thermodynamic modeling of pore water speciation, but the influence of experimental artifacts and uncertainties of thermodynamic input parameters are seldom evaluated. In this respect, we conducted an uncertainty propagation study in a reference geochemical model describing the pore water chemistry of the Callovian-Oxfordian clay formation. Nineteen model input parameters were perturbed, including those associated to experimental characterization (leached anions, exchanged cations, cation exchange selectivity coefficients) and those associated to generic thermodynamic databases (solubilities). A set of 13 quantities of interest were studied by the use of polynomial chaos expansions built non-intrusively with a least-squares forward stepwise regression approach. Training and validation sets of simulations were carried out using the geochemical speciation code PHREEQC. The statistical results explored the marginal distribution of each quantity of interest, their bivariate correlations as well as their global sensitivity indices. The influence of the assumed distributions for input parameters uncertainties was evaluated by considering two parametric domain sizes.

Stochastic lithofacies and petrophysical property modeling for fast history matching in heterogeneous clastic reservoir applications

Article Open access 02 January 2024

Resolving experimental biases in the interpretation of diffusion experiments with a user-friendly numerical reactive transport approach

Article Open access 12 September 2023

Micromechanics modelling for mineral volume fraction determination: application on a terrigenous formation

Article Open access 06 October 2020

Introduction

Knowledge of pore water chemical composition is crucial for the building of nuclear waste repository performance assessments¹. First, pore water chemical composition controls radionuclides solubility and adsorption properties on geological and engineered materials. Second, pore water chemical composition influences the transport and mechanical properties of clayey materials, which are essential constituents of existing multi-barrier concepts². Third, pore water chemical composition dictates the nature and kinetics of chemical alteration processes of repository exogenous materials, such as concrete and nuclear glass³. But pore water chemical composition models contain a significant number of input parameters, exhibit strong nonlinearities, and have tightly coupled output results. For these reasons, it is difficult to estimate uncertainties on each of the input parameters and to evaluate the uncertainties of the model outputs, using e.g. error propagation methods. A direct sampling of clay pore water that retains the main characteristics representative of in situ conditions is particularly complex because of a range of side reactions taking place during sampling procedures⁴. Consequently, confidence in the knowledge of pore water chemical composition must be built on a consistent combination of several factors, which includes in situ seepage water collection and characterization, experimental water rock interactions results, and geochemical modeling results linking observations about solid material composition and reactivity with quantitative thermo-kinetic concepts⁵. While considerable effort with this experimental and modeling coupled approach enabled to produce predictive models for claystone pore water chemical composition that are consistent with experimental characterization, little attention has been directed to evaluate the model output uncertainties induced by input parameters uncertainties. Indeed, although the treatment of uncertainties in the performance assessment of geologic high-level radioactive waste repositories is recognized as an important topic for more than three decades⁶, most of the studies focus on uncertainties related to retention processes⁷. Using Monte Carlo methods, the effects of database parameter uncertainty have been evidenced on geochemical equilibrium calculations but limited to very simple (as stated by the authors) modeling scenarios⁸. The goal of the present study is the implementation of a methodology based on surrogate models designed to propagate parametric uncertainties into a pore water chemical composition model with a moderate number of input parameters (around twenty).

Propagation of uncertainty gained wide popularity in many geosciences disciplines^9,10,11,12. Its principle consists in perturbing a set of input parameters and then estimating the ensuing effects on the output quantities. The interest of such statistical framework is to produce richer and more useful information than a single deterministic simulation can deliver. Parametric uncertainty analyses in geochemistry are motivated by different sources of uncertainty such as reaction kinetic rate constants, thermodynamic constants (e.g. solubility and aqueous complex formation constants), initial and boundary conditions, and transport properties^13,14,15,16. Among available approaches, surrogate models have the advantage of providing a fast approximation everywhere in the parametric domain from a small ensemble of simulations, whereas Monte-Carlo techniques evaluate the direct model for a finite number of samples and require a large ensemble to achieve the convergence of the statistical estimators.

In this study, we are interested in using a surrogate model approach to propagate uncertainty into a pore water chemical composition model of the Callovian-Oxfordian (COx) clay formation in the Paris Basin (France), which has been the target of many studies investigating the feasibility of deep nuclear waste repository¹⁷. First, we briefly summarize the geochemical model and the parametric domain on which statistical approximations of the different quantities of interest (QoI) were built. Second, we describe the construction and validation of the surrogate models, with a Polynomial Chaos (PC) method and an orthogonal matching pursuit procedure, which are particularly efficient if the QoI exhibit smooth variations when the uncertain inputs vary. At last, we focus discussion on moments, marginal distributions, correlations and joint distributions as well as on global sensitivity indices, which quantify the influence of the input parameter distributions onto the variance of the QoIs.

Framework

Pore water composition model

The estimation of pore water chemical composition in the COx claystone relies on a geochemical model, of which complete description can be found in⁴. The model is briefly presented and made available in the form of a PHREEQC¹⁸ input file and its associated database (THERMOCHIMIE v9b¹⁹) in the supplementary information file. The complete list of pore water chemical composition model input parameters are: ${\mathrm{Cl}}^{-}$ and ${\mathrm{SO}}_{4}^{2-}$ total concentration obtained from core sample leaching measurements; measured sodium ${\mathrm{Na}}^{+}$, potassium ${\mathrm{K}}^{+}$, calcium ${\mathrm{Ca}}^{2+}$, magnesium ${\mathrm{Mg}}^{2+}$, and strontium ${\mathrm{Sr}}^{2+}$ exchangeable concentrations; related ${\mathrm{Na}}^{+}/{\mathrm{K}}^{+}$, ${\mathrm{Na}}^{+}/{\mathrm{Ca}}^{2+}$, ${\mathrm{Na}}^{+}/{\mathrm{Mg}}^{2+}$, ${\mathrm{Na}}^{+}/{\mathrm{Sr}}^{2+}$ cation exchange selectivity coefficients; and solubilities of Celestite, Calcite, Dolomite, Goethite, Quartz, Pyrite, Ripidolite, and Illite (corresponding to illite$\_$Imt-2 of the database). The reference values of these $N=19$ parameters are reported in Table 1.

Table 1 List of the 19 uncertain input parameters with their reference values $\mu$ (unperturbed state of the geochemical model).

Full size table

Uncertainty model

Once the uncertain input parameters have been identified, the next step is to determine their statistical distributions. For a scalar parameter, it consists of specifying a range (or support) and an associated probability density function. The N uncertain inputs were collected into a random vector ${\varvec{\xi }}=(\xi _1,\dots ,\xi _N)\in {\varvec{\Xi }}\subset \mathbb {R}^N$ whose components $\xi _i$ were assumed to be independent and uniformly distributed over the range $[\xi _i^{-},\xi _i^{+}]$, namely

$$\begin{aligned} \xi _i \sim \mathscr {U}\left( \xi _i^{-},\xi _i^{+}\right) ,\quad \xi _i\perp \xi _j\quad \text {if}\quad i\ne j. \end{aligned}$$

(1)

The assumption of independence implies that the joint distribution $p_{{\varvec{\xi }}}$ of the vector ${\varvec{\xi }}$ and therefore its range ${\varvec{\Xi }}$ factorizes to

$$p_{\boldsymbol{\xi}} (\boldsymbol{\xi}) = \prod_{i=1}^N p_{\xi_i}(\xi_i;\xi_i^{-},\xi_{i}^{+})\quad\text{and}\quad {\boldsymbol{\Xi}}=\prod_{i=1}^N \left[\xi_i^{-},\xi_i^{+}\right].$$

(2)

In case of a uniform distribution, the probability density function of each parameter $\xi _i$ is defined as

$$\begin{aligned} p_{\xi _i}(\xi _ i;\xi _i^{-},\xi _i^{+}):= {\left\{ \begin{array}{ll} 1/(\xi _i^{+}-\xi _i^{-}), &{} \xi _i\in \left[ \xi _i^{-},\xi _i^{+}\right] , \\ 0, &{} \text {otherwise}.\\ \end{array}\right. } \end{aligned}$$

(3)

The extreme values $\xi _i^{-}$ and $\xi _i^{+}$ of the ith parameter are defined as

$$\begin{aligned} \xi _i^{-}=\mu _i-\sqrt{3}\sigma _i\quad \text {and}\quad \xi _i^{+}=\mu _i+\sqrt{3}\sigma _i, \end{aligned}$$

(4)

where the mean $\mu _i=\mathbb {E}(\xi _i)$ corresponds here to the reference value indicated in Table 1 and the standard deviation $\sigma _i=\sqrt{\mathbb {V}(\xi _i)}$ is reported in Table 2. Recall that the mean $\mathbb {E}(\cdot )$ and the variance $\mathbb {V}(\cdot )$ of a random variable u are defined as

$$\begin{aligned} \mathbb {E}(u)&:=\int _{\Xi }u({\varvec{\xi }})p_{{\varvec{\xi }}}({\varvec{\xi }})d{\varvec{\xi }}, \end{aligned}$$

(5)

$$\begin{aligned} \mathbb {V}(u)&:=\mathbb {E}\left[ (u-\mathbb {E}(u))^2\right] . \end{aligned}$$

(6)

We have chosen the uniform distribution since it is the maximum entropy distribution^20,21 among all continuous distributions which are supported in a given finite range. The maximum entropy distribution is often preferred because it represents the least informative distribution but other types of distributions can be adopted. Two cases were considered in order to investigate the effect of the amplitude of perturbations around the reference values onto the uncertainty of the QoIs. Hereafter, these cases are referred to as the “small range case” and “large range case”, respectively. Each parameter range of the latter case is twice the range of the former case, implying from Eq. (2) that the measure (or area) of the parametric domain ${\varvec{\Xi }}$ in the large range case is $2^{N}\simeq 5\cdot 10^5$ times higher than in the small range case.

Table 2 Standard deviation $\sigma$, minimal value $\xi ^{-}$ and maximal value $\xi ^{+}$ of the 19 uncertain input parameters for the small range case and the large range case.

Full size table

Quantities of interest

We are interested in $\mathrm{pH}$, $\mathrm{pe}+\mathrm{pH}$ where $\mathrm{pe}$ is the redox potential, total aqueous concentrations of sodium (${\mathrm{Na}}^{+}$), potassium (${\mathrm{K}}^{+}$), calcium (${\mathrm{Ca}}^{2+}$), magnesium (${\mathrm{Mg}}^{2+}$), strontium (${\mathrm{Sr}}^{2+}$), iron ($\mathrm{Fe}$), silicium ($\mathrm{Si}$), aluminum ($\mathrm{Al}$), sulphate ($\mathrm{S(VI)}$) and sulfur ($\mathrm{S(-II)}$), as well as the $\log _{10}$ of ${\mathrm{CO}}_{2}$ partial pressure ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$. The set of these $\mathscr {N}=13$ QoI is denoted $\mathbb {U}$,

$$\begin{aligned} \mathbb {U}:=\left\{ \mathrm{pH}, \mathrm{pe}+\mathrm{pH}, {\mathrm{Na}}^{+}, {\mathrm{K}}^{+}, {\mathrm{Ca}}^{2+}, {\mathrm{Mg}}^{2+}, {\mathrm{Sr}}^{2+}, \mathrm{Fe}, \mathrm{Si}, \mathrm{Al}, \mathrm{S(VI)}, \mathrm{S(-II)},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}\right\} . \end{aligned}$$

(7)

Surrogate model

The non-intrusive construction of a surrogate model relies on a training set $\mathscr {X}:=\{{\varvec{\xi }}^{(m)}\}$ that samples the parametric domain. The corresponding outputs were computed using PHREEQC, and we obtained $\mathscr {U}:=\{u^{(m)}:=u({\varvec{\xi }}^{(m)})\}$ for each $u\in \mathbb {U}$. The input-output relations ${\varvec{\xi }}^{(m)}\rightarrow u^{(m)}$ were then exploited to build an approximation of u over the whole parametric domain. Several families of methods have been developed over the past decades to construct surrogate models including Gaussian processes²² and (possibly deep) neural networks²³. In this study, in which 19 input parameters were perturbed, we chose polynomial chaos surrogates^24,25 for their relatively low computational costs of construction in moderate dimensional case. In this section, after a brief description of PC expansions and a short reminder on the least squares method, we present the orthogonal matching pursuit procedure as well as the validation of the surrogate models.

Polynomial chaos

Any random variable u with finite variance can be approximated by a spectral expansion^26,27 of the form

$$\begin{aligned} u^{\mathscr {K}}({\varvec{\xi }}) = \sum _{{\varvec{k}}\in \mathscr {K}}u_{{\varvec{k}}} \phi _{{\varvec{k}}}({\varvec{\xi }}), \end{aligned}$$

(8)

where $\{u_{{\varvec{k}}}\}$ is the set of spectral coefficients of $u^{\mathscr {K}}$ and $\{\phi _{{\varvec{k}}}({\varvec{\xi }})\}$ is a complete orthogonal set constituting a basis of $L_2({\varvec{\Xi }},p_{{\varvec{\xi }}})$. The $\phi _{{\varvec{k}}}({\varvec{\xi }})$ are N-variate Legendre polynomials for uniform distributions as is the case here. Each multivariate polynomial is defined by an integer-valued multi-index ${\varvec{k}}=(k_1,\ldots ,k_N)\in \mathbb {N}^N$ where $k_i$ is the polynomial degree associated to the ith variable $\xi _i$. The truncated PC expansion (8) is then defined using a finite set $\mathscr {K}$ of multi-indices and we denote $N_{\mathrm{b}}:=\left| \mathscr {K}\right|$ the PC basis dimension. Sets of multi-indices are often chosen by prescribing a maximal degree $d^{\circ }$ leading to

$$\begin{aligned} {\mathscr {K}}(d^{\circ }) = \left\{ {\varvec{k}}\in \mathbb {N}^N , \left\| {\varvec{k}}\right\| _1 \le d^{\circ }\right\} \quad \text {and}\quad N_{\mathrm{b}}(d^{\circ })=\frac{(N+d^{\circ })!}{N!d^{\circ }!}. \end{aligned}$$

(9)

Least squares method is an efficient approach to estimate the spectral coefficients but cannot be applied if the sample size M is much lower than the PC basis dimension $N_{\mathrm{b}}$. In this case, more advanced methods are used to produce sparse PC.

Ordinary least squares

A first way of estimating the spectral coefficients of a PC expansion is to use the Ordinary Least Squares (OLS) method that consists of minimizing the squared norm of the residual,

$$\begin{aligned} \min _{{\varvec{u}}}\Vert A {\varvec{u}}-\mathbbm {u}\Vert _2^2, \end{aligned}$$

(10)

where $A\in \mathbb {R}^{M,N_{\mathrm{b}}}$ is the matrix of basis functions $\phi _{{\varvec{k}}}({\varvec{\xi }}^{(m)})$, ${\varvec{u}}\in \mathbb {R}^{N_{\mathrm{b}}}$ collects the spectral coefficients $u_{{\varvec{k}}}$ and $\mathbbm {u}\in \mathbbm {R}^{M}$ is the vector of model output $u({\varvec{\xi }}^{(m)})$. The solution of the minimization problem (10) satisfies the system of normal equations

$$\begin{aligned} A^\top A {\varvec{u}}= A^\top \mathbbm {u}, \end{aligned}$$

(11)

provided that the matrix $A^\top A$ is invertible.

Orthogonal matching pursuit

When dealing with high dimensional case, sparse approximation theory has been developed for finding solutions to underdetermined linear systems under sparsity constraint. Such parsimonious solutions can be justified by the sparsity-of-effects principle stating that most models are usually dominated by main effects and low-order interactions²⁸. This principle is illustrated in PC by sparse expansions in which most of the coefficients are zeroes.

Numerous algorithms have been developed recently for the computation of sparse PC expansions (see²⁹ for a review of the existing methods). We relied here on the Orthogonal Matching Pursuit (OMP) method that is a classical greedy algorithm to select a set of active basis functions among a large set (or dictionary) of functions. Initially developed in signal processing³⁰, the matching pursuit algorithm starts with an empty approximation and adds sequentially the most correlated basis function to the current residual. The index $\gamma ^{k}$ of the new basis function satisfies (for $k\ge 1$),

$$\begin{aligned} \gamma ^{k} = {\mathop {\mathrm{arg\,max}}\limits _j}\left( \left| \mathbbm {d}_j^{\top }\mathbbm {r}^{k-1}\right| \right) , \end{aligned}$$

(12)

where $\mathbbm {d}_j\in \mathbbm {R}^M$ is the j-th column of the dictionary D and $\mathbbm {r}^{k-1}\in \mathbbm {R}^M$ the current residual. The orthogonal version of the method³¹ computes the coefficients of the approximation to ensure that the residual is orthogonal to the span of the active functions,

$$\begin{aligned} \left( A^{k}\right) ^\top A^{k} {\varvec{u}}^{k} = \left( A^{k}\right) ^\top \mathbbm {u}, \end{aligned}$$

(13)

where $A^{k}\in \mathbb {R}^{M,k}$ is the matrix of the active basis functions at iteration k. The OMP method is a least-squares forward stepwise regression approach that can be easily implemented (see Supplementary material for the detailed algorithm). Several criteria are possible to stop the iterations, such as the residual norm or cross-validation errors. Here, we compute every ten iterations (until 1500) the Mean Squared Error (MSE) using a validation set $\mathscr {X}_{*}$ of $M_{*}$ realizations, $\mathrm{MSE} := \sum _{{\varvec{\xi }}\in \mathscr {X}_{*}}\big (u\big ({\varvec{\xi }}\big )-u^{\mathscr {K}}\big ({\varvec{\xi }}\big )\big )^2/M_{*},$ and then select the number of active functions that minimizes the MSE.

Validation

We assessed and compared the PC expansions computed using either the OLS or OMP methods. Except for $\mathrm{pH}$, $\mathrm{pe}+\mathrm{pH},$ $\mathrm{S(VI)}$ and ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$, a logarithmic transformation improved the surrogate approximations. Indeed, the log variables exhibited smoother dependences with respect to the uncertain input parameters than the original ones, and their use reduced the approximation errors of the original variables. In practice, the change of variable is trivial and consists of (i) building a PC expansion $v^{\mathscr {K}}({\varvec{\xi }})$ of $v({\varvec{\xi }}):=\log (u({\varvec{\xi }}))$ using the set $\mathscr {V}$ of logarithmically transformed outputs

$$\begin{aligned} \mathscr {V}:=\big \{v^{(m)}:=\log \big (u^{(m)}\big )\big \}, \end{aligned}$$

(14)

and (ii) applying the backward transformation to retrieve the original variables

$$\begin{aligned} u^{\mathscr {K}}({\varvec{\xi }}) :=\exp \left( v^{\mathscr {K}}({\varvec{\xi }})\right) . \end{aligned}$$

(15)

The PC expansions were built with a training set $\mathscr {X}$ of $M=10^{4}$ Monte-Carlo realizations and their errors were estimated using an independent validation set $\mathscr {X}_{*}$ of $M_{*}=10^4$ Monte-Carlo realizations. The accuracy of four PC expansions were compared with three obtained with the OLS method in which different maximal degrees were used $d^{\circ }=1, 2, 3$, and one obtained with the OMP using a dictionary of $N_{\mathrm{b}}(5)=42504$ functions. The number of PC basis functions for the OLS method is $N_{\mathrm{b}}(1)=21$, $N_{\mathrm{b}}(2)=210$, $N_{\mathrm{b}}(3)=1540$ while the number of active functions retained in the OMP method depends on the QoI and is reported in Table 3.

Table 3 Number of terms retained in the OMP method for each QoI.

Full size table

Two error metrics were used to estimate the accuracy of the approximations (Fig. 1): the root mean squared error normalized by the empirical variance $\widehat{\mathbb {V}}_{\mathscr {X}_{*}}(\cdot )$ of the QoI,

$$\begin{aligned} e_1 := \left[ \frac{1}{M_{*}}\sum _{{\varvec{\xi }}\in \mathscr {X}_{*}} \frac{\left( u\big ({\varvec{\xi }}\big )-u^{\mathscr {K}}\big ({\varvec{\xi }}\big )\right) ^2}{\widehat{\mathbb {V}}_{\mathscr {X}_{*}}(u)}\right] ^{1/2}, \end{aligned}$$

(16)

and the root mean squared relative error,

$$\begin{aligned} e_2 := \left[ \frac{1}{M_{*}} \sum _{{\varvec{\xi }}\in \mathscr {X}_{*}} \left( \frac{u\big ({\varvec{\xi }}\big )-u^{\mathscr {K}}\big ({\varvec{\xi }}\big )}{u\big ({\varvec{\xi }}\big )}\right) ^2\right] ^{1/2}. \end{aligned}$$

(17)

The global normalization of error $e_1$ allows to express the approximation error of the QoI in comparison with its uncertainty level whereas the local normalization of error $e_2$ is suitable when the approximation error and/or the QoI have different magnitudes across the parametric domain. The error levels obtained for the large range case were higher than for the small range case (roughly one order of magnitude) because large variations of input parameters induced more complex dependencies in geochemical reactions. As expected, the errors associated with the OLS method decreased when the maximal degree increased since the addition of higher order terms improved the approximations of the stochastic nonlinearities. A further increase of the maximal degree was not an option to reduce the error because the number of basis functions $N_{\mathrm{b}}(4)=8855$ was too close to the sample size $M=10^4$, thereby producing an ill-conditioned matrix $A^\top A$ in (11). On the contrary, the PC expansions obtained by the OMP method exhibited a higher accuracy and a lower number of terms (Table 3). Therefore, in subsequent analyses, we used the OMP surrogate models for which the error level was at most $1\%$ for $e_1$ and $0.5\%$ for $e_2$ in the small range case and $10\%$ for $e_1$ and $7\%$ for $e_2$ in the large range case. Lastly, we note that the input parameters distributions can be changed retroactively on a subset of the parametric domain provided that the surrogate model error is sufficiently low over this subset.

Results and discussion

A direct exploitation of the PC coefficients was not feasible because of the logarithmic transformation. Statistical information were then derived promptly from extensive samplings of the surrogate models. We processed each QoI individually by studying their moments and marginal distributions. We then computed the correlations and plotted the joint distributions of the most correlated pairs of QoI. In closing, a global sensitivity analysis was carried out in order to rank the contribution of the uncertain input parameters onto the variance of each QoI.

Moments

The empirical estimators of the mean $\mu$, the standard deviation $\sigma$ and the coefficient of variation $c_\mathrm{v}=\sigma /\mu$ of each QoI (Table 4) were obtained from a set $\mathscr {Y}$ of $N=10^6$ Monte-Carlo realizations of the surrogate models,

$$\begin{aligned} \widehat{\mu } = \widehat{\mathbb {E}}(u^{\mathscr {K}}) := \frac{1}{N} \sum _{{\varvec{\xi }}\in \mathscr {Y}} u^{\mathscr {K}}\left( {\varvec{\xi }}\right) \quad \text {and}\quad \widehat{\sigma }:= \left[ \frac{1}{N-1} \sum _{{\varvec{\xi }}\in \mathscr {Y}}\left( u^{\mathscr {K}}\left( {\varvec{\xi }}\right) - \widehat{\mu } \right) ^2\right] ^{1/2}. \end{aligned}$$

(18)

For most QoI, mean values and standard deviations had the same characteristics as the uncertain input parameters for the large and small range cases, i.e. the means were roughly identical while the standard deviations were multiplied by a factor 2. Iron and Aluminum concentrations were an exception because their mean were respectively 4 (Fe) and 1.7 (Al) times higher for the large range case than for the small range case. The ratio is 13 (Fe) and 3.5 (Al) for their standard deviation. Except for Al in the small range case, their standard deviations were larger than their mean values, pointing out a high dependence of $\mathrm{Al}$ and $\mathrm{Fe}$ concentrations to input model parameters variations. Low concentration of these two elements and a tight coupling of solubility controls exerted by two mineral phases, Ripidolite and Illite, for which the chosen uncertainty on solubility products were the highest, explained these findings (Table 2). On the contrary, pH values were remarkably stable despite the complex coupled control on this parameter exerted by many phases in the system⁵, thus showing a strong thermodynamic buffering of this parameter by the mineralogical assemblage.

Table 4 Empirical mean $\widehat{\mu }$, standard deviation $\widehat{\sigma }$, skewness $\widehat{s}$, kurtosis $\widehat{k}$ and coefficient of variation $\widehat{c_\mathrm{v}}$ estimated with $10^6$ realizations of the surrogate models.

Full size table

Marginal distributions

Small range case results exhibited three types of empirical marginal distribution profiles (Fig. 2): bell-shaped distributions for $\mathrm{pH}$, $\mathrm{pe}+\mathrm{pH}$ (not shown), ${\mathrm{Na}}^{+}$, ${\mathrm{K}}^{+}$, ${\mathrm{Ca}}^{2+}$, ${\mathrm{Mg}}^{2+}$, ${\mathrm{Sr}}^{2+}$, $\mathrm{S(VI)}$, and ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$; right-skewed distributions for $\mathrm{Al}$, $\mathrm{S(-II)}$, and $\mathrm{Fe}$; and a piecewise linear distribution for $\mathrm{Si}$. Large range case results led to a flattening of the distributions (except for $\mathrm{Fe}$), which was coherent with variances increase. The shape of a distribution can be described by skewness s and kurtosis k that are defined as the third and fourth standardized moments, respectively. The empirical estimators of s and k, indicated in Table 4, are

$$\begin{aligned} \widehat{s} := \frac{\widehat{\mathbb {E}}\left[ \left( u^{\mathscr {K}}\left( {\varvec{\xi }}\right) - \widehat{\mu } \right) ^3\right] }{\widehat{\sigma }^3} \quad \text {and}\quad \widehat{k} := \frac{\widehat{\mathbb {E}}\left[ \left( u^{\mathscr {K}}\left( {\varvec{\xi }}\right) - \widehat{\mu } \right) ^4\right] }{\widehat{\sigma }^4}. \end{aligned}$$

(19)

The skewness of a distribution measures its asymmetry and a distribution is commonly said to be fairly symmetrical if $|s|\le 1/2$, moderately skewed if $1/2\le |s|\le 1$ and highly skewed if $|s|\ge 1$. In the small range case, we observed a slight asymmetry for ${\mathrm{Ca}}^{2+}$ and ${\mathrm{Mg}}^{2+}$ and a high asymmetry for $\mathrm{Al}$, $\mathrm{S(-II)}$, and $\mathrm{Fe}$. In the large range case, the asymmetry became important for all the QoIs except for $\mathrm{pH}$, $\mathrm{pH}+\mathrm{pe}$, $\mathrm{Si}$ and ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$. The kurtosis of a distribution measures the combined weight of the tails relative to the rest of the distribution. It is common to compare the kurtosis to 3 which is the kurtosis of a normal distribution; a high kurtosis ($k>3$) indicates heavy tails while low kurtosis ($k<3$) denotes light tails. In the small range case, the kurtosis is between 2 and 4 except for Fe and $\mathrm{S(-II)}$ which have strong heavy-tailed distributions and $\mathrm{Si}$ due to its piecewise linear distribution. In the large range case, we observed that the kurtosis of each distribution increases substantially (except for $\mathrm{pH}$, $\mathrm{pH}+\mathrm{pe}$, $\mathrm{Si}$ and ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$), meaning that the heaviness of the tails grows in importance. We noted that $\mathrm{Fe}$ was the only quantity of which the distribution was more peaked for the larger parametric domain; the mean values obtained with each of these cases were significantly different but the medians were very close (Fig. 2).

Linear correlations

Linear correlation between two random variables u and v were measured with the Pearson’s correlation coefficient $r(u,v)\in [-1,1]$ defined as follows

$$\begin{aligned} r(u,v):=\frac{{\mathbb C}\mathrm{ov}(u,v)}{\sigma (u)\sigma (v)}, \end{aligned}$$

(20)

where ${\mathbb C}\mathrm{ov}(u,v):=\mathbb {E}\left[ (u-\mathbb {E}(u))(v-\mathbb {E}(v))\right]$ is the covariance between u and v. The square of Pearson’s coefficient is the coefficient of determination $R^2(u,v):=r(u,v)^2\in [0,1]$, which represents the percentage of variation of u due to a linear variation of v.

Empirical estimates of r(u, v) and $R^2(u,v)$ are plotted on Fig. 3 in which the lower (resp. upper) parts of the matrices correspond to the small (resp. large) range case. Three pairs presented a particularly strong correlation regardless of the parametric domain size: the pairs $(\mathrm{pH},\mathrm{pe}+\mathrm{pH})$ and $(\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ were negatively correlated with $R^2=88\%$ and $R^2=97\%$ respectively, whereas the pair $(\mathrm{pe}+\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ was positively correlated with $R^2=80\%$. The $(\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ pair correlation can be understood by noting that the standard deviation of ${\mathrm{Ca}}^{2+}$ concentration (Table 2) was small compared to its mean value (Table 1) and that the ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$ value is directly related to $\mathrm{pH}$ by the Calcite equilibrium reaction. The correlation in the pair $(\mathrm{pH},\mathrm{pe}+\mathrm{pH})$ cannot be explained by the known negative correlation of the pair $(\mathrm{pe},\mathrm{pH})$ at constant dioxygen or dihydrogen fugacity through corresponding Nernst’s equation, which results in a $-1$ slope in the $\mathrm{pe}-\mathrm{pH}$ diagram representation: the QoI transformation from $\mathrm{pe}$ to $\mathrm{pe}+\mathrm{pH}$ was indeed meant to suppress this correlation. Consequently, the observed negative correlation must be attributed to particular equilibrium reactions. Goethite equilibrium, the reaction of which results in a $-3$ slope in a $\mathrm{pe}-\mathrm{pH}$ diagram, may explain the observed correlation.

Four pairs had a moderate correlation for the small range case that significantly decreased for the large range case: the pairs $(\mathrm{pH},\mathrm{Fe})$ and $(\mathrm{pe}+\mathrm{pH},\mathrm{S(-II)})$ were negatively correlated with $R^2=57\%$ and $R^2=70\%$. The correlation is well explained by the sensitivity of $\mathrm{S(-II)}$ and $\mathrm{Fe}$ concentration to redox conditions. The pairs $(\mathrm{Fe},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ and $(\mathrm{pe}+\mathrm{pH},\mathrm{Fe})$ were positively correlated with $R^2=53\%$ and $R^2=60\%$. Inversely, the correlation of some pairs involving $\mathrm{S(VI)}$ was higher for the large range case: $(\mathrm{Fe},\mathrm{S(VI)})$, $({\mathrm{Ca}}^{2+},\mathrm{S(VI)})$, and $({\mathrm{Mg}}^{2+},\mathrm{S(VI)})$ with $R^2=46\%$, $R^2=42\%$, and $R^2=40\%$, respectively (instead of $0.5\%$, $35\%$, and $19\%$ for the small range case). These observations can be related to the charge balance requirement in aqueous solution during the calculation. In the model, ${\mathrm{Na}}^{+}$ and ${\mathrm{Cl}}^{-}$ total concentrations (aqueous + exchange) are stabilized at their final values before the reaction step with minerals. Mineral phases exert no further control on their concentrations. Hence, a variation of $\mathrm{S(VI)}$ concentration, which is the second major anion in solution, must be compensated by an equivalent variation of cations concentrations to fulfill solution electroneutrality. This compensation is mostly achieved by ${\mathrm{Ca}}^{2+}$, ${\mathrm{Mg}}^{2+}$, and $\mathrm{Fe}$ because ${\mathrm{Na}}^{+}$ total concentration is fixed by the amount available on the cation exchanger, and because ${\mathrm{Sr}}^{2+}$ concentration is controlled by Celestite solubility, which is itself linked to $\mathrm{S(VI)}$ concentration.

Bivariate distributions

The shapes of the isolines contours of the most correlated pairs of QoI ($R^2>50\%$) were clearly consistent with the sign of the correlation coefficient (Fig. 4), namely negative for the pairs $(\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$, $(\mathrm{pH},\mathrm{Fe})$, $(\mathrm{pe}+\mathrm{pH},\mathrm{S(-II)})$ and positive for the pairs $(\mathrm{pe}+\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$, $(\mathrm{Fe},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$, $(\mathrm{pe}+\mathrm{pH},\mathrm{Fe})$. Also, the pairs $(\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ and $(\mathrm{pe}+\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ followed a bivariate normal distribution whereas the other pairs exhibited more complex asymmetrical distributions. Isolines of the pair $(\mathrm{pH},\mathrm{pe}+\mathrm{pH})$ had the same pattern as those of the pair $(\mathrm{pH},{\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2})$ (not shown).

Global sensitivity analysis

An essential aspect of uncertainty propagation is the global sensitivity analysis^33,34, which quantifies the relative contribution of each uncertain input parameter (or group of input parameters) to the variance of the QoI. This analysis across the whole parametric domain should not be confused with local sensitivity analysis³⁵, which estimates the effect of small perturbations around specific input values by means of the partial derivatives of the model. The global sensitivity analysis was based on the decomposition of the total variance³⁶ into $2^{N}-1$ terms ($N=19$ in this study), as follows

$$\begin{aligned} \mathbb {V}(u)= \sum _{i=1}^N \mathbb {V}_i + \sum _{i<j}\mathbb {V}_{ij}+\cdots +\mathbb {V}_{1\ldots N}, \end{aligned}$$

(21)

where $\{\mathbb {V}_i\}$ are the first-order interaction terms, $\{\mathbb {V}_{ij}\}$ the second order terms, and so on. Of particular interest are the $\mathbb {V}_i$ which measure the own effects of the input parameter $\xi _i$ on the output variance. Typically, these effects are normalized by the total variance defining the first-order sensitivity indices $S_i$ by

$$\begin{aligned} S_i:=\frac{\mathbb {V}_i}{\mathbb {V}}. \end{aligned}$$

(22)

The first-order sensitivity indices were estimated from the Monte-Carlo pick-freeze algorithm^33,37, which requires a sample of size M of the input variables (Fig. 5). For a given case, the number of surrogate model evaluations was $M(N+1)=2\cdot 10^7$ for each QoI. A sum of the first-order indices close to one is representative of low interactions between parameters and of an essentially additive model. Interaction effects were minor for the small range case (except for $\mathrm{Fe}$ and $\mathrm{S(-II)}$), but increased significantly for the large range case. The first-order sensitivity indices of eight quantities, $\mathrm{pH}$, $\mathrm{pe}+\mathrm{pH}$, ${\mathrm{Mg}}^{2+}$, $\mathrm{Fe}$, $\mathrm{Si}$, $\mathrm{Al}$, $\mathrm{S(-II)}$, ${\mathrm{log}}_{10}\,{\mathrm{p}}_{{\mathrm{CO}}_2}$ were mainly governed by solubilities, while four other quantities, ${\mathrm{Na}}^{+}$, ${\mathrm{Ca}}^{2+}$, ${\mathrm{Sr}}^{2+}$, $\mathrm{S(VI)}$, depended on the four input categories. In addition, three QoIs were strongly dependent on a single parameter: $\log K_{\mathrm{ex}}^{{\mathrm{Na}}^{+}/{\mathrm{K}}^{+}}$ for ${\mathrm{K}}^{+}$ consistently with the known control of ${\mathrm{K}}^{+}$ concentration by cation exchange reactions in clay minerals rich systems³⁸; quartz for $\mathrm{Si}$ consistently with the negligible variation of quartz solubility product and with $\mathrm{Si}$ aqueous speciation in the explored range of pH variations; and Illite for $\mathrm{Al}$ consistently with the fact that only Illite and Ripidolite react with $\mathrm{Al}$.

Conclusion

Our uncertainty propagation study using surrogate models proved to be successful in analyzing the sensitivity of a reference pore water geochemical model to its various input parameters. The results, and validation with direct Monte-Carlo simulations, show that sparse polynomial chaos are well-adapted to approximate the quantities of interest. Most significant correlations and anti-correlations were tractable from geochemical constraints, giving confidence in the overall analysis. The method makes it possible not only to quantify the uncertainties of the quantities of interest for future performance evaluation calculations, but also to identify the main influential input parameters. This latter information is particularly valuable to guide further research efforts in view of reducing uncertainties on specific aspects of performance assessment analyses. Because pore water chemistry influences many important parameters such as radionuclides transport and retardation by adsorption and precipitation, uncertainty analyses of reactive transport modeling outcomes would certainly benefit from a coupling with our surrogates models to decipher uncertainties in adsorption models predictions, and to speed up calculations in fully coupled approaches.

Data availability

All data generated or analysed during this study are included in this published article and its supplementary information files.

References

Altmann, S. ’Geo’chemical research: A key building block for nuclear waste disposal safety cases. J. Contam. Hydrol 102, 174–179 (2008).
Article ADS CAS Google Scholar
Tournassat, C. & Steefel, C. I. Reactive transport modeling of coupled processes in nanoporous media. Rev. Mineral Geochem. 85, 75–110 (2019).
Article CAS Google Scholar
Claret, F., Marty, N. & Tournassat, C. Modeling the Long-term Stability of Multi-barrier Systems for Nuclear Waste Disposal in Geological Clay Formations, chap. 8, 395–451 (Wiley, 2018). https://doi.org/10.1002/9781119060031.ch8.
Tournassat, C., Vinsot, A., Gaucher, E. C. & Altmann, S. Chapter 3—Chemical conditions in clay-rocks. In Tournassat, C., Steefel, C. I., Bourg, I. C. & Bergaya, F. (eds.) Natural and Engineered Clay Barriers, vol. 6 of Developments in Clay Science, 71 – 100 (Elsevier, 2015).
Gaucher, E. et al. A robust model for pore-water chemistry of clayrock. Geochim. Cosmochim. Acta. 73, 6470–6487 (2009).
Article ADS CAS Google Scholar
Bonano, E. J. & Cranwell, R. M. Treatment of uncertainties in the performance assessment of geologic high-level radioactive waste repositories. Math. Geol. 20, 543–565. https://doi.org/10.1007/BF00890336 (1988).
Article CAS Google Scholar
Ayoub, A., Pfingsten, W., Podofillini, L. & Sansavini, G. Uncertainty and sensitivity analysis of the chemistry of cesium sorption in deep geological repositories. Appl. Geochem. 117, 104607 https://doi.org/10.1016/j.apgeochem.2020.104607 (2020).
Article ADS CAS Google Scholar
Denison, F. H. & Garnier-Laplace, J. The effects of database parameter uncertainty on uranium (vi) equilibrium calculations. Geochim. Cosmochim. Acta. 69, 2183–2191. https://doi.org/10.1016/j.gca.2004.09.033 (2005).
Article ADS CAS Google Scholar
Sochala, P. & Le Maître, O. Polynomial Chaos expansion for subsurface flows with uncertain soil parameters. Adv. Water. Resour. 62, 139–154. https://doi.org/10.1016/j.advwatres.2013.10.003 (2013).
Article ADS Google Scholar
Li, G. et al. Quantifying initial and wind forcing uncertainties in the Gulf of Mexico. Comput. Geosci. 20, 1133–1153. https://doi.org/10.1007/s10596-016-9581-4 (2016).
Article MathSciNet MATH Google Scholar
Sochala, P., De Martin, F. & Le Maître, O. Model reduction for large-scale earthquake simulation in an uncertain 3d medium. Int. J. Uncertain. Quantif. 10, 101–127 (2020).
Article MathSciNet Google Scholar
Snelling, B., Neethling, S., Horsburgh, K., Collins, G. & Piggott, M. Uncertainty quantification of landslide generated waves using Gaussian process emulation and variance-based sensitivity analysis. Water 12, 416 (2020).
Article Google Scholar
Phenix, B. D. et al. Incorporation of parametric uncertainty into complex kinetic mechanisms: Application to hydrogen oxidation in supercritical water. Combust. Flame 112, 132–146 (1998).
Article CAS Google Scholar
Reagan, M. T., Najm, H. M., Ghanem, R. G. & Knio, O. M. Uncertainty quantification in reacting-flow simulations through non-intrusive spectral projection. Combust. Flame 132, 545–555. https://doi.org/10.1016/S0010-2180(02)00503-5 (2003).
Article CAS Google Scholar
Alexanderian, A., Le Maître, O., Najm, H., Iskandarani, M. & Knio, O. Multiscale stochastic preconditioners in non-intrusive spectral projection. SIAM J. Sci. Comp. 50, 306–340. https://doi.org/10.1007/s10915-011-9486-2 (2012).
Article MathSciNet MATH Google Scholar
Srinivasan, G., Tartakovsky, D. M., Robinson, B. A. & Aceves, A. B. Quantification of uncertainty in geochemical reactions. Water Resour. Res. https://doi.org/10.1029/2007WR006003 (2007).
Article Google Scholar
Delay, J. et al. Three decades of underground research laboratories: What have we learned?. Geol. Soc. Spec. Publ. 400, SP400-1 (2014).
Article Google Scholar
Parkhurst, D. L. & Appelo, C. A. J. Description of input and examples for PHREEQC Version 3—a computer program for speciation,batch-reaction, one-dimensional transport, and inverse geochemical calculations, U.S. Geological Survey Techniques and Methods, book 6, chap. A43, http://pubs.usgs.gov/tm/06/a43/, (2013).
Giffaut, E. et al. Andra thermodynamic database for performance assessment: ThermoChimie. Appl. Geochem. 49, 225–236 (2014).
Article ADS CAS Google Scholar
Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x (1948).
Article MathSciNet MATH Google Scholar
Jaynes, E. Information theory and statistical mechanics. Phys. Rev. 106, 620–630. https://doi.org/10.1103/PhysRev.106.620 (1957).
Article ADS MathSciNet MATH Google Scholar
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (The MIT Press, 2005).
Book Google Scholar
Aleksander, I. & Morton, H. An Introduction to Neural Computing (Chapman and Hall, 1990).
Google Scholar
Ghanem, R. G. & Spanos, S. D. Stochastic Finite Elements: A Spectral Approach (Springer, 1991).
Book Google Scholar
Le Maître, O. P. & Knio, O. M. Spectral Methods for Uncertainty Quantification. Scientific Computation (Springer, 2010).
Book Google Scholar
Cameron, R. & Martin, W. The orthogonal development of nonlinear functionals in series of Fourier-Hermite functionals. Ann. Math. 48, 385–392 (1947).
Article MathSciNet Google Scholar
Ernst, O. G., Mugler, A., Starkloff, H.-J. & Ullmann, E. On the convergence of generalized polynomial chaos expansions. Esaim Math. Model. Numer. Anal. 46, 317–339. https://doi.org/10.1051/m2an/2011045 (2012).
Article MathSciNet MATH Google Scholar
Montgomery, D. Design and Analysis of Experiments. Student Solutions Manual (Wiley, 2004).
Google Scholar
Lüthen, N., Marelli, S. & Sudret, B. Sparse polynomial chaos expansions: Literature survey and benchmark. SIAM-ASA J. Uncertain. 9, 593–649. https://doi.org/10.1137/20M1315774 (2021).
Article MathSciNet MATH Google Scholar
Mallat, S. & Zhang, Z. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41, 3397–3415 (1993).
Article ADS Google Scholar
Pati, Y., Rezaiifar, R. & Krishnaprasad, P. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, Vol. 1, 40–44, https://doi.org/10.1109/ACSSC.1993.342465 (1993).
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076. https://doi.org/10.1214/aoms/1177704472 (1962).
Article MathSciNet MATH Google Scholar
Sobol, I. M. Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp. 1, 407–414 (1993).
MathSciNet MATH Google Scholar
Homma, T. & Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 52, 1–17. https://doi.org/10.1016/0951-8320(96)00002-6 (1996).
Article Google Scholar
Cacuci, D. G. Sensitivity theory for nonlinear systems. I. Nonlinear functional analysis approach. J. Math. Phys. 22, 2794–2802. https://doi.org/10.1063/1.525186 (1981).
Article ADS MathSciNet Google Scholar
Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19, 293–325 (1948).
Article MathSciNet Google Scholar
Sobol, I. M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55, 271–280 (2001).
Article MathSciNet Google Scholar
Tremosa, J. et al. Geochemical characterization and modelling of the Toarcian/Domerian porewater at the Tournemire underground research laboratory. Appl. Geochem. 27, 1417–1431 (2012).
Article ADS CAS Google Scholar

Download references

Acknowledgements

The work of P. S., C. C. and F. C. has been supported by the European project DONUT. C. T. would like to acknowledge the funding support by a grant overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” Program LabEx VOLTAIRE, 10-LABX-0100. The authors would like to thank the two anonymous reviewers for taking the time necessary to assess the manuscript and for their thoughtful comments and constructive suggestions.

Author information

Authors and Affiliations

CEA, DAM, DIF, 91297, Arpajon, France
Pierre Sochala
BRGM, 3 avenue Claude Guillemin, 45060, Orléans, France
Pierre Sochala, Christophe Chiaberge & Francis Claret
ISTO, Université d’Orléans-CNRS-BRGM, Orléans, France
Christophe Tournassat
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Christophe Tournassat

Authors

Pierre Sochala
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Chiaberge
View author publications
You can also search for this author in PubMed Google Scholar
Francis Claret
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Tournassat
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.C., F.C. and C.T. conceived the uncertainty models, C.T. launched the PHREEQC ensemble simulations, P.S. built the surrogate models and analysed the results together with F.C. and C.T. All authors reviewed the manuscript.

Corresponding author

Correspondence to Pierre Sochala.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sochala, P., Chiaberge, C., Claret, F. et al. Uncertainty propagation in pore water chemical composition calculation using surrogate models. Sci Rep 12, 15077 (2022). https://doi.org/10.1038/s41598-022-18411-5

Download citation

Received: 16 May 2022
Accepted: 10 August 2022
Published: 05 September 2022
DOI: https://doi.org/10.1038/s41598-022-18411-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Stochastic lithofacies and petrophysical property modeling for fast history matching in heterogeneous clastic reservoir applications

Resolving experimental biases in the interpretation of diffusion experiments with a user-friendly numerical reactive transport approach

Micromechanics modelling for mineral volume fraction determination: application on a terrigenous formation

Introduction

Framework

Pore water composition model

Uncertainty model

Quantities of interest

Surrogate model

Polynomial chaos

Ordinary least squares

Orthogonal matching pursuit

Validation

Results and discussion

Moments

Marginal distributions

Linear correlations

Bivariate distributions

Global sensitivity analysis

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links