Introduction

With the growth in computational capabilities, statistical models are becoming increasingly complex to make predictions under various design conditions. These models often contain uncertain parameters which must be estimated using data obtained from controlled experiments. While methods for parameter estimation have matured significantly, there remain notable challenges for a statistical model and estimated parameters to be considered reliable. One such challenge is the practical identifiability of model parameters which is defined as the possibility of estimating each parameter with high confidence given different forms of uncertainties, such as parameter, model-form, or measurement are present1,2,3. Low practical identifiability of the statistical model can lead to an ill-posed estimation problem which becomes a critical issue when the parameters have a physical interpretation and decisions are to be made using their estimated values4, 5. Further, such identifiability deficit can also lead to an unreliable model prediction, and therefore such statistical models are not suitable for practical applications6, 7. Therefore, for a reliable parameter estimation process and model prediction, it is of significant interest that the practical identifiability is evaluated before any controlled experiment or parameter estimation studies are conducted8, 9.

In frequentist statistics, the problem of practical identifiability is to examine the possibility of unique estimation of model parameters \(\theta\)8. Under such considerations, methods examining identifiability are broadly classified into local and global identifiability methods. While the former examines the possibility that \(\theta =\theta ^k\) is a unique parameter estimate within its neighborhood \(N(\theta ^k)\) in the parameter space, the latter is concerned with the uniqueness of \(\theta ^k\) when considering the entire parameter space. Local sensitivity analysis has been widely used to find parameters that produce large variability in the model response10,11,12,13. In such an analysis, parameters resulting in large variability are considered relevant and therefore assumed to be identifiable for parameter estimation. However, the parameters associated with large model sensitivities could still have poor identifiability characteristics14. Another class of frequentist identification methods is based on the analysis of the properties of the Fisher information matrix (FIM). Staley et al.15 proposed that the positive definiteness of the FIM is the necessary and sufficient condition for the parameters to be considered practically identifiable. Similarly, Rothenberg16 showed that the identifiability of the parameters is equivalent to the non-singularity of the FIM. However, subsequent findings17, 18 have reported that models with singular FIM could also be identifiable. Weijers et al.19 extended the classical FIM analysis and showed that even if an individual parameter has low identifiability it can belong to an identifiable subset, such that the subset is practically identifiable. Parameters within such subsets have functional relationships with each other, thus resulting in a combined effect on the model response. It has been shown that such identifiable subsets can be found by examining the condition number (E-criterion) and determinant (D-criterion), and selecting parameter pairs with the smallest condition number and largest determinant. Similarly, Machado et al.20 considered the D to E ratio to examine practical identifiability and to find the identifiable subsets. Another popular identification technique is likelihood profiling4, 21,22,23. The method is based on finding the likelihood profile of a parameter by maximizing the likelihood with respect to the rest of the parameters. Parameters for which its likelihood profile is shallow are deemed to have low practical identifiability. In addition to evaluating practical identifiability, likelihood profiling could also be used to find functional relationships between parameters, which is helpful for model reparameterization24, 25. However, due to the several re-optimizations required to obtain the likelihood profiles, the method does not scale well with parameter space and could quickly become computationally intractable. While methods based on FIM or likelihood profiling have gained significant popularity they only examine local identifiability. This means that the estimate of practical identifiability is dependent on \(\theta ^k\) for which the analysis is conducted and is only valid within its neighborhood \(N(\theta ^k)\). To overcome the limitations of local identifiability, global identifiability methods using Kullback-Leibler divergence26 and identifying functions27 have been proposed. However, such methods are computationally complex and not suitable for practical problems. Moreover, since such methods are based on frequentist statistics, they are unable to account for parametric uncertainty and therefore unable to provide an honest representation of global practical identifiability.

There have been few studies examining global practical identifiability in a Bayesian framework. Early attempts were based on global sensitivity analysis (GSA) that apportions the variability (either by derivatives or variance) of the model output due to the uncertainty in each parameter6, 28,29,30. Unlike local sensitivity analysis, GSA-based methods simultaneously vary model parameters according to their distributions, thus providing a measure of global sensitivity that is independent of a particular parameter realization. However, global parameter sensitivity does not guarantee global practical identifiability31. Pant et al.32 and Capellari et al.33 formulated the problem of practical identifiability as gaining sufficient information about each model parameter from data. An information-theoretic approach was used to quantify the information gained, such that larger information gain would mean larger practical identifiability. However, assumptions about the structure of parameter-data joint distribution were made when developing the estimator. A similar approach was used by Ebrahimian et al.34 where the change in parameter uncertainty moving from the prior distribution to the estimated posterior distribution was used to quantify information gained. Pant35 proposed information sensitivity functions by combining information theory and sensitivity analysis to quantify information gain. However, the joint distribution between the parameters and the data was assumed to be Gaussian.

Framed in a Bayesian setting, the information-theoretic approach to identifiability provides a natural extension to include different forms of uncertainties that are present in practical problems. In this work, a novel estimator is developed from an information-theoretic perspective to examine the practical identifiability of a statistical model. The expected information gained from the data for each model parameter is used as a metric to quantify practical identifiability. In contrast to the aforementioned methods based on information theory, the proposed approach has the following novel advantages: first, the estimator for information gain can be used for an a priori analysis, that is, no data is required to evaluate practical identifiability; second, the framework can account for different forms of uncertainty, such as model-form, parameter, and measurement; third, the framework does not make assumptions about the joint distribution between the data and parameters as in the previous methods; fourth, the identifiability analysis is global, rather than being dependent on a particular realization of model parameters. Another contribution of this work is an information-theoretic estimator to highlight dependencies between parameter pairs that emerge a posteriori, however, in an a priori manner. Combining the knowledge about information gained about each parameter and parameter dependencies using the proposed approach, it is possible to find parameter subsets that can be estimated with high posterior certainty before any controlled experiment is performed. Broadly, this can dramatically reduce the cost of parameter estimation, inform model-form selection or refinement, and associate a degree of reliability to the parameter estimation.

The manuscript is organized as follows. In "Bayesian parameter inference" the Bayesian paradigm for parameter estimation is presented. In "Quantifying information gain" differential entropy and mutual information are presented as information-theoretic tools to quantify the uncertainty associated with random variables and information gain, respectively. In "Estimating practical identifiability" an a priori estimator is developed to quantify global practical identifiability in a Bayesian construct. In "Estimating parameter dependence" the problem of estimating parameter dependencies is addressed. An a priori estimator is developed to quantify parameter dependencies developed a posteriori. The practical identifiability framework is applied to a linear Gaussian statistical model and methane-air reduced kinetics model; results are presented in "Numerical experiments". Concluding remarks are presented in "Concluding remarks and perspectives".

Quantifying practical identifiability in a Bayesian setting

In this section, we first present the Bayesian framework for parameter estimation. Next, we utilize the concepts of differential entropy and mutual information from information theory to quantify information contained in the data about uncertain parameters of the statistical model. Thereafter, we extend the idea of mutual information to develop an a priori estimator to quantify practical identifiability in a Bayesian setting. While in most statistical models low practical identifiability is due to insufficient information about model parameters, it may often be the case that identifiable subsets exist. Parameters within such subsets have functional relations and exhibit a combined effect on the statistical model, such that the subset is practically identifiable. To find such identifiable subsets, we develop an estimator to highlight dependencies between parameter pairs that emerge a posteriori.

Bayesian parameter inference

Consider the observation/data \(y\in \mathcal {Y}\) of a physical system which is a realization of a random variable \(\text{Y}:\Omega \rightarrow {{\mathbb {R}}^{n}}\) distributed as p(y), where \(\mathcal {Y}\) is the set of all possible realizations of the random variable. Herein, we will use the same lower-case, upper-case, and symbol notation to represent a realization, random variable, and the set of all possible observations, respectively. Consider another real-valued random variable \(\Theta :\Omega \rightarrow {{\mathbb {R}}^{m}}\) distributed as \(p(\theta ): {{\mathbb {R}}^{m}}\rightarrow {{\mathbb {R}}^{+}}\) which denotes the uncertain parameters of the model. The data is assumed to be generated by the statistical model given as

$$\begin{aligned} y\triangleq \mathscr {F}(\theta , d)+\xi , \end{aligned}$$
(1)

where \(\mathscr {F}(\theta , d): {{\mathbb {R}}^{m}}\times {{\mathbb {R}}^{\ell }}\rightarrow {{\mathbb {R}}^{n}}\) is the forward model which maps the parameters and model inputs \(d\in {{\mathbb {R}}^{\ell }}\) to the prediction space. For simplicity, consider the input of the model d as known. The random variable \(\xi\) is the additive measurement noise or uncertainty in our measurement. Once the observations are collected using controlled experiments, the prior belief of the parameter distribution \(p(\theta | d)\) can be updated to obtain the posterior distribution \(p(\theta | y, d)\) via the Bayes’ rule

$$\begin{aligned} p(\theta | y, d) = \frac{p(y\mid \theta , d)p(\theta \mid d)}{p(y\mid d)}, \end{aligned}$$
(2)

where \(p(y\mid \theta , d)\) is called the model likelihood and \(p(y\mid d)\) is called the evidence.

Quantifying information gain

Updating parameter belief from the prior to posterior in (2) is associated with a gain in information from the data. This gain can be quantified as the change in the uncertainty of the parameters \(\Theta\). As an example, consider a 1D Gaussian prior and posterior distribution such that the information gain can be quantified as a change in variance (a measure of uncertainty) of the parameter distribution. A greater reduction in parameter uncertainty is a consequence of more information gained from the data.

In general, the change in parameter uncertainty between the prior and posterior distributions for a given input of the model \(d\in \mathcal {D}\) is defined as

$$\begin{aligned} \Delta {\mathscr {U}}(d, y) \triangleq {\mathscr {U}}(p(\theta \mid d)) - {\mathscr {U}}(p(\theta \mid y, d)), \end{aligned}$$
(3)

where \({\mathscr {U}}\) is an operator quantifying the amount of uncertainty or the lack of information for a given probability distribution. Thus, the expected information gained about the parameters is defined as

$$\begin{aligned} \Delta {\mathscr {U}}(d) \triangleq {\mathscr {U}}(p(\theta \mid d)) - \int \limits _{\mathcal {Y}}{\mathscr {U}}(p(\theta \mid y, d)) \,\text{d}y. \end{aligned}$$
(4)

One popular choice for the operator \({\mathscr {U}}\) is the differential entropy32, 33, 36 which is defined as the average Shannon information37 for a given probability distribution. Mathematically, for a continuous random variable \(Z:\Omega \rightarrow {{\mathbb {R}}^{t}}\) with distribution \(p(z): {{\mathbb {R}}^{t}}\rightarrow {{\mathbb {R}}^{+}}\) and support \(\mathcal {Z}\), the differential entropy is defined as

$$\begin{aligned} H(p(z)) = H(Z) \triangleq -\int \limits _{\mathcal {Z}} p(z) \log p(z) \,\text{d}z. \end{aligned}$$
(5)

Using differential entropy to quantify the uncertainty of a probability distribution, the change in uncertainty (or expected information gain) of \(\Theta\) can be evaluated as

$$\begin{aligned} \Delta {\mathscr {U}}(d)&= H(\Theta \mid d) - H(\Theta \mid Y,d), \end{aligned}$$
(6a)
$$\begin{aligned}{}&= H(\Theta \mid d) + H(Y\mid d) - H(\Theta , Y \mid d) , \end{aligned}$$
(6b)
$$\begin{aligned}{}&= -\int \limits _{{\Theta }}p(\theta \mid d)\log p(\theta \mid d) \,\text{d}\theta + \int \limits _{{\Theta }, \mathcal {Y}} p(\theta , y\mid d) \log p(\theta \mid y, d) \,\text{d}\theta \,\text{d}y,\end{aligned}$$
(6c)
$$\begin{aligned}{}&= \int \limits _{{\Theta }, \mathcal {Y}} p(\theta , y\mid d) \log \frac{p(\theta , y\mid d)}{p(\theta \mid d)p(y\mid d)} \,\text{d}\theta \,\text{d}y,\end{aligned}$$
(6d)
$$\begin{aligned}{}&\triangleq I(\Theta ;Y\mid d). \end{aligned}$$
(6e)

The quantity \(I(\Theta ;Y \mid d)\) is called mutual information between the random variables \(\Theta\) and Y given the model inputs \(\mathcal {D}=d\)38. In the case of discrete random variables the mutual information is measured in bits, whereas in the case of continuous variables the units are nats.

Remark 1

The mutual information \(I(\Theta ; Y\mid d)\) is always non-negative38. This means that updating the parameter belief from the prior to the posterior cannot increase parameter uncertainty.

Estimating practical identifiability

In a Bayesian framework where the parameters are treated as random variables, practical identifiability can be determined by examining information gained about each model parameter32, 35. Parameters for which the data is uninformative cannot be estimated with a high degree of confidence and therefore are practically unidentifiable. While mutual information in (6e) is a useful quantity to study information gained from data about the entire parameter set, it does not apportion information gained about each parameter. Therefore, to examine practical identifiability, we define a conditional mutual information

$$\begin{aligned} I(\Theta _{i};Y\mid \Theta _{\sim i}, d)\triangleq & {} \, {\mathbb {E}}_{\Theta _{\sim i}}[I(\Theta _i;Y\mid \Theta _{\sim i}= \theta _{\sim i}, d)], \end{aligned}$$
(7)

where \(\Theta _{\sim i}\) are all parameters except \(\Theta _i\) and \({\mathbb {E}}_{\Theta _{\sim i}}[\cdot ]\) denotes the expectation over \(p(\theta _{\sim i} \mid d)\). Using such conditional mutual information for practical identifiability is based on the intuition that on average high information gained about \(\Theta _i\) means high practical identifiability. We can thus present the following definitions for identifiability in a Bayesian setting.

Definition 1

(Local identifiability) Given a statistical model with parameters \(\Theta\), a parameter \(\Theta _i\in \Theta\) is said to be locally identifiable if sufficient information is gained about it for a particular realization \(\theta _{\sim i}\) of \(\Theta _{\sim i}\).

Definition 2

(Global identifiability) Given a statistical model with parameters \(\Theta\), a parameter \(\Theta _i\in \Theta\) is said to be globally identifiable if sufficient information is gained about it on average with respect to the distribution \(p(\theta _{\sim i}{\mid d})\).

The expectation over possible realizations of \(\Theta _{\sim i}\) in (7) therefore provides a statistical measure of global practical identifiability8. On the contrary, evaluating (7) at a fixed \(\theta _{\sim i}\) will result in a local identifiability measure, which means that the information gained about \(\Theta _i\) will implicitly depend on \(\theta _{\sim i}\).

Typically, (7) does not have a closed-form expression and must be estimated numerically. Using the definition of differential entropy in (5) the conditional mutual information can be written as

$$\begin{aligned} I(\Theta _i;Y\mid \Theta _{\sim i}, d)&= \int \limits _{{\Theta _{i},\Theta _{\sim i},} \mathcal {Y}} p(\theta _i,\theta _{\sim i}, y\mid d) \log \frac{p(\theta _i, y \mid \theta _{\sim i}, d)}{p(\theta _i\mid \theta _{\sim i}, d)p(y\mid \theta _{\sim i}, d)} \,\text{d}\theta _i \,\text{d}\theta _{\sim i} \,\text{d}y, \end{aligned}$$
(8a)
$$\begin{aligned}{}&= \int \limits _{{{\Theta _{i}},{\Theta _{\sim i}},} \mathcal {Y}} p({\theta _i},{\theta _{\sim i}}, y\mid d) \log \frac{p(y\mid {\theta _i}, {\theta _{\sim i}}, d)p({\theta _i}\mid {\theta _{\sim i}}, d)}{p({\theta _i}\mid {\theta _{\sim i}}, d)p(y\mid {\theta _{\sim i}}, d)} \,\text{d}{\theta _i} \,\text{d}{\theta _{\sim i}} \,\text{d}y, \end{aligned}$$
(8b)
$$\begin{aligned}{}&= \int \limits _{{{\Theta _{i}},{\Theta _{\sim i}},} \mathcal {Y}} p({\theta _i,}{\theta _{\sim i}}, y\mid d) \log \frac{p(y\mid {\theta _i}, {\theta _{\sim i}}, d)}{p(y\mid {\theta _{\sim i}}, d)} \,\text{d}{\theta _i} \,\text{d}{\theta _{\sim i}} \,\text{d}y. \end{aligned}$$
(8c)

Remark 2

In terms of differential entropy, the conditional mutual information in (7) can be defined as

$$\begin{aligned} I(\Theta _{i};Y\mid \Theta _{\sim i}, d)&\triangleq H(\Theta _i\mid \Theta _{\sim i}, d) - H(\Theta _i\mid \Theta _{\sim i}, Y, d), \end{aligned}$$
(9a)
$$\begin{aligned}{}&= H(\Theta _i, \Theta _{\sim i}\mid d) + H(\Theta _{\sim i}, Y\mid d)- H(\Theta _{\sim i}\mid d) - H(\Theta _i, \Theta _{\sim i}, Y\mid d). \end{aligned}$$
(9b)

In case the parameters are uncorrelated,

$$\begin{aligned} I(\Theta _i;Y\mid \Theta _{\sim i}, d) = H(\Theta _i\mid d) + H(\Theta _{\sim i}, Y\mid d) - H(\Theta _i, \Theta _{\sim i}, Y\mid d). \end{aligned}$$
(10)

While this formulation does not involve any conditional distributions involving the parameters or data, it requires joint distributions, namely, \(p(\theta _i, \theta _{\sim i}\mid d)\), \(p(\theta _{\sim i}, y\mid d)\), \(p(\theta _i, \theta _{\sim i}, y\mid d)\). Typically, such joint distributions do not have a closed-form expression and must be approximated.

In the special case where \(\Theta _i\) perfectly correlates with \(\Theta _{\sim i}\) such that the realization of \(\theta _{\sim i}\) provides sufficient information about \(\theta _i\), the term inside the logarithm in (8c) becomes identically unity. For such a case, the data is not informative about \(\Theta _i\) and the effective parameter dimensionality \(m_{\text {eff}}\) becomes less than m. For a more general case, Monte-Carlo integration can be used to approximate the high dimensional integral as

$$\begin{aligned} I(\Theta _{i};Y\mid \Theta _{\sim i}, d) \approx {\hat{I}}(\Theta _{i};Y\mid \Theta _{\sim i}, d) = \sum _{k=1}^{n_{\text {outer}}}\log \frac{p(y^{k}\mid \theta _{i}^{k}, \theta _{\sim i}^{k}, d)}{p(y^{k}\mid \theta _{\sim i}^{k}, d)}, \end{aligned}$$
(11)

where \((\theta _{i}^{k},\theta _{\sim i}^{k})\) is drawn from the distribution \(p(\theta _i, \theta _{\sim i}\mid d)\); \(y^{k}\) is drawn from the likelihood distribution \(p(y\mid \theta _{i}^k, \theta _{\sim i}^k, d)\); and \(n_{\text {outer}}\) is the number of Monte-Carlo samples. Typically, conditional evidence \(p(y\mid \theta _{\sim i}, d)\) does not have a closed-form expression, and therefore \(p(y^k\mid \theta _{\sim i}^k, d)\) must be numerically approximated. One approach is to rewrite the conditional evidence \(p(y^k\mid \theta _{\sim i}^k, d)\) by means of marginalization as

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i}^{k}, d) \triangleq \int \limits _{{\Theta _{i}}} p(y^{k},\theta _{i}\mid \theta _{\sim i}^{k}, d) \,\text{d}\theta _{i} = \int \limits _{{\Theta _{i}}} p(y^{k}\mid \theta _{i}, \theta _{\sim i}^{k}, d)p(\theta _i\mid \theta _{\sim i}^k, d) \,\text{d}\theta _{i}. \end{aligned}$$
(12)

For simplicity, assume that the parameters are uncorrelated prior to observing the data, and are also independent of the model inputs d. As a result, (12) can be re-written as

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i}^{k}, d) = \int \limits _{{\Theta _{i}}} p(y^{k}\mid \theta _{i}, \theta _{\sim i}^{k}, d)p(\theta _i) \,\text{d}\theta _{i}. \end{aligned}$$
(13)

This results in a low-dimensional integral over a univariate prior distribution \(p(\theta _i)\). Evaluating (13) using the classical Monte-Carlo integration can dramatically increase the overall cost of estimating the conditional mutual information in (11), especially if the likelihood evaluation is computationally expensive. In the special case where the priors are normally distributed, this cost can be reduced by considering a \(\zeta\)-point Gaussian quadrature rule. Using the quadrature approximation in (13) gives

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i}^{k}, d) \approx {\hat{p}}(y^{k}\mid \theta _{\sim i}^{k}, d) = \sum _{\zeta =1}^{\zeta =n_{\text {inner}}} \Big [p(y^{k}\mid \theta _{i}^{\zeta }, \theta _{\sim i}^{k}, d)\Big ]\gamma ^{\zeta }, \end{aligned}$$
(14)

where \(\theta _{i}^{\zeta }\) and \(\gamma ^{\zeta }\) are the \(\zeta ^{th}\) quadrature point and weight, respectively; \(n_{\text {inner}}\) is the number of quadrature points. Here, we use the Gauss-Hermite quadrature rule, which uses the \(t^{th}\) order Hermite polynomial and will be exact for polynomials up to order \(2t-1\)39. In a much more general case where the prior distributions can be non-Gaussian (however, can still be evaluated), the cost of estimating (13) can be reduced by using importance sampling with a proposal distribution \(q(\theta _i)\). Using importance sampling we can rewrite (13) as

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i}^{k}, d) = \int \limits _{{\Theta _{i}}} \Big [p(y^{k}\mid \theta _{i}, \theta _{\sim i}^{k}, d)w(\theta _i)\Big ] q(\theta _i) \,\text{d}\theta _{i}, \end{aligned}$$
(15)

where \(w(\theta _i) = p(\theta _i)/q(\theta _i)\) are the importance sampling weights. In the case where the proposal distribution \(q(\theta _i)\) is Gaussian, the quadrature rule can be applied to (15) as

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i}^{k}, d) \approx {\hat{p}}(y^{k}\mid \theta _{\sim i}^{k}, d) = \sum _{\zeta =1}^{\zeta =n_{\text {inner}}} \Big [p(y^{k}\mid \theta _{i}^{\zeta }, \theta _{\sim i}^{k}, d)w(\theta _i^{\zeta })\Big ]\gamma ^{\zeta }. \end{aligned}$$
(16)

Combining the estimator for conditional evidence ((14) or (16)) with (11) results in a biased estimator for conditional mutual information40, 41. While the variance is controlled by the numerical accuracy of estimating the high-dimensional integral in (11), the bias is governed by the accuracy of approximating the conditional evidence in (12). This means that the variance is controlled by \(n_{\text {outer}}\) Monte-Carlo samples and bias by \(n_{\text {inner}}\) quadrature points.

In practice, estimating conditional evidence can become computationally expensive, especially when the variability in the output of the forward model is high with respect to \(\Theta _i\) given \(\Theta _{\sim i} = \theta _{\sim i}\), that is, large \(\nabla _{\theta _{i}} {\mathscr {F}}(\theta , d)|_{\Theta _{\sim i}=\theta _{\sim i}}\). For such statistical models, conditional evidence can become near zero such that numerical approximation by means of vanilla Monte-Carlo integration or Gaussian quadrature in (14) can be challenging41. Using an estimator based on importance sampling for conditional evidence as shown in (16) can alleviate this problem by carefully choosing the density of the proposal \(q(\theta _i)\). As an example, consider the case where the additive measurement noise \(\xi\) is normally distributed as \(\mathcal {N}(0, \Gamma )\) such that the likelihood of the model is distributed as \(p(y\mid \theta )=\mathcal {N}({\mathscr {F}}(\theta , d), \Gamma )\), and \(y^{k}\) is sampled according to \(\mathcal {N}({\mathscr {F}}(\theta _{i}^{k}, \theta _{\sim i}^{k}, d), \Gamma )\). In the case where model predictions have large variability with respect to the parameter \(\Theta _i\) for a given \(\Theta _{\sim i}=\theta _{\sim i}\) the model likelihood can become small. For such a case, the importance-sampling-based estimator given in (16) can be used by constructing a proposal around the sample \(\theta _{i}^{k}\), such as \(q(\theta _{i})=\mathcal {N}(\theta _{i}^{k}, {\sigma ^2_{\text {proposal}}})\) where \({\sigma ^2_{\text {proposal}}}\) is the variance of the proposal distribution. This results in a robust estimation of conditional evidence and prevents infinite values for conditional mutual information. Here, we consider (16) to estimate conditional evidence.

Remark 3

Assessing the practical identifiability in a Bayesian framework is dependent on the prior distribution. Although the framework presented in this article is entirely an a priori analysis of practical identifiability, prior selection can affect estimated identifiability. Prior selection in itself is an extensive area of research and is not considered a part of this work.

Physical interpretation of identifiability in an information-theoretic framework

Assessing practical identifiability using the conditional mutual information described in (7) provides a relative measure of how many bits (or nats) of information is gained for a particular parameter. In practical applications where this information gain can vary on disparate scales, it is useful to associate a physical interpretation to identifiability. Following Pant et al.32, consider a hypothetical direct observation statistical model given as \(\psi \triangleq \theta _{i} + \Lambda\), where \(\Lambda \sim \mathcal {N}(0, \sigma ^2_{\Lambda })\) is the additive measurement noise. Given this observation model, we can define an information gain equivalent variance \(\mathscr {C}(\Theta _i)\) as the measurement uncertainty in the direct observation model given \(I(\Theta _i;\Psi ) = {\hat{I}}(\Theta _i;Y\mid \Theta _{\sim i}, d)\). Large \(\mathscr {C}(\Theta _i)\) would mean that the information gained about \(\Theta _i\) (using (7)) for the statistical model (1) would lead to higher measurement uncertainty if the parameter is observed directly.

If the prior distribution \(p(\theta _i)\) can be approximated by means of an equivalent normal distribution \(\mathcal {N}(\mu _{e}, \sigma ^2_{e})\) then \(I(\Theta _i;\Psi )\) is given as

$$\begin{aligned} I(\Theta _i;\Psi ) \triangleq \frac{1}{2}\log \Big ( 1 + \frac{\sigma ^2_{e}}{\sigma ^2_{\Lambda }}\Big ), \end{aligned}$$
(17)

such that

$$\begin{aligned} \mathscr {C}(\Theta _i) \triangleq \sigma ^2_{\Lambda } = \sigma ^2_{e} (\exp \{2{\hat{I}}(\Theta _i;Y\mid \Theta _{\sim i}, d)\} - 1)^{-1}. \end{aligned}$$
(18)

This information gain equivalent variance only depends on the information gained for the model parameter, and thus, can be used as a metric to compare different model parameters.

Estimating parameter dependence

In most statistical models, unknown functional relationships or dependencies may be present between parameters such that multiple parameters have a combined effect on the statistical model. Such parameters can form an identifiable subset where an individual parameter will exhibit low identifiability, however, the subset is collectively identifiable. This means that the data is uninformative or weakly informative about an individual parameter within the subset, whereas it is informative about the entire subset. As an example, consider the statistical model: \(y = \theta _1\theta _2*d + \xi\) for which individually identifying \(\Theta _1\) or \(\Theta _2\) is not possible as they have a combined effect on the statistical model. However, it is clear that \(\Theta _1\) and \(\Theta _2\) belong to an identifiable subset such that the pair \((\Theta _1, \Theta _2)\) is identifiable. Thus, considering the statistical model given by \(y = \theta _3*d + \xi\) where \(\theta _3=\theta _1*\theta _2\) will have better identifiability characteristics. For such statistical models, the traditional method of examining correlations between parameters is often insufficient, as it only reveals linear functional relations between random variables.

To highlight the parameter dependencies, consider the statistical model given in (1) such that we are interested in examining the relations between \(\Theta _i\in \Theta\) and \(\Theta _j\in \Theta\) that emerge a posteriori. While the conditional mutual information presented in "Estimating practical identifiability" provides information on the practical identifiability of an individual parameter, it does not provide information about dependencies developed between pairs of parameters. To quantify such dependencies, we define a conditional mutual information between parameter pairs

$$\begin{aligned} I(\Theta _i;\Theta _j\mid Y, \Theta _{\sim i, j}, d) \triangleq {\mathbb {E}}_{\Theta _{\sim i, j}}[{\mathbb {E}}_{Y}[I(\Theta _i;\Theta _j\mid Y=y, \Theta _{\sim i, j}=\theta _{\sim i, j}, d)]], \end{aligned}$$
(19)

which evaluates the average information between the variables \(\Theta _i\) and \(\Theta _j\) that is obtained a posteriori. Here, \(\Theta _{\sim i, j}\) is defined as all the parameters of the statistical model except \(\Theta _i\) and \(\Theta _j\).

A closed-form expression for (19) is typically not available, and therefore a numerical approximation is required. In integral form, (19) is given as

$$\begin{aligned} \begin{aligned} I(\Theta _{i};\Theta _{j}\mid Y, \Theta _{\sim i, j}, d) \triangleq \int \limits _{{\Theta _{i}, \Theta _{j}, \Theta _{\sim i, j}}, \mathcal {Y}} p(\theta _{i}, \theta _{j}, \theta _{\sim i, j}, y\mid d) \Big [\log \big [ p(\theta _{i}, \theta _j \mid y, \theta _{\sim i, j}, d)\big ]- \\ \log \big [p(\theta _{i}\mid y, \theta _{\sim i, j}, d)p(\theta _j\mid y, \theta _{\sim i, j}, d)\big ]\Big ] \,\text{d}\theta _{i} \,\text{d}\theta _{j} \,\text{d}\theta _{\sim i, j} \,\text{d}y, \end{aligned} \end{aligned}$$
(20)

where

$$\begin{aligned} p(\theta _i, \theta _j\mid y, \theta _{\sim i, j}, d)&\triangleq \frac{p(y\mid \theta _i, \theta _j, \theta _{\sim i, j}, d)p(\theta _i, \theta _j, \mid \theta _{\sim i, j}, d)}{p(y\mid \theta _{\sim i, j}, d)},\end{aligned}$$
(21a)
$$\begin{aligned} p(\theta _i \mid y, \theta _{\sim i, j}, d)&\triangleq \frac{p(y\mid \theta _i,\theta _{\sim i, j}, d)p(\theta _i\mid \theta _{\sim i, j}, d)}{p(y\mid \theta _{\sim i, j}, d)}, \end{aligned}$$
(21b)
$$\begin{aligned} p(\theta _j \mid y, \theta _{\sim i, j}, d)&\triangleq \frac{p(y\mid \theta _j,\theta _{\sim i, j}, d)p(\theta _j\mid \theta _{\sim i, j}, d)}{p(y\mid \theta _{\sim i, j}, d)}, \end{aligned}$$
(21c)

via Bayes’ theorem.

Remark 4

In terms of differential entropy, the conditional mutual information in (19) can be defined as

$$\begin{aligned} I(\Theta _i;\Theta _j\mid Y, \Theta _{\sim i, j}, d)&\triangleq H(\Theta _i\mid Y, \Theta _{\sim i, j}, d) - H(\Theta _i\mid \Theta _j, Y, \Theta _{\sim i, j}, d), \end{aligned}$$
(22a)
$$\begin{aligned}{}&= H({\Theta _i}, {\Theta _{\sim i, j}}, Y\mid d) + H({\Theta _j}, {\Theta _{\sim i, j}}, Y\mid d) \\&\qquad -H({\Theta _{\sim i,j}}, Y\mid d) - H({\Theta _i}, {\Theta _j},{\Theta _{\sim i, j}}, Y\mid d). \end{aligned}$$
(22b)

Such a formulation requires evaluating joint distributions, namely, \(p(\theta _i, \theta _{\sim i, j}, y\mid d)\), \(p(\theta _j, \theta _{\sim i, j}, y\mid d)\), \(p(\theta _{\sim i, j}, y\mid d)\), and \(p(\theta _i, \theta _j, \theta _{\sim i, j}, y\mid d)\). Typically, such joint distributions do not have a closed-form expression and must be approximated.

For the sake of illustration, assume that the parameters are uncorrelated with each other prior to observing the data. As a consequence of this assumption, any relations developed between \(\Theta _i\) and \(\Theta _j\) are discovered solely from data. Furthermore, it is also reasonable to assume that prior knowledge of the parameters is independent of the input of the model d. Substituting (21a) through (21c) into (20) we obtain

$$\begin{aligned} \begin{aligned} I(\Theta _{i};\Theta _{j}\mid Y, \Theta _{\sim i, j}, d) = \int \limits _{{\Theta _{i}, \Theta _{j}, \Theta _{\sim i, j}}, \mathcal {Y}}&p(\theta _{i}, \theta _{j}, \theta _{\sim i, j}, y\mid d) \Big [\log \big [p(y\mid \theta _{i}, \theta _j,\theta _{\sim i, j} d)\big ] \\&+ \log \big [p(y\mid \theta _{\sim i, j}, d)\big ]\\&- \log \big [p(y\mid \theta _{i}, \theta _{\sim i, j}, d)\big ]\\&- \log \big [p(y\mid \theta _j,\theta _{\sim i,j}, d)\big ]\Big ] \,\text{d}\theta _{i} \,\text{d}\theta _{j} \,\text{d}\theta _{\sim i, j} \,\text{d}y. \end{aligned} \end{aligned}$$
(23)

Similar to "Estimating practical identifiability" we can estimate the conditional mutual information in (23) using Monte-Carlo integration as \({\hat{I}}(\Theta _i;\Theta _j\mid Y, \Theta _{\sim i, j}, d) \approx I(\Theta _i;\Theta _j\mid Y, \Theta _{\sim i, j}, d)\) where

$$\begin{aligned} \begin{aligned} {\hat{I}}(\Theta _i;\Theta _j\mid Y, \Theta _{\sim i, j}, d) = \sum _{k=1}^{k=n_{\text {outer}}}\log \frac{p(y^{k}\mid \theta _{i}^{k}, \theta _j^{k},\theta _{\sim i, j}^{k}, d)p(y^{k}\mid \theta _{\sim i, j}^{k}, d)}{p(y^{k}\mid \theta _{i}^{k}, \theta _{\sim i, j}^{k}, d)p(y^{k}\mid \theta _j^{k},\theta _{\sim i,j}^{k}, d)}, \end{aligned} \end{aligned}$$
(24)

where \(\theta _i^{k}\), \(\theta _j^{k}\), and \(\theta _{\sim i, j}^{k}\) are drawn from the prior distributions \(p(\theta _i)\), \(p(\theta _j)\), and \(p(\theta _{\sim i, j})\), respectively; \(y^{k}\) is drawn from the likelihood distribution \(p(y\mid \theta _i^k, \theta _j^k, \theta _{\sim i, j}^k, d)\). The conditional evidence in (24) can be obtained by means of marginalization

$$\begin{aligned} p(y^{k}\mid \theta _{\sim i, j}^{k}, d)&\triangleq \int \limits _{{\Theta _i, \Theta _j}} p(y^{k}\mid \theta _i, \theta _j, \theta _{\sim i, j}^{k}, d)p(\theta _i, \theta _j) \,\text{d}\theta _i \,\text{d}\theta _j, \end{aligned}$$
(25a)
$$\begin{aligned} p(y^{k}\mid \theta _{i}^{k}, \theta _{\sim i, j}^{k}, d)&\triangleq \int \limits _{{\Theta _j}} p(y^{k}\mid \theta _j, \theta _{i}^{k}, \theta _{\sim i, j}^{k}, d)p(\theta _j) \,\text{d}\theta _j,\end{aligned}$$
(25b)
$$\begin{aligned} p(y^{k}\mid \theta _{j}^{k}, \theta _{\sim i, j}^{k}, d)&\triangleq \int \limits _{{\Theta _i}} p(y^{k}\mid \theta _i, \theta _{j}^{k}, \theta _{\sim i, j}^{k}, d)p(\theta _i) \,\text{d}\theta _i. \end{aligned}$$
(25c)

Similar to "Estimating practical identifiability" the conditional evidence in (25a) through (25c) can be efficiently estimated using importance sampling along with Gaussian quadrature rules. However, it should be noted that (25a) is an integral over a two-dimensional space, and therefore requires \(n_{\text {inner}}^{2}\) quadrature points.

Numerical experiments

This section presents numerical experiments to validate the information-theoretic approach to examine practical identifiability. The estimate obtained for global identifiability is compared with the variance-based global sensitivity analysis by means of first-order Sobol indices computed using \(\texttt{SALib}\)42, 43 (see S1 in the supplementary material). First, a linear Gaussian statistical model is considered for which practical identifiability can be analytically examined through the proposed information-theoretic approach. This model is computationally efficient and is therefore ideal for conducting estimator convergence studies. Next, the practical identifiability of a reduced kinetics model for methane-air combustion is considered. Reduced kinetics models are widely used in the numerical analysis of chemically reactive flows since embedding detailed chemistry of combustion is often infeasible. Such reduced kinetic models are often parameterized such that constructing models with practically identifiable parameters is desirable to improve confidence in the model prediction.

Application to a linear Gaussian model

The identifiability framework is now applied to a linear Gaussian problem for which closed-form expressions are available for the conditional mutual information in (7) and (19) (see S2 in the supplementary material). Consider the statistical model

$$\begin{aligned} y = {\mathscr {F}}(\theta , d)+ \xi \quad ;\xi \sim \mathcal {N}(0, \Gamma ), \end{aligned}$$
(26)

where \({\mathscr {F}}(\theta , d) = {\textbf{A}}\theta\) and \({\textbf{A}}\in {{\mathbb {R}}^{n\times m}}\) is called the feature matrix. The prior distribution is given by \(p(\theta ) = \mathcal {N}(\mu _{\Theta }, \Sigma _{\Theta })\) where \(\mu _{\Theta } \in {{\mathbb {R}}^{m}}\) and \(\Sigma _{\Theta }\in {{\mathbb {R}}^{m\times m}}\). Model likelihood is therefore given by \(p(y\mid \theta ) = \mathcal {N}({\textbf{A}}\theta , \Gamma )\) where \(\Gamma \in {{\mathbb {R}}^{n\times n}}\). Here, \(\mu _{\Theta }, \Sigma _{\Theta }\), and \(\Gamma\) are all constants and are considered known. The evidence distribution for this model is given by \(p(y) \triangleq \mathcal {N}(\mu _{Y}, \Sigma _{Y}) = \mathcal {N}({\textbf{A}}\mu _{\Theta }, {\textbf{A}}\Sigma _{\Theta }{\textbf{A}}^{T}+\Gamma )\), such that no model-form error exists. Consider a feature matrix

$$\begin{aligned} {\textbf{A}} = \begin{pmatrix} d_{1} &{} d_{1}^{2} &{}\dots &{} d_{1}^{m} \\ d_{2} &{} d_{2}^{2} &{}\dots &{} d_{2}^{m} \\ \vdots &{} \vdots &{}\ddots &{} \vdots \\ d_{n} &{} d_{n}^{2} &{}\dots &{} d_{n}^{m} \\ \end{pmatrix}, \end{aligned}$$
(27)

where \(d_{i\mid _{i=1}^n}\) are n linearly-spaced points in an interval \([-1, 1]\), and \(m=3\), which means that the statistical model has 3 uncertain parameters. Assume an uncorrelated measurement noise \(\Gamma =\sigma _{\xi }^2{\mathbb {I}}\) with \(\sigma _{\xi }^2 = 0.1\). For the purpose of parameter estimation, synthetic data is generated using (26) assuming \(\theta ^{*}=[1, 2, 3]^T\) and \(n=100\).

Figure 1
figure 1

Convergence of the variance in practical identifiability estimator of a linear Gaussian statistical model (left); the number of quadrature points \(n_{\text {inner}} = 50\) is considered and the number of Monte-Carlo integration samples \(n_{\text {outer}}\) is varied. Bias convergence for practical identifiability estimator in case of a linear Gaussian statistical model (right); \(n_{\text {outer}} = 10^{4}\) is considered and \(n_{\text {inner}}\) is varied. For a given \(n_{\text {inner}}\) increasing \(n_{\text {outer}}\) reduces the variance, whereas increasing \(n_{\text {inner}}\) for a given \(n_{\text {outer}}\) decreases the bias in the estimate.

Figure 2
figure 2

Convergence of the variance in practical identifiability for estimating parameter dependencies of a linear Gaussian statistical model (left); the number of quadrature points \(n_{\text {inner}} = 50\) is considered and the number of Monte-Carlo integration samples \(n_{\text {outer}}\) is varied. Bias convergence for estimating parameter dependencies in the case of a linear Gaussian statistical model (right); \(n_{\text {outer}} = 10^{4}\) is considered and \(n_{\text {inner}}\) is varied. For a given \(n_{\text {inner}}\) increasing \(n_{\text {outer}}\) reduces the variance, whereas increasing \(n_{\text {inner}}\) for a given \(n_{\text {outer}}\) decreases the bias in estimating the parameter dependencies.

Parameter identifiability

The goal of the framework developed in "Estimating practical identifiability" is to assess the practical identifiability of the statistical model in (26) before any controlled experiment is conducted. Consider \(\mu _{\Theta } = {\textbf{0}}\) and \(\Sigma _{\Theta } = {\mathbb {I}}\). Using such an uncorrelated prior distribution for the identifiability study ensures that the information obtained is only due to the observation of the data (as discussed in "Estimating parameter dependence"). Using historical parameter estimates can improve the prior (Remark 1) which can affect the identifiability analysis. However, we have not considered any such prior refinement.

Figure 1 illustrates the convergence of error in estimating information gain for each parameter using the estimator developed in "Estimating practical identifiability". As expected, for a fixed number of quadrature points, increasing the number of Monte-Carlo integration points decreases the variance in estimation. However, for a fixed \(n_{\text {outer}}\) increasing the number of quadrature points reduces the bias in the estimate. Figure 2 illustrates the variance and bias convergence of error in estimating parameter dependencies as described in "Estimating parameter dependence". As expected and observed, the variance in error is controlled by the accuracy of Monte-Carlo integration, that is, by \(n_{\text {outer}}\), and the bias is controlled by the quadrature approximation, that is, through \(n_{\text {inner}}\).

Figure 3
figure 3

First-order Sobol indices (left), information gain (center), and information gain equivalent variance \(\mathscr {C}(\Theta _i)\) (right) for linear Gaussian model. Sobol indices show that the output of the statistical model has the largest variability due to uncertainty in \(\Theta _1\), followed by \(\Theta _2\) and \(\Theta _3\). Variable \(\Theta _{1}\) exhibits the largest gain in information and therefore the highest practical identifiability, followed by \(\Theta _2\) and then \(\Theta _3\). For a direct observation model, the variable \(\Theta _1\) has the lowest measurement uncertainty, followed by \(\Theta _2\) and \(\Theta _3\).

Figure 4
figure 4

First-order Sobol indices (left) and estimated information gain (right) vs. measurement noise variance \(\sigma _{\xi }^2\) for linear Gaussian model. Increasing measurement noise covariance does not affect the variability of the output with respect to the parameters and therefore the first-order Sobol indices remain unchanged. However, the information gain decreases with increasing measurement noise covariance.

The first-order Sobol indices, estimated information gain, and information gain equivalent variance \(\mathscr {C}(\Theta _i)\) are shown in Figure 3. The estimated first-order Sobol indices (see S3 in the supplementary material for convergence study) show that the considered linear Gaussian forward model has the largest output variability due to uncertainty in \(\Theta _1\), followed by \(\Theta _2\) and \(\Theta _3\). This implies that the forward model is most sensitive to the parameter \(\theta _1\), followed by \(\theta _2\) and then \(\theta _3\). This is not surprising since \(d_{i\mid _{i=1}^n}\) are points in the interval \([-1, 1]\). Thus, according to the first-order Sobol indices, the relevance of the parameters follows the order: \(\theta _1\), \(\theta _2\), and \(\theta _3\). The estimated information gained agrees well with the truth. Further, the obtained trend suggests that the data is most informative about \(\Theta _1\), followed by \(\Theta _2\), and then \(\Theta _3\). As discussed in "Estimating practical identifiability", practical identifiability follows the same trend. Furthermore, as reported in previous work31, it can be seen that parameters with good identifiability characteristics also exhibit high model sensitivity. Using the hypothetical direct observation model described in "Estimating practical identifiability", the smallest measurement uncertainty is obtained for the variable \(\Theta _1\), followed by \(\Theta _2\) and \(\Theta _3\). That is, parameters with high practical identifiability are associated with low measurement uncertainty in a direct observation model.

Figure 4 illustrates the variability of the first-order Sobol indices and the estimated information gain with measurement noise variance \(\sigma _{\xi }^2\). The first-order Sobol indices only account for the parameter uncertainty, and therefore remain unchanged with an increase in measurement noise. However, the estimated information gain and thereby the practical identifiability decreases with measurement noise. This observation corroborates the intuition that large measurement uncertainty will lead to large uncertainty in the parameter estimation.

Figure 5 shows the second-order Sobol indices and the true and estimated dependencies between the parameter pairs for the linear Gaussian model. Examining the second-order Sobol indices (see S3 in the supplementary material for convergence study) shows that there are negligible interactions between parameter pairs. Estimated parameter dependencies agree well with the truth; the trend is preserved. The bias observed is due to the error in approximating the conditional evidence, as shown in Figure 2. It can be clearly seen that the parameters \(\Theta _1\) and \(\Theta _3\) have high dependencies. This means that these parameters compensate for one another such that they will have a combined effect on the output of the statistical model. These parameters are associated with the features \(d_{i\mid _{i=1}^n}\) and \(d_{i\mid _{i=1}^n}^3\) which, in fact, have a similar effect on the statistical model for \(d_{i\mid _{i=1}^n}\in [-1, 1]\). This observation also shows that the low practical identifiability of \(\Theta _3\) is mainly due to the underlying dependency with \(\Theta _1\) such that the pair \((\Theta _1, \Theta _3)\) has a combined effect in the statistical model.

Figure 5
figure 5

Second-order Sobol indices (left), true parameter dependencies (center), and estimated parameter dependencies (right) for linear Gaussian model. The second-order Sobol indices show negligible interactions between parameter pairs. The obtained estimate of parameter dependency agrees well with their true values; the trend is preserved. \(\Theta _1\) and \(\Theta _3\) have the largest dependency on one another, and therefore are expected to have a combined effect on the output of the statistical model.

Figure 6
figure 6

Correlation plot for samples obtained from the true posterior distribution (left) and the obtained aggregate posterior prediction (right) for linear Gaussian model. A negative correlation is observed between \(\Theta _1\) and \(\Theta _3\), whereas \(\Theta _2\) is uncorrelated from other parameters. Aggregate posterior prediction agrees well with the data and exhibits high certainty.

Parameter estimation

For the linear Gaussian model, the joint distribution \(p(\theta , y)\) can be written as

$$\begin{aligned} p(\theta , y) \triangleq \mathcal {N}(\mu _{\Theta , Y}, \Sigma _{\Theta , Y}) = \mathcal {N} \begin{pmatrix} \begin{bmatrix}\mu _{\Theta } \\ \mu _{Y}\end{bmatrix}, \begin{bmatrix}\Sigma _{\Theta } &{} \Sigma _{\Theta }{\textbf{A}}^{T} \\ {\textbf{A}}\Sigma _{\Theta } &{} \Sigma _{Y} \end{bmatrix} \end{pmatrix}, \end{aligned}$$
(28)

such that the analytical posterior distribution is given as \(p(\theta \mid y) = \mathcal {N}(\mu _{\Theta _{post}}, \Sigma _{\Theta _{post}})\), where \(\mu _{\Theta _{post}} = \mu _{\Theta } + \Sigma _{\Theta }{\textbf{A}}^{T}\Sigma _{Y}^{-1}(y-\mu _Y)\) and \(\Sigma _{\Theta _{post}} = \Sigma _{\Theta } - \Sigma _{\Theta }{\textbf{A}}^{T}\Sigma _{Y}^{-1}A\Sigma _{\Theta }\) using Gaussian conditioning.

Samples from the posterior distribution and the aggregate posterior prediction are shown in Figure 6. Variables \(\Theta _1\) and \(\Theta _3\) have a negative correlation, whereas \(\Theta _2\) is uncorrelated with other parameters. This means that the parameter variables \(\Theta _1\) and \(\Theta _3\) have (linear) dependencies on each other, and \(\Theta _2\) does not have such dependencies. These dependencies were suggested during the a priori analysis conducted on the statistical model as illustrated in Figure 5. Aggregate posterior prediction agrees well with the data and exhibits high certainty.

Figure 7
figure 7

Change in parameter variance \(\Delta (\sigma ^2_{\Theta _i})\) vs. measurement noise covariance \(\sigma _{\xi }^2\) for linear Gaussian model. Increasing measurement noise results in a smaller change in parameter variance from the prior to the posterior. Largest reduction in variance is observed for \(\Theta _2\), followed by \(\Theta _{1}\) and \(\Theta _{3}\).

Figure 7 illustrates the change in variance of the parameter \(\Theta _i\) defined as \(\Delta (\sigma ^2_{\Theta _i}) \triangleq \sigma ^2_{\Theta _i} - \sigma ^2_{\Theta _{i, post}}\) versus \(\sigma _{\xi }^2\). Parameter \(\Theta _2\) exhibits the smallest posterior uncertainty, followed by \(\Theta _1\) and \(\Theta _3\) for all \(\sigma _{\xi }^2\). While \(\Theta _1\) has the largest estimated information gain (Figure 3(center)), it exhibits dependencies with \(\Theta _3\) (Figure 5(right)), thereby resulting in larger posterior uncertainty in comparison to \(\Theta _2\). In practical applications, where model selection or parameter selection is critical, examining the information gain and parameter dependencies can therefore aid in finding parameters that can be estimated with high certainty. Increasing the measurement noise results in a smaller change in parameter variance, that is, the parameters exhibit larger posterior uncertainty. This is also shown by the variation of estimated information gain with measurement noise (Figure 4(right)). On the contrary, the first-order Sobol indices remain unchanged with measurement noise (Figure 4(left)).

Application to methane chemical kinetics

Accurate characterization of chemical kinetics is critical in the numerical prediction of reacting flows. Although there have been significant advancements in computational architectures and numerical methods, embedding the full chemical kinetics in numerical simulations is almost always infeasible. This is primarily because of the high-dimensional coupled ordinary differential equations that have to be solved to obtain concentrations of a large number of involved species. As a result, significant efforts have been made to develop reduced chemical kinetics models that seek to capture features such as ignition delay, adiabatic flame temperature, or flame speed observed using the true chemical kinetics44,45,46,47. These reduced mechanisms are typically formulated using a combination of theory and intuition, leaving unresolved chemistry, resulting in uncertainties in the relevant rate parameters48. Selecting a functional form of the modeled reaction rate terms that lead to reliable parameter estimation is highly desirable7, 49. This means that for high confidence in parameter estimation and thereby model prediction, the underlying parameterization of reaction rate terms must exhibit high practical identifiability.

Shock tube ignition is a canonical experiment used to develop and validate combustion reaction mechanism50. In such experiments, the reactant mixture behind the reflected shock experiences elevated temperature and pressure, followed by mixture combustion. An important quantity of interest in such experiments is the time difference between the onset of the reflected shock and the ignition of the reactant mixture, defined as the ignition delay \(t_{ign}\)51. Ignition delay is characterized as the time of maximum heat release or steepest change in reactant temperature and is therefore a key physio-chemical property for combustion systems.

Figure 8
figure 8

Temperature evolution for stoichiometric methane-air combustion at \(T_o\) = \({1500}{\text{K}}\), \(P_o\) = \({100}{\text{kPa}}\) by means of 2-step mechanism52 in comparison with GRI-Mech 3.0. Ignition delay time \(t_{ign}\) at which mixture releases maximum heat is under-predicted by nearly an order of magnitude by the 2-step mechanism.

To illustrate the practical identifiability framework we will consider stoichiometric methane-air combustion in a shock tube under an adiabatic, ideal-gas constant pressure ignition assumption. Typically, the chemical kinetics capturing detailed chemistry of methane-air ignition is computationally expensive due to hundreds of associated reactions. To model the reaction chemistry, consider the classical 2-step mechanism proposed by Westbrook et al.52 that accounts for the incomplete oxidation of methane. This reduced mechanism consists of a total of 6 species (5 reacting and 1 inert species, namely, \(\mathrm {N_2}\)) and 2 reactions (1 reversible), thus drastically reducing the cost of evaluating the chemical kinetics. The reactions involved in this reduced chemical kinetics model are

$$\begin{aligned} {\text{CH}_4} + \frac{3}{2} {\text{O}_2} \xrightarrow []{k_{1}} \mathrm{{CO}} + 2 {\text{H}}_{2}{\text{O}}, \end{aligned}$$
(29)
$$\begin{aligned} {\text{CO}} + \frac{1}{2} {\text{O}}_2 \mathop {\rightleftharpoons }\limits _{k_{2b}}^{ k_{2f} } {\text{CO}}_2, \end{aligned}$$
(30)

where the overall reaction rates are temperature-dependent and are modeled using the Arrhenius rate equation as

$$\begin{aligned} k_1\triangleq & {} Ae^{ \frac{-48400}{RT} }[{\text{CH}_4}]^{-0.3}[{\text{O}}_2]^{1.3}, \end{aligned}$$
(31)
$$\begin{aligned} k_{2f}\triangleq & {} 3.98\times 10^{14}e^{ \frac{-40000}{RT} }[{\text{CO}}][{\text{H}_2O}]^{0.5}[{\text{O}}_2]^{0.25},\end{aligned}$$
(32)
$$\begin{aligned} k_{2b}\triangleq & {} 5\times 10^{8}e^{ \frac{-40000}{RT} }[{\text{CO}_2}], \end{aligned}$$
(33)

where \(A=2.8\times 10^9\) is the pre-exponential factor, R is the ideal gas constant, and T is the temperature in Kelvin. To solve the resulting reaction equations CANTERA v2.6.053 is used. Figure 8 illustrates the temperature evolution using the 2-step mechanism and GRI-Mech 3.054 for an initial temperature \(T_o\) = \({1500}{\text{K}}\), initial pressure \(P_o\) = \({100}{\text{kPa}}\), and at a stoichiometric ratio \(\phi =1\). The GRI-Mech 3.0 mechanism consists of detailed chemical kinetics with 53 species and 325 reactions. As noticed, the 2-step mechanism under-predicts the ignition delay by nearly an order of magnitude. To improve the predictive capabilities of the 2-step mechanism a functional dependency for the pre-exponential factor can be introduced as \(\log A = \mathscr {G} (T_o, \phi )\), where

$$\begin{aligned} \mathscr {G} (T_o, \phi ) \triangleq {18 + \theta _1} + \tanh {(\theta _2 + \theta _3*\phi )\frac{T_o}{1000}}. \end{aligned}$$
(34)

Here, \(\theta _1, \theta _2\), and \(\theta _3\) are the uncertain model parameters. Similar parameterization has been used for n-dodecane reduced chemical kinetics48. It should be noted that while a more expressive functional form for the pre-exponential factor can be chosen in (34), the goal of the framework is to ascertain practical identifiability. For parameter estimation, consider the detailed GRI-Mech 3.0 to be the ‘exact solution’ to the combustion problem which can then be used to generate the data. Consider logarithm of ignition temperature at \(T_{o} = {1100}, {1400}, {1700}\) and \({2000}{\text{K}}\) at \(\phi =1.0\) and \(P_o\) = \({100}{\text{kPa}}\) as the available data for model calibration. Assume an uncorrelated measurement noise \(\Gamma =\sigma _{\xi }^2{\mathbb {I}}\) with \(\sigma _{\xi }^2 = 0.1\).

Parameter identifiabiliy

The practical identifiability framework is now applied to the methane-air combustion problem to examine the identifiability of the model parameters in (34) before any controlled experiments are conducted. Consider an uncorrelated prior distribution for the model parameters as \(\theta _1\sim \mathcal {N} {(0, 1)}; \theta _2\sim \mathcal {N} (0, 1); \theta _3\sim \mathcal {N} (0, 1)\). Such priors result in pre-exponential factors in an order similar to those previously reported52, and are therefore considered suitable for the study. Similar to "Application to a linear Gaussian model" historical estimates of the model parameters are not considered for examining identifiability.

Figure 9
figure 9

First-order Sobol effect indices (left), information gain (center), and information gain equivalent variance \(\mathscr {C}(\Theta _i)\) (right) for methane-air combustion model. Sobol indices show that the largest variability in the output of the statistical model is to uncertainty in \(\Theta _1\); \(\Theta _2\) and \(\Theta _3\) exhibit similar variabilities. Variable \(\Theta _{1}\) exhibits the information gain and therefore highest practical identifiability; \(\Theta _2\) and \(\Theta _3\) have similar information gain. Variable \(\Theta _1\) exhibits the lowest measurement uncertainty for the direct observation model, followed by similar uncertainty for \(\Theta _2\) and \(\Theta _3\).

The first-order Sobol indices , estimated information gain, and information gain equivalent variance \(\mathscr {C}(\Theta _i)\) are shown in Figure 9. The information gain is estimated using \(n_{\text {outer}}=12000\) Monte-Carlo samples, and \(n_{\text {inner}}=5\) quadrature points. Examining the first-order Sobol indices (see S3 in the supplementary material for convergence study), the output of the forward model exhibits the largest variability due to uncertainty in the variable \(\Theta _1\). Followed by similar variability in the model output with respect to \(\Theta _2\) and \(\Theta _3\). The largest information gain is observed for the variable \(\Theta _1\), followed by similar gains for \(\Theta _2\) and \(\Theta _3\). This means that \(\Theta _1\) will have the highest practical identifiability, followed by a much lower identifiability for \(\Theta _2\) and \(\Theta _3\). Using the hypothetical direct observation model as described in "Estimating practical identifiability", the variable \(\Theta _1\) with the largest practical identifiability exhibits the lowest measurement uncertainty, followed by similar uncertainty for \(\Theta _2\) and \(\Theta _3\).

Figure 10
figure 10

Second-order Sobol indices (left) and estimated parameter dependencies (right) for methane-air combustion model. The trend \(S_{2, 3} > S_{1, 2} \approx S_{1, 3}\) suggests that there are underlying interactions between the parameters \(\Theta _2\) and \(\Theta _3\). Pairs (\(\Theta _1\), \(\Theta _2\)) and (\(\Theta _1\), \(\Theta _3\)) have nearly the same dependencies on one another. Pair (\(\Theta _2\), \(\Theta _3\)) exhibit low dependencies.

Figure 10 shows the second-order Sobol indices and estimated parameter dependencies. The second-order Sobol indices (see S3 in the supplementary material for convergence study) follow the trend \(S_{2, 3} > S_{1, 2} \approx S_{1, 3}\), suggesting that there are underlying interactions between the parameters \(\Theta _2\) and \(\Theta _3\). As observed, the low identifiability of \(\Theta _2\) and \(\Theta _3\) suggested in Figure 9 is primarily due to the underlying dependencies between pairs \((\Theta _1, \Theta _2)\) and \((\Theta _1, \Theta _3)\). To estimate the parameter dependencies \(n_{\text {outer}}=12000\) Monte-Carlo samples, and \(n_{\text {inner}}=5\) and 10 quadrature points are used for single and two-dimensional integration space, respectively. Similar magnitude of parameter dependencies obtained for the pairs \((\Theta _1, \Theta _2)\) and \((\Theta _1, \Theta _3)\) in addition to similar information gain for \(\Theta _2\) and \(\Theta _3\) also suggest underlying symmetry with respect to \(\Theta _1\). This means that the interchange of \(\Theta _2\) and \(\Theta _3\) will not affect the output of the statistical model, which can be clearly seen in (34) for \(\phi =1\). This is also evident from the second-order Sobol indices which suggest that there is a combined effect on the output of the statistical model due to interactions between \(\Theta _2\) and \(\Theta _3\).

Figure 11
figure 11

Correlation plot for samples obtained from the posterior distribution (left) and the obtained aggregate posterior prediction (right) for the methane-air combustion model. Correlation plots do not reveal any relations among variables. Aggregate posterior prediction agrees well with the data and exhibits high certainty.

Parameter estimation

Now, let us consider the parameter estimation problem which seeks \(p(\theta \mid y)\), that is the posterior distribution. Typically, a closed-form expression for the posterior distribution is not available due to the non-linearities in the forward model or the chosen family of the prior distribution. As an alternative, sampling-based methods such as Markov Chain Monte Carlo (MCMC) that seek samples from an unnormalized posterior have gained significant attention. These methods construct Markov chains for which the stationary distribution is the posterior distribution. The Metropolis-Hastings algorithm is an MCMC method that can be used to generate a sequence of samples from any given probability distribution55. The adaptive Metropolis algorithm is a powerful modification to the Metropolis-Hastings algorithm and is used here to sample from the posterior distribution56.

Figure 11 illustrates the correlation between samples obtained using the Adaptive Metropolis algorithm and the obtained aggregate posterior prediction for ignition delay time. Any (linear) correlation is not observed between the variables; however, the joint distribution between pairs \((\Theta _1, \Theta _2)\) and \((\Theta _1, \Theta _3)\) show similarities. These similarities were also observed during the a priori analysis quantifying parameter dependencies as shown in Figure 10.

Figure 12
figure 12

Aggregate temperature evolution for methane-air combustion model. Aggregate prediction agrees well with the GRI-Mech 3.0 detailed mechanism.

Figure 13
figure 13

Aggregate species concentration evolution for methane-air combustion model. Aggregate prediction agrees well with the GRI-Mech 3.0 detailed mechanism.

The obtained aggregate prediction shows dramatic improvement over the 2-step mechanism in predicting ignition delay time over a wide range of temperatures. Using a functional form as (34) for the pre-exponential factor also improved the mixture temperature evolution, as shown in Figure 12. However, the adiabatic flame temperature, which is defined as the mixture temperature upon reaching equilibrium, is still being over-predicted. An improvement in the prediction of the evolution of species concentration over time is also noticed, as shown in Figure 13.

Concluding remarks and perspectives

Examining the practical identifiability of statistical models is useful in many applications, such as parameter estimation, model-form development, and model selection. Estimating practical identifiability prior to conducting controlled experiments or parameter estimation studies can assist in a choice of parametrization that can be associated with a high degree posterior certainty, thus improving confidence in estimation and model prediction.

In this work, a novel information-theoretic approach based on conditional mutual information is presented to assess global practical identifiability of a statistical model in a Bayesian framework. The proposed framework examines the expected information gain for each parameter from the data before performing controlled experiments. Parameters with higher information gain are characterized by having higher posterior certainty, and thereby have higher practical identifiability. The adopted viewpoint is that the practical identifiability of a parameter does not have a binary answer, rather it is the relative practical identifiability among parameters that is useful in practice.

In contrast to previous numerical approaches used to study practical identifiability, the proposed approach has the following notable advantages: first, no controlled experiment or data is required to conduct the practical identifiability analysis; second, different forms of uncertainties, such as model-form, parameter, or measurement can be taken into account; third, the framework does not make assumptions about the distribution of the data and parameters as in the previous methods; fourth, the estimator provides knowledge about global identifiability and is therefore not dependent on a particular realization of the parameters. To provide a physical interpretation to practical identifiability in the context of examining information gain for each parameter, an information gain equivalent variance for a direct observation model is also presented. The practical identifiability framework is then extended to examine dependencies among parameter pairs. Even if an individual parameter exhibits poor practical identifiability characteristics, it can belong to an identifiable subset such that parameters within the subset have functional relationships with one another. Parameters within such an identifiable subset have a combined effect on the statistical model and can be collectively identified. To find such subsets, a novel a priori estimator is proposed to quantify the expected dependencies between parameter pairs that emerge a posteriori.

To illustrate the framework, two statistical models are considered: (a) a linear Gaussian model and (b) a non-linear methane-air reduced kinetics model. For the linear Gaussian model, it is shown that parameters with large information gain and low parameter dependencies can be estimated with high confidence. The variance-based global sensitivity analysis (GSA) also illustrates that parameter sensitivity is necessary for identifiability. However, as conclusively shown, the inability of variance-based GSA to capture different forms of uncertainties can lead to unreliable estimates for practical identifiability. The information gain equivalent variance obtained using a direct observation model shows that parameters with high practical identifiability will be associated with low measurement uncertainty if observed directly. In the case of the methane-air reduced kinetics model, it is shown that parameters with large dependencies can have low information gain and therefore low practical identifiability. Further, the proposed estimator can capture non-linear dependencies and reveal structures within the parameter space before performing controlled experiments. Such non-linear dependencies cannot be observed when considering a posteriori parameter correlations, as only linear relations can be well understood.