Estimating global identifiability using conditional mutual information in a Bayesian framework

A novel information-theoretic approach is proposed to assess the global practical identifiability of Bayesian statistical models. Based on the concept of conditional mutual information, an estimate of information gained for each model parameter is used to quantify the identifiability with practical considerations. No assumptions are made about the structure of the statistical model or the prior distribution while constructing the estimator. The estimator has the following notable advantages: first, no controlled experiment or data is required to conduct the practical identifiability analysis; second, unlike popular variance-based global sensitivity analysis methods, different forms of uncertainties, such as model-form, parameter, or measurement can be taken into account; third, the identifiability analysis is global, and therefore independent of a realization of the parameters. If an individual parameter has low identifiability, it can belong to an identifiable subset such that parameters within the subset have a functional relationship and thus have a combined effect on the statistical model. The practical identifiability framework is extended to highlight the dependencies between parameter pairs that emerge a posteriori to find identifiable parameter subsets. The applicability of the proposed approach is demonstrated using a linear Gaussian model and a non-linear methane-air reduced kinetics model. It is shown that by examining the information gained for each model parameter along with its dependencies with other parameters, a subset of parameters that can be estimated with high posterior certainty can be found.


Introduction
With the growth in computational capabilities, statistical models are becoming increasingly complex to make predictions under various design conditions.These models often contain uncertain parameters which must be estimated using data obtained from controlled experiments.While methods for parameter estimation have matured significantly, there remain notable challenges for a statistical model and estimated parameters to be considered reliable.One such challenge is the practical identifiability of model parameters which is defined as the possibility of estimating each parameter with high confidence given different forms of uncertainties, such as parameter, modelform, or measurement are present [1][2][3].Low practical identifiability of the statistical model can lead to an ill-posed estimation problem which becomes a critical issue when the parameters have a physical interpretation and decisions are to be made using their estimated values [4,5].Further, such identifiability deficit can also lead to an unreliable model prediction, and therefore such statistical models are not suitable for practical applications [6,7].Therefore, for a reliable parameter estimation process and model prediction, it is of significant interest that the practical identifiability is evaluated before any controlled experiment or parameter estimation studies are conducted [8,9].
In frequentist statistics, the problem of practical identifiability is to examine the possibility of unique estimation of model parameters θ [8].Under such considerations, methods examining identifiability are broadly classified into local and global identifiability methods.While the former examines the possibility that θ = θ k is a unique parameter estimate within its neighborhood N (θ k ) in the parameter space, the latter is concerned with the uniqueness of θ k when considering the entire parameter space.Local sensitivity analysis has been widely used to find parameters that produce large variability in the model response [10][11][12][13].In such an analysis, parameters resulting in large variability are considered relevant and therefore assumed to be identifiable for parameter estimation.However, the parameters associated with large model sensitivities could still have poor identifiability characteristics [14].Another class of frequentist identification methods is based on the analysis of the properties of the Fisher information matrix (FIM).Staley and Yue [15] proposed that the positive definiteness of the FIM is the necessary and sufficient condition for the parameters to be considered practically identifiable.Similarly, Rothenberg [16] showed that the identifiability of the parameters is equivalent to the non-singularity of the FIM.Later, Stoica and Söderström [17] and Petersen et al. [18] showed that models with singular FIM could also be identifiable.Weijers and Vanrolleghem [19] extended the classical FIM analysis and showed that even if an individual parameter has low identifiability it can belong to an identifiable subset, such that the subset is practically identifiable.Parameters within such subsets have functional relationships with each other, thus resulting in a combined effect on the model response.It has been shown that such identifiable subsets can be found by examining the condition number (E-criterion) and determinant (D-criterion), and selecting parameter pairs with the smallest condition number and largest determinant.Following Weijers and Vanrolleghem [19], Machado et al. [20] considered the D to E ratio to examine practical identifiability and to find the identifiable subsets.Another popular identification technique is likelihood profiling [4,[21][22][23].The method is based on finding the likelihood profile of a parameter by maximizing the likelihood with respect to the rest of the parameters.Parameters for which its likelihood profile is shallow are deemed to have low practical identifiability.In addition to evaluating practical identifiability, likelihood profiling could also be used to find functional relationships between parameters, which is helpful for model reparameterization [24,25].However, due to the several re-optimizations required to obtain the likelihood profiles, the method does not scale well with parameter space and could quickly become computationally intractable.While methods based on FIM or likelihood profiling have gained significant popularity they only examine local identifiability.This means that the estimate of practical identifiability is dependent on θ k for which the analysis is conducted and is only valid within its neighborhood N (θ k ).To overcome the limitations of local identifiability, global identifiability methods using Kullback-Leibler divergence [26] and identifying functions [27] have been proposed.However, such methods are computationally complex and not suitable for practical problems.Moreover, since such methods are based on frequentist statistics, they are unable to account for parametric uncertainty and therefore unable to provide an honest representation of global practical identifiability.
There have been few studies examining global practical identifiability in a Bayesian framework.Early attempts were based on global sensitivity analysis (GSA) that apportions the variability (either by derivatives or variance) of the model output due to the uncertainty in each parameter [6,[28][29][30].Unlike local sensitivity analysis, GSA-based methods simultaneously vary model parameters according to their distributions, thus providing a measure of global sensitivity that is independent of a particular parameter realization.However, global parameter sensitivity does not guarantee global practical identifiability [31].Pant and Lombardi [32] and Capellari et al. [33] formulated the problem of practical identifiability as gaining sufficient information about each model parameter from data.An information-theoretic approach was used to quantify the information gained, such that larger information gain would mean larger practical identifiability.However, assumptions about the structure of parameter-data joint distribution were made when developing the estimator.A similar approach was used by Ebrahimian et al. [34] where the change in parameter uncertainty moving from the prior distribution to the estimated posterior distribution was used to quantify information gained.Pant [35] proposed information sensitivity functions by combining information theory and sensitivity analysis to quantify information gain.However, the joint distribution between the parameters and the data was assumed to be Gaussian.
Framed in a Bayesian setting, the information-theoretic approach to identifiability provides a natural extension to include different forms of uncertainties that are present in practical problems.In this work, a novel estimator is developed from an information-theoretic perspective to examine the practical identifiability of a statistical model.The expected information gained from the data for each model parameter is used as a metric to quantify practical identifiability.In contrast to the aforementioned methods based on information theory, the proposed approach has the following novel advantages: first, the estimator for information gain can be used for an a priori analysis, that is, no data is required to evaluate practical identifiability; second, the framework can account for different forms of uncertainty, such as model-form, parameter, and measurement; third, the framework does not make assumptions about the joint distribution between the data and parameters as in the previous methods; fourth, the identifiability analysis is global, rather than being dependent on a particular realization of model parameters.Another contribution of this work is an informationtheoretic estimator to highlight dependencies between parameter pairs that emerge a posteriori, however, in an a priori manner.Combining the knowledge about information gained about each parameter and parameter dependencies using the proposed approach, it is possible to find parameter subsets that can be estimated with high posterior certainty before any controlled experiment is performed.Broadly, this can dramatically reduce the cost of parameter estimation, inform model-form selection or refinement, and associate a degree of reliability to the parameter estimation.
The manuscript is organized as follows.In §2.1 the Bayesian paradigm for parameter estimation is presented.In §2.2 differential entropy and mutual information are presented as information-theoretic tools to quantify the uncertainty associated with random variables and information gain, respectively.In §2.3 an a priori estimator is developed to quantify global practical identifiability in a Bayesian construct.In §2.4 the problem of estimating parameter dependencies is addressed.An a priori estimator is developed to quantify parameter dependencies developed a posteriori.The practical identifiability framework is applied to a linear Gaussian statistical model and methane-air reduced kinetics model; results are presented in §3.Concluding remarks are presented in §4.

Quantifying practical identifiability in a Bayesian setting
In this section, we first present the Bayesian framework for parameter estimation.Next, we utilize the concepts of differential entropy and mutual information from information theory to quantify information contained in the data about uncertain parameters of the statistical model.Thereafter, we extend the idea of mutual information to develop an a priori estimator to quantify practical identifiability in a Bayesian setting.While in most statistical models low practical identifiability is due to insufficient information about model parameters, it may often be the case that identifiable subsets exist.Parameters within such subsets have functional relations and exhibit a combined effect on the statistical model, such that the subset is practically identifiable.
To find such identifiable subsets, we develop an estimator to highlight dependencies between parameter pairs that emerge a posteriori.

Bayesian parameter inference
Consider the observation/data y ∈ Y of a physical system which is a realization of a random variable Y : Ω → R n distributed as p(y), where Y is the set of all possible realizations of the random variable.Herein, we will use the same lower-case, uppercase, and symbol notation to represent a realization, random variable, and the set of all possible observations, respectively.Consider another real-valued random variable Θ : Ω → R m distributed as p(θ) : R m → R + which denotes the uncertain parameters of the model.The data is assumed to be generated by the statistical model given as where F (θ, d) : R m × R ℓ → R n is the forward model which maps the parameters and model inputs d ∈ R ℓ to the prediction space.For simplicity, consider the input of the model d as known.The random variable ξ is the additive measurement noise or uncertainty in our measurement.Once the observations are collected using controlled experiments, the prior belief of the parameter distribution p(θ | d) can be updated to obtain the posterior distribution p(θ | y, d) via the Bayes' rule where p(y | θ, d) is called the model likelihood and p(y | d) is called the evidence.

Quantifying information gain
Updating parameter belief from the prior to posterior in ( 2) is associated with a gain in information from the data.This gain can be quantified as the change in the uncertainty of the parameters Θ.As an example, consider a 1D Gaussian prior and posterior distribution such that the information gain can be quantified as a change in variance (a measure of uncertainty) of the parameter distribution.A greater reduction in parameter uncertainty is a consequence of more information gained from the data.
In general, the change in parameter uncertainty between the prior and posterior distributions for a given input of the model d ∈ D is defined as where U is an operator quantifying the amount of uncertainty or the lack of information for a given probability distribution.Thus, the expected information gained about the parameters is defined as One popular choice for the operator U is the differential entropy [32,33,36] which is defined as the average Shannon information [37] for a given probability distribution.Mathematically, for a continuous random variable Z : Ω → R t with distribution p(z) : R t → R + and support Z, the differential entropy is defined as Using differential entropy to quantify the uncertainty of a probability distribution, the change in uncertainty (or expected information gain) of Θ can be evaluated as The quantity I(Θ; Y | d) is called mutual information between the random variables Θ and Y given the model inputs D = d [38].In the case of discrete random variables the mutual information is measured in bits, whereas in the case of continuous variables the units are nats.
Remark 1.The mutual information I(Θ; Y | d) is always non-negative [38].This means that updating the parameter belief from the prior to the posterior cannot increase parameter uncertainty.

Estimating practical identifiability
In a Bayesian framework where the parameters are treated as random variables, practical identifiability can be determined by examining information gained about each model parameter [32,35].Parameters for which the data is uninformative cannot be estimated with a high degree of confidence and therefore are practically unidentifiable.While mutual information in (6e) is a useful quantity to study information gained from data about the entire parameter set, it does not apportion information gained about each parameter.Therefore, to examine practical identifiability, we define a conditional mutual information where Θ ∼i are all parameters except Θ i and E Θ∼i [•] denotes the expectation over p(θ ∼i | d).Using such conditional mutual information for practical identifiability is based on the intuition that on average high information gained about Θ i means high practical identifiability.We can thus present the following definitions for identifiability in a Bayesian setting.Definition 1 (Local Identifiability).Given a statistical model with parameters Θ, a parameter Θ i ∈ Θ is said to be locally identifiable if sufficient information is gained about it for a particular realization θ ∼i of Θ ∼i .Definition 2 (Global Identifiability).Given a statistical model with parameters Θ, a parameter Θ i ∈ Θ is said to be globally identifiable if sufficient information is gained about it on average with respect to the distribution p(θ ∼i | d).
The expectation over possible realizations of Θ ∼i in ( 7) therefore provides a statistical measure of global practical identifiability [8].On the contrary, evaluating (7) at a fixed θ ∼i will result in a local identifiability measure, which means that the information gained about Θ i will implicitly depend on θ ∼i .
Typically, (7) does not have a closed-form expression and must be estimated numerically.Using the definition of differential entropy in (5) the conditional mutual information can be written as Remark 2. In terms of differential entropy, the conditional mutual information in ( 7) can be defined as In case the parameters are uncorrelated, While this formulation does not involve any conditional distributions involving the parameters or data, it requires joint distributions, namely, Typically, such joint distributions do not have a closed-form expression and must be approximated.
In the special case where Θ i perfectly correlates with Θ ∼i such that the realization of θ ∼i provides sufficient information about θ i , the term inside the logarithm in (8c) becomes identically unity.For such a case, the data is not informative about Θ i and the effective parameter dimensionality m eff becomes less than m.For a more general case, Monte-Carlo integration can be used to approximate the high dimensional integral as where and n outer is the number of Monte-Carlo samples.Typically, conditional evidence p(y | θ ∼i , d) does not have a closed-form expression, and therefore p(y k | θ k ∼i , d) must be numerically approximated.One approach is to rewrite the conditional evidence p(y k | θ k ∼i , d) by means of marginalization as For simplicity, assume that the parameters are uncorrelated prior to observing the data, and are also independent of the model inputs d.As a result, ( 12) can be re-written as This results in a low-dimensional integral over a univariate prior distribution p(θ i ).Evaluating ( 13) using the classical Monte-Carlo integration can dramatically increase the overall cost of estimating the conditional mutual information in (11), especially if the likelihood evaluation is computationally expensive.In the special case where the priors are normally distributed, this cost can be reduced by considering a ζ-point Gaussian quadrature rule.Using the quadrature approximation in (13) gives where θ ζ i and γ ζ are the ζ th quadrature point and weight, respectively; n inner is the number of quadrature points.Here, we use the Gauss-Hermite quadrature rule, which uses the t th order Hermite polynomial and will be exact for polynomials up to order 2t − 1 [39].In a much more general case where the prior distributions can be non-Gaussian (however, can still be evaluated), the cost of estimating ( 13) can be reduced by using importance sampling with a proposal distribution q(θ i ).Using importance sampling we can rewrite (13) as where w(θ i ) = p(θ i )/q(θ i ) are the importance sampling weights.In the case where the proposal distribution q(θ i ) is Gaussian, the quadrature rule can be applied to (15) as Combining the estimator for conditional evidence (( 14) or ( 16)) with (11) results in a biased estimator for conditional mutual information [40,41].While the variance is controlled by the numerical accuracy of estimating the high-dimensional integral in (11), the bias is governed by the accuracy of approximating the conditional evidence in (12).This means that the variance is controlled by n outer Monte-Carlo samples and bias by n inner quadrature points.
In practice, estimating conditional evidence can become computationally expensive, especially when the variability in the output of the forward model is high with respect to Θ i given Θ ∼i = θ ∼i , that is, large ∇ θi F (θ, d)| Θ∼i=θ∼i .For such statistical models, conditional evidence can become near zero such that numerical approximation by means of vanilla Monte-Carlo integration or Gaussian quadrature in ( 14) can be challenging [41].Using an estimator based on importance sampling for conditional evidence as shown in (16) can alleviate this problem by carefully choosing the density of the proposal q(θ i ).As an example, consider the case where the additive measurement noise ξ is normally distributed as N (0, Γ) such that the likelihood of the model is distributed as p(y | θ) = N (F (θ, d), Γ), and y k is sampled according to N (F (θ k i , θ k ∼i , d), Γ).In the case where model predictions have large variability with respect to the parameter Θ i for a given Θ ∼i = θ ∼i the model likelihood can become small.For such a case, the importance-sampling-based estimator given in ( 16) can be used by constructing a proposal around the sample proposal is the variance of the proposal distribution.This results in a robust estimation of conditional evidence and prevents infinite values for conditional mutual information.Here, we consider (16) to estimate conditional evidence.Remark 3. Assessing the practical identifiability in a Bayesian framework is dependent on the prior distribution.Although the framework presented in this article is entirely an a priori analysis of practical identifiability, prior selection can affect estimated identifiability.Prior selection in itself is an extensive area of research and is not considered a part of this work.

Physical interpretation of identifiability in an information-theoretic framework
Assessing practical identifiability using the conditional mutual information described in (7) provides a relative measure of how many bits (or nats) of information is gained for a particular parameter.In practical applications where this information gain can vary on disparate scales, it is useful to associate a physical interpretation to identifiability.Following [32], consider a hypothetical direct observation statistical model given as ψ ≜ θ i + Λ, where Λ ∼ N (0, σ 2 Λ ) is the additive measurement noise.Given this observation model, we can define an information gain equivalent variance C (Θ i ) as the measurement uncertainty in the direct observation model given Large C (Θ i ) would mean that the information gained about Θ i (using (7)) for the statistical model (1) would lead to higher measurement uncertainty if the parameter is observed directly.
If the prior distribution p(θ i ) can be approximated by means of an equivalent normal distribution N (µ e , σ 2 e ) then I(Θ i ; Ψ) is given as such that This information gain equivalent variance only depends on the information gained for the model parameter, and thus, can be used as a metric to compare different model parameters.

Estimating parameter dependence
In most statistical models, unknown functional relationships or dependencies may be present between parameters such that multiple parameters have a combined effect on the statistical model.Such parameters can form an identifiable subset where an individual parameter will exhibit low identifiability, however, the subset is collectively identifiable.This means that the data is uninformative or weakly informative about an individual parameter within the subset, whereas it is informative about the entire subset.As an example, consider the statistical model: y = θ 1 θ 2 * d + ξ for which individually identifying Θ 1 or Θ 2 is not possible as they have a combined effect on the statistical model.However, it is clear that Θ 1 and Θ 2 belong to an identifiable subset such that the pair (Θ 1 , Θ 2 ) is identifiable.Thus, considering the statistical model given by y = θ 3 * d + ξ where θ 3 = θ 1 * θ 2 will have better identifiability characteristics.
For such statistical models, the traditional method of examining correlations between parameters is often insufficient, as it only reveals linear functional relations between random variables.
To highlight the parameter dependencies, consider the statistical model given in (1) such that we are interested in examining the relations between Θ i ∈ Θ and Θ j ∈ Θ that emerge a posteriori.While the conditional mutual information presented in §2.3 provides information on the practical identifiability of an individual parameter, it does not provide information about dependencies developed between pairs of parameters.To quantify such dependencies, we define a conditional mutual information between parameter pairs which evaluates the average information between the variables Θ i and Θ j that is obtained a posteriori.Here, Θ ∼i,j is defined as all the parameters of the statistical model except Θ i and Θ j .A closed-form expression for (19) is typically not available, and therefore a numerical approximation is required.In integral form, ( 19) is given as where via Bayes' theorem.Remark 4. In terms of differential entropy, the conditional mutual information in (19) can be defined as Such a formulation requires evaluating joint distributions, namely, p(θ i , θ ∼i,j , y | d), p(θ j , θ ∼i,j , y | d), p(θ ∼i,j , y | d), and p(θ i , θ j , θ ∼i,j , y | d).Typically, such joint distributions do not have a closed-form expression and must be approximated.
For the sake of illustration, assume that the parameters are uncorrelated with each other prior to observing the data.As a consequence of this assumption, any relations developed between Θ i and Θ j are discovered solely from data.Furthermore, it is also reasonable to assume that prior knowledge of the parameters is independent of the input of the model d.Substituting (21a) through (21c) into (20) we obtain − log p(y | θ j , θ ∼i,j , d) dθ i dθ j dθ ∼i,j dy.(23) Similar to §2.3 we can estimate the conditional mutual information in (23) using Monte-Carlo integration as Î(Θ where θ k i , θ k j , and θ k ∼i,j are drawn from the prior distributions p(θ i ), p(θ j ), and p(θ ∼i,j ), respectively; y k is drawn from the likelihood distribution p(y | θ k i , θ k j , θ k ∼i,j , d).The conditional evidence in (24) can be obtained by means of marginalization Similar to §2.3 the conditional evidence in (25a) through (25c) can be efficiently estimated using importance sampling along with Gaussian quadrature rules.However, it should be noted that (25a) is an integral over a two-dimensional space, and therefore requires n 2 inner quadrature points.

Numerical experiments
This section presents numerical experiments to validate the information-theoretic approach to examine practical identifiability.The estimate obtained for global identifiability is compared with the variance-based global sensitivity analysis by means of first-order Sobol indices (see §A in the Appendix).First, a linear Gaussian statistical model is considered for which practical identifiability can be analytically examined through the proposed information-theoretic approach.This model is computationally efficient and is therefore ideal for conducting estimator convergence studies.Next, the practical identifiability of a reduced kinetics model for methane-air combustion is considered.Reduced kinetics models are widely used in the numerical analysis of chemically reactive flows since embedding detailed chemistry of combustion is often infeasible.Such reduced kinetic models are often parameterized such that constructing models with practically identifiable parameters is desirable to improve confidence in the model prediction.

Application to a linear Gaussian model
The identifiability framework is now applied to a linear Gaussian problem for which closed-form expressions are available for the conditional mutual information in ( 7) and (19) (see §B in the Appendix).Consider the statistical model where F (θ, d) = Aθ and A ∈ R n×m is called the feature matrix.The prior distribution is given by p(θ) = N (µ Θ , Σ Θ ) where µ Θ ∈ R m and Σ Θ ∈ R m×m .Model likelihood is therefore given by p(y | θ) = N (Aθ, Γ) where Γ ∈ R n×n .Here, µ Θ , Σ Θ , and Γ are all constants and are considered known.The evidence distribution for this model is given by p(y) ≜ N (µ Y , Σ Y ) = N (Aµ Θ , AΣ Θ A T + Γ), such that no model-form error exists.Consider a feature matrix where d i| n i=1 are n linearly-spaced points in an interval [−1, 1], and m = 3, which means that the statistical model has 3 uncertain parameters.Assume an uncorrelated measurement noise Γ = σ 2 ξ I with σ 2 ξ = 0.1.For the purpose of parameter estimation, synthetic data is generated using (26) assuming θ * = [1, 2, 3] T and n = 100.
Fig. 1 Convergence of the variance in practical identifiability estimator of a linear Gaussian statistical model (left); the number of quadrature points n inner = 50 is considered and the number of Monte-Carlo integration samples nouter is varied.Bias convergence for practical identifiability estimator in case of a linear Gaussian statistical model (right); nouter = 10 4 is considered and n inner is varied.For a given n inner increasing nouter reduces the variance, whereas increasing n inner for a given nouter decreases the bias in the estimate.

Parameter identifiability
The goal of the framework developed in §2.3 is to assess the practical identifiability of the statistical model in (26) before any controlled experiment is conducted.Consider µ Θ = 0 and Σ Θ = I.Using such an uncorrelated prior distribution for the identifiability study ensures that the information obtained is only due to the observation of the data (as discussed in §2.4).Using historical parameter estimates can improve the prior (Remark 1) which can affect the identifiability analysis.However, we have not considered any such prior refinement.
Figure 1 illustrates the convergence of error in estimating information gain for each parameter using the estimator developed in §2.3.As expected, for a fixed number of quadrature points, increasing the number of Monte-Carlo integration points decreases the variance in estimation.However, for a fixed n outer increasing the number Fig. 2 Convergence of the variance in practical identifiability for estimating parameter dependencies of a linear Gaussian statistical model (left); the number of quadrature points n inner = 50 is considered and the number of Monte-Carlo integration samples nouter is varied.Bias convergence for estimating parameter dependencies in the case of a linear Gaussian statistical model (right); nouter = 10 4 is considered and n inner is varied.For a given n inner increasing nouter reduces the variance, whereas increasing n inner for a given nouter decreases the bias in estimating the parameter dependencies.
of quadrature points reduces the bias in the estimate.Figure 2 illustrates the variance and bias convergence of error in estimating parameter dependencies as described in §2.4.As expected and observed, the variance in error is controlled by the accuracy of Monte-Carlo integration, that is, by n outer , and the bias is controlled by the quadrature approximation, that is, through n inner .
Fig. 3 First-order Sobol indices (left), information gain (center), and information gain equivalent variance C (Θ i ) (right) for linear Gaussian model.Sobol indices show that the output of the statistical model has the largest variability due to uncertainty in Θ 1 , followed by Θ 2 and Θ 3 .Variable Θ 1 exhibits the largest gain in information and therefore the highest practical identifiability, followed by Θ 2 and then Θ 3 .For a direct observation model, the variable Θ 1 has the lowest measurement uncertainty, followed by Θ 2 and Θ 3 .
The first-order Sobol indices, estimated information gain, and information gain equivalent variance C (Θ i ) are shown in figure 3. The estimated first-order Sobol indices (see figure C1 in the Appendix for convergence study) show that the considered linear Gaussian forward model has the largest output variability due to uncertainty in Θ 1 , followed by Θ 2 and Θ 3 .This implies that the forward model is most sensitive to the parameter θ 1 , followed by θ 2 and then θ 3 .This is not surprising since d i| n i=1 are points in the interval [−1, 1].Thus, according to the first-order Sobol indices, the relevance of the parameters follows the order: θ 1 , θ 2 , and θ 3 .The estimated information gained ξ for linear Gaussian model.Increasing measurement noise covariance does not affect the variability of the output with respect to the parameters and therefore the first-order Sobol indices remain unchanged.However, the information gain decreases with increasing measurement noise covariance.
agrees well with the truth.Further, the obtained trend suggests that the data is most informative about Θ 1 , followed by Θ 2 , and then Θ 3 .As discussed in §2.3, practical identifiability follows the same trend.Furthermore, as suggested by Wu et al. [31], it can be seen that parameters with good identifiability characteristics also exhibit high model sensitivity.Using the hypothetical direct observation model described in §2.3.1, the smallest measurement uncertainty is obtained for the variable Θ 1 , followed by Θ 2 and Θ 3 .That is, parameters with high practical identifiability are associated with low measurement uncertainty in a direct observation model.
Figure 4 illustrates the variability of the first-order Sobol indices and the estimated information gain with measurement noise variance σ 2 ξ .The first-order Sobol indices only account for the parameter uncertainty, and therefore remain unchanged with an increase in measurement noise.However, the estimated information gain and thereby the practical identifiability decreases with measurement noise.This observation corroborates the intuition that large measurement uncertainty will lead to large uncertainty in the parameter estimation.
Figure 5 shows the second-order Sobol indices and the true and estimated dependencies between the parameter pairs for the linear Gaussian model.Examining the second-order Sobol indices (see figure C1 in the Appendix for convergence study) shows that there are negligible interactions between parameter pairs.Estimated parameter dependencies agree well with the truth; the trend is preserved.The bias observed is due to the error in approximating the conditional evidence, as shown in figure 2. It can be clearly seen that the parameters Θ 1 and Θ 3 have high dependencies.This means that these parameters compensate for one another such that they will have a combined effect on the output of the statistical model.These parameters are associated with the features d i| n i=1 and d 3 which, in fact, have a similar effect on the statistical model for . This observation also shows that the low practical identifiability of Θ 3 is mainly due to the underlying dependency with Θ 1 such that the pair (Θ 1 , Θ 3 ) has a combined effect in the statistical model.

Parameter estimation
For the linear Gaussian model, the joint distribution p(θ, y) can be written as such that the analytical posterior distribution is given as Samples from the posterior distribution and the aggregate posterior prediction are shown in figure 6. Variables Θ 1 and Θ 3 have a negative correlation, whereas Θ 2 is uncorrelated with other parameters.This means that the parameter variables Θ 1 and Θ 3 have (linear) dependencies on each other, and Θ 2 does not have such dependencies.These dependencies were suggested during the a priori analysis conducted on the statistical model illustrated in figure 5. Aggregate posterior prediction agrees well with the data and exhibits high certainty.Θi − σ 2 Θi,post versus σ 2 ξ .Parameter Θ 2 exhibits the smallest posterior uncertainty, followed by Θ 1 and Θ 3 for all σ 2 ξ .While Θ 1 has the largest estimated information gain (figure 3 (center)), it exhibits dependencies with Θ 3 (figure 5 (right)), thereby resulting in larger posterior uncertainty in comparison to Θ 2 .In practical applications, where model selection or parameter selection is critical, examining the information gain and parameter dependencies can therefore aid in finding parameters that can be estimated with high certainty.Increasing the measurement noise results in a smaller change in parameter variance, that is, the parameters exhibit larger posterior uncertainty.This is also shown by the variation of estimated information gain with measurement noise (figure 4 (right)).On the contrary, the first-order Sobol indices remain unchanged with measurement noise (figure 4 (left)).

Application to methane chemical kinetics
Accurate characterization of chemical kinetics is critical in the numerical prediction of reacting flows.Although there have been significant advancements in computational architectures and numerical methods, embedding the full chemical kinetics in numerical simulations is almost always infeasible.This is primarily because of the highdimensional coupled ordinary differential equations that have to be solved to obtain concentrations of a large number of involved species.As a result, significant efforts have been made to develop reduced chemical kinetics models that seek to capture features such as ignition delay, adiabatic flame temperature, or flame speed observed using the true chemical kinetics [42][43][44][45].These reduced mechanisms are typically formulated using a combination of theory and intuition, leaving unresolved chemistry, resulting in uncertainties in the relevant rate parameters [46].Selecting a functional form of the modeled reaction rate terms that lead to reliable parameter estimation is highly desirable [7,47].This means that for high confidence in parameter estimation and thereby model prediction, the underlying parameterization of reaction rate terms must exhibit high practical identifiability.
Shock tube ignition is a canonical experiment used to develop and validate combustion reaction mechanism [48].In such experiments, the reactant mixture behind the reflected shock experiences elevated temperature and pressure, followed by mixture combustion.An important quantity of interest in such experiments is the time difference between the onset of the reflected shock and the ignition of the reactant mixture, defined as the ignition delay t ign [49].Ignition delay is characterized as the time of maximum heat release or steepest change in reactant temperature and is therefore a key physio-chemical property for combustion systems.To illustrate the practical Fig. 8 Temperature evolution for stoichiometric methane-air combustion at To = 1500 K, Po = identifiability framework we will consider stoichiometric methane-air combustion in a shock tube under an adiabatic, ideal-gas constant pressure ignition assumption.Typically, the chemical kinetics capturing detailed chemistry of methane-air ignition is computationally expensive due to hundreds of associated reactions.To model the reaction chemistry, consider the classical 2-step mechanism proposed by Westbrook and Dryer [50] that accounts for the incomplete oxidation of methane.This reduced mechanism consists of a total of 6 species (5 reacting and 1 inert species, namely, N 2 ) and 2 reactions (1 reversible), thus drastically reducing the cost of evaluating the chemical kinetics.The reactions involved in this reduced chemical kinetics model are where the overall reaction rates are temperature-dependent and are modeled using the Arrhenius rate equation as where A = 2.8 × 10 9 is the pre-exponential factor, R is the ideal gas constant, and T is the temperature in Kelvin.To solve the resulting reaction equations CANTERA v2.6.0 [51] is used.Figure 8 illustrates the temperature evolution using the 2-step mechanism and GRI-Mech 3.0 [52] for an initial temperature T o = 1500 K, initial pressure P o = 100 kPa, and at a stoichiometric ratio ϕ = 1.The GRI-Mech 3.0 mechanism consists of detailed chemical kinetics with 53 species and 325 reactions.As noticed, the 2-step mechanism under-predicts the ignition delay by nearly an order of magnitude.To improve the predictive capabilities of the 2-step mechanism a functional dependency for the pre-exponential factor can be introduced as log A = G (T o , ϕ), where Here, θ 1 , θ 2 , and θ 3 are the uncertain model parameters.Similar parameterization was used by Hakim et al. [46] for n-dodecane reduced chemical kinetics.It should be noted that while a more expressive functional form for the pre-exponential factor can be chosen in (34), the goal of the framework is to ascertain practical identifiability.For parameter estimation, consider the detailed GRI-Mech 3.0 to be the 'exact solution' to the combustion problem which can then be used to generate the data.Consider logarithm of ignition temperature at T o = 1100, 1400, 1700 and 2000 K at ϕ = 1.0 and P o = 100 kPa as the available data for model calibration.Assume an uncorrelated measurement noise Γ = σ 2 ξ I with σ 2 ξ = 0.1.

Parameter Identifiabiliy
The practical identifiability framework is now applied to the methane-air combustion problem to examine the identifiability of the model parameters in (34) before any controlled experiments are conducted.Consider an uncorrelated prior distribution for the model parameters as θ 1 ∼ N (0, 1); θ 2 ∼ N (0, 1); θ 3 ∼ N (0, 1).Such priors result in pre-exponential factors in the order similar to those reported by Westbrook and Dryer [50], and are therefore considered suitable for the study.Similar to §3.1 historical estimates of the model parameters are not considered for examining identifiability.The first-order Sobol indices, estimated information gain, and information gain equivalent variance C (Θ i ) are shown in figure 9.The information gain is estimated using n outer = 12000 Monte-Carlo samples, and n inner = 5 quadrature points.Examining the first-order Sobol indices (see figure C2 in the Appendix for convergence study), the output of the forward model exhibits the largest variability due to uncertainty in the variable Θ 1 .Followed by similar variability in the model output with respect Fig. 9 First-order Sobol effect indices (left), information gain (center), and information gain equivalent variance C (Θ i ) (right) for methane-air combustion model.Sobol indices show that the largest variability in the output of the statistical model is to uncertainty in Θ 1 ; Θ 2 and Θ 3 exhibit similar variabilities.Variable Θ 1 exhibits the information gain and therefore highest practical identifiability; Θ 2 and Θ 3 have similar information gain.Variable Θ 1 exhibits the lowest measurement uncertainty for the direct observation model, followed by similar uncertainty for Θ 2 and Θ 3 .
to Θ 2 and Θ 3 .The largest information gain is observed for the variable Θ 1 , followed by similar gains for Θ 2 and Θ 3 .This means that Θ 1 will have the highest practical identifiability, followed by a much lower identifiability for Θ 2 and Θ 3 .Using the hypothetical direct observation model as described in §2.3.1, the variable Θ 1 with the largest practical identifiability exhibits the lowest measurement uncertainty, followed by similar uncertainty for Θ 2 and Θ 3 .Figure 10 shows the second-order Sobol indices and estimated parameter dependencies.The second-order Sobol indices (see figure C2 in the Appendix for convergence study) follow the trend S 2,3 > S 1,2 ≈ S 1,3 , suggesting that there are underlying interactions between the parameters Θ 2 and Θ 3 .As observed, the low identifiability of Θ 2 and Θ 3 suggested in figure 9 is primarily due to the underlying dependencies between pairs (Θ 1 , Θ 2 ) and (Θ 1 , Θ 3 ).To estimate the parameter dependencies n outer = 12000 Monte-Carlo samples, and n inner = 5 and 10 quadrature points are used for single and two-dimensional integration space, respectively.Similar magnitude of parameter dependencies obtained for the pairs (Θ 1 , Θ 2 ) and (Θ 1 , Θ 3 ) in addition to similar information gain for Θ 2 and Θ 3 also suggest underlying symmetry with respect to Θ 1 .This means that the interchange of Θ 2 and Θ 3 will not affect the output of the statistical model, which can be clearly seen in (34) for ϕ = 1.This is also evident from the second-order Sobol indices which suggest that there is a combined effect on the output of the statistical model due to interactions between Θ 2 and Θ 3 .

Parameter Estimation
Now, let us consider the parameter estimation problem which seeks p(θ | y), that is the posterior distribution.Typically, a closed-form expression for the posterior distribution is not available due to the non-linearities in the forward model or the chosen family of the prior distribution.As an alternative, sampling-based methods such as Markov Chain Monte Carlo (MCMC) that seek samples from an unnormalized posterior have gained significant attention.These methods construct Markov chains for which the stationary distribution is the posterior distribution.The Metropolis-Hastings algorithm is an MCMC method that can be used to generate a sequence of samples from any given probability distribution [53].The adaptive Metropolis algorithm is a powerful modification to the Metropolis-Hastings algorithm and is used here to sample from the posterior distribution [54].
Figure 11 illustrates the correlation between samples obtained using the Adaptive Metropolis algorithm and the obtained aggregate posterior prediction for ignition delay time.Any correlation (linear) is not observed between the variables; however, the joint distribution between pairs (Θ 1 , Θ 2 ) and (Θ 1 , Θ 3 ) show similarities.These similarities were also observed during the a priori analysis quantifying parameter dependencies as shown in figure 10.The obtained aggregate prediction shows dramatic improve-  ment over the 2-step mechanism in predicting ignition delay time over a wide range of temperatures.Using a functional form as (34) for the pre-exponential factor also improved the mixture temperature evolution, as shown in figure 12.However, the adiabatic flame temperature, which is defined as the mixture temperature upon reaching equilibrium, is still being over-predicted.An improvement in the prediction of the evolution of species concentration over time is also noticed, as shown in figure 13.

Concluding remarks and perspectives
Examining the practical identifiability of statistical models is useful in many applications, such as parameter estimation, model-form development, and model selection.
Estimating practical identifiability prior to conducting controlled experiments or parameter estimation studies can assist in a choice of parametrization that can be associated with a high degree posterior certainty, thus improving confidence in estimation and model prediction.
In this work, a novel information-theoretic approach based on conditional mutual information is presented to assess global practical identifiability of a statistical model in a Bayesian framework.The proposed framework examines the expected information gain for each parameter from the data before performing controlled experiments.Parameters with higher information gain are characterized by having higher posterior certainty, and thereby have higher practical identifiability.The adopted viewpoint is that the practical identifiability of a parameter does not have a binary answer, rather it is the relative practical identifiability among parameters that is useful in practice.In contrast to previous numerical approaches used to study practical identifiability, the proposed approach has the following notable advantages: first, no controlled experiment or data is required to conduct the practical identifiability analysis; second, different forms of uncertainties, such as model-form, parameter, or measurement can be taken into account; third, the framework does not make assumptions about the distribution of the data and parameters as in the previous methods; fourth, the estimator provides knowledge about global identifiability and is therefore not dependent on a particular realization of the parameters.To provide a physical interpretation to practical identifiability in the context of examining information gain for each parameter, an information gain equivalent variance for a direct observation model is also presented.The practical identifiability framework is then extended to examine dependencies among parameter pairs.Even if an individual parameter exhibits poor practical identifiability characteristics, it can belong to an identifiable subset such that parameters within the subset have functional relationships with one another.Parameters within such an identifiable subset have a combined effect on the statistical model and can be collectively identified.To find such subsets, a novel a priori estimator is proposed to quantify the expected dependencies between parameter pairs that emerge a posteriori.
To illustrate the framework, two statistical models are considered: (a) a linear Gaussian model and (b) a non-linear methane-air reduced kinetics model.For the linear Gaussian model, it is shown that parameters with large information gain and low parameter dependencies can be estimated with high confidence.The variancebased global sensitivity analysis (GSA) also illustrates that parameter sensitivity is necessary for identifiability.However, as conclusively shown, the inability of variancebased GSA to capture different forms of uncertainties can lead to unreliable estimates for practical identifiability.The information gain equivalent variance obtained using a direct observation model shows that parameters with high practical identifiability will be associated with low measurement uncertainty if observed directly.In the case of the methane-air reduced kinetics model, it is shown that parameters with large dependencies can have low information gain and therefore low practical identifiability.Further, the proposed estimator can capture non-linear dependencies and reveal structures within the parameter space before performing controlled experiments.Such non-linear dependencies cannot be observed when considering a posteriori parameter correlations, as only linear relations can be well understood.

Fig. 4
Fig. 4 First-order Sobol indices (left) and estimated information gain (right) vs. measurement noise variance σ 2ξ for linear Gaussian model.Increasing measurement noise covariance does not affect the variability of the output with respect to the parameters and therefore the first-order Sobol indices remain unchanged.However, the information gain decreases with increasing measurement noise covariance.

Fig. 5
Fig. 5 Second-order Sobol indices (left), true parameter dependencies (center), and estimated parameter dependencies (right) for linear Gaussian model.The second-order Sobol indices show negligible interactions between parameter pairs.The obtained estimate of parameter dependency agrees well with their true values; the trend is preserved.Θ 1 and Θ 3 have the largest dependency on one another, and therefore are expected to have a combined effect on the output of the statistical model.

Fig. 6
Fig. 6 Correlation plot for samples obtained from the true posterior distribution (left) and the obtained aggregate posterior prediction (right) for linear Gaussian model.A negative correlation is observed between Θ 1 and Θ 3 , whereas Θ 2 is uncorrelated from other parameters.Aggregate posterior prediction agrees well with the data and exhibits high certainty.

Fig. 7
Fig. 7 Change in parameter variance ∆(σ 2Θ i ) vs. measurement noise covariance σ 2 ξ for linear Gaussian model.Increasing measurement noise results in a smaller change in parameter variance from the prior to the posterior.Largest reduction in variance is observed for Θ 2 , followed by Θ 1 and Θ 3 .

Figure 7
Figure 7 illustrates the change in variance of the parameter Θ i defined as ∆(σ 2 Θi ) ≜ σ 2 Θi − σ 2 Θi,post versus σ 2 ξ .Parameter Θ 2 exhibits the smallest posterior uncertainty, followed by Θ 1 and Θ 3 for all σ 2 ξ .While Θ 1 has the largest estimated information gain (figure 3 (center)), it exhibits dependencies with Θ 3 (figure 5 (right)), thereby resulting in larger posterior uncertainty in comparison to Θ 2 .In practical applications, where model selection or parameter selection is critical, examining the information gain and parameter dependencies can therefore aid in finding parameters that can be estimated with high certainty.Increasing the measurement noise results in a smaller change in parameter variance, that is, the parameters exhibit larger posterior uncertainty.This is also shown by the variation of estimated information gain with measurement noise (figure 4 (right)).On the contrary, the first-order Sobol indices remain unchanged with measurement noise (figure 4 (left)).

Fig. 11
Fig. 11 Correlation plot for samples obtained from the posterior distribution (left) and the obtained aggregate posterior prediction (right) for the methane-air combustion model.Correlation plots do not reveal any relations among variables.Aggregate posterior prediction agrees well with the data and exhibits high certainty.

Fig. 12
Fig. 12 Aggregate temperature evolution for methane-air combustion model.Aggregate prediction agrees well with the GRI-Mech 3.0 detailed mechanism.

Fig. 13
Fig. 13 Aggregate species concentration evolution for methane-air combustion model.Aggregate prediction agrees well with the GRI-Mech 3.0 detailed mechanism.