Beyond the Michaelis-Menten equation: Accurate and efficient estimation of enzyme kinetic parameters

Examining enzyme kinetics is critical for understanding cellular systems and for using enzymes in industry. The Michaelis-Menten equation has been widely used for over a century to estimate the enzyme kinetic parameters from reaction progress curves of substrates, which is known as the progress curve assay. However, this canonical approach works in limited conditions, such as when there is a large excess of substrate over enzyme. Even when this condition is satisfied, the identifiability of parameters is not always guaranteed, and often not verifiable in practice. To overcome such limitations of the canonical approach for the progress curve assay, here we propose a Bayesian approach based on an equation derived with the total quasi-steady-state approximation. In contrast to the canonical approach, estimates obtained with this proposed approach exhibit little bias for any combination of enzyme and substrate concentrations. Importantly, unlike the canonical approach, an optimal experiment to identify parameters with certainty can be easily designed without any prior information. Indeed, with this proposed design, the kinetic parameters of diverse enzymes with disparate catalytic efficiencies, such as chymotrypsin, fumarase, and urease, can be accurately and precisely estimated from a minimal amount of timecourse data. A publicly accessible computational package performing such accurate and efficient Bayesian inference for enzyme kinetics is provided.

. Note that both assays require prior knowledge of K M , which gives rise to the conundrum that, in order to estimate K M , the approximate value of K M needs to be known.
To overcome such limits on the inference using the model based on the MM equation, which is referred to as the sQ (standard QSSA) model, here we propose an alternative approach. In our approach, we use a different approximate model that is derived with the total QSSA and is referred to as the tQ (total QSSA) model [26][27][28][29] . By applying the Bayesian inference based on either the sQ or the tQ model to the product progress curve, we found that the estimates obtained with the sQ model were considerably biased when the enzyme concentration was not low. On the other hand, the estimates obtained with the tQ model were not biased for any combination of enzyme and substrate concentrations. Thus, with the tQ model, the experimental data from various conditions can be pooled without any restrictions to improve the accuracy and precision of the estimation. For instance, when two sets of timecourse data obtained under low and high enzyme concentrations are used together, the tQ model, but not the sQ model, leads to accurate and precise estimation. Another advantage of our approach is that, by analyzing the scatter plots of current estimates, the next optimal experiment to ensure the parameter identifiability can be easily designed without requiring any prior knowledge of the k cat and K M values. The proposed optimized design yields accurate and precise estimation from a minimal amount of data simulated based on the kinetics of various enzymes: chymotrypsin, fumarase and urease, which have disparate catalytic efficiencies (k cat /K M ). We provide a publicly accessible computational package that performs the Bayesian inference based on the tQ model, thus leading to accurate and efficient estimation of enzyme kinetics.

Results
Two types of models describing enzyme kinetics: The sQ and tQ models. A fundamental enzyme reaction consists of a single enzyme and a single substrate, where the free enzyme (E) reversibly binds with the substrate (S) to form the complex (C), and the complex irreversibly dissociates into the product (P) and the free enzyme: where the total enzyme concentration (E T ≡ C + E) and the total substrate and product concentration (S T ≡ S + C + P) are conserved. A popular model describing the accumulation of the product over time is based on the MM equation, as follows (see Supplementary Method for detailed derivation): where K M = (k b + k cat )/k f is the Michaelis-Menten constant and k cat is the catalytic constant. This sQ model derived with the standard QSSA has been widely used to estimate the kinetic parameters, K M and k cat from the progress curve of the product [8][9][10][11]23,25 . Another model describing the accumulation of the product is derived with the total QSSA; it was developed later than the sQ model and thus has received less attention for parameter estimation 26-29 : = . T   T  M  T   T  M  T  TT  2 where K = k b /k f is the dissociation constant [27][28][29] . Importantly, this condition is generally valid and thus the tQ model, unlike the sQ model, is accurate even when the enzyme is in excess. See 14,30 for more details.
Next, we investigated the accuracy of the stochastic simulations performed with both models. Specifically, we compared the stochastic simulations using the Gillespie algorithm based on the propensity functions from either the original full model (described in Table S1), the sQ model (Table S2), or the tQ model (Table S3) for 9 different  conditions [31][32][33][34][35][36] : E T is either lower than, similar to, or higher than K M , and S T is also either lower than, similar to, or higher than K M (Fig. 1). The stochastic simulations of the sQ model fail to approximate those of the original full model when E T is not low (i.e., E T is lower than neither S T nor K M ). On the other hand, stochastic simulations using the tQ model are accurate for all conditions (Fig. 1), as is consistent with a recent study showing that stochastic simulations with the sQ and the tQ models are accurate when their deterministic validity conditions hold (Eqs (3) and (4)) 37,38 . Taken together, the tQ model is valid for a wider range of conditions than the sQ model is in both the deterministic and the stochastic sense.
Estimation with the tQ model is unbiased for any combination of enzyme and substrate concentrations. Because the tQ model is accurate for a wider range of conditions than the sQ model is (Fig. 1), we hypothesized that the parameter estimation based on the tQ model is also accurate for more general conditions. To investigate this hypothesis, we first generated 10 2 noisy progress curves of P from the stochastic simulations of the original full model (Fig. S1). Then, we inferred parameters (k cat and K M ) from these simulated data sets by applying the Bayesian inference with the likelihood functions based on either the sQ or the tQ model, under weakly informative gamma priors (Fig. S2) (see Methods for details). Note that throughout this study, we have used the simulated product progress curves (e.g. Fig. S1) because we need to know the true values of parameters for the accurate comparison of the estimations based on the sQ model and the tQ model.
We first focused on the estimation of the k cat under the assumption that the value of K M is known. When E T is low, so that both the sQ and the tQ models are accurate ( Fig. 1 left), posterior samples obtained with both models are similar and successfully capture the true value of k cat (Fig. 2a left). The posterior samples obtained with the two models are similar because, when E T is low and thus  + E S K T T M , both models (Eqs 1 and 2) are approximately equivalent as follows:  (Table S1), the sQ model (Table S2), and the tQ model (Table S3) were performed with S T = 0.2, 2, or 80 nM, and E T = 0.2, 2, or 40 nM. Note that these concentrations are either lower than, similar to, or higher than K M ≈ 2 nM. Here, the lines and colored ranges represent a mean trajectory and fluctuation range (±2σ from the mean) of 10 4 stochastic simulations.
where the first approximation comes from the Taylor expansion in terms of (see [27][28][29] for details). Therefore, when E S K and thus the sQ model is accurate, estimations with the sQ and the tQ models should be similar. On the other hand, when E T is high, they show clear differences ( Similar results are also observed in the box plots of posterior means and posterior coefficient of variations (CVs) (Fig. S3a,b). Whereas posterior means obtained with the sQ model are biased when E T is high, those obtained with the tQ model are accurate for all conditions (Fig. S3a). In particular, narrow distributions of posterior means indicate that the estimation of k cat with the tQ model is robust aginst the noise in the data (Fig. S1). Furthermore, posterior CVs are much smaller than prior CVs (Fig. S3b), indicating precise estimation of k cat with the tQ model.
Next, K M was estimated under the assumption that the value of k cat is known (Fig. 2b). Posterior samples of the K M obtained with the sQ model again show errors that grow with increasing E T . Note that the estimates of the K M are biased upward, which implies that using the posterior estimates of K M to validate the MM equation can be misleading. On the other hand, the estimates of K M obtained with the tQ model are little biased for all conditions. However, unlike the narrow posterior distributions of k cat (Fig. 2a), those of K M obtained with the tQ model become wider; so precision decreases as E T or S T increases (Fig. 2b). These patterns are also observed in the box plots of posterior means and posterior CVs (Fig. S3c,d). The identifiability problem arises because, , the K M is negligible in the tQ model (Eq. 2), as follows: ≈ . T  T  M  T  T T  T  T  T  T  T T  2  2 Specifically, when K M is too low, the value of K M has little effect on the dynamics of the tQ model and thus the K M is structurally unidentifiable. Taken together, the estimations of K M with both the sQ and the tQ models are not satisfactory, although for different reasons: estimations with the sQ model can be biased and those with the tQ model can be structurally unidentifiable (Fig. 2b). Similar patterns were also observed when a more informative prior was given (Fig. S4). In particular, even with the informative prior, estimates obtained with the sQ model still show considerable error as E T increases.
Simultaneous estimation of k cat and K M suffers from the lack of identifiability. Next, we considered simultaneous estimation of two parameters, k cat and K M , which is the typical goal of enzyme kinetics. For the same gamma priors used in the single-parameter estimation (Fig. 2), the distributions of posterior samples obtained with both models became wider overall (Fig. 3). To find the reason for such imprecise estimation, we analysed the scatter plots of posterior k cat and K M samples (Fig. 4). When S K T M  (Fig. 4a-c), the posterior samples of k cat and K M obtained with the sQ model exhibited a strong correlation, because the dynamics of the sQ model depend only on the ratio k cat /K M , as seen in the following approximation:  (Fig. 4g-i), the scatter plot of the sQ model becomes horizontal, indicating the structure unidentifiability of the K M . Indeed, the value of K M has nearly no effect on the dynamics of the sQ model, as seen in the following approximation: is consistent with previous studies, which recommend using S T ≈ K M for more precise estimation 22,23 . However, even   (Fig. 3). The scatter plots imply two types of structure unidentifiability: strong correlation between k cat and K M , and unidentifiability of K M , which is represented as a horizontal plot. Positively correlated scatter plots of the tQ model are changed to horizontal ones when the sampled K M is much lower than S T + E T (dashed gray lines). Here, green triangles represent the true values of parameters. when S T ≈ K M , estimates are still imprecise ( Fig. 3a and b middle). Furthermore, as E T increases, the estimates obtained with the sQ model are biased (Fig. 3) like in the single-parameter estimation (Fig. 2). Based on this analysis it appears that the simultaneous estimation of k cat and K M with the sQ model is challenging because of both identifiability and bias problems.
, the K M has a negligible effect on the dynamics of the tQ model (Eq. 6), and thus only k cat was identifiable in the single-parameter estimation ( Fig. 2a and b right or bottom). Similarly, when both k cat and K M are inferred simultaneously with the tQ model, estimation of only k cat is accurate and precise ( Fig. 3a and b right or bottom), as is shown by the horizontal scatter plots along the true value of k cat (Fig. 4c,f,g-i). In other cases (when neither , posterior variance of both parameters dramatically increases compared to the single-parameter estimation (Figs 2 and 3 left and top). Such imprecise estimation stems from two sources, according to the scatter plots (Fig. 4a,b,d,e). When k cat and K M decrease together, the behavior of the tQ model changes little as the SQ model (Eq. 5), which leads to the strong correlation between posterior samples of k cat and K M . As the estimates of K M keep decreasing together with those of k cat , so that they become much less than E T + S T (dashed vertical line of Fig. 4), the tQ model no longer depends on the value of K M , as shown in Eq. 6, and thus the scatter plots become horizontal.
Combined data from different experiments allow accurate and precise estimation with the tQ model. As shown above, the estimation of both k cat and K M using a single progress curve suffers from considerable bias and lack of identifiability (Figs 3 and 4), which is consistent with previous studies reporting that a progress curve obtained from a single experiment is not enough to identify both parameters simultaneously 19 . Thus, here, we investigate whether using multiple timecourse data sets obtained under different experimental conditions can improve the estimation.
In typical in vitro assays, progress curves are measured with either a fixed S T and varied E T or a fixed E T and varied S T 8-11,39 . We first consider the case when progress curves are measured with a fixed S T and a varied E T . Specifically, progress curves from both low and high E T are used to estimate parameters for a fixed S T at different levels ( Fig. S1 top and bottom). In this case, posterior samples obtained with the sQ model show considerable errors as the data from high E T is used (Figs 5a and S5). On the other hand, the posterior samples obtained with the tQ model accurately capture the true values of both k cat and K M with low variance (Figs 5a and S5). Such improvement stems from the fact that data obtained under the low and high E T provide different types of information for parameter estimation. Specifically, from the high E T data, although the K M is not identifiable, the k cat can be accurately estimated with the tQ model (Fig. 4c,f,i). Such accurate estimation of k cat from the high E T data can prevent the correlation between the k cat and the K M when they are estimated from the low E T data (Fig. 4a,d). Indeed, the narrow scatter plots of the tQ model ( Fig. 5b left and middle) are the intersection of two scatter plots, a horizontal one obtained with the high E T data (Fig. 4c,f) and a nonhorizontal one obtained with the low E T data (Fig. 4a,d). However, when S T is high, the scatter plot from the low E T also becomes horizontal (Fig. 4c), and thus the synergistic effect of using combined data decreases (Fig. 5a,b right). Taken together, the tQ model can accurately estimate both parameters from the combination of low E T and high E T data when S T is not much larger than K M . Note that such low S T is preferred for in vitro experiments 24,39-41 and is the case for most physiological conditions 24 .
Next, we consider the case when progress curves are measured with a fixed E T and a varied S T . Specifically, the combination of two progress curves from low and high S T is used to infer parameters for a fixed E T at different levels (Fig. S1 left and right). When E T is low, and thus the sQ and the tQ models behave similarly (Eq. 5), posterior samples obtained with both models accurately capture the true values of k cat and K M (Figs 6a left and  S6). Again, the narrow scatter plot (Fig. 6b left) is obtained as the intersection of a nonhorizontal scatter plot of low S T (Fig. 4a) and a horizontal scatter plot of high S T (Fig. 4g). However, as E T increases, and thus the sQ model becomes less accurate, those obtained with the sQ model are biased, as expected (Figs 6a right and S6). Whereas such biases are not observed in those obtained with the tQ model, the precision of K M estimates decreases as E T increases, as in the single-parameter estimation (Fig. 2 and Eq. 6).
Optimal design of experiments for accurate and efficient estimation with the tQ model. When a progress curve obtained from a single experiment is used, the posterior scatter plots of the tQ model can be categorized as a correlated type (Fig. 4a,b,d,e) and a horizontal type (Fig. 4c,f,g-i). The intersections of these two different types of scatter plots tend to be narrowly distributed near the true value (Figs 5b and 6b). Thus, combining two such data sets allows accurate estimation of both k cat and K M (Figs 5a and 6a) (Fig. 4a,b,d,e) and one measured under (Fig. 4c,f,g-i) provide different types of information for parameter estimation; so using both data sets leads to  (Fig. S1 left) and S T = 80 nM (Fig. S1 right) together for either E T = 0.2, 2, or 40 nM. When E T is low, both the sQ and the tQ models allow accurate and precise estimation. As E T increases, the estimates obtained with the sQ model become inaccurate, and the estimates of K M obtained with the tQ model become less precise, similar to the single-parameter estimation (Fig. 2). Here, green triangles represent the true values of k cat or K M . (b) The scatter plots of the posterior samples. Here green triangles, blue circles, and red squares represent true values, posterior means of the sQ model, and those of the tQ model, respectively.
ScIENTIfIc REPORTS | 7: 17018 | DOI:10.1038/s41598-017-17072-z successful estimation. However, it is hard to compare the values of S T , E T , and K M in practice, because the value of K M is usually unknown a priori. This problem can be easily resolved by using the scatter plot. That is, if the posterior scatter plot obtained from the first experiment is horizontal, then both E T and S T should be decreased for the next experiment, so that the nonhorizontal scatter plot can be obtained (Fig. 7a). On the other hand, if the scatter plot from the first experiment shows a strong correlation between K M and k cat , then either S T or E T should be increased in the next experiment (Fig. 7b). Basically, without any prior information of the value of K M and k cat , the shape of the scatter plots of the current estimates determines the next optimal experimental design, which ensures accurate and precise estimation. However, this approach cannot be used with the sQ model, because estimation with the sQ model can be biased, depending on the relationship between E T or S T and K M , which is unknown a priori. That is, unlike the tQ model, precise estimation does not always guarantee accurate estimation with the sQ model, as seen above (e.g. Fig. 5a right).
We test whether the proposed approach with the tQ model can accurately estimate k cat and K M for catalysis of the N-acetylglycine ethyl ester, fumarate, and urea by the enzymes the chymotrypsin, urease, and fumarase, respectively (Fig. 7c). These three enzymes were chosen because they have disparate catalytic efficiencies (k cat /K M ) 1 : 0.12, 4 · 10 5 , and 1.6 · 10 8 s −1 M −1 , respectively. For each enzyme, 10 2 noisy timecourse data sets were generated using stochastic simulations based on known enzyme kinetic parameters 1 . When progress curves obtained with low E T and low S T are used, as expected, nonhorizontal scatter plots of posterior samples were obtained for all three enzymes (Fig. 7c). This indicates that either E T or S T should be increased in the next experiment to obtain a horizontal scatter plot. Indeed, when a progress curve with a 100-fold increase of E T was used, horizontal scatter plots were obtained for all enzymes (Fig. 7c). Therefore, when these two progress curves are used together, both k cat and K M can be accurately estimated (Fig. 7c red dots). These results support that such two-step optimized experimental design (Fig. 7a,b) to get two different types of scatter plots allows accurate and efficient estimation of enzyme kinetics with the tQ model. The computational package performing such estimation is provided (see Method for the details).

Discussion
The standard approach for estimating enzyme kinetic parameters even today continues to be based on the 100-year old MM equation (Eq. 1) 5,6 . However, when enzyme concentration is high, this approach can lead to biased estimation (Fig. 2). Even when enzyme concentration is relatively low, it may not be possible to identify kinetic parameters (Figs 3 and 4). To overcome the limitations of the canonical approach, we proposed an Figure 7. The optimal experimental design for accurate and precise estimation with the tQ model. (a) When the scatter plot of posterior samples from the first experiment is horizontal, E T and S T need to be decreased to obtain the nonhorizontal scatter plot in the next experiment. Then, using the combination of the two experiments leads to accurate and precise estimation (red scatter plots). (b) When the scatter plot from the first experiment is nonhorizontal, E T or S T need to be increased in the next experiment to obtain a horizontal scatter plot. (c) Inference with a single progress curve from the low E T (0.1 K M ) and the high E T (10 K M ) leads to nonhorizontal and horizontal scatter plots, respectively, for chymotrypsin, urease, and fumarase (gray scatter plots). When both data sets were used together, accurate estimates were obtained for all enzymes (red scatter plots). Here, low S T (0.1 K M ) is used. Here, green triangles represent the true values of the parameters.
ScIENTIfIc REPORTS | 7: 17018 | DOI:10.1038/s41598-017-17072-z estimation method based on an alternative to the MM equation: the tQ model (Eq. 2), which is derived with the total QSSA [26][27][28][29] . Because the estimation procedure with the tQ model is not biased regardless of enzyme or substrate concentrations (Fig. 2), more accurate and precise estimations can be made when pooled data from different experimental conditions are used, unlike the canonical approach (Figs 5 and 6). It appears thus that the tQ model is especially appropriate for creating a consistent Bayesian inferential framework, which becomes more accurate as more data is used.
The canonical enzyme kinetic assay based on the MM equation generally requires a large excess of substrate over enzyme 42 . However, such conditions impose experimental limitations and cannot be always guaranteed and verified 15 . For instance, it is hard to generate a high concentration of barely soluble substrate 24 , and a low concentration of substrate is required for sensitive kinetic analysis, e.g., in the case of QD-FRET-based probes [39][40][41] . Importantly, to analyze in vivo enzyme kinetics, where enzyme concentration is often high [16][17][18] , our approach, but not the canonical approach, can be used. For example, one needs to estimate the kinetic parameters underlying drug metabolism by CYP enzymes in the liver in order to predict the effects of drugs, as is essential for drug development 43 . Because of dosing requirements for potent drugs, the amount of CYP enzyme can greatly exceed the drug amount in the liver 44,45 . Another large area where our estimation method can be applied is in the development of nanobiosensors, which measure in vivo activity of a specific enzyme for precise diagnostics, because such enzymes are often in large excess over biosensors 46,47 .
KinTek Explorer has been widely used to estimate enzyme kinetic parameters from the progress curves [48][49][50] . This software provides the confidence contours, which reveal the relationships between the estimated parameters. This approach recommends using multiple data sets to narrow down the confidence contours and thus improve precision of estimates and resolve the unidentifiability issue. Our finding (Fig. 7a,b) can provide the specific type of data sets required for the identifiability of k cat and K M , so that the KinTek Explore could perform parameter estimation more efficiently for the Michales-Menten type of enzyme reactions.
Since the initial velocity estimation with the MM equation is not accurate when enzyme concentration is high (Fig. 1), the standard initial velocity based on the MM equation would also be inaccurate 15 . On the other hand, the tQ model accurately captures the initial velocity for all conditions, and thus the modified initial velocity assay based on the tQ model is likely to be accurate over a wider range of conditions. To simplify such estimation procedures, an interesting future study could derive an analogous Lineweaver-Burk plot or the Hanes-Woolf plot 8,9,15,42 for the tQ model.
Even with relatively large noise in the data (Fig. S1), our proposed method leads to accurate estimation (Figs 5, 6 and 7), indicating its robustness against experimental noise and some minor inaccuracy of the tQ model in certain ranges of parameter observed in 14,30 . Furthermore, if there are departures from simple non-inhibitory enzyme kinetics (e.g. inhibition of enzyme by product) 51,52 , our method can be easily adjusted by modifying the tQ model (see 53,54 for the tQ model for other enzyme kinetics). Our work can also be used to improve the estimation of the kinetics underlying diverse biological functions, such as gene regulation 55,56 , cellular rhythms 57-59 , quorum sensing 60,61 , signal cascade 62,63 and membrane transport 64,65 , where the MM equation has been widely used.

Methods
Simulated Data. To obtain timecourse data (Fig. S1) for Bayesian inference, stochastic simulations of the original full model (Table S1) were performed with the Gillespie algorithm 66 . E(0) = E T , S(0) = S T , C(0) = 0, and P(0) = 0 are used as initial conditions following the typical in vitro enzyme kinetics protocol.
Description of the Bayesian inference approach. The Bayesian inference approach is used to estimate the catalytic constant k cat and the Michaelis-Menten constant K M , based on the hazard function, with respective rates described in Eq. 1 for the sQ model and in Eq. 2 for the tQ model (Fig. S2). The likelihood functions are constructed based on an approximation to the underlying Markov model 66   Weakly informative gamma priors are used for both k cat and K M : their prior means are the same as their true values, and their prior variance is 10 times larger than the prior mean, which covers orders of magnitude (e.g. Fig. 2). The estimation of a single parameter, i.e., either k cat or K M is done conditionally on the other parameter. For estimating the two parameters simultaneously, the Gibbs sampler method is used. In order to draw the sample for K M , we also use the Metropolis-Hastings algorithm within the Gibbs sampler step. See Supplementary material for further details.