Stochastic modelling of cellulose hydrolysis with Gauss and Weibull distributed transition probabilities

Two Markov-type stochastic models were developed to describe the kinetics of acid hydrolysis of cellulose. One of them involved a Gauss (normal) distribution of probabilities of chemical bond breaking, the other a Weibull distribution. It was considered that the random breaking of cellulose was based on the cleavage of a parent macromolecule into two descendants. Model equations and kinetics of acid hydrolysis of cellulose consisting of 10 and 100 units of cellobiose were presented. The effects of acid concentration and temperature on the kinetics of hydrolysis process were taken into account. The results obtained applying both stochastic models were in a reasonable agreement with those obtained using a deterministic kinetic model. These stochastic models can accurately describe the kinetics of acid hydrolysis and cover the drawbacks of some deterministic kinetic models, e.g., large number of model equations and parameters, modification of parameter values by changing the process conditions.


Stochastic modelling of cellulose hydrolysis with Gauss and Weibull distributed transition probabilities Joseph Mcgreg Duru, Oana Cristina Pârvulescu * , Tănase Dobre & Cristian Eugen Răducanu
Two Markov-type stochastic models were developed to describe the kinetics of acid hydrolysis of cellulose. One of them involved a Gauss (normal) distribution of probabilities of chemical bond breaking, the other a Weibull distribution. It was considered that the random breaking of cellulose was based on the cleavage of a parent macromolecule into two descendants. Model equations and kinetics of acid hydrolysis of cellulose consisting of 10 and 100 units of cellobiose were presented. The effects of acid concentration and temperature on the kinetics of hydrolysis process were taken into account. The results obtained applying both stochastic models were in a reasonable agreement with those obtained using a deterministic kinetic model. These stochastic models can accurately describe the kinetics of acid hydrolysis and cover the drawbacks of some deterministic kinetic models, e.g., large number of model equations and parameters, modification of parameter values by changing the process conditions.

Nomenclature c a
Acid concentration (%) c C Concentration of cellulose in the reaction medium (g C /L) c CB Concentration of cellobiose in the reaction medium (g CB /L) c C0 Initial concentration of cellulose in the reaction medium (g C /L) c E1 Concentration of enzymes E1 in the reaction medium (FPU/L) (FPU-filter paper units) c G Concentration of glucose in the reaction medium (g G /L) c HMF Concentration of hydroxymethylfurfural in the reaction medium (g HMF /L) Value of X α ji s Transition probability frequency of s polysaccharide species (s -1 ) β Shape parameter λ Scale parameter μ Mean of x values σ Standard deviation In the case of enzymatic hydrolysis, three basic processes, which are shown in scheme (3), where k 1 , k 2 , and k 3 are kinetic constants, whereas E1 and E2 are specific enzymes, can be assumed. Three types of enzymes are commonly considered in the biochemical hydrolysis of cellulose, i.e., endocellulases, exocellulases, and cellobiases (β-glucosidases). The mode of action of these enzymes is as follows: (i) endocellulases break cellulose chains into shorter polysaccharide chains; (ii) exocelullases attack from the ends of these shorter polysaccharide chains resulting in oligosaccharides, predominantly cellobiose; (iii) cellobiases cleave the cellobiose into glucose. In the presence of endo/exocellulases (E1) and β-glucosidases (E2), the processes presented in scheme (3) can be described by simple first-order kinetics expressed by Eqs. (4)−(6), where c C and c CB are concentrations of cellulose and cellobiose in the reaction medium, v R1 , v R2 , and v R3 are reaction rates. Reaction rate constants for the first and third process depend on the adsorption kinetics of enzymes on cellulose chains [16][17][18] . Assuming a rapid adsorption of enzymes E1, the process occurs near to equilibrium. Accordingly, considering a Langmuir adsorption isotherm, k 1 and k 3 are given by Eqs. (7) and (8), where c E1 is the concentration of enzymes E1 in the reaction medium, K L represents the Langmuir constant for sorption equilibrium, k max1 and k max3 are the maximum values of k 1 and k 3 at full saturation of the substrate with enzymes E1 for the first and third process in the scheme (3) 19 . To determine the reaction rate constant for the second process (transformation of cellobiose into glucose), the activity and concentration of enzymes E2 in liquid phase as well as the fact that there is a classic enzyme catalysis described by Michaelis-Menten equation must be taken into account 19 . Moreover, the effect of process temperature, reaction inhibition by product and substrate should be considered [18][19][20][21][22][23] . www.nature.com/scientificreports/ The decomposition of (bio)polymers leads to a large number of components involved in complex schemes of parallel and consecutive reactions. For example, in the thermal degradation of halogenated polymers, i.e., poly(chloroprene) and poly(vinyl chloride), kinetic mechanisms taking into account 38-40 species and pseudocomponents (molecules and radicals) involved in 190-250 chemical reactions were presented 24,25 . It is very difficult in this case to apply mathematical models based on formal kinetics. Models assuming random scission of linear polymer chains 26 and chain-end scission 27 can be more appropriate. Mechanisms based on systematic breakage of polymer chains explain quite well the hydrolysis process of cellulose 28,29 . Markov chain was used to describe the enzymatic and acid hydrolysis of cellulose 30 . Markov-type stochastic models are commonly applied in all engineering fields where elementary processes with random evolution appear [31][32][33] .
Stochastic models of cellulose hydrolysis, which are based on two basic phenomena, i.e., breaking or nonbreaking of polymer chain, could be effective and more realistic approaches. On the one hand, the concentration of cellulose-derived products and frequency distribution of molecular chain length during hydrolysis can be easily determined by a stochastic model, which can provide a better understanding of the mechanisms involved in the process. On the other hand, working with large transition probability matrices, imposed by the initial length of cellulose molecular chain or by the existence of several initial chain lengths, can be computationally quite complex.
Assuming that the highest probability of bond breaking is in the middle of the polymer chain, the transition probabilities can be expressed using a Gauss (normal distribution). A Weibull distribution can be applied if the breaking is more likely to occur at the ends of the polymer chain. For example, in the biochemical hydrolysis of cellulose, it can be assumed that endocellulases act in the middle of the cellulose chain, whereas exocelullases attack from the ends of polymer chains.
This paper aimed at studying the acid hydrolysis of cellulose. A Markov-type stochastic model was developed, assuming that the probabilities of chemical bond breaking followed either a Gauss (normal) distribution or a Weibull distribution.

Methods
The stochastic model focuses on partial hydrolysates obtained by acid hydrolysis of cellulose, which are numerically characterized by their molecular masses. The basic assumptions of the model are as follows: (i) cellulose subjected to acid hydrolysis contains a finite number of hydrolysable polysaccharide species (cellulose fragments with different polymerization degrees) with known molecular masses (M 0 s , s = 1, 2…S) and concentrations; (ii) random breaking of a polysaccharide into several macromolecules is based on the cleavage of a parent macromolecule into two descendants; (iii) breaking of the polysaccharide chain into different fragments can occur with different probabilities; (iv) the continuous distribution of molecular masses of hydrolysates can be divided into discrete intervals with a molecular mass corresponding to each interval; (v) a hydrolysate within a molecular mass range (class) is divided into two macromolecules within lower molecular mass ranges; (vi) the process of molecular fragmentation by hydrolysis is a homogeneous Markov process; (vii) hydrolysates cannot participate in any coupling reactions.
Stochastic model principle of hydrolysis of a polysaccharide species (s = 1, 2…S) is shown in Fig. 1. The breaking of a macromolecule within M 0 s molecular mass class into one within M 1 s molecular mass class and another within M N s molecular mass class is represented with continuous line. The process evolution, i.e., the macromolecule within M 1 s molecular mass class splits into one within M i s molecular mass class and another within M N−1 s molecular mass class, appears with dotted line. Transition probability is denoted by P ab s , where the subscripts a and b refer to the molecular mass ranges of hydrolysates after and before breaking, respectively.
The mass balance for s species parent macromolecules within the molecular mass class M k s which break at time τ into descendants within the molecular mass classes M l s and M m s is given by Eq. (9), where m k s represents the mass of parent macromolecules, whereas m l s and m m s are the masses of descendants. Moreover, if a parent macromolecule breaks into two descendants, the probability of its breaking by hydrolysis (p k s ) is equal to the probability of birth of each descendant (P lk s and P mk s ), as shown by Eq. (10).  (11) and (12). Assuming that the hydrolysis of s species is a Markov homogeneous stochastic process, the transition probability from the molecular mass class M i s to M j s , P ji s (i, j = 0, 1…N, j ≠ i), is defined by Eq. (13), where α ji s is the transition probability frequency and Δτ the time interval. The transition probability in mass units, P ji s,m , is given by Eq. (14)
According to Markov stochastic cellular models, a transition probability matrix, P 1 , containing values of transition probability P ji 1 , is defined by Eq. (19) 30 . P 1 can be determined using different probability density functions of random variable. The breaking of a cellulose species within the molecular mass class   www.nature.com/scientificreports/ Assuming the highest breaking probability in the centre of polymeric chain, a Gauss (normal) distribution can be used to compute the probabilities in the transition probability matrix. A random variable X that takes values x is normally (Gauss) distributed when the probability density function is expressed by Eq. (20), where μ is the mean of x values, σ 2 the variance, and σ the standard deviation.    www.nature.com/scientificreports/ Normal distributions have many useful properties, so random variables with unknown distributions are often considered to be normal, especially in chemistry, physics, biology, etc. This assumption is based on the central limit theorem (CLT), which states that if sufficiently large random samples are taken from a population having any distribution of a variable and finite values of mean and variance, then the distribution of sample mean will approach a Gauss (normal) distribution. There is an obvious tendency in sciences and social live to assume normal distributions in applications where they may not be suitable. As Lippmann concluded, "Everybody believes in the exponential law of errors: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation" 35 .
In the case of cellulose hydrolysis, the values of shape parameter are interpreted as follows: (i) 0 < β < 1 indicates that the chain breaking rate decreases over time, meaning that the process hydrolysates become more stable, (ii) β = 1 implies a constant breaking rate over time, and (iii) β > 1 corresponds to an increase in the breaking rate. Characteristic parameters of Weibull distribution were expressed by Eqs. (26) and (27) Fig. 4.   5) and Weibull (Fig. 6) distributions of transition probabilities were considered. The results presented in Figs. 5 and 6 highlight a relatively quick disappearance of macromolecules with high molecular mass, suggesting that the hydrolysis of species with low molecular mass is the rate-limiting step. This finding is in line with the conclusions of other researches, where the cellulose hydrolysis was assumed as a homogeneous kinetic process, occurred according to scheme (29) and described by Eqs. (30)−(32), where c C , c G , and c HMF are the concentrations of cellulose (C), glucose (G), and hydroxymethylfurfural (HMF), k 1 and k 2 represent the rate constants, and τ is the time [40][41][42][43] . www.nature.com/scientificreports/ Acid hydrolysis dynamics are heavily affected by process temperature, acid type and concentration. Arrhenius equations [Eqs. (33) and (34)], where the contribution of acid concentration (divided by a reference value, c a /c a,ref ) was taken into account, were used to express characteristic reaction rate constants of homogeneous kinetic model (k 1 and k 2 ) depending on absolute temperature (T).
According to Eqs. (33) and (34), transition probability frequencies in the stochastic model, α ji 1 , can be estimated by Eq. (35), where E Am represents a mean reaction activation energy (taking into account all decompositions to glucose), E AG is the activation energy for transition from glucose (G) to HMF, and T ref is a reference absolute temperature.  Table S1 5 . The largest values of root mean square error (RMSE) and coefficient of variation (CV) for results presented in Figs. 7b and 8b are summarized in Table S2. Data predicted by deterministic and stochastic models for a cellulose with PD = 200 assuming Gauss and Weibull distributed transition probabilities were in a reasonable agreement, i.e., RMSE ≤ 6.50 g i /L (CV ≤ 0.098) and RMSE ≤ 5.26 g i /L (CV ≤ 0.410).   www.nature.com/scientificreports/

Conclusions
The production of cellulosic biofuel, which is technologically feasible, expects the transition to industrial scale. Accordingly, the hydrolysis of cellulose and (ligno)cellulosic biomass has been of great interest lately. Numerous experimental and theoretical studies on acid hydrolysis have been reported in the related literature. Deterministic kinetic models are commonly applied to describe the acid hydrolysis. Two stochastic models, one involving a Gauss distribution of transition probabilities, the other a Weibull distribution, were developed in this paper to predict the kinetics of acid hydrolysis of cellulose. It was assumed that the cellulose subjected to the hydrolysis contained a finite number of hydrolysable polysaccharide species (cellulose fragments with different PD), random breaking of a polysaccharide was based on the cleavage of a parent macromolecule into two descendants, and the molecular fragmentation was a homogeneous Markov process. Model equations, transition probability matrix and dynamics of mass fraction of hydrolysable species for a cellulose consisting of 10 units of CB (PD = 20) were presented in Methods and the first part of Results and discussions. The models were then extended, considering the effects of acid concentration and process temperature on the kinetics of acid hydrolysis of a cellulose consisting of 100 units of CB (PD = 200). Dynamics of mass concentration of hydrolysable species predicted by stochastic models and those determined using a homogeneous deterministic kinetic model were compared and a reasonable agreement was obtained. Both stochastic models can accurately predict the kinetics of acid hydrolysis and cover the limitations of some deterministic kinetic models, e.g., large number of equations and parameters, modification of parameter values by changing the process conditions.