Bayesian framework for analyzing adsorption processes observed via time-resolved X-ray diffraction

Clarifying dynamic processes of materials is an important research topic in materials science. Time-resolved X-ray diffraction is a powerful technique for probing dynamic processes. To understand the dynamics, it is essential to analyze time-series data using appropriate time-evolution models and accurate start times of dynamic processes. However, conventional analyses based on non-linear least-squares fitting have difficulty both evaluating time-evolution models and estimating start times. Here, we establish a Bayesian framework including time-evolution models. We investigate an adsorption process, which is a representative dynamic process, and extract information about the time-evolution model and adsorption start time. The information enables us to estimate adsorption properties such as rate constants more accurately, thus achieving more precise understanding of dynamic adsorption processes. Our framework is highly versatile, can be applied to other dynamic processes such as chemical reactions, and is expected to be utilized in various areas of materials science.

Bayesian framework.First, we consider a generative model including a peak model and time-evolution models.The peak model determines the shape and number of diffraction peaks, while the time-evolution model expresses the time evolution of peak parameters such as peak areas and positions.By incorporating a timeevolution model, adsorption properties such as the rate constant (κ) , growth dimension (n) , and adsorption start time (t 0 ) can replace peak parameters.This approach enables a more accurate and reliable analysis of dynamic adsorption processes, as it captures the continuous changes in crystal structures over time.We formulate the conditional probability of generated data p(D gen |κ, n, t 0 , σ noise , • • • ) considering Gaussian noise with variance σ 2 noise .Then, applying Bayes' theorem to the conditional probability, we estimate the probability distribution of parameters conditioned on the observed data, i.e., the posterior probability distribution p(κ, n, t 0 , σ noise , • • • |D).
Comparison between the Bayesian framework and the conventional framework is shown in Fig. 1.The conventional framework is based on the stepwise application of non-linear least-squares fitting, i.e., (Step 1) fitting the diffraction peaks at each time point and (Step 2) fitting the variation in the peak area over time with the Kolmogorov-Johnson-Mehl-Avrami (KJMA) equation 23,24 .The conventional framework corresponds to point estimation without considering the accuracy of estimation.In contrast, the Bayesian framework enables the estimation of the posterior probability distribution, thereby providing more information based on the shape and width of the distribution.In addition, the Bayesian framework realizes the estimation of t 0 directly from the observed data, while the conventional framework has difficulty extracting information about t 0 from data and requires the gas-shot time ( t s ).The Bayesian framework also allows the most plausible model to be selected from possible time-evolution models using Bayes free energy, which is one of information criteria used to evaluate the consistency between data and models in Bayesian statistics 10,11 .

Bayesian model selection.
In the Bayesian analysis, estimations based on different analysis models were performed to select the most plausible model.We adopted the peak model with two Gaussian functions corresponding to the adsorption and desorption phase, while we designed two different time-evolution models, i.e., Model 1 ( M = 1 ) and Model 2 ( M = 2 ).For both models, the time evolution of the peak area was assumed to follow the KJMA equation, as in previous studies [6][7][8] .In this study, it is assumed that the adsorption dynamics observed by Tr-XRD can be explained by the KJMA equation.Although time variation of the average structure is observed in Tr-XRD measurements, we attempted to estimate t 0 from Tr-XRD data.The time evolution of the peak center angle was assumed to be linear, given that there is little angular variation in each phase.On the other hand, the time evolution of the peak full width at half maximum (FWHM) was assumed to be constant in Model 1 and exponential in Model 2, indicating the absence and presence of peak broadening in the adsorption dynamics.To perform the analysis without bias, the prior probabilities of the models were set as p(M = 1) = p(M = 2) = 0.5 , and the prior probability distribution of each parameter was set to be uniform.For obtaining the posterior probability distribution, we used the exchange Monte Carlo method.In this method, the Monte Carlo sampling in each replica with each virtual temperature and exchanges of samples between neighboring replicas enable escape from local optima and efficient achievement of the global optimal solution.By utilizing sampling results in all replicas, it is also possible to calculate Bayes free energy.The total number of the Monte Carlo sweeps was 10,000 including 5,000 burn-in sweeps.
Figure 2a-c shows the color maps of the observed Tr-XRD patterns and estimated results with Model 1 and Model 2. Both the models appear to be in good qualitative agreement with the observed data.To calculate the model selection probabilities p(M|D) , which quantify the realization probability of each model, we compared Bayes free energies in Model 1 and Model 2. Bayes free energy of Model 2 is lower than that of Model 1 by around 300, meaning that p(M = 2|D) is almost equal to 1 as shown in Fig. 2d.The Bayesian model selection suggests that Model 2 is much better than Model 1, which indicates the peak broadening during the adsorption process.

Estimation of the adsorption properties.
Based on the selected model (Model 2), we estimated the adsorption properties using the posterior probability distribution.Since p(θ M=2 , σ |D, M = 2) is a multivari- ate probability distribution and difficult to visualize, we consider the marginalized posterior probability dis- where k denotes the index of a specific parameter.By using the marginalization, we can obtain the one-dimensional probability distribution of each parameter.Figure 3 shows the marginalized posterior probability distributions of the adsorption properties, i.e., the adsorption start time ( t 0 ), rate constant ( κ ), and number of dimensions ( n ).Each probability distribution in Fig. 3 consists of a single peak, indicating that the optimal value of each parameter was uniquely estimated.By using one standard deviation ( 1σ ) of each distribution as a measure of accuracy, the adsorption start time  was estimated to be t 0 = 7.4461 ± 0.0287 s.The estimated value of t 0 deviates significantly from t s = 6.327s , indicating that the time lag between t s and t 0 was large in the present experiment.Since the conventional analysis substitutes t s for t 0 , the large time lag leads to misunderstanding of the dynamics.Since t s is usually selected from the time points observed every 0.333 s, the accuracy of t 0 in the Bayesian analysis (~ 0.029 s) is one order of mag- nitude greater than that in the conventional analysis (~ 0.333 s).The rate constant and the number of dimensions were estimated to be κ = 0.6192 ± 0.0235 1/s and n = 0.9680 ± 0.0382 in the Bayesian analysis.On the other hand, in the conventional analysis, the values were estimated to be κ = 0.1535 1/s and n = 1.8460 .We consider that these differences are primarily due to the large time lag between t gate and t 0 .
To visualize the difference in results between the Bayesian analysis and the conventional analysis, we employed the time evolution of fractions transformed to the adsorption phase.The time evolution of the fractions refers to 031 diffraction peak area of the adsorption phase and was estimated by using the KJMA equation.Figure 4 shows the comparison of the fractions estimated by the Bayesian analysis and the conventional analysis.As shown in the figure, the time evolution of the Bayesian framework is apparently different from that of the conventional framework.Since the difference is significant in the time region between t s and t 0 , estimating t 0 is quite important for understanding the dynamics.To examine the accuracy of the estimated t 0 , we consider that a comparison with another probing technique such as X-ray absorption spectroscopy would be helpful because XRD is inherently less sensitive to local phenomena.

Validation of the Bayesian analysis with numerical simulations. To validate deviation between
true values and estimated values in the Bayesian framework, we performed numerical simulations using artificial Tr-XRD data.The artificial data was generated based on Model 2 by adding Gaussian noise with the same level as the observed data.The Bayesian analysis of the artificial data was performed by using Model 2 under the same prior distributions as the observed data analysis.The color maps of the artificial data and the estimation results are shown in Fig. 5a,b.These color maps are in good qualitative agreement, indicating successful fitting of the artificial data.
Figure 5c-e shows the marginalized posterior probability distributions of the adsorption properties.The adsorption start time was estimated to be t 0 = 7.3982 ± 0.0298 s, while the true value was t 0 = 7.4461 s.The true value is outside of the estimated t 0 range (from 7.3684 s to 7.4280 s).However, the true value is included in the probability distribution and the estimation accuracy is much higher than the conventionally required accuracy (~ 0.333 s) as shown in Fig. 5c.Therefore, the estimation of t 0 is accurate enough to understand the adsorption dynamics.The rate constant and number of dimensions were estimated to be κ = 0.5980 ± 0.0219 1/s and n = 0.9921 ± 0.0343 , respectively.Since the true values were κ = 0.6192 1/s and n = 0.9680 , the values are included in the estimated ranges of κ and n .In addition, the accuracies required to understand the dynamics are ~ 0.1 s and ~ 0.1 in the present case, indicating that the estimation of κ and n was successfully performed.The results of the numerical simulations show that the information about the adsorption properties can be extracted by the Bayesian analysis with high accuracy even when the data contains considerable noise, indicating that the Bayesian analysis enables precise understanding of the dynamics from experimentally observed data.

Discussion
To understand the dynamics observed via Tr-XRD, it is essential to analyze the data using the appropriate timeevolution model ( M ) and precise start time ( t 0 ).Since M and t 0 cannot be measured directly, the conventional analysis has difficulty extracting the information about M and t 0 from the experimentally observed data.To overcome the difficulties, we established the Bayesian framework for analyzing dynamic processes and demonstrated the Bayesian analysis of the Ar gas adsorption process in a typical MOF.The Bayesian model selection suggested that Model 2 was much better than Model 1, indicating the peak broadening during the adsorption process.Although human judgement has been used to determine the time evolution such as the peak broadening and shift, the Bayesian analysis enables the selection of the most plausible time-evolution model without involving human judgement.
Since the conventional analysis substitutes t s for t 0 , we understood the adsorption dynamics incorrectly due to the large time lag between t s and t 0 .However, the Bayesian analysis enables the estimation of t 0 directly from the experimental data with an accuracy one order of magnitude greater than that of the conventional analysis.In addition, other adsorption properties such as the rate constant ( κ ) and growth dimension ( n ) were estimated with much higher accuracies than the accuracies required to understand the dynamics.
The estimated rate constant of Ar gas adsorption was lower than that of O 2 gas adsorption reported in ref. 6.If the time difference between t s and t 0 is small in the O 2 gas adsorption measurements, the slower adsorption can be attributed to the stronger influence of the potential barrier in the nanopores because the molecular size of Ar (3.405 Å) is larger than that of O 2 (2.930 Å) 6,25 .The number of dimensions was estimated to be ~ 1, indicating quasi-one-dimensional growth in the Ar gas adsorption phase, which is similar to O 2 gas adsorption case.By applying the Bayesian analysis to various gas adsorption processes, more quantitative comparison can be realized on the basis of the posterior probability distributions.We also consider that the integrated Bayesian analysis of multiple diffraction peaks is effective for deeper understanding of the adsorption dynamics.
Our framework can be applied to other dynamic processes by modifying the time-evolution models.We should note that the applications are limited to dynamic processes for which time-evolution models can be formulated.For example, in the case of solid-state chemical reactions, various reaction models such as nucleation models and geometrical contraction models can be used to represent the time evolution 26 .The Bayesian analysis enables the selection of the most plausible reaction model among existing reaction models.It may be also possible to design new models and evaluate the models through the Bayesian analysis, which could lead to advances in reaction model research.Moreover, our framework can be applied to a wide variety of one-dimensional datasets observed by various measurements such as electron powder diffraction, neutron powder diffraction, and various types of spectroscopies.It is also possible in principle to extend our framework to two-dimensional datasets.Hence, the Bayesian analysis is expected to be utilized in various research areas of materials science.

Methods
Time-resolved X-ray diffraction measurement.The CPL-1 sample was placed at 2 mm from the tip into a borosilicate capillary with a 0.5-mm inner diameter, which was attached to a stainless-steel tube with double O-rings.Time-resolved in situ synchrotron X-ray powder diffraction patterns of CPL-1 during adsorption of Ar at 100 K were measured on the BL02B2 beamline of the SPring-8 synchrotron facility, Japan, by using a large Debye-Scherrer-type diffractometer with a multi-modular system constructed with six MYTHEN detectors 27 .The sample temperature was controlled by hot N 2 gas flow devices, and the sample atmosphere was controlled Generative model for time-resolved X-ray diffraction patterns.The time-resolved X-ray diffraction data are given as D = {x, t, I} = x i , t j , I ij i=1,...,N x ,j=1,...,N t , where x , t , and I denote the diffraction angles, time points, and diffraction intensities, respectively.We consider that a diffraction intensity at an angle x i and a time point t j is generated by adding measurement noise ε ij as where f x i , t j ; θ M , M is a model function with parameters θ M and M .If the noise is assumed to follow a Gauss- ian distribution as ε ij ∼ N (0, σ 2 ) with noise variance σ 2 , the conditional probability p I ij |θ M , M, σ , x i , t j is equal to p(ε ij ) and is represented as Regarding the noise at each data point as independent, we can express the conditional probability for all data points as where In a typical time-resolved X-ray diffraction experiment, x and t are set before the measurements in a non- probabilistic manner, which means p(D|θ M , M, σ ) = p(I|θ M , M, σ , x, t).
Model functions for recognizing the observed data.Assuming the shape of a diffraction peak to be a Gaussian function, we can express the model function f (x, t; θ M , M) as where Here, i = 1 and i = 2 correspond to the adsorption and desorption phases, respectively.a M i (t) , c M i (t) , and w M i (t) represent the time-evolution function for the peak area, center angle, and full width at half maximum (FWHM), respectively.In this study, we designed two different time-evolution models, i.e., Model 1 ( M = 1 ) and Model 2 ( M = 2 ), and selected a more plausible model using Bayes free energy, as described below.In both models, we used the Kolmogorov-Johnson-Mehl-Avrami equation 23,24 for the time-evolution of the peak area as and where t 0 , κ , and n indicate the adsorption start time, rate constant, and number of dimensions, respectively, while A and B correspond to the scale parameters.For the time evolution of the peak center angles, assuming little or no angular variation, we used linear functions as and ( 1) (3) p(I|θ M , M, σ , x, t) = (5) f (x, t; θ M , M) =

2 Figure 1 .
Figure 1.Difference between (a) the Bayesian framework and (b) the conventional framework for analyzing a dynamic adsorption process observed via time-resolved X-ray diffraction (Tr-XRD).(a) In the Bayesian framework, we construct the generative model including a peak model and time-evolution models.This is a probabilistic modeling of generated data conditioned on parameters and noise.Then, we use Bayes theorem to estimate the posterior probability distribution of adsorption properties such as the rate constant (κ) , growth dimension (n) , and adsorption start time (t 0 ). (b) The conventional framework is based on the stepwise application of non-linear least-squares fitting, i.e. (Step 1) fitting the diffraction peaks at each time point and (Step 2) fitting the variation in the peak area over time with the Kolmogorov-Johnson-Mehl-Avrami (KJMA) equation and gas-shot time ( t s ).

Figure 2 .
Figure 2. (a-c) Color maps of the observed time-resolved X-ray diffraction patterns and estimated results with Model 1 and Model 2. (d) Comparison between Bayes free energies of Model 1 and Model 2. The model selection probabilities are also shown.

Figure 3 .
Figure 3. Marginalized posterior probability distributions of the adsorption properties, i.e., the adsorption start time ( t 0 ), rate constant ( κ ), and number of dimensions ( n ).The vertical line and error bar for each parameter correspond to the maximum and one standard deviation ( 1σ ) of the distribution.

Figure 4 .
Figure 4. Fractions of the adsorption phase estimated by the Bayesian framework and the conventional framework.The blue line denotes the Bayesian analysis result, while the red line corresponds to the conventional result.The vertical lines represent the gas-shot time ( t s ) and estimated adsorption start time ( t 0 ).

Figure 5 .
Figure 5. Color maps of the (a) artificial data and (b) estimation by the Bayesian framework.Posterior probability distributions of the (c) adsorption start time, (d) rate constant, and (e) number of dimensions.The vertical dashed lines denote the true values.The error bars for each parameter correspond to the accuracy required to understand the dynamics and one standard deviation ( 1σ ) of the distribution.