Physical constraints and functional plasticity of cellulases

Enzyme reactions, both in Nature and technical applications, commonly occur at the interface of immiscible phases. Nevertheless, stringent descriptions of interfacial enzyme catalysis remain sparse, and this is partly due to a shortage of coherent experimental data to guide and assess such work. In this work, we produced and kinetically characterized 83 cellulases, which revealed a conspicuous linear free energy relationship (LFER) between the substrate binding strength and the activation barrier. The scaling occurred despite the investigated enzymes being structurally and mechanistically diverse. We suggest that the scaling reflects basic physical restrictions of the hydrolytic process and that evolutionary selection has condensed cellulase phenotypes near the line. One consequence of the LFER is that the activity of a cellulase can be estimated from its substrate binding strength, irrespectively of structural and mechanistic details, and this appears promising for in silico selection and design within this industrially important group of enzymes.

E nzyme reactions at interfaces are common both in Nature and industry 1 . About half of the enzymes in the living cell work at a membrane surface 2 and many technical enzyme applications involve catalysis at the solid-liquid interface 3 . Examples of the latter include the use of immobilized enzymes in protein arrays or biosensors 4 , but more commonly, the activity of soluble enzymes on insoluble substrates such as polysaccharides, lipids, precipitated proteins 5 or more recently plastic 6 . Studies of heterogeneous enzyme reactions have shown that substrate specificity 7 , turnover number 8 , and enzyme-substrate binding affinity 9 can be significantly altered at an interface compared to analogous reactions in the bulk. Nevertheless, the kinetics of interfacial reactions is typically disregarded or fleetingly treated in textbooks [10][11][12][13][14] , and this state of affairs is quite different from conventional (non-biochemical) catalysis, where homogeneous and heterogeneous reactions are treated in parallel. Although insightful models and concepts of interfacial enzyme kinetics have been suggested [15][16][17] , no generally applied kinetic approach or rate equation currently exist. Neither is it clear whether progress in this field should be based on adaptation of conventional enzyme kinetic theory, or modifications of concepts and principles taken from inorganic heterogeneous catalysis.
Here we investigated heterogeneous enzyme catalysis using cellulases as a paradigm. These enzymes catalyze the hydrolysis of the β-1,4 glycosidic bond that links glucopyranose units in (insoluble) cellulose and constitute a generic and experimentally convenient example of interfacial enzymes. In addition, cellulases are of direct industrial interest since enzymatic conversion of lignocellulosic biomass into fermentable sugars (known as saccharification) is expected to play a key role in the upcoming biorefineries that produce fuels, chemicals, and materials from sustainable feedstocks [18][19][20] . We focused on fungal cellulases, which are commonly applied in industrial enzyme cocktails 21 , and investigated enzymes from Glycoside Hydrolase (GH) family 5, 6, 7, 12, and 45 22 . Specifically, we produced and biochemically characterized 83 enzymes using insoluble cellulose as substrate. The characterized cellulases included both, wild types and variants, and represented a wide range of structural and functional differences (see Fig. 1). Nevertheless, the kinetic data showed a clear common trait as we found a conspicuous scaling between the apparent Michaelis-Menten (MM) constant (K M ) and the maximal turnover (k cat ) across the entire group of cellulases. The scaling could be expressed as a so-called linear free energy relationship (LFER), and we used this to discuss functional plasticity and physical constraints for the enzymatic conversion of cellulose. We argue that the LFER for cellulases may facilitate both mechanistic and evolutionary studies, and act as guidance in future attempts to select or design improved technical enzymes. Moreover, the observed LFER is reminiscent of the behavior found for some well-described inorganic heterogeneous catalysts, and this may help to establish better theoretical frameworks for interfacial enzyme reactions.

Results
Enzyme production. The investigated enzymes were selected from five GH families as illustrated in Fig. 1 and Table 1. These families (GH5, GH6, GH7, GH12, and GH45) cover essentially all major fungal cellulases 21 and hence represent a wide range of structures and mechanisms. This included enzymes with or without a carbohydratebinding module (CBM), enzymes using an inverting or retaining mechanism, enzymes that attack the cellulose chain internally (endoglucanases, EGs) or at a chain end (cellobiohydrolases, CBHs) and enzymes with different degrees of processivity. In addition to the wild types, a library of cellulase variants was made with the intention of changing the enzyme-substrate binding strength. This library included variants with mutations in the CBM, linker, and catalytic domain, as well as variants, where the CBM and linker were added, removed or swapped. A full list of the enzymes characterized here, can be found in supplementary Table 4 in the supplementary information (SI).
Kinetic analysis. All enzymes were characterized by MM kinetics using microcrystalline cellulose (Avicel PH-101) as substrate. Quasi-steady-state rates (v ss ) were measured at a constant, low enzyme concentration (E 0 ) and different substrate loads (S 0 ), and analyzed by the MM-equation (Eq. 1) using non-linear regression. The resulting kinetic parameters (K M and k cat ) are listed in supplementary Table 4. Previous studies have identified practical procedures for measuring the quasi-steady-state rate for this type of system 23 and shown that Eq. 1 is valid and applicable even though the substrate is solid and specified by its mass load (S 0 ) in units of g/L 24,25 . The derived rates were based on soluble products only and control experiments (supplementary Table 3) showed that this was a good descriptor of the overall activity even for EGs.
In Fig. 2 investigated enzymes to illustrate the power-law correlation between the kinetic parameters. From the main panel (Fig. 2a), it appeared that most enzymes clustered in a narrow lane around the diagonal. Some enzymes were located below the diagonal, but we did not find any above. To assess whether the experimental points in Fig. 2 correlated with structural or functional properties of the studied enzymes, we highlighted specific sub-groups in the dataset in five separate subplots (Panels b-f, Fig. 2). Linear regression showed that the slope in Fig. 2 was 0.74 ± 0.02. Regression outliers were identified based on studentized residual analysis using a conservative cutoff of ±2.5σ. The outliers were omitted from the regression analysis and identified by open symbols in Fig. 2. A list of kinetic parameters for all investigated enzymes can be found in supplementary Table 4.
Computational analysis. The strong correlation between ln(K M ) and ln(k cat ) shown in Fig. 2 Fig. 8). We selected enzymes that spanned a wide range of K M values and represented all structural and functional classes listed in Fig. 1. For modular cellulases, the contribution of the CBM to binding energy ΔG°B was computed separately. To compare with experiments we used k cat and K M , to estimate changes in respectively transition-state free energy (ΔΔG ‡ ) and standard free energy of ligand binding (ΔΔG°B) following well-established principles [26][27][28] . Specifically, we used the equations which introduce a reference enzyme with the kinetic parameters K M,ref and k cat,ref . Hence, the calculated free energies are energy changes relative to the selected reference. This approach alleviates ambiguities regarding standard states (Eq. 2) and pre-exponential factors (Eq. 3). We used the GH6 cellobiohydrolase from Trichoderma reseei (TrCel6A) as our reference enzyme and it follows that this enzyme will have ΔΔG ‡ = ΔΔG°B = 0.
The validity of Eq. 2 is dependent on whether K M can be interpreted as a descriptor of the enzyme-substrate affinity. The comparison in Fig. 3a showed that despite the diversity of the analyzed cellulases, computed changes in binding affinity, ΔΔG B,MD , scaled reasonably well with the experimental values, ΔΔG B,exp , derived from Eq. 2. This supports the validity of Eq. 2 for this system and the idea of using computed ligand-binding energies to predict catalytic rates. Figure 3b illustrate the scaling between ΔΔG B,MD and ΔΔG ‡ exp .

Discussion
In this study, we produced and kinetically characterized 83 enzymes covering essentially all classes of fungal cellulases (Table 1 and Fig. 1). We used the same expression host, to ensure the enzymes were exposed to the same apparatus of posttranslational modifications. Moreover, kinetic characterizations were based on the same substrate, experimental conditions, and principles of analysis. This provided a robust basis for comparative analyses of interfacial enzymes in general and cellulases in particular. Indeed, the breadth of the dataset allowed us to identify a striking correlation between ln(K M ) and ln(k cat ) and in the following we discuss the origin and corollaries of this observation.
Enzyme fitness and physical constraints. Figure 2 may be seen as a fitness landscape for cellulases attacking their native insoluble substrate, and it appears that most enzymes accumulated around the diagonal. The diagonal defines a continuum ranging from enzymes with weak substrate interactions and rapid turnover (high K M and k cat ), to enzymes with stronger interactions, but slower turnover (low K M and k cat ). The tendency to accumulate along the diagonal was observed for all types of cellulases (refer to Table 1 and Fig. 1), and hence does not seem to rely on specific structural or mechanistic properties. Rather, it appears that the maximal turnover can be expressed solely by one descriptor, namely K M . The area below the diagonal in Fig. 2 represents a region where the enzymes have a low specificity constant (i.e., low k cat /K M ), and this seems to signify inefficient catalysis. We found some enzymes in this range, including some wild type enzymes and variants with replacements of key amino acid residues. We suggest that this southeastern region of the fitness landscape represents enzymes that have been either catalytically impaired by our engineering, are structurally unstable under the selected conditions, or have other primary substrate preference than cellulose.
On the other hand, the region above the diagonal in Fig. 2, specifies enzymes, which have a high specificity constant on this substrate. This clearly appears functionally advantageous, but we did not find any cellulases in this northwestern region. We suggest that this absence is the result of basic physical restrictions of the cellulolytic process. It follows that the accumulation of data points in a narrow lane in Fig. 2 may be seen as a balance between . Experimental free energies were calculated using Eqs. 2 and 3. The kinetic parameters (K M and k cat ) of the nine cellulases can be found in supplementary  Table 4. The selected cellulases covered a wide range of kinetic parameters shown in Fig. 2 and encompassed all main structural and functional traits specified in Fig. 1. Standard deviations of the experimental free energies and computed free energies are shown as error bars. evolutionary selection, which drives the kinetic parameters toward the northwest, and physical constraints, which prevents this development beyond the boundary defined by the line in Fig. 2.
The engineered variants in Fig. 2b represent a range of replacements and deletions at different positions (see supplementary Table 4), which were designed with the overall purpose of altering ligand-binding strength. In a few cases, the mutations shifted the variants into the southeastern "wasteland" of the fitness landscape, but most remained on the diagonal. The tendency to stay on the line did not reflect that the variants had unaltered kinetic parameters. Rather, changes in K M and k cat tended to compensate. Some examples of this are highlighted in Fig. 4, and it appears that both point mutations, and extensive changes in the amino acid sequence, readily moved kinetic parameters up or down the diagonal, but rarely sent them off the line. Interestingly, the vast majority of the variants moved up the line compared to their respective wild type, and only in cases where a CBM was added to a CBM-less wild type (Fig. 4a) did the variant move down the line toward lower K M and k cat values. This indicates that the wild-type enzymes have evolved to have high affinity for the substrate rather than high turnover. Nonetheless, the differences in affinities across GH families may be important in Nature where cellulose is degraded by cellulases from multiple GH families.
Origin of physical constraint. Correlations between binding and activation free energies are well-known in both organic and inorganic catalysis 29 , but have only been sporadically used for (homogenous) enzyme reactions 30,31 . A LFER exists, if the binding free energy, ΔG°B, scales linearly with the free energy of activation, ΔG ‡ . This is tantamount to proportionality between the changes in these two free energies, and we may write where Φ is a scaling constant that convert changes in the binding free energy (ΔΔG o B ) to changes in activation free energy (ΔΔG ‡ ). The correlation shown in Fig. 2 may be interpreted as an LFER if the K M values can be interpreted as a dissociation constant for the enzyme-substrate complex. In general one has to be cautious when using the (apparent) K M value as affinity descriptor for complex enzyme reactions such as the one studied here. However, such interpretation of K M has been successfully used earlier [26][27][28] and it is also in line with the MD results (Fig. 3a) that showed good correlations between computed ligand-binding energies and experimental binding energies calculated using Eq. 2. The validity of K M as a descriptor of the enzyme-substrate affinity of the investigated enzymes is further discussed in the SI (see supplementary note 1 and 2).
Using Eqs. 2 and 3 we calculated ΔΔG°B and ΔΔG ‡ and found that the two free energies correlated with a slope of Φ = −0.74 ± 0.02 (see supplementary Fig. 9). This is the same slope as found for the line in Fig. 2 but with opposite sign due to the minus in Eq. 3 (e.g., low activation energies gives high k cat values). The scaling constant, Φ in Eq. 4 provides some information about the nature of the transition state (TS), and this idea has been used, for example, to elucidate the TS of protein folding 32 . As proposed by Warshel 27 , the Φ -value also provides a means to classify effects of mutations on enzyme function. If, for example, both the enzyme-substrate complex and the TS in a variant are stabilized to the same extent (so-called uniform binding, see Figs. 5b-1) Φ would be 0 since the activation energy would remain unchanged (i.e., and ΔΔG ‡ = 0). Another illustrative case is when changes in interactions only manifest themselves in the TS (so-called TSstabilization, Figs. 5b-2). This results in Φ → ∝ since the activation energy can be changed independently of the binding energy. Finally, if mutations only act to stabilize the ground state complex (GS stabilization, Figs. 5b-3), ΔΔG ‡ will change commensurate with ΔΔG°B, and Φ = −1.
This interpretation of Φ -values was developed to classify mutants that were closely related in structure, but in the current context it may elucidate differences across cellulases (wild types and mutants) with widely different structures and mechanisms. We found Φ = −0.74 ± 0.02 (see supplementary Fig. 9), and it follows that kinetic differences among the investigated cellulases can be mostly ascribed to differences in the degree of GS stabilization. This has the noticeable consequence that the free energy of the (rate-limiting) TS is quite similar for all tested enzymes, and that the main kinetic diversity lies in different affinities for the substrate. This is illustrated in Figs. 5b-3, which shows that tighter binding to the substrate (red trace) unavoidably leads to a higher activation energy if the TS is (almost) fixed. Experimental studies have suggested that the ratelimiting step for some cellulases is slow dissociation [33][34][35][36][37] . Since weaker binding is associated with a lower activation barrier for dissociation (Figs. 5b-3), a dissociation limited mechanism would explain the inverse correlation of binding strength and maximal turnover. Based on these considerations it is tempting to suggest that weak ligand-binding is a functional advantage since it invariably increases k cat . However, mutational studies suggest that weak binding is not necessarily advantageous for the efficacy of GHs attacking solid carbohydrates 38,39 . The characterized variants support this interpretation, since most of the variants moved up the line in Fig. 2 compared to the respective wild type, indicating that the wild types were optimized for high affinity. Strong ligand binding may be needed in order for the enzyme to transfer a cellulose chain from the cellulose surface, where it is strongly bound 40,41 , to the binding cleft (see cartoon in Fig. 5a). Hence, strong ligand binding appears to benefit catalysis by promoting ligand transfer 42 , but it is inevitably associated with a slow turnover of an off-rate controlled reaction, as illustrated in Figs. 5b-3. We suggest that the LFER between the binding energy and activation energy, is a direct consequence of the overall reaction being controlled by the on-off kinetics of the cellulases (see supplementary Fig. 6). The existence of LFERs for enzyme reactions governed by the chemical step remains to be investigated further, but meta-analyses of kinetic databases show little correlation between k cat and K M 26,28,43 . This is unlike many reactions in both homogenous and heterogeneous (non-biochemical) catalysis, which may be limited by an LFER even though the reaction is governed by a chemical step 44,45 . Kinetic (1) association, (2) hydrolysis, and (3) dissociation. b Schematic energy-diagrams for a wild type (black curve) and three conceptually different variants (red curves). c Expected scaling plots for a group of variants that behave according to the three different energy-diagrams shown in (b). If the energy of the variant differs from the wild type by the same amount in both transition state (TS) and ground state (GS), we have so-called uniform binding and Φ = 0 (panel b1 and c1). The parallel shift in energies for uniform binding implies that the same interactions occur in GS and TS. If, on the other hand, a mutation only lowers the TS energy, known as TS-stabilization, this leads to a vertical line in the scaling plot (panel b2 and c2). Finally, in GS-stabilization (panels B3 and c3), only the GS energy changes, while the TS remains fixed. In this case, Φ = −1, and this is close to the experimental observation (see supplementary Fig. 9). parameters for heterogeneous enzyme reactions are scarce. Thus, it is still an open question, whether scaling relations are as common in heterogeneous biocatalysis as they are in inorganic heterogeneous catalysis 46 , but the current study shows that cellulases are severely restricted by an LFER.
Consequences of the scaling relationship. One aspect of the proposed scaling of K M and k cat is that the initial rate, v ss (Eq. 1), may be approximated by just one of the kinetic parameters. To illustrate this, we combined Eqs. 2-4 and solved for k cat .
In Eq. 5 a = −Φ and A ¼ Equation 6 underscores, how ligand affinity is a double-edged sword. Hence, as demonstrated in the SI (supplementary note 3), Eq. 6 has a global maximum when K M attains the value This implies that at a fixed load of substrate, S 0 , a cellulase with low K M (i.e., K M < K M,opt ) will become a better catalyst (increase v ss ) if it is engineered for weaker substrate binding. Conversely, weakly binding enzymes (K M > K M,opt ) will gain from tighter binding. In the current case, a = 0.74 and insertion into Eq. (7) shows that K M,opt = 2.8 S 0 . In other words, the fastest initial rate on the current substrate (Avicel) will be observed for a cellulase that has a K M value that is around threefold higher than the Avicel load. To illustrate this, we plotted v ss as a function of K M for all of the investigated enzymes (excluding outliers identified in Fig. 2) at different substrate loads (Fig. 6). The results are in line with a previous observation 47 showing the so-called volcano plots, where cellulase activity tapers off on each side of the optimal affinity. Such volcano plots mirror the Sabatier principle, which states that the catalytic efficacy is optimal for a catalyst with intermediate binding strength 48 . Higher/lower affinity leads to a situation where dissociation/association limits the overall rate. The optimal affinity, K M,opt , depends on the substrate load and this is indicated by the black symbols in Fig. 6, which were calculated using Eq. 7. We emphasize that the appearance of an optimal K M is a direct consequence of the LFER, and that this type of analysis is well-established within (non-biochemical) heterogeneous catalysis 46,49 .
As a final example of an application, we note that the LFER may be useful in computational selection and design of enzymes for technical use. Thus, a link between activity and affinity provides an important simplification as it converts the highly complex problem of in silico assessment of enzyme turnover frequency to the more tractable challenge of calculating binding energy. To illustrate this, we computationally assessed the strength of enzyme-substrate interactions for a subset of nine enzymes spread along the diagonal in Fig. 2. As shown in Fig. 3a, the computed binding energies scaled with the experimental values. These results suggest that the kinetic properties of novel, uncharacterized enzymes may be estimated by combining computed binding data with an experimental LFER based on a limited number of enzymes. Hence, efficient enzymes for a given set of experimental conditions could be identified through in silico screening.
In closing, the kinetic characterization of a wide group of fungal cellulases on their native, insoluble substrate revealed a LFER between substrate binding and activation barrier. We propose that this reflects basic physical restrictions of the hydrolytic reaction, which limits the evolutionary selection to a narrow lane around the scaling line, irrespectively of the enzymes' fold, modularity, or catalytic mechanism. The scatter around the proposed scaling line in Fig. 2 corresponds to a factor of about 2 in the value of k cat . Hence, our results suggested that experimental k cat values for enzymes with approximately the same K M varied within this range. This variance encompassed a minor contribution from experimental errors, but it may also reflect kinetic diversity that results from differences in the mechanism and specificity of the tested enzymes. However, when we zoomed out and considered a broad range of K M values, this variance was modest, and the fitness landscape was dominated by a common scaling for all enzymes. Comparisons of wild types and variants revealed that small alterations in sequence (even point mutations) could lead to significant kinetic changes. In most cases, however, the changes involved a stringent movement on the scaling line rather than a shift away from the line, and this further demonstrated a strong coupling between affinity and turnover. We propose that this behavior is linked to the interfacial nature of the reaction. On one hand, strong ligand interactions are required to enable the transfer of a cellulose chain from the cellulose surface to the enzyme complex. On the other, a highly stable enzyme-substrate complex is inescapably associated with slow turnover (Figs. 5b-3). These relationships may help rationalize cellulolytic mechanisms and guide the selection of technical enzymes. It also appears that LFERs for interfacial enzyme reactions may establish a connection to (inorganic) heterogeneous catalysis, and hence pave the way for the use of practices and principles from this field within enzymology.

Methods
Enzymes and kinetic measurements. Experimental methods used in this work have been described elsewhere (see supplementary Table 4). Briefly, we expressed all enzymes heterologously in Aspergillus oryzae and purified as described elsewhere 50,51 . Engineered enzymes containing single or multiple amino acid substitutions, deletions or insertions was made using splicing overlap extension (SOE) PCR or by expression vector 50 . A full list of primers can be found in supplementary Table 5. For variants with added CBM, gBlocks ™ Gene Fragments was ordered from Integrated DNA Technologies (IDT) overhang of 24 bp for SOE. SDS-PAGE gels (15-well NuPAGE 4-12% BisTris, GE Healthcare) revealed a single band for the purified enzymes, and their concentrations were determined by UV   59 . The ten clusters lowest in energy were inspected and the lowest energy configuration from the cluster with the closest distance between the catalytic residues and the glycosidic bond of interest was taken. The CHARMM36 force field was used to describe the system 60 . All simulations were run in GROMACS 2018.6 61 . Catalytic acids of all CDs were protonated. GROMACS was used to construct a cuboid box with edge lengths of 9.4 × 9.4 × 20 nm and the complexes were positioned at 4.7, 4.7, and 3.3 nm. The complexes were rotated so that the center of mass of the last and the fourth last sugar unit of the ligands were parallel to the z-axis. The systems were solvated with TIP3P water. To neutralize the net charges of the systems, random water molecules were exchanged with sodium ions. Minimization was conducted in a steepest-descent over 10'000 iterations. All subsequent simulations were performed at 300 K. NVT-simulations were performed for 100 ps while keeping the complex restraint. Thereafter, NPT-simulations with restraints on the solutes were performed for 100 ps. For all further simulations, only C α further away than 1.5 nm from the ligand were restrained. A second round of NPT-simulations with the new restraints were performed for 100 ps. RMSD analysis of the protein backbone showed, that this time was sufficient to reach an equilibrated state. Thereafter, steered MD simulations were done over 800 ps with a pulling rate of 0.01 nm/ps and a force constant of 1000 kJ/mol/nm 2 . The pull was performed on the first sugar unit of the cellononaose ligand in z direction. The resulting trajectories were used to prepare further simulations. Frames every travelled 0.5 Å by the ligand were extracted up to a final distance of 1 nm between the CD and the ligand. The extracted frames were used as starting configuration for Umbrella sampling simulation along the binding path. Each window was simulated for 620 ps, where the first 20 ps were disregarded as equilibration. It should be noted, that TrCel6A works from the opposite end compared to the other cellulases 21 . The set-up was adapted accordingly.
Simulations of the carbohydrate-binding modules. If available, the structures were taken from the Protein Data Bank (CBM1 of TrCel7A: 2CBH, CBM1 of TrCel7B: 4BMF). Otherwise, they were prepared through homology modelling by Modeller 62 (CBM1 of TrCel6A, CBM1 of TrCel5A, CBM1 of HiCel45A). A cellulose crystal of the type Iβ with a length of 5, a width of 6, and a depth of 3 unit cells was generated with the Cellulose Builder web server 63 . The CBMs were placed on the surface according to Beckham, et al 64 . A cubic box with a minimal distance of 1.0 nm was constructed. The crystal plane was oriented perpendicular to the zaxis. The simulations were performed in a similar fashion as the ones for the CD domains. However, the heavy atoms of the crystals were kept constrained after the energy minimization and the second NPT-simulation was increased to 1 ns to get the CBM settled on the crystal surface.
Analysis. Analysis of the trajectories was performed with GROMACS. The weighted histogram analysis method (WHAM) was applied to analyze the Umbrella sampling simulations along the binding path 65 . If density gaps occurred, additional windows at those distances were inserted iteratively until no gaps occurred. From the resulting PFM curves, the energy difference between the minimum and the maximum of those curves were taken. The errors were estimated with bootstrapping. Obtained ΔG B,MD values from the CD and CBM part were added up to give values for the full enzyme. The energies were normalized by the values from the reference enzyme TrCel6A. This resulted in ΔΔG B,MD values, which are more readily comparable to the experimental ΔΔG B,Exp . Linear regressions of the experimental binding energy and experimental activation energy against the computed binding energy were performed. The former resulted in a linear fit in the form of y = 0.16x + 0.2 and with a Pearson's coefficient r 2 = 0.93 and the later resulted in y = 0.13x + 0.67 with r 2 = 0.81. To counteract this known systematic overestimation issues of the method [66][67][68] and of the carbohydrate binding in general 69,70 , a linear transformation on the initially obtained computed binding energies was performed using the parameters from the linear regressions. The final results for the prediction of the binding energies had a root-meansquared error (RMSE) of 0.86 kJ/mol, the ones for the prediction of the activation energy had RMSE of 1.20 kJ/mol.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.