Abstract
Automated reaction prediction has the potential to elucidate complex reaction networks for applications ranging from combustion to materials degradation, but computational cost and inconsistent reaction coverage are still obstacles to exploring deep reaction networks. Here we show that cost can be reduced and reaction coverage can be increased simultaneously by relatively straightforward modifications of the reaction enumeration, geometry initialization and transition state convergence algorithms that are common to many prediction methodologies. These components are implemented in the context of yet another reaction program (YARP), our reaction prediction package with which we report reaction discovery benchmarks for organic single-step reactions, thermal degradation of a γ-ketohydroperoxide, and competing ring-closures in a large organic molecule. Compared with recent benchmarks, YARP (re)discovers both established and unreported reaction pathways and products while simultaneously reducing the cost of reaction characterization by nearly 100-fold and increasing convergence of transition states. This combination of ultra-low cost and high reaction coverage creates opportunities to explore the reactivity of larger systems and more complex reaction networks for applications such as chemical degradation, where computational cost is a bottleneck.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Source data for Figs. 3–6 and Extended Data Figs. 2 and 3 are available in Source Data. Data referenced from other studies were scraped from the manuscripts or supporting information of the indicated publications, including the Zimmerman20 and KHP decomposition datasets30. Further raw data sources generated by this work are available at https://doi.org/10.6084/m9.figshare.14766624 (ref. 66), including raw output files and molecular geometries.
Code availability
The version of YARP used in this study and a guide to reproducing the results is available through GitHub under the GNU GPL-3.0 License (https://github.com/zhaoqy1996/YARP). The specific version of the package used to generate the results in the current study can be found at https://doi.org/10.5281/zenodo.4947195 (ref. 67).
References
Westbrook, C. K., Mizobuchi, Y., Poinsot, T. J., Smith, P. J. & Warnatz, J. Computational combustion. Proc. Combust. Inst. 30, 125–157 (2005).
Sarathy, S. M. et al. Comprehensive chemical kinetic modeling of the oxidation of 2-methylalkanes from C7 to C20. Combust. Flame 158, 2338–2357 (2011).
Rodrigo, G., Carrera, J., Prather, K. J. & Jaramillo, A. DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics 24, 2554–2556 (2008).
Wu, D., Wang, Q., Assary, R. S., Broadbelt, L. J. & Krilov, G. A computational approach to design and evaluate enzymatic reaction pathways: application to 1-butanol production from pyruvate. J. Chem. Inf. Model. 51, 1634–1647 (2011).
Stine, A. et al. Exploring de novo metabolic pathways from pyruvate to propionic acid. Biotechnol. Prog. 32, 303–311 (2016).
Jalan, A., Allen, J. W. & Green, W. H. Chemically activated formation of organic acids in reactions of the Criegee intermediate with aldehydes and ketones. Phys. Chem. Chem. Phys. 15, 16841–16852 (2013).
Rousso, A. C., Hansen, N., Jasper, A. W. & Ju, Y. Identification of the Criegee intermediate reaction network in ethylene ozonolysis: impact on energy conversion strategies and atmospheric chemistry. Phys. Chem. Chem. Phys. 21, 7341–7357 (2019).
Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).
Simm, G. N., Vaucher, A. C. & Reiher, M. Exploration of reaction pathways and chemical transformation networks. J. Phys. Chem. A 123, 385–399 (2018).
Green, W. H. Computer Aided Chemical Engineering Vol. 45, 259–294 (Elsevier, 2019).
Vernuccio, S. & Broadbelt, L. J. Discerning complex reaction networks using automated generators. AIChE J. 65, e16663 (2019).
Coley, C. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
Schreck, J. S., Coley, C. W. & Bishop, K. J. Learning retrosynthetic planning through simulated experience. ACS Cent. Sci. 5, 970–981 (2019).
Henkelman, G., Uberuaga, B. P. & Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 113, 9901–9904 (2000).
Zimmerman, P. M. Growing string method with interpolation and optimization in internal coordinates: method and examples. J. Chem. Phys. 138, 184102 (2013).
Birkholz, A. B. & Schlegel, H. B. Path optimization by a variational reaction coordinate method. I. Development of formalism and algorithms. J. Chem. Phys. 143, 244101 (2015).
Behn, A., Zimmerman, P. M., Bell, A. T. & Head-Gordon, M. Efficient exploration of reaction paths via a freezing string method. J. Chem. Phys. 135, 224108 (2011).
Zimmerman, P. M. Reliable transition state searches integrated with the growing string method. J. Chem. Theory Comput. 9, 3043–3050 (2013).
Martínez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem. Res. 50, 652–656 (2017).
Dewyer, A. L., Argüelles, A. J. & Zimmerman, P. M. Methods for exploring reaction space in molecular systems. Wiley Interdiscip. Rev. Comput. Mol. Sci. 8, e1354 (2018).
Unsleber, J. P. & Reiher, M. The exploration of chemical reaction networks. Annu. Rev. Phys. Chem. 71, 121–142 (2020).
Luo, Y., Maeda, S. & Ohno, K. Automated exploration of stable isomers of H+(H2O)n (n = 5–7) via ab initio calculations: an application of the anharmonic downward distortion following algorithm. J. Comput. Chem. 30, 952–961 (2009).
Maeda, S., Taketsugu, T. & Morokuma, K. Exploring transition state structures for intramolecular pathways by the artificial force induced reaction method. J. Comput. Chem. 35, 166–173 (2014).
Maeda, S., Harabuchi, Y., Takagi, M., Taketsugu, T. & Morokuma, K. Artificial force induced reaction (AFIR) method for exploring quantum chemical potential energy surfaces. Chem. Rec. 16, 2232–2248 (2016).
Shang, C. & Liu, Z. P. Stochastic surface walking method for structure prediction and pathway searching. J. Chem. Theory Comput. 9, 1838–1845 (2013).
Zimmerman, P. M. Automated discovery of chemically reasonable elementary reaction steps. J. Comput. Chem. 34, 1385–1392 (2013).
Suleimanov, Y. V. & Green, W. H. Automated discovery of elementary chemical reaction steps using freezing string and Berny optimization methods. J. Chem. Theory Comput. 11, 4248–4259 (2015).
Grambow, C. A. et al. Unimolecular reaction pathways of a γ-ketohydroperoxide from combined application of automated reaction discovery methods. J. Am. Chem. Soc. 140, 1035–1048 (2018).
Broadbelt, L. J., Stark, S. M. & Klein, M. T. Computer generated pyrolysis modeling: on-the-fly generation of species, reactions, and rates. Ind. Eng. Chem. Res. 33, 790–799 (1994).
Gao, C. W., Allen, J. W., Green, W. H. & West, R. H. Reaction mechanism generator: automatic construction of chemical kinetic mechanisms. Comput. Phys. Commun. 203, 212–225 (2016).
Van de Vijver, R. & Zádor, J. KinBot: automated stationary point search on potential energy surfaces. Comput. Phys. Commun. 248, 106947 (2020).
Bergeler, M., Simm, G. N., Proppe, J. & Reiher, M. Heuristics-guided exploration of reaction mechanisms. J. Chem. Theory Comput. 11, 5712–5722 (2015).
Puripat, M. et al. The Biginelli reaction is a urea-catalyzed organocatalytic multicomponent reaction. J. Org. Chem 80, 6959–6967 (2015).
Ludwig, J. R., Zimmerman, P. M., Gianino, J. B. & Schindler, C. S. Iron(iii)-catalysed carbonyl–olefin metathesis. Nature 533, 374–379 (2016).
Dewyer, A. L. & Zimmerman, P. M. Simulated mechanism for palladium-catalyzed, directed γ-arylation of piperidine. ACS Catal. 7, 5466–5477 (2017).
Jacobson, L. D. et al. Automated transition state search and its application to diverse types of organic reactions. J. Chem. Theory Comput. 13, 5780–5797 (2017).
Yang, M., Zou, J., Wang, G. & Li, S. Automatic reaction pathway search via combined molecular dynamics and coordinate driving method. J. Phys. Chem. A 121, 1351–1361 (2017).
Lu, T. & Law, C. K. Toward accommodating realistic fuel chemistry in large-scale computations. Prog. Energy Combust. Sci. 35, 192–215 (2009).
Van de Vijver, R. et al. Automatic mechanism and kinetic model generation for gas-and solution-phase processes: a perspective on best practices, recent advances, and future challenges. Int. J. Chem. Kinet. 47, 199–231 (2015).
Grimme, S., Bannwarth, C. & Shushkov, P. A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J. Chem. Theory Comput. 13, 1989–2009 (2017).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB? An accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
Maeda, S. & Harabuchi, Y. On benchmarking of automated methods for performing exhaustive reaction path search. J. Chem. Theory Comput. 15, 2111–2115 (2019).
Jalan, A. et al. New pathways for formation of acids and carbonyl products in low-temperature oxidation: the Korcek decomposition of γ-ketohydroperoxides. J. Am. Chem. Soc. 135, 11100–11114 (2013).
Pracht, P., Bohle, F. & Grimme, S. Automated exploration of the low-energy chemical space with fast quantum chemical methods. Phys. Chem. Chem. Phys. 22, 7169–7192 (2020).
Zhao, Q. & Savoie, B. M. Self-consistent component increment theory for predicting enthalpy of formation. J. Chem. Inf. Model. 60, 2199–2207 (2020).
Tsai, C. J. & Jordan, K. D. Use of an eigenmode method to locate the stationary points on the potential energy surfaces of selected argon and water clusters. J. Phys. Chem. 97, 11227–11237 (1993).
Maeda, S. & Ohno, K. Global mapping of equilibrium and transition structures on potential energy surfaces by the scaled hypersphere search method: applications to ab initio surfaces of formaldehyde and propyne molecules. J. Phys. Chem. A 109, 5742–5753 (2005).
Maeda, S., Ohno, K. & Morokuma, K. Systematic exploration of the mechanism of chemical reactions: the global reaction route mapping (GRRM) strategy using the ADDF and AFIR methods. Phys. Chem. Chem. Phys. 15, 3683–3701 (2013).
Martínez-Núñez, E. An automated method to find transition states using chemical dynamics simulations. J. Comput. Chem. 36, 222–234 (2015).
Yoneda, Y. A computer program package for the analysis, creation, and estimation of generalized reactions? GRACE. I. Generation of elementary reaction network in radical reactions? GRACE (I). Bull. Chem. Soc. Jpn. 52, 8–14 (1979).
Zimmerman, P. M. Navigating molecular space for reaction mechanisms: an efficient, automated procedure. Mol. Simul. 41, 43–54 (2015).
Kim, Y., Kim, J. W., Kim, Z. & Kim, W. Y. Efficient prediction of reaction paths through molecular graph and reaction network analysis. Chem. Sci. 9, 825–835 (2018).
Ugi, I. et al. New applications of computers in chemistry. Angew. Chem. Int. Ed. 18, 111–123 (1979).
Di Maio, F. P. & Lignola, P. G. KING, a kinetic network generator. Chem. Eng. Sci. 47, 2713–2718 (1992).
Baker, J., Kessi, A. & Delley, B. The generation and use of delocalized internal coordinates in geometry optimization. J. Chem. Phys. 105, 192–212 (1996).
Rappé, A. K., Casewit, C. J., Colwell, K. S., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).
O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminf. 3, 33 (2011).
Larsen, A. et al. The atomic simulation environment? A Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Melander, M., Laasonen, K. & Jonsson, H. Removing external degrees of freedom from transition-state search methods using quaternions. J. Chem. Theory Comput. 11, 1055–1062 (2015).
Dohm, S., Bursch, M., Hansen, A. & Grimme, S. Semiautomated transition state localization for organometallic complexes with semiempirical quantum chemical methods. J. Chem. Theory Comput. 16, 2002–2012 (2020).
Frisch, M. J. et al. Gaussian 16 Revision C.01 (Gaussian, 2016).
Wang, L. P. & Song, C. Geometry optimization made simple with translation and rotation coordinates. J. Chem. Phys. 144, 214108 (2016).
Aldaz, C., Kammeraad, J. A. & Zimmerman, P. M. Discovery of conical intersection mediated photochemistry with growing string methods. Phys. Chem. Chem. Phys. 20, 27394–27405 (2018).
Zhao, Q., Savoie, B. YARP Dataset (FigShare, 2021); https://doi.org/10.6084/m9.figshare.14766624
Zhao, Q., Savoie, B. YARP: Yet Another Reaction Program (YARP) (Zenodo, 2021); https://doi.org/10.5281/zenodo.4947195
Acknowledgements
The work performed by Q.Z. and B.M.S was made possible by the Office of Naval Research (ONR) through support provided by the Energetic Materials Program (MURI grant no. N00014-21-1-2476, Program Manager: C. Stoltz). B.M.S also acknowledges partial support for this work from the Dreyfus Program for Machine Learning in the Chemical Sciences and Engineering and the Purdue Process Safety and Assurance Center. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Q.Z. and B.M.S conceived and designed the study. Q.Z developed tools, performed analysis and wrote the paper. B.M.S. oversaw the project and wrote the paper. All authors reviewed the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Reviewer recognition statement Nature Computational Science thanks Cyrille Lavigne, Andreas Hansen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Handling editor: Jie Pan, in collaboration with the Nature Computational Science team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Extended Data Fig. 1 Illustration of Elementary Reaction Steps.
Two cases of the ‘break two bonds and form two bonds’ (b2f2) elementary reaction step (ERS). a, The two bonds involved in the ERS connect four different atoms. b, An atom is shared between the two bonds involved in the ERS.
Extended Data Fig. 2 Timing comparisons for YARP.
Wall times for reaction enumeration, GFN2-xTB/GSM, and Berny optimization with respect to the number of heavy atoms in the reactant. The cases shown here are drawn from the Zimmerman dataset. The computational cost of Berny optimization occupies 95% to 99% of the total cost while the GSM at most contributes ~ 5%. All walltimes are reported without parallelization (that is, single-core equivalent walltimes). Additional timing details are reported in Section 1 of the Supporting Information.
Extended Data Fig. 3 Comparison of b2f2 and b3f3 reaction searches and performance statistics.
Comparison of b2f2 and b3f3 reaction enumeration for the reactants 1,3-butadiene and ethene (17), and isobutene and water (13) from the Zimmerman dataset. a, Number of potential products, b, average number of DFT gradient calls per successful channel, c, the success rates of unique reactions and d, the intended rates of unique reactions. e, Five b3f3 reactions for 17 that exhibit lower activation barriers compared with the lowest barrier b2f2 reaction, including the Diels-Alder reaction (top). Activation energies are reported in kcal/mol and additional technical details for this comparison are reported in Section 2 of the Supporting Information.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Tables 1–6 and discussion.
Source data
Source Data Fig. 2
Statistical source data for figure panels.
Source Data Fig. 3
Statistical source data for figure panels.
Source Data Fig. 4
Statistical source data for figure panels.
Source Data Fig. 5
Statistical source data for figure panels.
Source Data Fig. 6
Statistical source data for figure panels.
Source Data Extended Data Fig. 2
Statistical source data for figure panels.
Source Data Extended Data Fig. 3
Statistical source data for figure panels.
Rights and permissions
About this article
Cite this article
Zhao, Q., Savoie, B.M. Simultaneously improving reaction coverage and computational cost in automated reaction prediction tasks. Nat Comput Sci 1, 479–490 (2021). https://doi.org/10.1038/s43588-021-00101-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-021-00101-3
This article is cited by
-
A human-machine interface for automatic exploration of chemical reaction networks
Nature Communications (2024)
-
Exploring the combinatorial explosion of amine–acid reaction space via graph editing
Communications Chemistry (2024)
-
Comprehensive exploration of graphically defined reaction spaces
Scientific Data (2023)
-
Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model
Nature Computational Science (2023)
-
Chemical reaction networks and opportunities for machine learning
Nature Computational Science (2023)