Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Simultaneously improving reaction coverage and computational cost in automated reaction prediction tasks

A preprint version of the article is available at ChemRxiv.

Abstract

Automated reaction prediction has the potential to elucidate complex reaction networks for applications ranging from combustion to materials degradation, but computational cost and inconsistent reaction coverage are still obstacles to exploring deep reaction networks. Here we show that cost can be reduced and reaction coverage can be increased simultaneously by relatively straightforward modifications of the reaction enumeration, geometry initialization and transition state convergence algorithms that are common to many prediction methodologies. These components are implemented in the context of yet another reaction program (YARP), our reaction prediction package with which we report reaction discovery benchmarks for organic single-step reactions, thermal degradation of a γ-ketohydroperoxide, and competing ring-closures in a large organic molecule. Compared with recent benchmarks, YARP (re)discovers both established and unreported reaction pathways and products while simultaneously reducing the cost of reaction characterization by nearly 100-fold and increasing convergence of transition states. This combination of ultra-low cost and high reaction coverage creates opportunities to explore the reactivity of larger systems and more complex reaction networks for applications such as chemical degradation, where computational cost is a bottleneck.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the YARP methodology.
Fig. 2: Overview of YARP performance at predicting single-step organic reactions from the Zimmerman dataset.
Fig. 3: Characterization of sequential and concerted reaction mechanisms discovered by YARP.
Fig. 4: Overview of YARP performance on predicting unimolecular degradation of 3-hydroperoxypropanal.
Fig. 5: Five multistep reaction pathways identified by YARP that exhibit more than 20 kcal mol–1 reduction in activation energy compared with single-step reaction pathways.
Fig. 6: Characterizing competing Diels–Alder ring-closures for a ketothioester.

Similar content being viewed by others

Data availability

The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Source data for Figs. 36 and Extended Data Figs. 2 and 3 are available in Source Data. Data referenced from other studies were scraped from the manuscripts or supporting information of the indicated publications, including the Zimmerman20 and KHP decomposition datasets30. Further raw data sources generated by this work are available at https://doi.org/10.6084/m9.figshare.14766624 (ref. 66), including raw output files and molecular geometries.

Code availability

The version of YARP used in this study and a guide to reproducing the results is available through GitHub under the GNU GPL-3.0 License (https://github.com/zhaoqy1996/YARP). The specific version of the package used to generate the results in the current study can be found at https://doi.org/10.5281/zenodo.4947195 (ref. 67).

References

  1. Westbrook, C. K., Mizobuchi, Y., Poinsot, T. J., Smith, P. J. & Warnatz, J. Computational combustion. Proc. Combust. Inst. 30, 125–157 (2005).

    Article  Google Scholar 

  2. Sarathy, S. M. et al. Comprehensive chemical kinetic modeling of the oxidation of 2-methylalkanes from C7 to C20. Combust. Flame 158, 2338–2357 (2011).

    Article  Google Scholar 

  3. Rodrigo, G., Carrera, J., Prather, K. J. & Jaramillo, A. DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics 24, 2554–2556 (2008).

    Article  Google Scholar 

  4. Wu, D., Wang, Q., Assary, R. S., Broadbelt, L. J. & Krilov, G. A computational approach to design and evaluate enzymatic reaction pathways: application to 1-butanol production from pyruvate. J. Chem. Inf. Model. 51, 1634–1647 (2011).

    Article  Google Scholar 

  5. Stine, A. et al. Exploring de novo metabolic pathways from pyruvate to propionic acid. Biotechnol. Prog. 32, 303–311 (2016).

    Article  Google Scholar 

  6. Jalan, A., Allen, J. W. & Green, W. H. Chemically activated formation of organic acids in reactions of the Criegee intermediate with aldehydes and ketones. Phys. Chem. Chem. Phys. 15, 16841–16852 (2013).

    Article  Google Scholar 

  7. Rousso, A. C., Hansen, N., Jasper, A. W. & Ju, Y. Identification of the Criegee intermediate reaction network in ethylene ozonolysis: impact on energy conversion strategies and atmospheric chemistry. Phys. Chem. Chem. Phys. 21, 7341–7357 (2019).

    Article  Google Scholar 

  8. Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

    Article  Google Scholar 

  9. Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

    Article  Google Scholar 

  10. Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).

    Article  Google Scholar 

  11. Simm, G. N., Vaucher, A. C. & Reiher, M. Exploration of reaction pathways and chemical transformation networks. J. Phys. Chem. A 123, 385–399 (2018).

    Article  Google Scholar 

  12. Green, W. H. Computer Aided Chemical Engineering Vol. 45, 259–294 (Elsevier, 2019).

  13. Vernuccio, S. & Broadbelt, L. J. Discerning complex reaction networks using automated generators. AIChE J. 65, e16663 (2019).

    Article  Google Scholar 

  14. Coley, C. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  Google Scholar 

  15. Schreck, J. S., Coley, C. W. & Bishop, K. J. Learning retrosynthetic planning through simulated experience. ACS Cent. Sci. 5, 970–981 (2019).

    Article  Google Scholar 

  16. Henkelman, G., Uberuaga, B. P. & Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 113, 9901–9904 (2000).

    Article  Google Scholar 

  17. Zimmerman, P. M. Growing string method with interpolation and optimization in internal coordinates: method and examples. J. Chem. Phys. 138, 184102 (2013).

    Article  Google Scholar 

  18. Birkholz, A. B. & Schlegel, H. B. Path optimization by a variational reaction coordinate method. I. Development of formalism and algorithms. J. Chem. Phys. 143, 244101 (2015).

    Article  Google Scholar 

  19. Behn, A., Zimmerman, P. M., Bell, A. T. & Head-Gordon, M. Efficient exploration of reaction paths via a freezing string method. J. Chem. Phys. 135, 224108 (2011).

    Article  Google Scholar 

  20. Zimmerman, P. M. Reliable transition state searches integrated with the growing string method. J. Chem. Theory Comput. 9, 3043–3050 (2013).

    Article  Google Scholar 

  21. Martínez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem. Res. 50, 652–656 (2017).

    Article  Google Scholar 

  22. Dewyer, A. L., Argüelles, A. J. & Zimmerman, P. M. Methods for exploring reaction space in molecular systems. Wiley Interdiscip. Rev. Comput. Mol. Sci. 8, e1354 (2018).

    Article  Google Scholar 

  23. Unsleber, J. P. & Reiher, M. The exploration of chemical reaction networks. Annu. Rev. Phys. Chem. 71, 121–142 (2020).

    Article  Google Scholar 

  24. Luo, Y., Maeda, S. & Ohno, K. Automated exploration of stable isomers of H+(H2O)n (n = 5–7) via ab initio calculations: an application of the anharmonic downward distortion following algorithm. J. Comput. Chem. 30, 952–961 (2009).

    Article  Google Scholar 

  25. Maeda, S., Taketsugu, T. & Morokuma, K. Exploring transition state structures for intramolecular pathways by the artificial force induced reaction method. J. Comput. Chem. 35, 166–173 (2014).

    Article  Google Scholar 

  26. Maeda, S., Harabuchi, Y., Takagi, M., Taketsugu, T. & Morokuma, K. Artificial force induced reaction (AFIR) method for exploring quantum chemical potential energy surfaces. Chem. Rec. 16, 2232–2248 (2016).

    Article  Google Scholar 

  27. Shang, C. & Liu, Z. P. Stochastic surface walking method for structure prediction and pathway searching. J. Chem. Theory Comput. 9, 1838–1845 (2013).

    Article  Google Scholar 

  28. Zimmerman, P. M. Automated discovery of chemically reasonable elementary reaction steps. J. Comput. Chem. 34, 1385–1392 (2013).

    Article  Google Scholar 

  29. Suleimanov, Y. V. & Green, W. H. Automated discovery of elementary chemical reaction steps using freezing string and Berny optimization methods. J. Chem. Theory Comput. 11, 4248–4259 (2015).

    Article  Google Scholar 

  30. Grambow, C. A. et al. Unimolecular reaction pathways of a γ-ketohydroperoxide from combined application of automated reaction discovery methods. J. Am. Chem. Soc. 140, 1035–1048 (2018).

    Article  Google Scholar 

  31. Broadbelt, L. J., Stark, S. M. & Klein, M. T. Computer generated pyrolysis modeling: on-the-fly generation of species, reactions, and rates. Ind. Eng. Chem. Res. 33, 790–799 (1994).

    Article  Google Scholar 

  32. Gao, C. W., Allen, J. W., Green, W. H. & West, R. H. Reaction mechanism generator: automatic construction of chemical kinetic mechanisms. Comput. Phys. Commun. 203, 212–225 (2016).

    Article  Google Scholar 

  33. Van de Vijver, R. & Zádor, J. KinBot: automated stationary point search on potential energy surfaces. Comput. Phys. Commun. 248, 106947 (2020).

    Article  Google Scholar 

  34. Bergeler, M., Simm, G. N., Proppe, J. & Reiher, M. Heuristics-guided exploration of reaction mechanisms. J. Chem. Theory Comput. 11, 5712–5722 (2015).

    Article  Google Scholar 

  35. Puripat, M. et al. The Biginelli reaction is a urea-catalyzed organocatalytic multicomponent reaction. J. Org. Chem 80, 6959–6967 (2015).

    Article  Google Scholar 

  36. Ludwig, J. R., Zimmerman, P. M., Gianino, J. B. & Schindler, C. S. Iron(iii)-catalysed carbonyl–olefin metathesis. Nature 533, 374–379 (2016).

    Article  Google Scholar 

  37. Dewyer, A. L. & Zimmerman, P. M. Simulated mechanism for palladium-catalyzed, directed γ-arylation of piperidine. ACS Catal. 7, 5466–5477 (2017).

    Article  Google Scholar 

  38. Jacobson, L. D. et al. Automated transition state search and its application to diverse types of organic reactions. J. Chem. Theory Comput. 13, 5780–5797 (2017).

    Article  Google Scholar 

  39. Yang, M., Zou, J., Wang, G. & Li, S. Automatic reaction pathway search via combined molecular dynamics and coordinate driving method. J. Phys. Chem. A 121, 1351–1361 (2017).

    Article  Google Scholar 

  40. Lu, T. & Law, C. K. Toward accommodating realistic fuel chemistry in large-scale computations. Prog. Energy Combust. Sci. 35, 192–215 (2009).

    Article  Google Scholar 

  41. Van de Vijver, R. et al. Automatic mechanism and kinetic model generation for gas-and solution-phase processes: a perspective on best practices, recent advances, and future challenges. Int. J. Chem. Kinet. 47, 199–231 (2015).

    Article  Google Scholar 

  42. Grimme, S., Bannwarth, C. & Shushkov, P. A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J. Chem. Theory Comput. 13, 1989–2009 (2017).

    Article  Google Scholar 

  43. Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB? An accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).

    Article  Google Scholar 

  44. Maeda, S. & Harabuchi, Y. On benchmarking of automated methods for performing exhaustive reaction path search. J. Chem. Theory Comput. 15, 2111–2115 (2019).

    Article  Google Scholar 

  45. Jalan, A. et al. New pathways for formation of acids and carbonyl products in low-temperature oxidation: the Korcek decomposition of γ-ketohydroperoxides. J. Am. Chem. Soc. 135, 11100–11114 (2013).

    Article  Google Scholar 

  46. Pracht, P., Bohle, F. & Grimme, S. Automated exploration of the low-energy chemical space with fast quantum chemical methods. Phys. Chem. Chem. Phys. 22, 7169–7192 (2020).

    Article  Google Scholar 

  47. Zhao, Q. & Savoie, B. M. Self-consistent component increment theory for predicting enthalpy of formation. J. Chem. Inf. Model. 60, 2199–2207 (2020).

    Article  Google Scholar 

  48. Tsai, C. J. & Jordan, K. D. Use of an eigenmode method to locate the stationary points on the potential energy surfaces of selected argon and water clusters. J. Phys. Chem. 97, 11227–11237 (1993).

    Article  Google Scholar 

  49. Maeda, S. & Ohno, K. Global mapping of equilibrium and transition structures on potential energy surfaces by the scaled hypersphere search method: applications to ab initio surfaces of formaldehyde and propyne molecules. J. Phys. Chem. A 109, 5742–5753 (2005).

    Article  Google Scholar 

  50. Maeda, S., Ohno, K. & Morokuma, K. Systematic exploration of the mechanism of chemical reactions: the global reaction route mapping (GRRM) strategy using the ADDF and AFIR methods. Phys. Chem. Chem. Phys. 15, 3683–3701 (2013).

    Article  Google Scholar 

  51. Martínez-Núñez, E. An automated method to find transition states using chemical dynamics simulations. J. Comput. Chem. 36, 222–234 (2015).

    Article  Google Scholar 

  52. Yoneda, Y. A computer program package for the analysis, creation, and estimation of generalized reactions? GRACE. I. Generation of elementary reaction network in radical reactions? GRACE (I). Bull. Chem. Soc. Jpn. 52, 8–14 (1979).

    Article  Google Scholar 

  53. Zimmerman, P. M. Navigating molecular space for reaction mechanisms: an efficient, automated procedure. Mol. Simul. 41, 43–54 (2015).

    Article  Google Scholar 

  54. Kim, Y., Kim, J. W., Kim, Z. & Kim, W. Y. Efficient prediction of reaction paths through molecular graph and reaction network analysis. Chem. Sci. 9, 825–835 (2018).

    Article  Google Scholar 

  55. Ugi, I. et al. New applications of computers in chemistry. Angew. Chem. Int. Ed. 18, 111–123 (1979).

    Article  Google Scholar 

  56. Di Maio, F. P. & Lignola, P. G. KING, a kinetic network generator. Chem. Eng. Sci. 47, 2713–2718 (1992).

    Article  Google Scholar 

  57. Baker, J., Kessi, A. & Delley, B. The generation and use of delocalized internal coordinates in geometry optimization. J. Chem. Phys. 105, 192–212 (1996).

    Article  Google Scholar 

  58. Rappé, A. K., Casewit, C. J., Colwell, K. S., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).

    Article  Google Scholar 

  59. O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminf. 3, 33 (2011).

    Article  Google Scholar 

  60. Larsen, A. et al. The atomic simulation environment? A Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).

    Article  Google Scholar 

  61. Melander, M., Laasonen, K. & Jonsson, H. Removing external degrees of freedom from transition-state search methods using quaternions. J. Chem. Theory Comput. 11, 1055–1062 (2015).

    Article  Google Scholar 

  62. Dohm, S., Bursch, M., Hansen, A. & Grimme, S. Semiautomated transition state localization for organometallic complexes with semiempirical quantum chemical methods. J. Chem. Theory Comput. 16, 2002–2012 (2020).

    Article  Google Scholar 

  63. Frisch, M. J. et al. Gaussian 16 Revision C.01 (Gaussian, 2016).

  64. Wang, L. P. & Song, C. Geometry optimization made simple with translation and rotation coordinates. J. Chem. Phys. 144, 214108 (2016).

    Article  Google Scholar 

  65. Aldaz, C., Kammeraad, J. A. & Zimmerman, P. M. Discovery of conical intersection mediated photochemistry with growing string methods. Phys. Chem. Chem. Phys. 20, 27394–27405 (2018).

    Article  Google Scholar 

  66. Zhao, Q., Savoie, B. YARP Dataset (FigShare, 2021); https://doi.org/10.6084/m9.figshare.14766624

  67. Zhao, Q., Savoie, B. YARP: Yet Another Reaction Program (YARP) (Zenodo, 2021); https://doi.org/10.5281/zenodo.4947195

Download references

Acknowledgements

The work performed by Q.Z. and B.M.S was made possible by the Office of Naval Research (ONR) through support provided by the Energetic Materials Program (MURI grant no. N00014-21-1-2476, Program Manager: C. Stoltz). B.M.S also acknowledges partial support for this work from the Dreyfus Program for Machine Learning in the Chemical Sciences and Engineering and the Purdue Process Safety and Assurance Center. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Q.Z. and B.M.S conceived and designed the study. Q.Z developed tools, performed analysis and wrote the paper. B.M.S. oversaw the project and wrote the paper. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Brett M. Savoie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Reviewer recognition statementNature Computational Science thanks Cyrille Lavigne, Andreas Hansen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Handling editor: Jie Pan, in collaboration with the Nature Computational Science team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Extended Data Fig. 1 Illustration of Elementary Reaction Steps.

Two cases of the ‘break two bonds and form two bonds’ (b2f2) elementary reaction step (ERS). a, The two bonds involved in the ERS connect four different atoms. b, An atom is shared between the two bonds involved in the ERS.

Extended Data Fig. 2 Timing comparisons for YARP.

Wall times for reaction enumeration, GFN2-xTB/GSM, and Berny optimization with respect to the number of heavy atoms in the reactant. The cases shown here are drawn from the Zimmerman dataset. The computational cost of Berny optimization occupies 95% to 99% of the total cost while the GSM at most contributes ~ 5%. All walltimes are reported without parallelization (that is, single-core equivalent walltimes). Additional timing details are reported in Section 1 of the Supporting Information.

Source data

Extended Data Fig. 3 Comparison of b2f2 and b3f3 reaction searches and performance statistics.

Comparison of b2f2 and b3f3 reaction enumeration for the reactants 1,3-butadiene and ethene (17), and isobutene and water (13) from the Zimmerman dataset. a, Number of potential products, b, average number of DFT gradient calls per successful channel, c, the success rates of unique reactions and d, the intended rates of unique reactions. e, Five b3f3 reactions for 17 that exhibit lower activation barriers compared with the lowest barrier b2f2 reaction, including the Diels-Alder reaction (top). Activation energies are reported in kcal/mol and additional technical details for this comparison are reported in Section 2 of the Supporting Information.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–6, Tables 1–6 and discussion.

Source data

Source Data Fig. 2

Statistical source data for figure panels.

Source Data Fig. 3

Statistical source data for figure panels.

Source Data Fig. 4

Statistical source data for figure panels.

Source Data Fig. 5

Statistical source data for figure panels.

Source Data Fig. 6

Statistical source data for figure panels.

Source Data Extended Data Fig. 2

Statistical source data for figure panels.

Source Data Extended Data Fig. 3

Statistical source data for figure panels.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Q., Savoie, B.M. Simultaneously improving reaction coverage and computational cost in automated reaction prediction tasks. Nat Comput Sci 1, 479–490 (2021). https://doi.org/10.1038/s43588-021-00101-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00101-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing