Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Organic reactivity from mechanism to machine learning

A Publisher Correction to this article was published on 22 March 2021

This article has been updated

Abstract

As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until ‘big data’ applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Approaches for modelling chemical reactivity.
Fig. 2: An example reaction profile of a simple E2 elimination.
Fig. 3: Molecular mechanics methods for generating transition states.
Fig. 4: Reactivity predictions from quantum mechanics data.
Fig. 5: Experiment versus prediction using a descriptor-based model.
Fig. 6: Different types of reaction fingerprints.
Fig. 7: General reactivity models.

Change history

References

  1. 1.

    Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).

    CAS  PubMed  Google Scholar 

  2. 2.

    de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).

    Google Scholar 

  3. 3.

    Struble, T. J. et al. Current and future roles of artificial intelligence in medicinal chemistry synthesis. J. Med. Chem. 63, 8667–8682 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Coley, C. W., Eyke, N. S. & Jensen, K. F. Autonomous discovery in the chemical sciences part II: outlook. Angew. Chem. Int. Ed. 59, 23414–23436 (2020).

    CAS  Google Scholar 

  5. 5.

    Zahrt, A. F., Athavale, S. V. & Denmark, S. E. Quantitative structure–selectivity relationships in enantioselective catalysis: past, present, and future. Chem. Rev. 120, 1620–1689 (2020).

    CAS  PubMed  Google Scholar 

  6. 6.

    Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).

    CAS  Google Scholar 

  7. 7.

    Cramer, C. J. Essentials of Computational Chemistry: Theories and Models 2nd edn (Wiley, 2004).

  8. 8.

    Maskill, H. The Physical Basis of Organic Chemistry (Oxford Univ. Press, 1985).

  9. 9.

    Eyring, H. The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115 (1935).

    CAS  Google Scholar 

  10. 10.

    Clot, E. & Norrby, P.-O. in Innovative Catalysis in Organic Synthesis: Oxidation, Hydrogenation, and C-X Bond Forming Reactions (ed. Andersson, P. G.) (Wiley, 2012).

  11. 11.

    Kozuch, S. & Shaik, S. How to conceptualize catalytic cycles? The energetic span model. Acc. Chem. Res. 44, 101–110 (2011).

    CAS  PubMed  Google Scholar 

  12. 12.

    Plata, R. E. & Singleton, D. A. A case study of the mechanism of alcohol-mediated Morita Baylis–Hillman reactions. The importance of experimental observations. J. Am. Chem. Soc. 137, 3811–3826 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Jorner, K., Brinck, T., Norrby, P.-O. & Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 12, 1163–1175 (2021).

    CAS  Google Scholar 

  14. 14.

    Maeda, S. & Ohno, K. Global mapping of equilibrium and transition structures on potential energy surfaces by the scaled hypersphere search method: applications to ab initio surfaces of formaldehyde and propyne molecules. J. Phys. Chem. A 109, 5742–5753 (2005).

    CAS  PubMed  Google Scholar 

  15. 15.

    Nett, A. J., Zhao, W., Zimmerman, P. M. & Montgomery, J. Highly active nickel catalysts for C–H functionalization identified through analysis of off-cycle intermediates. J. Am. Chem. Soc. 137, 7636–7639 (2015).

    CAS  PubMed  Google Scholar 

  16. 16.

    Hansen, E., Rosales, A. R., Tutkowski, B., Norrby, P.-O. & Wiest, O. Prediction of stereochemistry using Q2MM. Acc. Chem. Res. 49, 996–1005 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Houk, K. N. & Liu, F. Holy grails for computational organic chemistry and biochemistry. Acc. Chem. Res. 50, 539–543 (2017).

    CAS  PubMed  Google Scholar 

  18. 18.

    Guan, Y., Ingman, V. M., Rooks, B. J. & Wheeler, S. E. AARON: an automated reaction optimizer for new catalysts. J. Chem. Theory Comput. 14, 5249–5261 (2018).

    CAS  PubMed  Google Scholar 

  19. 19.

    Maeda, S., Ohno, K. & Morokuma, K. Systematic exploration of the mechanism of chemical reactions: the global reaction route mapping (GRRM) strategy using the ADDF and AFIR methods. Phys. Chem. Chem Phys 15, 3683–3701 (2013).

    CAS  PubMed  Google Scholar 

  20. 20.

    Bannwarth, C. et al. Extended tight-binding quantum chemistry methods. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1493 (2020).

    Google Scholar 

  21. 21.

    Grimme, S. et al. Fully automated quantum-chemistry-based computation of spin–spin-coupled nuclear magnetic resonance spectra. Angew. Chem. Int. Ed. 56, 14763–14769 (2017).

    CAS  Google Scholar 

  22. 22.

    Koerstz, M., Christensen, A. S., Mikkelsen, K. V., Nielsen, M. B. & Jensen, J. H. High throughput virtual screening of 230 billion molecular solar heat battery candidates. PeerJ Phys. Chem. 3, e16 (2021).

    Google Scholar 

  23. 23.

    Kromann, J. C., Jensen, J. H., Kruszyk, M., Jessing, M. & Jørgensen, M. Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions. Chem. Sci. 9, 660–665 (2018).

    CAS  PubMed  Google Scholar 

  24. 24.

    Hwang, M. J., Stockfisch, T. P. & Hagler, A. T. Derivation of class II force fields. 2. Derivation and characterization of a class II force field, CFF93, for the alkyl functional group and alkane molecules. J. Am. Chem. Soc. 116, 2515–2525 (1994).

    CAS  Google Scholar 

  25. 25.

    Senftle, T. P. et al. The ReaxFF reactive force-field: development, applications and future directions. NPJ Comput. Mater. 2, 15011 (2016).

    CAS  Google Scholar 

  26. 26.

    Jensen, F. Introduction to Computational Chemistry 3rd edn (Wiley, 2017).

  27. 27.

    Jensen, F. Locating minima on seams of intersecting potential energy surfaces. An application to transition structure modeling. J. Am. Chem. Soc. 114, 1596–1603 (1992).

    CAS  Google Scholar 

  28. 28.

    Eksterowicz, J. E. & Houk, K. N. Transition-state modeling with empirical force fields. Chem. Rev. 93, 2439–2461 (1993).

    CAS  Google Scholar 

  29. 29.

    Åqvist, J. & Warshel, A. Simulation of enzyme reactions using valence bond force fields and other hybrid quantum/classical approaches. Chem. Rev. 93, 2523–2544 (1993).

    Google Scholar 

  30. 30.

    Hartke, B. & Grimme, S. Reactive force fields made simple. Phys. Chem. Chem. Phys. 17, 16715–16718 (2015).

    CAS  PubMed  Google Scholar 

  31. 31.

    Weill, N., Corbeil, C. R., De Schutter, J. W. & Moitessier, N. Toward a computational tool predicting the stereochemical outcome of asymmetric reactions: development of the molecular mechanics-based program ACE and application to asymmetric epoxidation reactions. J. Comput. Chem. 32, 2878–2889 (2011).

    CAS  PubMed  Google Scholar 

  32. 32.

    Sherrod, M. J. & Menger, F. M. “Transition-state modeling” does not always model transition states. J. Am. Chem. Soc. 111, 2611–2613 (1989).

    CAS  Google Scholar 

  33. 33.

    Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).

    CAS  Google Scholar 

  34. 34.

    Rosales, A. R. et al. Transition state force field for the asymmetric redox-relay Heck reaction. J. Am. Chem. Soc. 142, 9700–9707 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Rosales, A. R. et al. Application of Q2MM to predictions in stereoselective synthesis. Chem. Commun. 54, 8294–8311 (2018).

    CAS  Google Scholar 

  36. 36.

    Burai Patrascu, M. et al. From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis. Nat. Catal. 3, 574–584 (2020).

    CAS  Google Scholar 

  37. 37.

    Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    PubMed  Google Scholar 

  40. 40.

    Kang, P.-L., Shang, C. & Liu, Z.-P. Glucose to 5-hydroxymethylfurfural: origin of site-selectivity resolved by machine learning based reaction sampling. J. Am. Chem. Soc. 141, 20525–20536 (2019).

    CAS  PubMed  Google Scholar 

  41. 41.

    Grambow, C. A., Pattanaik, L. & Green, W. H. Deep learning of activation energies. J. Phys. Chem. Lett. 11, 2992–2997 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Grambow, C. A., Pattanaik, L. & Green, W. H. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 7, 137 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Friederich, P., dos Passos Gomes, G., De Bin, R., Aspuru-Guzik, A. & Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 11, 4584–4601 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Mulliner, D., Wondrousch, D. & Schuurmann, G. Predicting Michael-acceptor reactivity and toxicity through quantum chemical transition-state calculations. Org. Biomol. Chem. 9, 8400–8412 (2011).

    CAS  PubMed  Google Scholar 

  45. 45.

    Palazzesi, F. et al. Bireactive: a machine-learning model to estimate covalent warhead reactivity. J. Chem. Inf. Model. 60, 2915–2923 (2020).

    CAS  PubMed  Google Scholar 

  46. 46.

    Mortelmans, K. & Zeiger, E. The Ames Salmonella/microsome mutagenicity assay. Mutat. Res. 455, 29–60 (2000).

    CAS  PubMed  Google Scholar 

  47. 47.

    Kuhnke, L., Ter Laak, A. & Goller, A. H. Mechanistic reactivity descriptors for the prediction of Ames mutagenicity of primary aromatic amines. J. Chem. Inf. Model. 59, 668–672 (2019).

    CAS  PubMed  Google Scholar 

  48. 48.

    Finkelmann, A. R., Goller, A. H. & Schneider, G. Site of metabolism prediction based on ab initio derived atom representations. ChemMedChem 12, 606–612 (2017).

    CAS  PubMed  Google Scholar 

  49. 49.

    Rydberg, P., Gloriam, D. E., Zaretzki, J., Breneman, C. & Olsen, L. SMARTCyp: a 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS Med. Chem. Lett. 1, 96–100 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Rydberg, P., Rostkowski, M., Gloriam, D. E. & Olsen, L. The contribution of atom accessibility to site of metabolism models for cytochromes P450. Mol. Pharm. 10, 1216–1223 (2013).

    CAS  PubMed  Google Scholar 

  51. 51.

    Olsen, L., Montefiori, M., Tran, K. P. & Jørgensen, F. S. SMARTCyp 3.0: enhanced cytochrome P450 site-of-metabolism prediction server. Bioinformatics 35, 3174–3175 (2019).

    CAS  PubMed  Google Scholar 

  52. 52.

    Tomberg, A., Johansson, M. J. & Norrby, P.-O. A predictive tool for electrophilic aromatic substitutions using machine learning. J. Org. Chem. 84, 4695–4703 (2019).

    CAS  PubMed  Google Scholar 

  53. 53.

    Li, X., Zhang, S. Q., Xu, L. C. & Hong, X. Predicting regioselectivity in radical C–H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).

    CAS  Google Scholar 

  54. 54.

    De, S., Bartók, A. P., Csányi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).

    CAS  PubMed  Google Scholar 

  55. 55.

    Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 58, 4515–4519 (2019).

    CAS  Google Scholar 

  56. 56.

    Skoraczyński, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).

    PubMed  PubMed Central  Google Scholar 

  57. 57.

    Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).

    CAS  PubMed  Google Scholar 

  58. 58.

    Sigman, M. S., Harper, K. C., Bess, E. N. & Milo, A. The development of multidimensional analysis tools for asymmetric catalysis and beyond. Acc. Chem. Res. 49, 1292–1301 (2016).

    CAS  PubMed  Google Scholar 

  59. 59.

    Woods, B. P., Orlandi, M., Huang, C.-Y., Sigman, M. S. & Doyle, A. G. Nickel-catalyzed enantioselective reductive cross-coupling of styrenyl aziridines. J. Am. Chem. Soc. 139, 5688–5691 (2017).

    CAS  PubMed  Google Scholar 

  60. 60.

    Hwang, Y., Jung, H., Lee, E., Kim, D. & Chang, S. Quantitative analysis on two-point ligand modulation of iridium catalysts for chemodivergent C–H amidation. J. Am. Chem. Soc. 142, 8880–8889 (2020).

    PubMed  Google Scholar 

  61. 61.

    Ferreira, M. A. B. et al. Noncovalent interactions drive the efficiency of molybdenum imido alkylidene catalysts for olefin metathesis. J. Am. Chem. Soc. 141, 10788–10800 (2019).

    CAS  PubMed  Google Scholar 

  62. 62.

    Verloop, A., Hoogenstraaten, W. & Tipker, J. in Drug Design Vol. 11 (ed. Ariëns, E. J.) 165–207 (Academic, 1976).

  63. 63.

    Santiago, C. B., Guo, J. Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Durand, D. J. & Fey, N. Computational ligand descriptors for catalyst design. Chem. Rev. 119, 6561–6594 (2019).

    CAS  PubMed  Google Scholar 

  65. 65.

    Ravasco, J. M. J. M. & Coelho, J. A. S. Predictive multivariate models for bioorthogonal inverse-electron demand Diels–Alder reactions. J. Am. Chem. Soc. 142, 4235–4241 (2020).

    CAS  PubMed  Google Scholar 

  66. 66.

    Reid, J. P., Proctor, R. S. J., Sigman, M. S. & Phipps, R. J. Predictive multivariate linear regression analysis guides successful catalytic enantioselective Minisci reactions of diazines. J. Am. Chem. Soc. 141, 19178–19185 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    CAS  PubMed  Google Scholar 

  69. 69.

    Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018).

    PubMed  Google Scholar 

  70. 70.

    Estrada, J. G., Ahneman, D. T., Sheridan, R. P., Dreher, S. D. & Doyle, A. G. Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8763 (2018).

    PubMed  Google Scholar 

  71. 71.

    Mayr, H. & Patz, M. Scales of nucleophilicity and electrophilicity: a system for ordering polar organic and organometallic reactions. Angew. Chem. Int. Ed. Engl. 33, 938–957 (1994).

    Google Scholar 

  72. 72.

    Hoffmann, G. et al. Predicting experimental electrophilicities from quantum and topological descriptors: a machine learning approach. J. Comput. Chem. 41, 2124–2136 (2020).

    CAS  Google Scholar 

  73. 73.

    St. John, P. C., Guan, Y., Kim, Y., Kim, S. & Paton, R. S. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat. Commun. 11, 2328 (2020).

    Google Scholar 

  74. 74.

    St John, P. C. et al. Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules. Sci. Data 7, 244 (2020).

    Google Scholar 

  75. 75.

    Guan, Y. et al. Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors. Chem. Sci. 12, 2198–2208 (2021).

    CAS  Google Scholar 

  76. 76.

    Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019). A recent example of selectivity prediction with results close to experiment.

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Schneider, N., Lowe, D. M., Sayle, R. A. & Landrum, G. A. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Model. 55, 39–53 (2015).

    CAS  PubMed  Google Scholar 

  78. 78.

    Ghiandoni, G. M. et al. Development and application of a data-driven reaction classification model: comparison of an electronic lab notebook and medicinal chemistry literature. J. Chem. Inf. Model. 59, 4167–4187 (2019).

    CAS  PubMed  Google Scholar 

  79. 79.

    Patel, H., Bodkin, M. J., Chen, B. & Gillet, V. J. Knowledge-based approach to de novo design using reaction vectors. J. Chem. Inf. Model. 49, 1163–1184 (2009).

    CAS  PubMed  Google Scholar 

  80. 80.

    Sandfort, F., Strieth-Kalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structure-based platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).

    CAS  Google Scholar 

  81. 81.

    Duvenaud, D. K. et al. in Advances in Neural Information Processing Systems 28 (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.) 2224–2232 (Curran Associates, 2015).

  82. 82.

    Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).

    Google Scholar 

  84. 84.

    Schwaller, P., Vaucher, A. C., Laino, T. & Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Preprint at https://doi.org/10.26434/chemrxiv.12758474.v1 (2020).

  85. 85.

    Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Varnek, A., Fourches, D., Hoonakker, F. & Solov’ev, V. P. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J. Comput. Aided Mol. Des. 19, 693–703 (2005). This work introduced the CGR–ISIDA approach used for the reactions and conditions prediction, clustering, similarity searching etc.

    CAS  PubMed  Google Scholar 

  87. 87.

    Fujita, S. Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Model. 26, 205–212 (1986).

    CAS  Google Scholar 

  88. 88.

    Körner, R. & Apostolakis, J. Automatic determination of reaction mappings and reaction center information. 1. The imaginary transition state energy approach. J. Chem. Inf. Model. 48, 1181–1189 (2008).

    PubMed  Google Scholar 

  89. 89.

    Glavatskikh, M. et al. Predictive models for kinetic parameters of cycloaddition reactions. Mol. Inform. 38, 1800077 (2019).

    CAS  Google Scholar 

  90. 90.

    Madzhidov, T. I. et al. Structure–reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction. J. Struct. Chem. 56, 1227–1234 (2016).

    Google Scholar 

  91. 91.

    Gimadiev, T. et al. Bimolecular nucleophilic substitution reactions: predictive models for rate constants and molecular reaction pairs analysis. Mol. Inform. 38, 1800104 (2019).

    Google Scholar 

  92. 92.

    Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).

    CAS  PubMed  Google Scholar 

  93. 93.

    Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).

    CAS  PubMed  Google Scholar 

  94. 94.

    Nugmanov, R. I. et al. CGRtools: python library for molecule, reaction, and condensed graph of reaction processing. J. Chem. Inf. Model. 59, 2516–2521 (2019).

    CAS  PubMed  Google Scholar 

  95. 95.

    Fialkowski, M., Bishop, K. J. M., Chubukov, V. A., Campbell, C. J. & Grzybowski, B. A. Architecture and evolution of organic chemistry. Angew. Chem. Int. Ed. 44, 7263–7269 (2005).

    CAS  Google Scholar 

  96. 96.

    Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    Google Scholar 

  97. 97.

    Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

    CAS  Google Scholar 

  98. 98.

    Tiano, K. Merck acquires Grzybowski scientific inventions to expand chemical synthesis offering. Merck https://www.merckmillipore.com/SE/en/20170505_202234 (2017).

  99. 99.

    Plehiers, P. P., Marin, G. B., Stevens, C. V. & Van Geem, K. M. Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics. J. Cheminformatics 10, 11 (2018).

    Google Scholar 

  100. 100.

    Krallinger, M., Rabal, O., Lourenço, A., Oyarzabal, J. & Valencia, A. Information retrieval and text mining technologies for chemistry. Chem. Rev. 117, 7673–7761 (2017).

    CAS  PubMed  Google Scholar 

  101. 101.

    Warr, W. A. A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014).

    CAS  PubMed  Google Scholar 

  102. 102.

    Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. Doctor of Philosophy (PhD) thesis, Univ. Cambridge (2012).

  103. 103.

    Zhang, Q.-Y. & Aires-de-Sousa, J. Structure-based classification of chemical reactions without assignment of reaction centers. J. Chem. Inf. Model. 45, 1775–1783 (2005).

    CAS  PubMed  Google Scholar 

  104. 104.

    Carrera, G. V. S. M., Gupta, S. & Aires-de-Sousa, J. Machine learning of chemical reactivity from databases of organic reactions. J. Comput. Mol. Des. 23, 419–429 (2009).

    CAS  Google Scholar 

  105. 105.

    Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).

    CAS  PubMed  Google Scholar 

  106. 106.

    Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. 107.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). This work introduced a fully data-driven neural network for general reactivity prediction.

    CAS  PubMed  Google Scholar 

  108. 108.

    Schneider, N., Stiefl, N. & Landrum, G. A. What’s what: the (nearly) definitive guide to reaction role assignment. J. Chem. Inf. Model. 56, 2336–2346 (2016).

    CAS  PubMed  Google Scholar 

  109. 109.

    Jaworski, W. et al. Automatic mapping of atoms across both simple and complex chemical reactions. Nat. Commun. 10, 1434 (2019).

    PubMed  PubMed Central  Google Scholar 

  110. 110.

    Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Unsupervised attention-guided atom-mapping. Preprint at https://doi.org/10.26434/chemrxiv.12298559.v1 (2020).

  111. 111.

    Kayala, M. A., Azencott, C.-A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112.

    Kayala, M. A. & Baldi, P. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Model. 52, 2526–2540 (2012).

    CAS  PubMed  Google Scholar 

  113. 113.

    Fooshee, D. et al. Deep learning for chemical reaction prediction. Mol. Syst. Des. Eng. 3, 442–452 (2018).

    CAS  Google Scholar 

  114. 114.

    Sadowski, P., Fooshee, D., Subrahmanya, N. & Baldi, P. Synergies between quantum mechanics and machine learning in reaction prediction. J. Chem. Inf. Model. 56, 2125–2128 (2016).

    CAS  PubMed  Google Scholar 

  115. 115.

    Fujinami, M., Seino, J. & Nakai, H. Quantum chemical reaction prediction method based on machine learning. Bull. Chem. Soc. Jpn. 93, 685–693 (2020).

    CAS  Google Scholar 

  116. 116.

    Jin, W. C., Connor W., Barzilay, R. & Jaakkola, T. in Neural Information Processing Systems (eds Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R.) 2607–2616 (Curran Associates, 2017).

  117. 117.

    Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    CAS  PubMed  Google Scholar 

  118. 118.

    Schwaller, P. & Laino, T. in Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions Vol. 1326 61–79 (American Chemical Society, 2019).

  119. 119.

    Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  120. 120.

    Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. & Laino, T. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  121. 121.

    Schwaller, P. et al. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019). In this work, natural language processing methods were successfully used for general reaction prediction.

    CAS  PubMed  PubMed Central  Google Scholar 

  122. 122.

    Alammar, J. The Illustrated Transformer. J. Alammar http://jalammar.github.io/illustrated-transformer/ (2018).

  123. 123.

    Walker, E. et al. Learning to predict reaction conditions: relationships between solvent, molecular structure, and catalyst. J. Chem. Inf. Model. 59, 3645–3654 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  124. 124.

    Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  125. 125.

    Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

    CAS  PubMed  Google Scholar 

  126. 126.

    Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

    CAS  PubMed  Google Scholar 

  127. 127.

    Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).

    Google Scholar 

  128. 128.

    Wang, Z., Zhao, W., Hao, G. & Song, B. Automated synthesis: current platforms and further needs. Drug Discov. Today 25, 2006–2011 (2020).

    CAS  Google Scholar 

  129. 129.

    Nesterov, V., Wieser, M. & Roth, V. J. 3DMolNet: a generative network for molecular structures. Preprint at https://arxiv.org/abs/2010.06477 (2020).

  130. 130.

    Pattanaik, L., Ingraham, J. B., Grambow, C. A. & Green, W. H. Generating transition states of isomerization reactions with deep learning. Phys. Chem. Chem. Phys. 22, 23618–23626 (2020).

    CAS  PubMed  Google Scholar 

  131. 131.

    Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. 132.

    Kammeraad, J. A., Goetz, J., Walker, E. A., Tewari, A. & Zimmerman, P. M. What does the machine learn? Knowledge representations of chemical reactivity. J. Chem. Inf. Model. 60, 1290–1301 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. 133.

    Herges, R. & Hoock, C. Reaction planning: computer-aided discovery of a novel elimination reaction. Science 255, 711–713 (1992).

    CAS  PubMed  Google Scholar 

  134. 134.

    William, B. et al. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci. Rep. 11, 3178 (2021).

    Google Scholar 

  135. 135.

    Unsleber, J. P. & Reiher, M. The exploration of chemical reaction networks. Annu. Rev. Phys. Chem. 71, 121–142 (2020).

    CAS  PubMed  Google Scholar 

  136. 136.

    Sameera, W. M. C., Maeda, S. & Morokuma, K. Computational catalysis using the artificial force induced reaction method. Acc. Chem. Res. 49, 763–773 (2016).

    CAS  PubMed  Google Scholar 

  137. 137.

    Martínez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem. Res. 50, 652–656 (2017).

    PubMed  Google Scholar 

  138. 138.

    Rappoport, D., Galvin, C. J., Zubarev, D. Y. & Aspuru-Guzik, A. Complex chemical reaction networks from heuristics-aided quantum chemistry. J. Chem. Theory Comput. 10, 897–907 (2014).

    CAS  PubMed  Google Scholar 

  139. 139.

    Bergeler, M., Simm, G. N., Proppe, J. & Reiher, M. Heuristics-guided exploration of reaction mechanisms. J. Chem. Theory Comput. 11, 5712–5722 (2015).

    CAS  PubMed  Google Scholar 

  140. 140.

    Smith, D. G. A. et al. The MolSSI QCArchive project: an open-source platform to compute, organize, and share quantum chemistry data. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1491 (2020).

    Google Scholar 

  141. 141.

    Álvarez-Moreno, M. et al. Managing the computational chemistry big data problem: the ioChem-BD platform. J. Chem. Inf. Model. 55, 95–103 (2014).

    PubMed  Google Scholar 

  142. 142.

    Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    CAS  PubMed  Google Scholar 

  143. 143.

    Jaeger, S., Fulle, S. & Turk, S. Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58, 27–35 (2018).

    CAS  PubMed  Google Scholar 

  144. 144.

    Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  145. 145.

    Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).

    CAS  PubMed  Google Scholar 

  146. 146.

    Korolev, V., Mitrofanov, A., Korotcov, A. & Tkachenko, V. Graph convolutional neural networks as “general-purpose” property predictors: the universality and limits of applicability. J. Chem. Inf. Model. 60, 22–28 (2020).

    CAS  PubMed  Google Scholar 

  147. 147.

    Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Mol. Des. 30, 595–608 (2016).

    CAS  Google Scholar 

  148. 148.

    Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).

    Google Scholar 

  149. 149.

    Varma, S. & Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinforma. 7, 91 (2006).

    Google Scholar 

  150. 150.

    Hanser, T., Barber, C., Marchaland, J. F. & Werner, S. Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27, 865–881 (2016).

    CAS  Google Scholar 

  151. 151.

    Abu-Mostafa, Y. S., Magdon-Ismail, M. & Lin, H. T. Learning from Data: A Short Course (AMLBook.com, 2012).

  152. 152.

    Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer, 2009).

  153. 153.

    Harrell, F. E. Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 2nd edn (Springer, 2015).

  154. 154.

    James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning: with Applications in R (Springer, 2013).

Download references

Acknowledgements

K.J. is a fellow of the AstraZeneca Postdoc Programme.

Author information

Affiliations

Authors

Contributions

All authors contributed equally to the preparation of this manuscript.

Corresponding author

Correspondence to Per-Ola Norrby.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

CGRtools: https://github.com/cimm-kzn/CGRtools

CIMtools: https://github.com/cimm-kzn/CIMtools

Daylight Chemical Information Systems: Fingerprints: https://www.daylight.com/dayhtml/doc/theory/theory.finger.html

Daylight Chemical Information Systems: SMILES: https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

Daylight Chemical Information Systems: SMARTS: https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

Dragon Descriptors: https://chm.kode-solutions.net/products_dragon_descriptors.php

IUPAC InChI: http://www.iupac.org/inchi/

Lowe, D. M. Patent Reaction Extractor: https://github.com/dan2097/patent-reaction-extraction

Open Reaction Database: https://ord-schema.readthedocs.io/en/latest/

RDKit: Open-Source Cheminformatics Software: https://www.rdkit.org/

Reaxys: http://www.reaxys.com

Glossary

Density functional theory

(DFT). A quantum-mechanical method based on electron density for simulating molecules and reactions.

Descriptors

Also referred to as features. The properties used to train a machine learning model.

Semiempirical QM methods

Use the same algorithms as wave function and density functional theory methods, but approximated values for matrix elements.

Domains of applicability

The regions of chemical space within which a model can reliably make predictions.

Gaussian process regression

Machine learning algorithm in which the data points are assumed to be the means of Gaussian distributions. Delivers both predicted means and variance.

Extra tree regressor model

Machine learning algorithm similar to random forest. Owing to differences in implementation, this method is usually faster than a random forest.

Random forest

Machine learning algorithm that builds an ensemble of decision trees and predicts the value of a new example by taking into consideration the prediction from each decision tree in the ensemble.

Sterimol parameters

A set of parameters that describes the steric effects of substituents.

Gradient boosting decision tree model

Machine learning algorithm that is based on decision trees (see ‘random forest’). The model is built stepwise, conjoined with the introduction of a learning rate. This approach has been shown to avoid overfitting problems.

Receiver operator characteristic

(ROC). Curve of true positive rate versus the false positive rate of a machine learning classification algorithm. The area under the ROC curve is often used as a performance metric.

Support vector machine

(SVM). A machine learning algorithm based on the idea that data points are divided by a hyperplane. The model tries to define the form of the hyperplane so as to maximize the separation between dissimilar data points.

Deep feed-forward neural network models

A feed-forward neural network, also called a multilayer perceptron, is one of the basic architectures in machine learning, in which the input nodes connect to hidden layers of nodes, which, in turn, connect to the output nodes. A neural network is feed-forward when no output information is channelled back into the model, as opposed to recurrent networks.

Molecular fingerprints

Molecular representations derived from the molecular connectivity.

Representations

Machine-readable descriptions of a molecule as, for example, a string of characters, a vector or a graph.

Atom mapping

Refers to the labelling of atoms in the reactants and the corresponding atoms in the products in a reaction SMARTS.

Deep learning

The field of machine learning that uses neural networks with many hidden layers.

Templates

Patterns describing a chemical reaction, often represented by reaction SMARTS.

SMARTS

A string representation of a molecular pattern, based on the simplified molecular input line entry system (SMILES). SMARTS are used to define a substructure of a molecule. For example, ethanol could be represented using the SMILES string CCO. To define the alcohol functional group, one uses SMARTS [#6][OX2H], in which each atomic position is enclosed in square brackets and encodes which atom types are allowed at this position.

Negative reactions

Reactions that give a low or zero yield. These are important for machine learning because the model needs to learn that not all input leads to a product.

Graph convolutional networks

Neural networks that operate on a graph and use convolution to create their own features for learning.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jorner, K., Tomberg, A., Bauer, C. et al. Organic reactivity from mechanism to machine learning. Nat Rev Chem 5, 240–255 (2021). https://doi.org/10.1038/s41570-021-00260-x

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing