Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Bayesian reaction optimization as a tool for chemical synthesis

Abstract

Reaction optimization is fundamental to synthetic chemistry, from optimizing the yield of industrial processes to selecting conditions for the preparation of medicinal candidates1. Likewise, parameter optimization is omnipresent in artificial intelligence, from tuning virtual personal assistants to training social media and product recommendation systems2. Owing to the high cost associated with carrying out experiments, scientists in both areas set numerous (hyper)parameter values by evaluating only a small subset of the possible configurations. Bayesian optimization, an iterative response surface-based global optimization algorithm, has demonstrated exceptional performance in the tuning of machine learning models3. Bayesian optimization has also been recently applied in chemistry4,5,6,7,8,9; however, its application and assessment for reaction optimization in synthetic chemistry has not been investigated. Here we report the development of a framework for Bayesian reaction optimization and an open-source software tool that allows chemists to easily integrate state-of-the-art optimization algorithms into their everyday laboratory practices. We collect a large benchmark dataset for a palladium-catalysed direct arylation reaction, perform a systematic study of Bayesian optimization compared to human decision-making in reaction optimization, and apply Bayesian optimization to two real-world optimization efforts (Mitsunobu and deoxyfluorination reactions). Benchmarking is accomplished via an online game that links the decisions made by expert chemists and engineers to real experiments run in the laboratory. Our findings demonstrate that Bayesian optimization outperforms human decisionmaking in both average optimization efficiency (number of experiments) and consistency (variance of outcome against initially available data). Overall, our studies suggest that adopting Bayesian optimization methods into everyday laboratory practices could facilitate more efficient synthesis of functional chemicals by enabling better-informed, data-driven decisions about which experiments to run.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Bayesian reaction optimization.
Fig. 2: Training data used to select Bayesian optimizer parameters.
Fig. 3: Balancing exploration and exploitation in reaction optimization.
Fig. 4: Statistical validation of Bayesian reaction optimization.
Fig. 5: Applications of Bayesian reaction optimization.

Data availability

Quantum mechanical computation data and Gaussian output files used to parameterize reactions 1–5 are available at https://github.com/b-shields/auto-QChem. Processed reaction outcome data for reactions 1–5 are available at https://github.com/b-shields/edbo and in our published Code Ocean capsule at https://doi.org/10.24433/CO.3864629.v1. Tabulated player data for the reaction optimization game are available at https://github.com/b-shields/EvML.

Code availability

Two software packages and one web application were written to support this work. The first, auto-qchem, was written to facilitate high-throughput computational chemistry and reaction featurization. This package is freely available at https://github.com/b-shields/auto-QChem. The second, EDBO, was written as a user-friendly implementation of Bayesian optimization. This package is freely available at https://github.com/b-shields/edbo and in our published Code Ocean capsule at https://doi.org/10.24433/CO.3864629.v1. The web application, EvML, was written to collect user data for comparison of Bayesian optimization with human expert performance. This package is freely available at https://github.com/b-shields/EvML.

References

  1. 1.

    Carlson, R. Design and Optimization in Organic Synthesis (Elsevier, 1992).

  2. 2.

    Luo, G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw. Model. Anal. Health Inform. Bioinform. 5, 18 (2016).

    ADS  Article  Google Scholar 

  3. 3.

    Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems Vol. 25 (eds Pereira, F. et al.) 2951–2959 (Curran Associates Inc., 2012).

  4. 4.

    Häse, F., Roch, L. M., Kreisbeck, C. & Aspuru-Guzik, A. Phoenics: a Bayesian Optimizer for Chemistry. ACS Cent. Sci. 4, 1134–1145 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  5. 5.

    Griffiths, R.-R. & Hernández-Lobato, J. M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem. Sci. 11, 577–586 (2020).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  6. 6.

    Schweidtmann, A. M. et al. Machine learning meets continuous flow chemistry: automated optimization towards the Pareto front of multiple objectives. Chem. Eng. J. 352, 277–282 (2018).

    CAS  Article  Google Scholar 

  7. 7.

    Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    ADS  CAS  PubMed  Article  PubMed Central  Google Scholar 

  8. 8.

    Häse, F., Roch, L. M. & Aspuru-Guzik, A. Gryffin: an algorithm for Bayesian optimization for categorical variables informed by physical intuition with applications to chemistry. Preprint at https://arxiv.org/abs/2003.12127 (2020).

  9. 9.

    Negoescu, D. M., Frazier, P. I. & Powell, W. B. The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS J. Comput. 23, 346–363 (2011).

    MathSciNet  MATH  Article  Google Scholar 

  10. 10.

    Santanilla, A. B. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2014).

    Article  CAS  Google Scholar 

  11. 11.

    Clayton, A. D. et al. Algorithms for the self-optimisation of chemical reactions. React. Chem. Eng. 4, 1545–1554 (2019).

    CAS  Article  Google Scholar 

  12. 12.

    Häse, F., Roch, L. M. & Aspuru-Guzik, A. Next-generation experimentation with self-driving laboratories. Trends Chem. 1, 282–291 (2019).

    Article  CAS  Google Scholar 

  13. 13.

    Weissman, S. A. & Anderson, N. G. Design of experiments (DoE) and process optimization. A review of recent publications. Org. Process Res. Dev. 19, 1605–1633 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Lee, R. Statistical design of experiments for screening and optimization. Chem. Ing. Tech. 91, 191–200 (2019).

    CAS  Article  Google Scholar 

  15. 15.

    Murray, P. M. et al. The application of design of experiments (DoE) reaction optimisation and solvent selection in the development of new synthetic chemistry. Org. Biomol. Chem. 14, 2373–2384 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  16. 16.

    Hsieh, H.-W., Coley, C. W., Baumgartner, L. M., Jensen, K. F. & Robinson, R. I. Photoredox iridium–nickel dual-catalyzed decarboxylative arylation cross-coupling: from batch to continuous flow via self-optimizing segmented flow reactor. Org. Process Res. Dev. 22, 542–550 (2018).

    CAS  Article  Google Scholar 

  17. 17.

    Mateos, C., Nieves-Remacha, M. J. & Rincón, J. A. Automated platforms for reaction self-optimization in flow. React. Chem. Eng. 4, 1536–1544 (2019).

    CAS  Article  Google Scholar 

  18. 18.

    Feurer, M. & Hutter, F. in Automated Machine Learning: Methods, Systems, Challenges (eds Hutter, F. et al.) 3–33 (Springer, 2019).

  19. 19.

    Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016).

    Article  Google Scholar 

  20. 20.

    Maceiczyk, R. M. & deMello, A. J. Fast and reliable metamodeling of complex reaction spaces using Universal Kriging. J. Phys. Chem. C 118, 20026–20033 (2014).

    CAS  Article  Google Scholar 

  21. 21.

    Rogers, A. & Ierapetritou, M. Feasibility and flexibility analysis of black-box processes part 1: surrogate-based feasibility analysis. Chem. Eng. Sci. 137, 986–1004 (2015).

    CAS  Article  Google Scholar 

  22. 22.

    Boukouvala, F. & Ierapetritou, M. G. Feasibility analysis of black-box processes using an adaptive sampling Kriging-based method. Comput. Chem. Eng. 36, 358–368 (2012).

    CAS  Article  Google Scholar 

  23. 23.

    Olofsson, S., Hebing, L., Niedenführ, S., Deisenroth, M. P. & Misener, R. GPdoemd: a Python package for design of experiments for model discrimination. Comput. Chem. Eng. 125, 54–70 (2019).

    CAS  Article  Google Scholar 

  24. 24.

    Krivák, R., Hoksza, D. & Škoda, P. Improving quality of ligand-binding site prediction with Bayesian optimization. In 2017 IEEE International Conference on Bioinformatics and Biomedicine 2278–2279 (2017).

  25. 25.

    Reker, D., Hoyt, E. A., Bernardes, G. J. L. & Rodrigues, T. Adaptive optimization of chemical reactions with minimal experimental information. Cell Rep. Phys. Sci. 1, 100247 (2020).

    Article  Google Scholar 

  26. 26.

    Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3, 1337–1344 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Kondo, M. et al. Exploration of flow reaction conditions using machine-learning for enantioselective organocatalyzed Rauhut–Currier and [3+2] annulation sequence. Chem. Commun. 56, 1259–1262 (2020); correction 56, 12256–12256 (2020).

    CAS  Article  Google Scholar 

  28. 28.

    Ueno, T., Rhone, T. D., Hou, Z., Mizoguchi, T. & Tsuda, K. COMBO: an efficient Bayesian optimization library for materials science. Mater. Discov. 4, 18–21 (2016).

    Article  Google Scholar 

  29. 29.

    Gardner, J., Pleiss, G., Weinberger, K. Q., Bindel, D. & Wilson, A. G. GPyTorch: blackbox matrix–matrix Gaussian process inference with GPU acceleration. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 7576–7586 (Curran Associates Inc., 2018).

  30. 30.

    Mockus, J. On the Bayes methods for seeking the extremal point. IFAC Proc. 8, 428–431 (1975).

    MathSciNet  Article  Google Scholar 

  31. 31.

    Perera, D. et al. A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow. Science 359, 429–434 (2018).

    ADS  CAS  PubMed  Article  PubMed Central  Google Scholar 

  32. 32.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    ADS  CAS  PubMed  Article  PubMed Central  Google Scholar 

  33. 33.

    Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  34. 34.

    Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012).

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (MIT Press, 2006).

  36. 36.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011); https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

    MathSciNet  MATH  Google Scholar 

  37. 37.

    Reker, D. & Schneider, G. Active-learning strategies in computer-assisted drug discovery. Drug Discov. Today 20, 458–465 (2015).

    PubMed  Article  PubMed Central  Google Scholar 

  38. 38.

    Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).

    MathSciNet  MATH  Article  Google Scholar 

  39. 39.

    Kandasamy, K., Krishnamurthy, A., Schneider, J. & Poczos, B. Parallelised Bayesian optimisation via Thompson sampling. In International Conference on Artificial Intelligence and Statistics 133–142 (2018).

  40. 40.

    Hernández-Lobato, J. M., Requeima, J., Pyzer-Knapp, E. O. & Aspuru-Guzik, A. Parallel and distributed Thompson sampling for large-scale accelerated exploration of chemical space. Preprint at https://arxiv.org/abs/1706.01825 (2017).

  41. 41.

    Ginsbourger, D., Le Riche, R. & Carraro, L. in Computational Intelligence in Expensive Optimization Problems (eds Tenne, Y. & Goh, C.-K.) 131–162 (Springer, 2010).

  42. 42.

    Wang, J., Clark, S. C., Liu, E. & Frazier, P. I. Parallel Bayesian global optimization of expensive functions. Oper. Res. 68, 1850–1865 (2020).

    Article  Google Scholar 

  43. 43.

    Surowiec, I. et al. Generalized subset designs in analytical chemistry. Anal. Chem. 89, 6491–6497 (2017).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  44. 44.

    Davies, H. M. L. & Morton, D. Recent advances in C–H functionalization. J. Org. Chem. 81, 343–350 (2016).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  45. 45.

    Lyons, T. W. & Sanford, M. S. Palladium-catalyzed ligand-directed C−H functionalization reactions. Chem. Rev. 110, 1147–1169 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Alberico, D., Scott, M. E. & Lautens, M. Aryl−aryl bond formation by transition-metal-catalyzed direct arylation. Chem. Rev. 107, 174–238 (2007).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  47. 47.

    Vitaku, E., Smith, D. T. & Njardarson, J. T. Analysis of the structural diversity, substitution patterns, and frequency of nitrogen heterocycles among U.S. FDA approved pharmaceuticals. J. Med. Chem. 57, 10257–10274 (2014).

    CAS  Article  Google Scholar 

  48. 48.

    Fox, R. J. et al. C–H Arylation in the formation of a complex pyrrolopyridine, the commercial synthesis of the potent JAK2 inhibitor, BMS-911543. J. Org. Chem. 84, 4661–4669 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  49. 49.

    Ji, Y. et al. Mono-oxidation of bidentate bis-phosphines in catalyst activation: kinetic and mechanistic studies of a Pd/xantphos-catalyzed C–H functionalization. J. Am. Chem. Soc. 137, 13272–13281 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  50. 50.

    Durand, D. J. & Fey, N. Computational ligand descriptors for catalyst design. Chem. Rev. 119, 6561–6594 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  51. 51.

    Duros, V. et al. Human versus robots in the discovery and crystallization of gigantic polyoxometalates. Angew. Chem. Int. Ed. 56, 10815–10820 (2017).

    CAS  Article  Google Scholar 

  52. 52.

    Swamy, K. C. K., Kumar, N. N. B., Balaraman, E. & Kumar, K. V. P. P. Mitsunobu and related reactions: advances and applications. Chem. Rev. 109, 2551–2651 (2009).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  53. 53.

    Mitsunobu, O. & Yamada, M. Preparation of esters of carboxylic and phosphoric acid via quaternary phosphonium salts. Bull. Chem. Soc. Jpn 40, 2380–2382 (1967).

    CAS  Article  Google Scholar 

  54. 54.

    Fletcher, S. The Mitsunobu reaction in the 21st century. Org. Chem. Front. 2, 739–752 (2015).

    CAS  Article  Google Scholar 

  55. 55.

    Gillis, E. P., Eastman, K. J., Hill, M. D., Donnelly, D. J. & Meanwell, N. A. Applications of fluorine in medicinal chemistry. J. Med. Chem. 58, 8315–8359 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  56. 56.

    Hagmann, W. K. The many roles for fluorine in medicinal chemistry. J. Med. Chem. 51, 4359–4369 (2008).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  57. 57.

    Hu, W.-L., Hu, X.-G. & Hunter, L. Recent developments in the deoxyfluorination of alcohols and phenols: new reagents, mechanistic insights, and applications. Synthesis 49, 4917–4930 (2017).

    CAS  Article  Google Scholar 

  58. 58.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  59. 59.

    Nielsen, M. K., Ugaz, C. R., Li, W. & Doyle, A. G. PyFluor: a low-cost, stable, and selective deoxyfluorination reagent. J. Am. Chem. Soc. 137, 9571–9574 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  60. 60.

    O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  61. 61.

    Frisch, M. J. et al. Gaussian 16 Revision A.03 (Gaussian, Inc., 2016).

  62. 62.

    Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017).

  63. 63.

    Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) 8026–8037 (Curran Associates Inc., 2019).

Download references

Acknowledgements

Financial support was provided by Bristol-Myers Squibb, the Princeton Catalysis Initiative, the NSF under the CCI Center for Computer Assisted Synthesis (CHE-1925607) and the DataX Program at Princeton University through support from the Schmidt Futures Foundation. We thank A. Żurański and J. Ash for discussions. We thank all the participants in the reaction optimization game for their time and effort in contributing to this study. We thank B. Hao for help with HTE protocols.

Author information

Affiliations

Authors

Contributions

B.J.S. designed the overall research project with A.G.D., R.P.A. and J.J. providing guidance. B.J.S. wrote and ran the software with the assistance of F.D. and input from J.L.; J.J. and J.S. carried out the initial investigation to select the test reaction; J.S. designed and carried out HTE experiments with the assistance of M.P.; J.L. wrote the web application for the reaction optimization game with the assistance of J.S., J.J. and B.J.S.; and B.J.S. carried out data experiments and modelling with input from J.L. and F.D. J.S. and J.M.A. carried out Mitsunobu and deoxyfluorination reaction optimizations. B.J.S. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Ryan P. Adams or Abigail G. Doyle.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Jason Hein and Tiago Rodrigues for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Table 1 Simulation outcome summary for reactions 1 and 2a–e
Extended Data Table 2 Summary of reaction encodings

Supplementary information

Supplementary Information

This file contains Supplementary Sections 1-12, including Supplementary Tables 1-14 and Supplementary Figs 1-73.

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shields, B.J., Stevens, J., Li, J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021). https://doi.org/10.1038/s41586-021-03213-y

Download citation

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing