Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

How to explore chemical space using algorithms and automation

Abstract

Although extending the reactivity of a given class of molecules is relatively straightforward, the discovery of genuinely new reactivity and the molecules that result is a wholly more challenging problem. If new reactions can be considered unpredictable using current chemical knowledge, then we suggest that they are not merely new but also novel. Such a classification, however, requires an expert judge to have access to all current chemical knowledge or risks a lack of information being interpreted as unpredictability. Here, we describe how searching chemical space using automation and algorithms improves the probability of discovery. The former enables routine chemical tasks to be performed more quickly and consistently, while the latter uses algorithms to facilitate the searching of chemical knowledge databases. Experimental systems can also be developed to discover novel molecules, reactions and mechanisms by augmenting the intuition of the human expert. In order to find new chemical laws, we must seek to question current assumptions and biases. Accomplishing that involves using two areas of algorithmic approaches: algorithms to perform searches, and more general machine learning and statistical modelling algorithms to predict the chemistry under investigation. We propose that such a chemical intelligence approach is already being used and that, in the not-too-distant future, the automated chemical reactor systems controlled by these algorithms and monitored by a sensor array will be capable of navigating and searching chemical space more quickly, efficiently and, importantly, without bias. This approach promises to yield not only new molecules but also unpredictable and thus novel reactivity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Searching chemical space.
Fig. 2: Creation of databases and the extraction of data.
Fig. 3: Searching for new reactivity, methods or properties.
Fig. 4: Optimizing reaction conditions.
Fig. 5: A projected 3D search space.
Fig. 6: A flow chart to assist in a strict definition of validity, newness and novelty.

Similar content being viewed by others

References

  1. Miller, M. A. Chemical database techniques in drug discovery. Nat. Rev. Drug Discov. 1, 220–227 (2002).

    Article  CAS  Google Scholar 

  2. Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    Article  CAS  Google Scholar 

  3. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    Article  CAS  Google Scholar 

  4. Richmond, C. J. et al. A flow-system array for the discovery and scale up of inorganic clusters. Nat. Chem. 4, 1037–1043 (2012).

    Article  CAS  Google Scholar 

  5. Carell, T. et al. New promise in combinatorial chemistry: synthesis, characterization, and screening of small-molecule libraries in solution. Chem. Biol. 2, 171–183 (1995).

    Article  CAS  Google Scholar 

  6. Ortholand, J.-Y. & Ganesan, A. Natural products and combinatorial chemistry: back to the future. Curr. Opin. Chem. Biol. 8, 271–280 (2004).

    Article  CAS  Google Scholar 

  7. Ingham, R. J. et al. A systems approach towards an intelligent and self-controlling platform for integrated continuous reaction sequences. Angew. Chem. Int. Ed. 54, 144–148 (2015).

    Article  CAS  Google Scholar 

  8. Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).

    Article  CAS  Google Scholar 

  9. Mitchell, J. B. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).

    Article  CAS  Google Scholar 

  10. Oprea, T. I. & Gottfries, J. Chemography: the art of navigating in chemical space. J. Comb. Chem. 3, 157–166 (2001).

    Article  CAS  Google Scholar 

  11. Lipinski, C. & Hopkins, A. Navigating chemical space for biology and medicine. Nature 432, 855–861 (2004).

    Article  CAS  Google Scholar 

  12. Goodnow, R. A. Jr, Dumelin, C. E. & Keefe, A. D. DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nat. Rev. Drug Discov. 16, 131–147 (2017).

    Article  CAS  Google Scholar 

  13. Reymond, J.-L., Ruddigkeit, L., Blum, L. & van Deursen, R. The enumeration of chemical space. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 717–733 (2012).

    Article  CAS  Google Scholar 

  14. Reymond, J.-L., van Deursen, R., Blum, L. C. & Ruddigkeit, L. Chemical space as a source for new drugs. Med. Chem. Commun. 1, 30–38 (2010).

    Article  CAS  Google Scholar 

  15. Troshin, K. & Hartwig, J. F. Snap deconvolution: an informatics approach to high-throughput discovery of catalytic reactions. Science 357, 175–181 (2017).

    Article  CAS  Google Scholar 

  16. Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    Article  Google Scholar 

  17. Kreutz, J. E. et al. Evolution of catalysts directed by genetic algorithms in a plug-based microfluidic device tested with oxidation of methane by oxygen. J. Am. Chem. Soc. 132, 3128–3132 (2010).

    Article  CAS  Google Scholar 

  18. Hopkinson, M. N., Gómez-Suárez, A., Teders, M., Sahoo, B. & Glorius, F. Accelerated discovery in photocatalysis using a mechanism-based screening method. Angew. Chem. Int. Ed. 55, 4361–4366 (2016).

    Article  CAS  Google Scholar 

  19. Grzybowski, B. A., Bishop, K. J. M., Kowalczyk, B. & Wilmer, C. E. The ‘wired’ universe of organic chemistry. Nat. Chem. 1, 31–36 (2009).

    Article  CAS  Google Scholar 

  20. Soh, S. et al. Estimating chemical reactivity and cross-influence from collective chemical knowledge. Chem. Sci. 3, 1497–1502 (2012).

    Article  CAS  Google Scholar 

  21. Scior, T. et al. Recognizing pitfalls in virtual screening: a critical review. J. Chem. Inf. Model. 52, 867–881 (2012).

    Article  CAS  Google Scholar 

  22. Collins, K. D., Gensch, T. & Glorius, F. Contemporary screening approaches to reaction discovery and development. Nat. Chem. 6, 859–871 (2014).

    Article  CAS  Google Scholar 

  23. Santanilla, A. B. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

    Article  Google Scholar 

  24. Ruddigkeit, L., Awale, M. & Reymond, J.-L. Expanding the fragrance chemical space for virtual screening. J. Cheminform. 6, 27 (2014).

    Article  Google Scholar 

  25. Brereton, R. G. The evolution of chemometrics. Anal. Methods 5, 3785–3789 (2013).

    Article  CAS  Google Scholar 

  26. Hopke, P. K. The evolution of chemometrics. Anal. Chim. Acta 500, 365–377 (2003).

    Article  CAS  Google Scholar 

  27. Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

    Article  CAS  Google Scholar 

  28. Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

    Article  Google Scholar 

  29. Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

    Article  CAS  Google Scholar 

  30. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  31. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  CAS  Google Scholar 

  32. McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an amino acid C-H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114–1117 (2011).

    Article  CAS  Google Scholar 

  33. Weber, L., Illgen, K. & Almstetter, M. Discovery of new multi component reactions with combinatorial methods. Synlett 3, 366–374 (1999).

    Article  Google Scholar 

  34. Beeler, A. A., Su, S., Singleton, C. A. & Porco, J. A. Discovery of chemical reactions through multidimensional screening. J. Am. Chem. Soc. 129, 1413–1419 (2007).

    Article  CAS  Google Scholar 

  35. Robbins, D. W. & Hartwig, J. F. A. Simple, multidimensional approach to high-throughput discovery of catalytic reactions. Science 333, 1423–1427 (2011).

    Article  CAS  Google Scholar 

  36. Walker, B. E., Bannock, J. H., Nightingale, A. M. & deMello, J. C. Tuning reaction products by constrained optimisation. React. Chem. Eng. 2, 785–798 (2017).

    Article  CAS  Google Scholar 

  37. Chen, S., Reyes, K.-R. G., Gupta, M. K., McAlpine, M. C. & Powell, W. B. Optimal learning in experimental design using the knowledge gradient policy with application to characterizing nanoemulsion stability. SIAM/ASA J. Uncertain. Quantif. 3, 320–345 (2015).

    Google Scholar 

  38. Kalivas, J. H., Roberts, N. & Sutter, J. M. Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal. Chem. 61, 2024–2030 (1989).

    Article  CAS  Google Scholar 

  39. Sutter, J. M., Dixon, S. L. & Jurs, P. C. Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J. Chem. Inf. Comput. Sci. 35, 77–84 (1995).

    Article  CAS  Google Scholar 

  40. Corma, A. et al. Optimisation of olefin epoxidation catalysts with the application of high-throughput and genetic algorithms assisted by artificial neural networks (softcomputing techniques). J. Catal. 229, 513–524 (2005).

    Article  CAS  Google Scholar 

  41. Chen, X., Du, W., Qi, R., Qian, F. & Tianfield, H. Hybrid gradient particle swarm optimization for dynamic optimization problems of chemical processes. Asia Pac. J. Chem. Eng. 8, 708–720 (2013).

    Article  CAS  Google Scholar 

  42. Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3, 1337–1344 (2017).

    Article  CAS  Google Scholar 

  43. Nikolaev, P. et al. Autonomy in materials research: a case study in carbon nanotube growth. Comput. Mater. 2, 16031 (2016).

    Article  Google Scholar 

  44. Hibbert, D. B. Experimental design in chromatography: a tutorial review. J. Chromatogr. B 910, 2–13 (2012).

    Article  CAS  Google Scholar 

  45. Murray, P. M., Tyler, S. N. G. & Moseley, J. D. Beyond the numbers: charting chemical reaction space. Org. Process Res. Dev. 17, 40–46 (2013).

    Article  CAS  Google Scholar 

  46. Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).

    Article  Google Scholar 

  47. Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

    Article  CAS  Google Scholar 

  48. Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

    Article  CAS  Google Scholar 

  49. Sieg, S., Stutz, B., Schmidt, T., Hamprecht, F. & Maier, W. F. A. QCAR-approach to materials modeling. J. Mol. Model. 12, 611–619 (2006).

    Article  CAS  Google Scholar 

  50. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    Article  CAS  Google Scholar 

  51. Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

    Article  CAS  Google Scholar 

  52. Ley, S. V., Fitzpatrick, D. E., Myers, R. M., Battilocchio, C. & Ingham, R. J. Machine-assisted organic synthesis. Angew. Chem. Int. Ed. 54, 10122–10137 (2015).

    Article  CAS  Google Scholar 

  53. Pastre, J. C., Browne, D. L. & Ley, S. V. Flow chemistry syntheses of natural products. Chem. Soc. Rev. 42, 8849–8869 (2013).

    Article  CAS  Google Scholar 

  54. Straathof, N. J. W., Su, Y., Hessel, V. & Noël, T. Accelerated gas-liquid visible light photoredox catalysis with continuous-flow photochemical microreactors. Nat. Protoc. 11, 10–21 (2016).

    Article  CAS  Google Scholar 

  55. Ghislieri, D., Gilmore, K. & Seeberger, P. H. Chemical assembly systems: layered control for divergent, continuous, multistep syntheses of active pharmaceutical ingredients. Angew. Chem. Int. Ed. 54, 678–682 (2015).

    CAS  Google Scholar 

  56. Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).

    Article  CAS  Google Scholar 

  57. Lehmann, J. W., Blair, D. J. & Burke, M. D. Towards the generalized iterative synthesis of small molecules. Nat. Rev. Chem. 2, 0115 (2018).

    Article  Google Scholar 

  58. Adamo, A. et al. On-demand continuous-flow production of pharmaceuticals in a compact, reconfigurable system. Science 352, 61–67 (2016).

    Article  CAS  Google Scholar 

  59. Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 2–25 (2018).

    Article  Google Scholar 

  60. Gutierrez, J. M. P. et al. Evolution of oil droplets in a chemorobotic platform. Nat. Commun. 5, 5571 (2014).

    Article  CAS  Google Scholar 

  61. Krishnadasan, S., Brown, R. J. C., DeMello, A. J. & DeMello, J. C. Intelligent routes to the controlled synthesis of nanoparticles. Lab. Chip 7, 1434–1441 (2007).

    Article  CAS  Google Scholar 

  62. Roch, L. M. et al. ChemOS: an orchestration autonomous experimentation. Sci. Robot. 3, eaat5559 (2018).

    Article  Google Scholar 

  63. Goldstein, M. & Uchida, S. A. Comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE 11, e0152173 (2016).

    Article  Google Scholar 

  64. Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009).

    Article  Google Scholar 

  65. Oprea, T. I. Chemical space navigation in lead discovery. Curr. Opin. Chem. Biol. 6, 384–389 (2002).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge financial support from the UK Engineering and Physical Sciences Research Council (EPSRC) (grant nos EP/H024107/1, EP/I033459/1, EP/J00135X/1, EP/J015156/1, EP/K021966/1, EP/K023004/1, EP/K038885/1, EP/L015668/1 and EP/L023652/1) and the European Research Council (ERC) (project 670467 SMART-POM).

Reviewer information

Nature Reviews Chemistry thanks M. Waller and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

P.S.G. and A.B.H. contributed equally to the article. L.C. conceived the framework and developed the novelty algorithm presented here; L.C., A.B.H., P.S.G. and J.M.G. performed the literature review; and all the authors wrote the article. The authors thank N. A. B. Johnson for the artistic depiction used in the graphical abstract.

Corresponding author

Correspondence to Leroy Cronin.

Ethics declarations

Competing interests

The authors declare no competing interests

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gromski, P.S., Henson, A.B., Granda, J.M. et al. How to explore chemical space using algorithms and automation. Nat Rev Chem 3, 119–128 (2019). https://doi.org/10.1038/s41570-018-0066-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41570-018-0066-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing