Perspective | Published:

How to explore chemical space using algorithms and automation


Although extending the reactivity of a given class of molecules is relatively straightforward, the discovery of genuinely new reactivity and the molecules that result is a wholly more challenging problem. If new reactions can be considered unpredictable using current chemical knowledge, then we suggest that they are not merely new but also novel. Such a classification, however, requires an expert judge to have access to all current chemical knowledge or risks a lack of information being interpreted as unpredictability. Here, we describe how searching chemical space using automation and algorithms improves the probability of discovery. The former enables routine chemical tasks to be performed more quickly and consistently, while the latter uses algorithms to facilitate the searching of chemical knowledge databases. Experimental systems can also be developed to discover novel molecules, reactions and mechanisms by augmenting the intuition of the human expert. In order to find new chemical laws, we must seek to question current assumptions and biases. Accomplishing that involves using two areas of algorithmic approaches: algorithms to perform searches, and more general machine learning and statistical modelling algorithms to predict the chemistry under investigation. We propose that such a chemical intelligence approach is already being used and that, in the not-too-distant future, the automated chemical reactor systems controlled by these algorithms and monitored by a sensor array will be capable of navigating and searching chemical space more quickly, efficiently and, importantly, without bias. This approach promises to yield not only new molecules but also unpredictable and thus novel reactivity.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Miller, M. A. Chemical database techniques in drug discovery. Nat. Rev. Drug Discov. 1, 220–227 (2002).

  2. 2.

    Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

  3. 3.

    Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

  4. 4.

    Richmond, C. J. et al. A flow-system array for the discovery and scale up of inorganic clusters. Nat. Chem. 4, 1037–1043 (2012).

  5. 5.

    Carell, T. et al. New promise in combinatorial chemistry: synthesis, characterization, and screening of small-molecule libraries in solution. Chem. Biol. 2, 171–183 (1995).

  6. 6.

    Ortholand, J.-Y. & Ganesan, A. Natural products and combinatorial chemistry: back to the future. Curr. Opin. Chem. Biol. 8, 271–280 (2004).

  7. 7.

    Ingham, R. J. et al. A systems approach towards an intelligent and self-controlling platform for integrated continuous reaction sequences. Angew. Chem. Int. Ed. 54, 144–148 (2015).

  8. 8.

    Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).

  9. 9.

    Mitchell, J. B. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).

  10. 10.

    Oprea, T. I. & Gottfries, J. Chemography: the art of navigating in chemical space. J. Comb. Chem. 3, 157–166 (2001).

  11. 11.

    Lipinski, C. & Hopkins, A. Navigating chemical space for biology and medicine. Nature 432, 855–861 (2004).

  12. 12.

    Goodnow, R. A. Jr, Dumelin, C. E. & Keefe, A. D. DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nat. Rev. Drug Discov. 16, 131–147 (2017).

  13. 13.

    Reymond, J.-L., Ruddigkeit, L., Blum, L. & van Deursen, R. The enumeration of chemical space. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 717–733 (2012).

  14. 14.

    Reymond, J.-L., van Deursen, R., Blum, L. C. & Ruddigkeit, L. Chemical space as a source for new drugs. Med. Chem. Commun. 1, 30–38 (2010).

  15. 15.

    Troshin, K. & Hartwig, J. F. Snap deconvolution: an informatics approach to high-throughput discovery of catalytic reactions. Science 357, 175–181 (2017).

  16. 16.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

  17. 17.

    Kreutz, J. E. et al. Evolution of catalysts directed by genetic algorithms in a plug-based microfluidic device tested with oxidation of methane by oxygen. J. Am. Chem. Soc. 132, 3128–3132 (2010).

  18. 18.

    Hopkinson, M. N., Gómez-Suárez, A., Teders, M., Sahoo, B. & Glorius, F. Accelerated discovery in photocatalysis using a mechanism-based screening method. Angew. Chem. Int. Ed. 55, 4361–4366 (2016).

  19. 19.

    Grzybowski, B. A., Bishop, K. J. M., Kowalczyk, B. & Wilmer, C. E. The ‘wired’ universe of organic chemistry. Nat. Chem. 1, 31–36 (2009).

  20. 20.

    Soh, S. et al. Estimating chemical reactivity and cross-influence from collective chemical knowledge. Chem. Sci. 3, 1497–1502 (2012).

  21. 21.

    Scior, T. et al. Recognizing pitfalls in virtual screening: a critical review. J. Chem. Inf. Model. 52, 867–881 (2012).

  22. 22.

    Collins, K. D., Gensch, T. & Glorius, F. Contemporary screening approaches to reaction discovery and development. Nat. Chem. 6, 859–871 (2014).

  23. 23.

    Santanilla, A. B. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

  24. 24.

    Ruddigkeit, L., Awale, M. & Reymond, J.-L. Expanding the fragrance chemical space for virtual screening. J. Cheminform. 6, 27 (2014).

  25. 25.

    Brereton, R. G. The evolution of chemometrics. Anal. Methods 5, 3785–3789 (2013).

  26. 26.

    Hopke, P. K. The evolution of chemometrics. Anal. Chim. Acta 500, 365–377 (2003).

  27. 27.

    Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

  28. 28.

    Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

  29. 29.

    Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

  30. 30.

    Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

  31. 31.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

  32. 32.

    McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an amino acid C-H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114–1117 (2011).

  33. 33.

    Weber, L., Illgen, K. & Almstetter, M. Discovery of new multi component reactions with combinatorial methods. Synlett 3, 366–374 (1999).

  34. 34.

    Beeler, A. A., Su, S., Singleton, C. A. & Porco, J. A. Discovery of chemical reactions through multidimensional screening. J. Am. Chem. Soc. 129, 1413–1419 (2007).

  35. 35.

    Robbins, D. W. & Hartwig, J. F. A. Simple, multidimensional approach to high-throughput discovery of catalytic reactions. Science 333, 1423–1427 (2011).

  36. 36.

    Walker, B. E., Bannock, J. H., Nightingale, A. M. & deMello, J. C. Tuning reaction products by constrained optimisation. React. Chem. Eng. 2, 785–798 (2017).

  37. 37.

    Chen, S., Reyes, K.-R. G., Gupta, M. K., McAlpine, M. C. & Powell, W. B. Optimal learning in experimental design using the knowledge gradient policy with application to characterizing nanoemulsion stability. SIAM/ASA J. Uncertain. Quantif. 3, 320–345 (2015).

  38. 38.

    Kalivas, J. H., Roberts, N. & Sutter, J. M. Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal. Chem. 61, 2024–2030 (1989).

  39. 39.

    Sutter, J. M., Dixon, S. L. & Jurs, P. C. Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J. Chem. Inf. Comput. Sci. 35, 77–84 (1995).

  40. 40.

    Corma, A. et al. Optimisation of olefin epoxidation catalysts with the application of high-throughput and genetic algorithms assisted by artificial neural networks (softcomputing techniques). J. Catal. 229, 513–524 (2005).

  41. 41.

    Chen, X., Du, W., Qi, R., Qian, F. & Tianfield, H. Hybrid gradient particle swarm optimization for dynamic optimization problems of chemical processes. Asia Pac. J. Chem. Eng. 8, 708–720 (2013).

  42. 42.

    Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3, 1337–1344 (2017).

  43. 43.

    Nikolaev, P. et al. Autonomy in materials research: a case study in carbon nanotube growth. Comput. Mater. 2, 16031 (2016).

  44. 44.

    Hibbert, D. B. Experimental design in chromatography: a tutorial review. J. Chromatogr. B 910, 2–13 (2012).

  45. 45.

    Murray, P. M., Tyler, S. N. G. & Moseley, J. D. Beyond the numbers: charting chemical reaction space. Org. Process Res. Dev. 17, 40–46 (2013).

  46. 46.

    Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).

  47. 47.

    Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

  48. 48.

    Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

  49. 49.

    Sieg, S., Stutz, B., Schmidt, T., Hamprecht, F. & Maier, W. F. A. QCAR-approach to materials modeling. J. Mol. Model. 12, 611–619 (2006).

  50. 50.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

  51. 51.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

  52. 52.

    Ley, S. V., Fitzpatrick, D. E., Myers, R. M., Battilocchio, C. & Ingham, R. J. Machine-assisted organic synthesis. Angew. Chem. Int. Ed. 54, 10122–10137 (2015).

  53. 53.

    Pastre, J. C., Browne, D. L. & Ley, S. V. Flow chemistry syntheses of natural products. Chem. Soc. Rev. 42, 8849–8869 (2013).

  54. 54.

    Straathof, N. J. W., Su, Y., Hessel, V. & Noël, T. Accelerated gas-liquid visible light photoredox catalysis with continuous-flow photochemical microreactors. Nat. Protoc. 11, 10–21 (2016).

  55. 55.

    Ghislieri, D., Gilmore, K. & Seeberger, P. H. Chemical assembly systems: layered control for divergent, continuous, multistep syntheses of active pharmaceutical ingredients. Angew. Chem. Int. Ed. 54, 678–682 (2015).

  56. 56.

    Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).

  57. 57.

    Lehmann, J. W., Blair, D. J. & Burke, M. D. Towards the generalized iterative synthesis of small molecules. Nat. Rev. Chem. 2, 0115 (2018).

  58. 58.

    Adamo, A. et al. On-demand continuous-flow production of pharmaceuticals in a compact, reconfigurable system. Science 352, 61–67 (2016).

  59. 59.

    Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 2–25 (2018).

  60. 60.

    Gutierrez, J. M. P. et al. Evolution of oil droplets in a chemorobotic platform. Nat. Commun. 5, 5571 (2014).

  61. 61.

    Krishnadasan, S., Brown, R. J. C., DeMello, A. J. & DeMello, J. C. Intelligent routes to the controlled synthesis of nanoparticles. Lab. Chip 7, 1434–1441 (2007).

  62. 62.

    Roch, L. M. et al. ChemOS: an orchestration autonomous experimentation. Sci. Robot. 3, eaat5559 (2018).

  63. 63.

    Goldstein, M. & Uchida, S. A. Comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE 11, e0152173 (2016).

  64. 64.

    Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009).

  65. 65.

    Oprea, T. I. Chemical space navigation in lead discovery. Curr. Opin. Chem. Biol. 6, 384–389 (2002).

Download references


The authors gratefully acknowledge financial support from the UK Engineering and Physical Sciences Research Council (EPSRC) (grant nos EP/H024107/1, EP/I033459/1, EP/J00135X/1, EP/J015156/1, EP/K021966/1, EP/K023004/1, EP/K038885/1, EP/L015668/1 and EP/L023652/1) and the European Research Council (ERC) (project 670467 SMART-POM).

Reviewer information

Nature Reviews Chemistry thanks M. Waller and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

P.S.G. and A.B.H. contributed equally to the article. L.C. conceived the framework and developed the novelty algorithm presented here; L.C., A.B.H., P.S.G. and J.M.G. performed the literature review; and all the authors wrote the article. The authors thank N. A. B. Johnson for the artistic depiction used in the graphical abstract.

Competing interests

The authors declare no competing interests

Correspondence to Leroy Cronin.

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

About this article

Further reading

Fig. 1: Searching chemical space.
Fig. 2: Creation of databases and the extraction of data.
Fig. 3: Searching for new reactivity, methods or properties.
Fig. 4: Optimizing reaction conditions.
Fig. 5: A projected 3D search space.
Fig. 6: A flow chart to assist in a strict definition of validity, newness and novelty.