How to explore chemical space using algorithms and automation

Abstract

Although extending the reactivity of a given class of molecules is relatively straightforward, the discovery of genuinely new reactivity and the molecules that result is a wholly more challenging problem. If new reactions can be considered unpredictable using current chemical knowledge, then we suggest that they are not merely new but also novel. Such a classification, however, requires an expert judge to have access to all current chemical knowledge or risks a lack of information being interpreted as unpredictability. Here, we describe how searching chemical space using automation and algorithms improves the probability of discovery. The former enables routine chemical tasks to be performed more quickly and consistently, while the latter uses algorithms to facilitate the searching of chemical knowledge databases. Experimental systems can also be developed to discover novel molecules, reactions and mechanisms by augmenting the intuition of the human expert. In order to find new chemical laws, we must seek to question current assumptions and biases. Accomplishing that involves using two areas of algorithmic approaches: algorithms to perform searches, and more general machine learning and statistical modelling algorithms to predict the chemistry under investigation. We propose that such a chemical intelligence approach is already being used and that, in the not-too-distant future, the automated chemical reactor systems controlled by these algorithms and monitored by a sensor array will be capable of navigating and searching chemical space more quickly, efficiently and, importantly, without bias. This approach promises to yield not only new molecules but also unpredictable and thus novel reactivity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Searching chemical space.
Fig. 2: Creation of databases and the extraction of data.
Fig. 3: Searching for new reactivity, methods or properties.
Fig. 4: Optimizing reaction conditions.
Fig. 5: A projected 3D search space.
Fig. 6: A flow chart to assist in a strict definition of validity, newness and novelty.

References

  1. 1.

    Miller, M. A. Chemical database techniques in drug discovery. Nat. Rev. Drug Discov. 1, 220–227 (2002).

    CAS  Article  Google Scholar 

  2. 2.

    Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).

    CAS  Article  Google Scholar 

  3. 3.

    Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    CAS  Article  Google Scholar 

  4. 4.

    Richmond, C. J. et al. A flow-system array for the discovery and scale up of inorganic clusters. Nat. Chem. 4, 1037–1043 (2012).

    CAS  Article  Google Scholar 

  5. 5.

    Carell, T. et al. New promise in combinatorial chemistry: synthesis, characterization, and screening of small-molecule libraries in solution. Chem. Biol. 2, 171–183 (1995).

    CAS  Article  Google Scholar 

  6. 6.

    Ortholand, J.-Y. & Ganesan, A. Natural products and combinatorial chemistry: back to the future. Curr. Opin. Chem. Biol. 8, 271–280 (2004).

    CAS  Article  Google Scholar 

  7. 7.

    Ingham, R. J. et al. A systems approach towards an intelligent and self-controlling platform for integrated continuous reaction sequences. Angew. Chem. Int. Ed. 54, 144–148 (2015).

    CAS  Article  Google Scholar 

  8. 8.

    Sans, V., Porwol, L., Dragone, V. & Cronin, L. A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopy. Chem. Sci. 6, 1258–1264 (2015).

    CAS  Article  Google Scholar 

  9. 9.

    Mitchell, J. B. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).

    CAS  Article  Google Scholar 

  10. 10.

    Oprea, T. I. & Gottfries, J. Chemography: the art of navigating in chemical space. J. Comb. Chem. 3, 157–166 (2001).

    CAS  Article  Google Scholar 

  11. 11.

    Lipinski, C. & Hopkins, A. Navigating chemical space for biology and medicine. Nature 432, 855–861 (2004).

    CAS  Article  Google Scholar 

  12. 12.

    Goodnow, R. A. Jr, Dumelin, C. E. & Keefe, A. D. DNA-encoded chemistry: enabling the deeper sampling of chemical space. Nat. Rev. Drug Discov. 16, 131–147 (2017).

    CAS  Article  Google Scholar 

  13. 13.

    Reymond, J.-L., Ruddigkeit, L., Blum, L. & van Deursen, R. The enumeration of chemical space. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 717–733 (2012).

    CAS  Article  Google Scholar 

  14. 14.

    Reymond, J.-L., van Deursen, R., Blum, L. C. & Ruddigkeit, L. Chemical space as a source for new drugs. Med. Chem. Commun. 1, 30–38 (2010).

    CAS  Article  Google Scholar 

  15. 15.

    Troshin, K. & Hartwig, J. F. Snap deconvolution: an informatics approach to high-throughput discovery of catalytic reactions. Science 357, 175–181 (2017).

    CAS  Article  Google Scholar 

  16. 16.

    Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    Article  Google Scholar 

  17. 17.

    Kreutz, J. E. et al. Evolution of catalysts directed by genetic algorithms in a plug-based microfluidic device tested with oxidation of methane by oxygen. J. Am. Chem. Soc. 132, 3128–3132 (2010).

    CAS  Article  Google Scholar 

  18. 18.

    Hopkinson, M. N., Gómez-Suárez, A., Teders, M., Sahoo, B. & Glorius, F. Accelerated discovery in photocatalysis using a mechanism-based screening method. Angew. Chem. Int. Ed. 55, 4361–4366 (2016).

    CAS  Article  Google Scholar 

  19. 19.

    Grzybowski, B. A., Bishop, K. J. M., Kowalczyk, B. & Wilmer, C. E. The ‘wired’ universe of organic chemistry. Nat. Chem. 1, 31–36 (2009).

    CAS  Article  Google Scholar 

  20. 20.

    Soh, S. et al. Estimating chemical reactivity and cross-influence from collective chemical knowledge. Chem. Sci. 3, 1497–1502 (2012).

    CAS  Article  Google Scholar 

  21. 21.

    Scior, T. et al. Recognizing pitfalls in virtual screening: a critical review. J. Chem. Inf. Model. 52, 867–881 (2012).

    CAS  Article  Google Scholar 

  22. 22.

    Collins, K. D., Gensch, T. & Glorius, F. Contemporary screening approaches to reaction discovery and development. Nat. Chem. 6, 859–871 (2014).

    CAS  Article  Google Scholar 

  23. 23.

    Santanilla, A. B. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

    Article  Google Scholar 

  24. 24.

    Ruddigkeit, L., Awale, M. & Reymond, J.-L. Expanding the fragrance chemical space for virtual screening. J. Cheminform. 6, 27 (2014).

    Article  Google Scholar 

  25. 25.

    Brereton, R. G. The evolution of chemometrics. Anal. Methods 5, 3785–3789 (2013).

    CAS  Article  Google Scholar 

  26. 26.

    Hopke, P. K. The evolution of chemometrics. Anal. Chim. Acta 500, 365–377 (2003).

    CAS  Article  Google Scholar 

  27. 27.

    Santiago, C. B., Guo, J.-Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

    Article  Google Scholar 

  29. 29.

    Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).

    CAS  Article  Google Scholar 

  30. 30.

    Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  31. 31.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    CAS  Article  Google Scholar 

  32. 32.

    McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an amino acid C-H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114–1117 (2011).

    CAS  Article  Google Scholar 

  33. 33.

    Weber, L., Illgen, K. & Almstetter, M. Discovery of new multi component reactions with combinatorial methods. Synlett 3, 366–374 (1999).

    Article  Google Scholar 

  34. 34.

    Beeler, A. A., Su, S., Singleton, C. A. & Porco, J. A. Discovery of chemical reactions through multidimensional screening. J. Am. Chem. Soc. 129, 1413–1419 (2007).

    CAS  Article  Google Scholar 

  35. 35.

    Robbins, D. W. & Hartwig, J. F. A. Simple, multidimensional approach to high-throughput discovery of catalytic reactions. Science 333, 1423–1427 (2011).

    CAS  Article  Google Scholar 

  36. 36.

    Walker, B. E., Bannock, J. H., Nightingale, A. M. & deMello, J. C. Tuning reaction products by constrained optimisation. React. Chem. Eng. 2, 785–798 (2017).

    CAS  Article  Google Scholar 

  37. 37.

    Chen, S., Reyes, K.-R. G., Gupta, M. K., McAlpine, M. C. & Powell, W. B. Optimal learning in experimental design using the knowledge gradient policy with application to characterizing nanoemulsion stability. SIAM/ASA J. Uncertain. Quantif. 3, 320–345 (2015).

    Google Scholar 

  38. 38.

    Kalivas, J. H., Roberts, N. & Sutter, J. M. Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal. Chem. 61, 2024–2030 (1989).

    CAS  Article  Google Scholar 

  39. 39.

    Sutter, J. M., Dixon, S. L. & Jurs, P. C. Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J. Chem. Inf. Comput. Sci. 35, 77–84 (1995).

    CAS  Article  Google Scholar 

  40. 40.

    Corma, A. et al. Optimisation of olefin epoxidation catalysts with the application of high-throughput and genetic algorithms assisted by artificial neural networks (softcomputing techniques). J. Catal. 229, 513–524 (2005).

    CAS  Article  Google Scholar 

  41. 41.

    Chen, X., Du, W., Qi, R., Qian, F. & Tianfield, H. Hybrid gradient particle swarm optimization for dynamic optimization problems of chemical processes. Asia Pac. J. Chem. Eng. 8, 708–720 (2013).

    CAS  Article  Google Scholar 

  42. 42.

    Zhou, Z., Li, X. & Zare, R. N. Optimizing chemical reactions with deep reinforcement learning. ACS Cent. Sci. 3, 1337–1344 (2017).

    CAS  Article  Google Scholar 

  43. 43.

    Nikolaev, P. et al. Autonomy in materials research: a case study in carbon nanotube growth. Comput. Mater. 2, 16031 (2016).

    Article  Google Scholar 

  44. 44.

    Hibbert, D. B. Experimental design in chromatography: a tutorial review. J. Chromatogr. B 910, 2–13 (2012).

    CAS  Article  Google Scholar 

  45. 45.

    Murray, P. M., Tyler, S. N. G. & Moseley, J. D. Beyond the numbers: charting chemical reaction space. Org. Process Res. Dev. 17, 40–46 (2013).

    CAS  Article  Google Scholar 

  46. 46.

    Wolpert, D. H. & Macready, W. G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997).

    Article  Google Scholar 

  47. 47.

    Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

    CAS  Article  Google Scholar 

  48. 48.

    Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

    CAS  Article  Google Scholar 

  49. 49.

    Sieg, S., Stutz, B., Schmidt, T., Hamprecht, F. & Maier, W. F. A. QCAR-approach to materials modeling. J. Mol. Model. 12, 611–619 (2006).

    CAS  Article  Google Scholar 

  50. 50.

    Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    CAS  Article  Google Scholar 

  51. 51.

    Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

    CAS  Article  Google Scholar 

  52. 52.

    Ley, S. V., Fitzpatrick, D. E., Myers, R. M., Battilocchio, C. & Ingham, R. J. Machine-assisted organic synthesis. Angew. Chem. Int. Ed. 54, 10122–10137 (2015).

    CAS  Article  Google Scholar 

  53. 53.

    Pastre, J. C., Browne, D. L. & Ley, S. V. Flow chemistry syntheses of natural products. Chem. Soc. Rev. 42, 8849–8869 (2013).

    CAS  Article  Google Scholar 

  54. 54.

    Straathof, N. J. W., Su, Y., Hessel, V. & Noël, T. Accelerated gas-liquid visible light photoredox catalysis with continuous-flow photochemical microreactors. Nat. Protoc. 11, 10–21 (2016).

    CAS  Article  Google Scholar 

  55. 55.

    Ghislieri, D., Gilmore, K. & Seeberger, P. H. Chemical assembly systems: layered control for divergent, continuous, multistep syntheses of active pharmaceutical ingredients. Angew. Chem. Int. Ed. 54, 678–682 (2015).

    CAS  Google Scholar 

  56. 56.

    Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347, 1221–1226 (2015).

    CAS  Article  Google Scholar 

  57. 57.

    Lehmann, J. W., Blair, D. J. & Burke, M. D. Towards the generalized iterative synthesis of small molecules. Nat. Rev. Chem. 2, 0115 (2018).

    Article  Google Scholar 

  58. 58.

    Adamo, A. et al. On-demand continuous-flow production of pharmaceuticals in a compact, reconfigurable system. Science 352, 61–67 (2016).

    CAS  Article  Google Scholar 

  59. 59.

    Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57, 2–25 (2018).

    Article  Google Scholar 

  60. 60.

    Gutierrez, J. M. P. et al. Evolution of oil droplets in a chemorobotic platform. Nat. Commun. 5, 5571 (2014).

    CAS  Article  Google Scholar 

  61. 61.

    Krishnadasan, S., Brown, R. J. C., DeMello, A. J. & DeMello, J. C. Intelligent routes to the controlled synthesis of nanoparticles. Lab. Chip 7, 1434–1441 (2007).

    CAS  Article  Google Scholar 

  62. 62.

    Roch, L. M. et al. ChemOS: an orchestration autonomous experimentation. Sci. Robot. 3, eaat5559 (2018).

    Article  Google Scholar 

  63. 63.

    Goldstein, M. & Uchida, S. A. Comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE 11, e0152173 (2016).

    Article  Google Scholar 

  64. 64.

    Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009).

    Article  Google Scholar 

  65. 65.

    Oprea, T. I. Chemical space navigation in lead discovery. Curr. Opin. Chem. Biol. 6, 384–389 (2002).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge financial support from the UK Engineering and Physical Sciences Research Council (EPSRC) (grant nos EP/H024107/1, EP/I033459/1, EP/J00135X/1, EP/J015156/1, EP/K021966/1, EP/K023004/1, EP/K038885/1, EP/L015668/1 and EP/L023652/1) and the European Research Council (ERC) (project 670467 SMART-POM).

Reviewer information

Nature Reviews Chemistry thanks M. Waller and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

P.S.G. and A.B.H. contributed equally to the article. L.C. conceived the framework and developed the novelty algorithm presented here; L.C., A.B.H., P.S.G. and J.M.G. performed the literature review; and all the authors wrote the article. The authors thank N. A. B. Johnson for the artistic depiction used in the graphical abstract.

Corresponding author

Correspondence to Leroy Cronin.

Ethics declarations

Competing interests

The authors declare no competing interests

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gromski, P.S., Henson, A.B., Granda, J.M. et al. How to explore chemical space using algorithms and automation. Nat Rev Chem 3, 119–128 (2019). https://doi.org/10.1038/s41570-018-0066-y

Download citation

Further reading

Search

Quick links

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing