From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis

Abstract

The organic chemist’s toolbox is vast, with technologies to accelerate the synthesis of novel chemical matter. The field of asymmetric catalysis is one approach to accessing new areas of chemical space and computational power is today sufficient to assist in this exploration. Unfortunately, existing techniques generally require computational expertise and are therefore underutilized in synthetic chemistry. Here we present our platform Virtual Chemist, which allows bench chemists to predict outcomes of asymmetric chemical reactions ahead of testing in the laboratory, in just a few clicks. Modular workflows facilitate the simulation of various sets of experiments, including the four realistic scenarios discussed: one-by-one design, library screening, hit optimization and substrate-scope evaluation. Catalyst candidates are screened within hours and the enantioselectivity predictions provide substantial enrichments compared to random testing. The achieved accuracies within ~1 kcal mol–1 provide opportunities for computational chemistry in the field of asymmetric catalyst design, allowing bench chemists to guide the design and discovery of asymmetric catalysts.

Main

Organic chemistry research is vital to the discovery, optimization and large-scale production of numerous small molecules, such as novel drugs that treat life-threatening diseases. It contributes to the design of innovative materials comprising modern electronics and low power consumption OLEDs and to the development of novel agricultural practices, cosmetics, textiles, inks and paints, to name a few1. Unfortunately, a major hurdle in the production of these complex small molecules is the challenging syntheses they often require. Although several research groups are focused on the development and optimization of new methodologies, they are often reaction-specific and universalizing them for mainstream wet laboratory chemistry requires substantial work.

For the design of novel organic synthetic methodologies to access novel compounds, chemists often make use of the vast organic chemistry toolbox at their disposal; chemists routinely make use of nuclear magnetic resonance (NMR), mass spectrometry (MS) and chromatography. These complex scientific technologies are largely accessible without expert knowledge of their innerworkings. For example, synthetic chemists run standard 1H NMR, 13C NMR and various 2D NMR experiments without necessarily understanding and/or manipulating the magnetic pulse sequences. By contrast, computational chemistry remains largely inaccessible to the experimental chemistry community; complex theoretical calculations are neglected since coding/programming knowledge, sometimes advanced experience, is often a prerequisite. The omission of computational techniques from the larger toolbox is regrettable, since interpreting unexpected observations2 and proposing new reaction mechanisms3,4 have been attributed, in part, to computations. With the rise of quantum mechanics (QM) methods (Hartree–Fock (HF) and density functional theory (DFT)) and molecular mechanics (MM) methods (docking and molecular dynamics), organic chemists have become aware of the power and utility of such computations. Computational experts frequently collaborate with experimentalists to rationalize the observations of the organic chemists. However, rather than only offering post facto theories, computational chemistry could prospectively hypothesize and screen organic chemistry transformations. We remain sanguine at such a possibility upon consideration of a similar successful implementation of computer simulations in drug discovery5,6. After the pioneering development in 1982 of DOCK7, a structure-based drug discovery tool, an entire field of research emerged. In fact, many computational techniques including machine learning8, molecular dynamics9, molecular docking10 and pharmacophore modelling11 are now commonplace, addressing research challenges in drug discovery. Theoretically, analogous computational techniques could tackle synthetic chemistry challenges; already, robotics12 and synthetic planning computational tools13,14 have been reported and will likely be incorporated into many chemistry laboratories soon.

Among synthetic methodologies are asymmetric transformations. While biocatalysis and the use of the chiral pool are common approaches for the synthesis of chiral molecules (for example, chiral drugs and chiral materials), their application is limited (substrate specificity and stability of biocatalysts and limited available chiral molecules). Asymmetric synthesis is an attractive alternative to generating chiral molecules in high quantity and purity. In practical terms, cheap, selective, synthetically accessible and green asymmetric catalysts are highly desired to shorten synthetic routes to complex small molecules. The vastness of the chemical space suggests that many organocatalysts or transition metal catalysts exist, but their discovery is challenging, tedious and physically intractable using solely traditional experimental techniques15. The exploration of the chemical space can, however, be more efficient when performed computationally. Furthermore, virtually applying identified and selected catalysts to predict stereoselectivity for a specific reaction is within reach16.

Several groups have focused on the prediction of stereoselectivity of asymmetric transformations17,18,19. Among the proposed approaches are statistical models20,21, neural networks (NNs)22,23, DFT17,18,24,25,26,27,28,29,30,31, QM/MM32 and MM-based methods (Q2MM33,34 and ACE35,36), with DFT being the most widely used. However, despite the demonstration of its feasibility, it was not until 2009 that the first use of DFT for screening a small sized set of asymmetric catalysts and substrates was reported37, with little work communicated since. In addition, Bootsma and Wheeler recently revealed a major potential pitfall when DFT free energy calculations are used to predict enantioselectivity38. NNs are newer on the scene, but require a plethora of data and may not be appropriate for discovering novel catalysts. Our program ACE (Asymmetric Catalyst Evaluation) combines ground state parameters of reactants and products to predict transition state (TS) geometries and enantioselectivities. Alternatively, Q2MM can be used to derive a reaction-specific TS force field (TSFF).

Generally, most software in this domain has been plagued by poor usability and time inefficiencies, although the Wiest/Norrby research group developed CatVS to begin addressing those concerns39. We suggest that organic chemists should be able to screen for potential asymmetric catalysts using computational methods. More broadly, we aim to continue to advocate for the use of virtual asymmetric catalyst discovery and design as a complement to traditional and automated asymmetric catalysis.

Here we present our efforts to develop a platform (VIRTUAL CHEMIST) that integrates all the tools, accessories and automation required for organic chemistry laboratories to design experiments, rather than rationalize data. Its application to the simulation of four catalyst discovery scenarios (discovery of catalysis through trial and error, screening of potential catalysts, catalyst optimization and investigation of catalyst scope) has demonstrated its accuracy and usefulness. Moreover, VIRTUAL CHEMIST improves on CatVS by providing access to easily-customizable workflows, within the context of a graphical user interface. It allows application of a range of methods (ACE and Q2MM), with more approximate tools that can be applied almost immediately, up to methods that can be fine-tuned for each specific reaction type, as illustrated for two examples39.

Results

Initial considerations

For use by organic chemists, the accessibility aspect of this technology must be addressed without sacrificing accuracy. Regarding accessibility: this technology should not require large computational resources, should ideally be useable on a standard desktop computer (Windows, Linux, MacOS) and should be substantially faster than the experiments being simulated. We believe that this software should bring knowledge complementary to that of chemists, taking advantage of complex calculations (machine time) and years of expertise (human time). For example, chemists should be able to interact with this technology instructing the software for specially desired properties (for example protecting groups, water solubility and commercial availability of chemicals). Regarding accuracy: a difference of only 1 kcal mol–1 between diastereomeric TSs can distinguish between weakly stereoselective catalysts. To put this margin of error in context, in the drug discovery process one often investigates molecules hitting a target with reasonable binding affinity. In this case, an accuracy of a few kcal mol–1 can differentiate between strong, weak and non-binders (for example, 4 kcal mol–1 would differentiate between a nanomolar and a micromolar enzyme inhibitor). As such, accuracy is a major challenge in asymmetric catalyst screening.

The ultimate objective of this research programme is to deliver software that can simulate an entire organic chemistry project from beginning to end. As an example, we demonstrate the development of a Diels–Alder organocatalyst (Fig. 1).

Fig. 1: Experimental versus computational catalyst discovery.
figure1

a, Experiments may start with a library of molecules that will be tested as catalysts. b, Computationally, this process would start with a database of chemicals that would be evaluated for their catalytic activity and stereoselectivity.

In this scenario, we would need software to prepare virtually libraries of potential catalysts and to understand chemistry concepts such as chirality, functional group compatibility (chemoselectivity) and similarity, evaluate the catalytic activity of the potential catalysts and evaluate the enantioselectivity induced by these catalysts. Ideally, a common platform would seamlessly execute all three actions without user intervention. Chemists should also be able to instruct the software through sketches using a program they are familiar with (for example, ChemDraw, IsisDraw, ChemWindow and so on).

Automation and accessory programs

To run the gamut of simulations mentioned above, several transformations and computations must be automated and concealed from the user; this is an often forgotten, yet major, challenge of this research.

We built on and expanded our drug discovery platform Forecaster user interface (UI) to create a novel platform, VIRTUAL CHEMIST. This UI contains a 2D sketcher for drawing input catalysts and substrates and an easy-to-use three-dimensional graphical interface for visualizing the calculated output TS structures (Fig. 2). Additionally, resulting data is summarized in the UI (for example, the potential energies of TS structures and predicted enantioselectivities). Finally, we have made strides towards universal application by enabling the creation of modular workflows.

Fig. 2: Screening catalysts for diethylzinc addition to aldehydes.
figure2

a, From reported Cartesian coordinates and drawn catalysts and substrates to accurate TSs. b, Workflow corresponding to the tasks shown in a.

Preparing libraries of potential catalysts may be the first step. Previously reported programs SELECT (searches for analogues or dissimilar compounds and optimizes library diversity) and REDUCE (filters chemical library for presence of functional groups such as secondary amines for organocatalysed Diels–Alder cycloaddition)40 are accessible in modular workflows. A library of synthetic analogues can also be generated using our previously reported searching and combinatorial tools FINDERS and REACT2D41. In contrast to other virtual combinatorial library tools42 these programs consider stereochemistry change during a reaction (for example, in a Mitsunobu reaction), ensuring that the asymmetric catalysts virtually screened are truly synthetically accessible.

Predicting enantioselectivity would be the next step. Generally, for each catalyst candidate, the software must compile a TS, parameterize that system and then compute energies. First, where does a TS come from? As an example, consider the diethylzinc addition to aldehydes previously investigated with Q2MM43. In this work, TS structures were provided as Cartesian coordinates using a common ‘xyz’ format. These structures (text files) could be used as a starting point for screening asymmetric catalysts without any graphical user interface or QM methods. As shown in Fig. 2, provided Cartesian coordinates yield TS templates that are subsequently used to assemble realistic TS structures for a series of catalysts and substrates.

All of these steps were successfully integrated into a single program Constructs (Converting and Orienting Native Structures on Templates of Rotatable and Unoptimized Chemical Transition States). In short, CONSTRUCTS assembles TS structures with reasonable geometry (later optimized by ACE) from simple text files (TSs of simple models), 2D catalysts and substrate sketches (Fig. 2a). For simplicity, we have also included several reactions precluding the acquisition of TS coordinates.

Our software ACE, which predicts stereoselectivity of reactions, relies on MM3 force field (FF) parameters. However, in the MM3 FF, metals are not parameterized, precluding the use of previous versions of ACE for metal-catalysed additions. Parameters can now be fully developed in a user-friendly manner using our program QUEMIST (QUantum Energy of Molecules Inducing Structural Transformations). QUEMIST enables single point energy calculations, geometry optimizations and Hessian calculations using HF methods to automate the generation of FF parameters using the method developed by Seminario44 and improved by Allen et al.45. We acknowledge that using HF methods for metal-containing systems is far from ideal. However, the results presented below are encouraging and support their usage here. Importantly, the parameters only need to be developed once and can subsequently be used to screen libraries of catalysts.

Q2MM FFs were reported for some reactions and made available to the community (diethylzinc addition to aldehydes43, asymmetric dihydroxylation46 and rhodium-catalysed hydrogenation of activated alkenes47). However, these TSFFs require access to an external MM package for TS optimization. We therefore decided to take advantage of ACE, which includes all of these MM routines and added the option to use Q2MM-derived TSFFs, thus improving the usability of Q2MM.

Finally, we need to evaluate the catalytic activity. While some molecules may be predicted to be stereoselective, they may not be reactive (that is, they would not catalyse the reaction). To evaluate the catalytic activity of a set of molecules, various reactivity parameters, including a nucleophilicity index, may now be computed using our program QUEMIST, mentioned above, embedded into Smart. Smart had been developed to compute a number of molecular properties and descriptors, such as molecular weight and the presence of some functional groups40. The pseudocode of all the programs used in this study is available in the Supplementary Sections I.2–I.7.

To assess the applicability of these tools, we envisioned four different realistic scenarios. First, a chemist may draw catalysts one by one and test the potential stereoinduction. Second, a chemist may screen a large database of chiral molecules to identify novel chemical series. Third, a chemist may search for analogues as part of the lead optimization of a hit molecule (with analogy to drug discovery). Finally, a chemist may assess the substrate-scope of a specific catalyst.

Application of the software tools to scenario 1

A chemist may want to test one catalyst at a time and identify virtually the most promising, truly interacting with the platform. In this scenario, each catalyst may be drawn using the provided sketcher; TS templates are either available directly or may be built from literature data (see the tutorial provided as Supplementary Data 1 for examples). We tested this scenario on over 350 reactions from seven reaction classes (Fig. 3, complete set given in Supplementary Tables 18) and compared the results from random predictions to assess the accuracy of the methodology (Fig. 4).

Fig. 3: ACE-optimized TS structures for selected reactions.
figure3

General reaction schemes are drawn, followed by 2D and 3D representations of TS models.

Fig. 4: Accuracy of the programs for scenario 1.
figure4

a, Mean unsigned error for ΔΔG (kcal mol–1) between the predicted and experimentally measured reactions for each catalyst/auxiliary–substrate pair; 1–7 refer to seven reaction types using Ace; 8–10 refer to three reactions using ACE and reported Q2MM-derived TSFFs. The black dots refer to the error should we select a random value from −4.12 to 4.12 kcal mol–1 (that is, maximum stereoselectivity of 1,000:1). In red is shown the average of the unsigned error over the set of catalysts/auxiliaries used for each reaction type. b, Predicted versus observed ΔΔG for a set of 51 asymmetric catalyst/substrate pairs (epoxidation reaction). A positive ΔΔG value represents one enantiomer, while a negative ΔΔG value represents the other enantiomer. Details of the predictions are provided as Supplementary Information (Supplementary Tables 915). Diels–Alder aux., Diels–Alder cycloaddition with chiral auxiliaries; Organocat. Diels–Alder, organocatalysed Diels–Alder cycloadditions; Et2Zn add., Et2Zn addition to aldehyde. The trendline and correlation coefficient were computed in Microsoft Excel 2016.

To evaluate accuracy, we first visualized the proposed TS structures (Fig. 3). As observed with previous versions, Ace-generated TS structures resemble those previously proposed35,36. We then investigated whether the stereoselectivity predictions were accurate. The error of the prediction of ΔΔG between the major diastereomeric TSs was computed and compared to a random assignment (Fig. 4). We note that none of the FFs used by ACE in these tests have been trained specifically on these reactions. Since Q2MM TSFFs have been derived to complement MM3* and Ace is using MM3, the accuracy presented here may underestimate the accuracy of the TSFFs.

As can be seen in Fig. 4, the overall average error ranges between 0.94 and 0.97 kcal mol–1 (over five runs). This ~1.0 kcal mol–1 value, often referred to as chemical accuracy, is the gold standard in quantum chemistry and catalysis48. With this accuracy, the platform can distinguish poor asymmetric catalysts (0% e.e., ΔΔG ~0 kcal mol–1) from good asymmetric catalysts (90% e.e., ΔΔG ~1.4 kcal mol–1) and good asymmetric catalysts from excellent asymmetric catalysts (99% e.e., ΔΔG ~2.8 kcal/mol–1). It is noteworthy that some of the catalysts used in this set have been reported to produce various enantioselectivities depending on conditions (such as acid cocatalyst, solvent and temperature, see for example ref. 49). Although ACE considers solvent (implicit model) and temperature (Boltzmann population), manipulating the two parameters did not improve accuracy. The nature of the acid cocatalyst in the Diels–Alder reaction was not considered. ACE produces a similar average mean unsigned error (within 0.2 kcal mol–1) whether using the original MM3 implementation or the Q2MM-generated TSFF.

A closer look at the epoxidation reaction (Fig. 4b) shows that the most weakly stereoselective catalysts (for example, ΔΔG < 1.0 kcal mol–1) were predicted to induce weak stereoselectivity, while the most strongly stereoselective (for example, ΔΔG > 2.0 kcal mol–1) were predicted to induce strong stereoselectivity.

We investigated the false positives and false negatives that, in large part, result from poor parameters in the MM3 force field rather than intrinsic problems in the methodologies. For example, sugar derivatives such as 6, conjugated systems (possessing an aniline nitrogen and axial chirality) such as 3, sulfonamides (5), silylethers (4), polycyclic compounds (8, 9) and complex phosphine ligands (10) are not well-parameterized in MM3 (Fig. 5). In particular, phenyl sulfonamides have a very particular torsional energy profile (although MM3 parameters for alkyl sulfonamides have been reported50), while phosphines can adopt different cone angles51. Efforts to develop a FF with large applicability domain are ongoing in our laboratory to address this issue52,53,54,55.

Fig. 5: Investigating the failures.
figure5

Example of substrates and catalysts that resulted in ΔΔG errors of 2 kcal mol–1 or more. TES, triethylsilyl.

Overall, the data demonstrated that this platform can be used to evaluate retrospectively asymmetric catalysts through interaction with the chemists and prompted us to start a larger virtual screening study.

Application of the software tools to scenario 2

A chemist may be looking for a new chemical series as catalyst for a known reaction. As examples, the Shi epoxidation and organocatalysed Diels–Alder reaction were used. These two well-characterized reactions were chosen here due to the existence of known, highly selective catalysts that are few in number. As a result, we expected to generate decoys from library filtering and attempted to recover the known molecules embedded in this list.

A library of approximately 140,000 chiral amines was assembled from the ZINC database56 for the Diels–Alder reaction and the workflow shown in Fig. 6a was assembled. Molecular descriptors were computed for these molecules and used to extract only those of interest (molecular mass < 500, uncharged compounds, only secondary amines, aldehydes and other reactive functional groups removed). Then, any molecules too similar to known catalysts (for example, proline methyl ester in organocatalysed reactions) were removed since the objective was to discover new chemical series. At this stage, nearly 10,000 potential catalysts were selected. SELECT was used to remove analogues and pick the most diverse molecules (for optimal computing time). To ensure that no duplicates were left, our program DIVERSE was applied. A total of 1,307 candidate catalysts remained for screening.

Fig. 6: Accuracy of the programs for scenario 2.
figure6

a, Workflow for selecting the most diverse molecules for screening, with description of the actions on the right. b, Workflow for screening molecules, with description of the actions on the right. c, Ranking of predicted catalyst enantioselectivity by our program ACE in the Shi epoxidation and Diels–Alder reactions. The red lines in the bar indicate the ranks of known stereoselective catalysts. The graph indicates the portion of known catalysts versus the portion of molecules from the ZINC database.

The evaluation of the 1,307 chiral secondary amines was carried out in two steps. A second workflow (Fig. 6b) filtered molecules for their reactivity. It is well established that some amines are more reactive (basic and/or nucleophilic) than others57. In this workflow, various reactivity parameters, including a nucleophilicity index, were computed using our program QUEMIST. Subsequently, REDUCE filtered molecules predicted to be less reactive than proline methyl ester, a known catalyst for the Diels–Alder reaction. CONSTRUCTS processed the remaining 798 molecules to assemble the TS structures that were finally used by Ace to compute stereoselectivity. These calculations were completed in ten days using a single core. Six known catalysts were added to the library to assess the accuracy of ACE to recover them (Fig. 6c).

The same overall process was applied to the search for Shi epoxidation catalysts starting from chiral ketones (very few in available chiral chemical databases) complemented with chiral secondary alcohols converted into chiral ketones using our program REACT2D. Eighteen known stereoselective catalysts were added to the library (Fig. 6c).

Most of the known stereoselective catalysts are ranked high, shown in Fig. 6c (Area Under Receiver Operating Curve (AUROC): 0.79 for Shi epoxidation and 0.92 for Diels–Alder). The evaluation of the program in this second scenario suggests that our platform can virtually screen numerous chemicals and discover novel chemical series of asymmetric catalysts. In addition, the options to use these programs in workflows enable chemists to guide the platform towards novel chemical series with specific features and to reduce chemical compatibility issues.

Application of the software tools to scenario 3

A chemist may have a hit molecule (for example, from scenario 2) and will look for analogues with improved selectivity. We used a detailed study by Gerosa et al.58 to simulate this scenario. In that work, chiral pyrrolidine derivatives were synthesized and tested as organocatalysts for the Diels–Alder cycloaddition after the core scaffold was identified as a promising candidate. As shown in Fig. 7, this research project can be simulated within a single workflow. Imines are synthesized and subsequently reacted with a chiral dipolarophile to make three potential diastereomers. These pyrrolidines are then assessed in both endo-Diels–Alder and exo-Diels–Alder cycloadditions. In practice, each reaction step requires extensive experimental work to isolate and characterize the stereoisomers.

Fig. 7: Optimization of asymmetric organocatalysts for Diels–Alder cycloaddition.
figure7

a, Workflow. b, Predicted and experimentally observed enantioselectivity obtained with chiral pyrrolidine derivatives. Orange: endo adduct, blue: exo adduct. The inset shows the mean unsigned error (blue: each substrate, red: average, black: random). A positive ΔΔG value represents one enantiomer, while a negative ΔΔG value represents the other enantiomer. Details of the predictions are provided in Supplementary Table 16. The trendline and correlation coefficient (linear regression) have been computed using Microsoft Excel 2016.

As seen in Fig. 7, this virtual lead optimization had a mean unsigned error as low as 0.33 kcal mol–1. The most stereoselective catalysts predicted by Ace were the best (endo) and second best (exo) experimentally. This study was completed in just a few days on a standard Windows PC and could be extended to hundreds of analogues.

Application of the software tools to scenario 4

A chemist may evaluate the potential substrate-scope of a given catalyst. This last set of calculations was done using (DHQD)2PHAL, a now commercially available catalyst (in AD-mix α) for asymmetric dihydroxylation. This catalyst has been virtually (and previously experimentally) applied to 25 substrates and compared to experimental data (Fig. 8).

Fig. 8: Substrate-scope study with (DHQD)2PHAL.
figure8

The inset on the left shows the mean unsigned error (blue: each substrate, red: average, black: random). Positive ΔΔG represents (R) and (R,R) isomers, while negative ΔΔG represents the other isomers. Details of the predictions are provided in Supplementary Tables 17 and 18. The trendline and correlation coefficient (linear regression) has been computed using Microsoft Excel 2016.

Overall, this last simulation suggests that the catalyst would be highly enantioselective (≥97% e.e.) on approximately 25% of the substrates and on approximately 20% it would be poorly selective (≤40% e.e.), in excellent agreement with the experiments. However, we observed a poorer reproducibility for dihydroxylation (large standard deviation over multiple runs) than with the other reactions (Supplementary Table 18). This can be explained by the significantly larger size and flexibility of the catalysts used in dihydroxylation (Fig. 5) and suggests a limitation of the approach. More time and computational resources may be required to search the conformational space of such systems adequately.

Three substrates are consistently poorly predicted (over five runs). One of these failures can be attributed to the poor parameterization of sulfur-containing groups (tosylate in this case). The other two are a cis olefin (the FF parameters were developed using a trans olefin) and a naphthalene derivative which contributes significant ππ interactions with the catalyst. Interestingly, the predicted average enantioselectivity varies from 67 to 76% e.e. over five runs while it is 73.6% e.e. experimentally with an overall good correlation with experiments (r2 varies from 0.51 to 0.64).

Conclusions

Our efforts to interface computational and organic chemistry have led to the creation of the VIRTUAL CHEMIST platform, which aims to aid experimental chemists in the pursuit of asymmetric synthesis projects. This platform is user friendly (designed for organic chemists) and highly customizable through the introduction of modular workflows. The power of these modular workflows and of the individual programs making up the Virtual Chemist software suite (free for academic use) has been demonstrated through the in-depth analysis of four realistic scenarios which could comprise various asymmetric synthesis projects.

We believe that every computational approach carries its own caveats. Here, we acknowledge that the methodology presented requires a mechanism-based TS to study, much like docking potential drug molecules requires a target structure. Additionally, the MM-based computations suffer from current FF limitations, although efforts are ongoing to overcome this obstacle. Finally, large catalytic systems provide a challenge for conformational searching in TS optimization and can lead to simulations trapped in local energy minima.

Notwithstanding these obstacles, we have demonstrated the reliability and accuracy of our platform, which is able to distinguish weak asymmetric catalysts from good asymmetric catalysts and good asymmetric catalysts from great asymmetric catalysts with chemical accuracy in most cases. With this platform, chemists could now test ideas in a matter of hours, a fraction of the time needed to synthesize and test novel catalysts.

We believe that our computational approach will lead to a more efficient catalyst discovery, as our simulated experiments allow a broader exploration of the chemical space than experiments allow, in a shorter amount of time. Moving forward, we hope that computational chemistry will have the same impact on organic chemistry as NMR, MS and chromatography had at the time of their incorporation into the chemist’s toolbox, or as structure-based design software has had in medicinal chemistry.

Methods

The VIRTUAL CHEMIST platform subversion 5679 was used throughout this work. Pseudocode for all the tools is available in the Supplementary Information (Section I). The workflow module was used to generate all the parameters and a batch script was used to run calculations. These were then ported to supercomputers for more time-efficient calculations. For scenario 1, scenario 3 and scenario 4, catalyst and substrate structures were drawn either using the sketcher provided in VIRTUAL CHEMIST and/or ChemDraw and provided as input to CONSTRUCTS. The templates needed to assemble the TS structures in CONSTRUCTS are either available in VIRTUAL CHEMIST35,36 or derived from reported TSs43,47,59. For metal-containing reactions, reactant and products structures were optimized using QUEMIST (DFT, HF/def2-SVP/D2 dispersion correction) and Hessians were computed to generate FF parameters ready to be used with ACE. For scenario 2, a library of chemicals was extracted from the ZINC database for both Diels–Alder and Shi epoxidation reactions. In the case of Diels–Alder reactions, we extracted a library containing only chiral amines, after which we manipulated it as described in the main text. For the Shi epoxidation, we extracted a library of cyclic ketones but as this proved to contain an insufficient number of molecules we supplemented it with secondary alcohols that were further manipulated as described in the main text.

Data availability

The sets of molecules used in this study (Supplementary Tables 18) and representative computed data (Supplementary Tables 918) are available as Supplementary Information. A tutorial for the use of this platform is provided as Supplementary Data 1. The programs are available (free of charge for academic research) at www.molecularforecaster.com. All the data, parameter files and structures are available on moitessier-group.mcgill.ca/software.html.

All other data is available from the authors upon reasonable request.

Code availability

Description and pseudocode of all the programs used in this study are provided in the Supplementary Methods. The programs are available for download upon request from the authors (www.molecularforecaster.com).

References

  1. 1.

    ACS. Where Is Organic Chemistry Used? ACS Chemistry for life www.acs.org/content/acs/en/careers/college-to-career/areas-of-chemistry/organic-chemistry.html (accessed 30 June 2020).

  2. 2.

    Wang, A. et al. Unraveling the mysterious failure of Cu/SAPO-34 selective catalytic reduction catalysts. Nat. Commun. 10, 1137 (2019).

    PubMed  PubMed Central  Google Scholar 

  3. 3.

    Wang, X.-G. et al. Three-component ruthenium-catalyzed direct meta-selective C–H activation of arenes: a new approach to the alkylarylation of alkenes. J. Am. Chem. Soc. 141, 13914–13922 (2019).

    CAS  PubMed  Google Scholar 

  4. 4.

    Meucci, E. A. et al. Nickel(IV)-catalyzed C–H trifluoromethylation of (hetero)arenes. J. Am. Chem. Soc. 141, 12872–12879 (2019).

    CAS  PubMed  Google Scholar 

  5. 5.

    Durrant, J. D. & McCammon, J. A. Molecular dynamics simulations and drug discovery. BMC Biol. 9, 71–71 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Borhani, D. W. & Shaw, D. E. The future of molecular dynamics simulations in drug discovery. J. Comput.-Aided Mol. Des. 26, 15–26 (2012).

    CAS  PubMed  Google Scholar 

  7. 7.

    Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. & Ferrin, T. E. A geometric approach to macromolecule–ligand interactions. J. Mol. Biol. 161, 269–288 (1982).

    CAS  PubMed  Google Scholar 

  8. 8.

    Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Liu, X. et al. Molecular dynamics simulations and novel drug discovery. Expert Opin. Drug Discovery 13, 23–37 (2018).

    CAS  Google Scholar 

  10. 10.

    Wang, G. & Zhu, W. Molecular docking for drug discovery and development: a widely used approach but far from perfect. Future Med. Chem. 8, 1707–1710 (2016).

    CAS  PubMed  Google Scholar 

  11. 11.

    Santosh, A. K., Alpeshkumar, K. M., Evans, C. C. & Sudha, S. Pharmacophore modeling in drug discovery and development: an overview. Med. Chem. 3, 187–197 (2007).

    Google Scholar 

  12. 12.

    Sanderson, K. Automation: chemistry shoots for the moon. Nature 568, 577–579 (2019).

    CAS  PubMed  Google Scholar 

  13. 13.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    PubMed  Google Scholar 

  14. 14.

    Zheng, S., Rao, J., Zhang, Z., Xu, J. & Yang, Y. Predicting retrosynthetic reactions using self-corrected transformer neural networks. J. Chem. Inf. Model. 60, 47–55 (2020).

    CAS  PubMed  Google Scholar 

  15. 15.

    Marques-Lopez, E., Herrera, R. P. & Christmann, M. Asymmetric organocatalysis in total synthesis–a trial by fire. Nat. Prod. Rep. 27, 1138–1167 (2010).

    CAS  PubMed  Google Scholar 

  16. 16.

    Maldonado, A. G. & Rothenberg, G. Predictive modeling in homogeneous catalysis: a tutorial. Chem. Soc. Rev. 39, 1891–1902 (2010).

    CAS  PubMed  Google Scholar 

  17. 17.

    Brown, J. M. & Deeth, R. J. Is enantioselectivity predictable in asymmetric catalysis. Angew., Chem. Int. Ed. 48, 4476–4479 (2009).

    CAS  Google Scholar 

  18. 18.

    Houk, K. N. & Cheong, P. H. Y. Computational prediction of small-molecule catalysts. Nature 455, 309–313 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Harper, K. C. & Sigman, M. S. Predicting and optimizing asymmetric catalyst performance using the principles of experimental design and steric parameters. Proc. Natl Acad. Sci. USA 108, 2179–2183 (2011).

    CAS  PubMed  Google Scholar 

  20. 20.

    Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Norrby, P. O. Holistic models of reaction selectivity. Nature 571, 332–333 (2019).

    CAS  PubMed  Google Scholar 

  22. 22.

    Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem., Int. Ed. 58, 4515–4519 (2019).

    CAS  Google Scholar 

  23. 23.

    Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).

    CAS  Google Scholar 

  25. 25.

    Bahmanyar, S. & Houk, K. N. The origin of stereoselectivity in proline-catalyzed intramolecular aldol reactions. J. Am. Chem. Soc. 123, 12911–12912 (2001).

    CAS  PubMed  Google Scholar 

  26. 26.

    Gordillo, R. & Houk, K. N. Origins of stereoselectivity in Diels-Alder cycloadditions catalyzed by chiral imidazolidinones. J. Am. Chem. Soc. 128, 3543–3553 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Ford, D. D., Nielsen, L. P. C., Zuend, S. J., Musgrave, C. B. & Jacobsen, E. N. Mechanistic basis for high stereoselectivity and broad substrate scope in the (salen)Co(iii)-catalyzed hydrolytic kinetic resolution. J. Am. Chem. Soc. 135, 15595–15608 (2013).

    CAS  PubMed  Google Scholar 

  28. 28.

    Lin, H., Pei, W., Wang, H., Houk, K. N. & Krauss, I. J. Enantioselective homocrotylboration of aliphatic aldehydes. J. Am. Chem. Soc. 135, 82–85 (2013).

    CAS  PubMed  Google Scholar 

  29. 29.

    Wolf, L. M. & Denmark, S. E. A theoretical investigation on the mechanism and stereochemical course of the addition of (E)-2-butenyltrimethylsilane to acetaldehyde by electrophilic and nucleophilic activation. J. Am. Chem. Soc. 135, 4743–4756 (2013).

    CAS  PubMed  Google Scholar 

  30. 30.

    Lam, Y.-h & Houk, K. N. How cinchona alkaloid-derived primary amines control asymmetric electrophilic fluorination of cyclic ketones. J. Am. Chem. Soc. 136, 9556–9559 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Lam, Y.-h & Houk, K. N. Origins of stereoselectivity in intramolecular aldol reactions catalyzed by cinchona amines. J. Am. Chem. Soc. 137, 2116–2127 (2015).

    CAS  PubMed  Google Scholar 

  32. 32.

    Reid, J. P., Simón, L. & Goodman, J. M. A practical guide for predicting the stereochemistry of bifunctional phosphoric acid catalyzed reactions of imines. Acc. Chem. Res. 49, 1029–1041 (2016).

    CAS  PubMed  Google Scholar 

  33. 33.

    Rosales, A. R. et al. Application of Q2MM to predictions in stereoselective synthesis. Chem. Commun. 54, 8294–8311 (2018).

    CAS  Google Scholar 

  34. 34.

    Hansen, E., Rosales, A. R., Tutkowski, B., Norrby, P.-O. & Wiest, O. Prediction of Stereochemistry using Q2MM. Acc. Chem. Res. 49, 996–1005 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Corbeil, C. R., Thielges, S., Schwartzentruber, J. A. & Moitessier, N. Toward a computational tool predicting the stereochemical outcome of asymmetric reactions: development and application of a rapid and accurate program based on organic principles. Angew. Chem., Int. Ed. 47, 2635–2638 (2008).

    CAS  Google Scholar 

  36. 36.

    Weill, N., Corbeil, C. R., De Schutter, J. W. & Moitessier, N. Toward a computational tool predicting the stereochemical outcome of asymmetric reactions: development of the molecular mechanics-based program ACE and application to asymmetric epoxidation reactions. J. Comput. Chem. 32, 2878–2889 (2011).

    CAS  PubMed  Google Scholar 

  37. 37.

    Schneebeli, S. T., Hall, M. L., Breslow, R. & Friesner, R. A. Quantitative DFT modeling of the enantiomeric excess for dioxirane-catalyzed epoxidations. J. Am. Chem. Soc. 131, 3965–3973 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Bootsma, A. N. & Wheeler, S. Popular integration grids can result in large errors in DFT-computed free energies. Preprint at https://doi.org/10.26434/chemrxiv.8864204.v5 (2019).

  39. 39.

    Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).

    CAS  Google Scholar 

  40. 40.

    Therrien, E. et al. Integrating medicinal chemistry, organic/combinatorial chemistry, and computational chemistry for the discovery of selective estrogen receptor modulators with FORECASTER, a novel platform for drug discovery. J. Chem. Inf. Model. 52, 210–224 (2012).

    CAS  PubMed  Google Scholar 

  41. 41.

    Pottel, J. & Moitessier, N. Customizable generation of synthetically accessible, local chemical subspaces. J. Chem. Inf. Model. 57, 454–467 (2017).

    CAS  PubMed  Google Scholar 

  42. 42.

    van Hilten, N., Chevillard, F. & Kolb, P. Virtual compound libraries in computer-assisted drug discovery. J. Chem. Inf. Model. 59, 644–651 (2019).

    PubMed  Google Scholar 

  43. 43.

    Rasmussen, T. & Norrby, P. O. Modeling the stereoselectivity of the beta-amino alcohol-promoted addition of dialkylzinc to aldehydes. J. Am. Chem. Soc. 125, 5130–5138 (2003).

    CAS  PubMed  Google Scholar 

  44. 44.

    Seminario, J. M. Calculation of intramolecular force fields from second-derivative tensors. Int. J. Quantum Chem. 60, 1271–1277 (1996).

    Google Scholar 

  45. 45.

    Allen, A. E. A., Payne, M. C. & Cole, D. J. Harmonic force constants for molecular mechanics force fields via hessian matrix projection. J. Chem. Theory Comput. 14, 274–281 (2018).

    CAS  PubMed  Google Scholar 

  46. 46.

    Norrby, P. O., Rasmussen, T., Haller, J., Strassner, T. & Houk, K. N. Rationalizing the stereoselectivity of osmium tetroxide asymmetric dihydroxylations with transition state modeling using quantum mechanics-guided molecular mechanics. J. Am. Chem. Soc. 121, 10186–10192 (1999).

    CAS  Google Scholar 

  47. 47.

    Donoghue, P. J., Helquist, P., Norrby, P.-O. & Wiest, O. Prediction of enantioselectivity in rhodium catalyzed hydrogenations. J. Am. Chem. Soc. 131, 410–411 (2009).

    CAS  PubMed  Google Scholar 

  48. 48.

    Harvey, J. N., Himo, F., Maseras, F. & Perrin, L. Scope and challenge of computational methods for studying mechanism and reactivity in homogeneous catalysis. ACS Catal. 9, 6803–6813 (2019).

    CAS  Google Scholar 

  49. 49.

    Yang, X. et al. Chiral pyrrolidine derivatives as catalysts in the enantioselective addition of diethylzinc to aldehydes. Tetrahedron: Asym. 10, 133–138 (1999).

    CAS  Google Scholar 

  50. 50.

    Liang, G., Bays, J. P. & Bowen, J. P. Ab initio calculations and molecular mechanics (MM3) force field development for sulfonamide and its alkyl derivatives. J. Mol. Struct. THEOCHEM 401, 165–179 (1997).

    CAS  Google Scholar 

  51. 51.

    Immirzi, A. & Musco, A. A method to measure the size of phosphorus ligands in coordination complexes. Inorg. Chim. Acta 25, L41–L42 (1977).

    CAS  Google Scholar 

  52. 52.

    Liu, Z. et al. Elucidating hyperconjugation from electronegativity to predict drug conformational energy in a high throughput manner. J. Chem. Inf. Model. 56, 788–801 (2016).

    CAS  PubMed  Google Scholar 

  53. 53.

    Liu, Z., Barigye, S. J., Shahamat, M., Labute, P. & Moitessier, N. Atom Types Independent Molecular Mechanics Method for Predicting the Conformational Energy of Small Molecules. J. Chem. Inf. Model. 58, 194–205 (2018).

    CAS  PubMed  Google Scholar 

  54. 54.

    Champion, C. et al. Atom type independent modeling of the conformational energy of benzylic, allylic, and other bonds adjacent to conjugated systems. J. Chem. Inf. Model. 59, 4750–4763 (2019).

    CAS  PubMed  Google Scholar 

  55. 55.

    Wei, W. et al. Torsional energy barriers of biaryls could be predicted by electron-richness/deficiency of aromatic rings; advancement of molecular mechanics towards atom-type independence. J. Chem. Inf. Model. 59, 4764–4777 (2019).

    CAS  PubMed  Google Scholar 

  56. 56.

    Sterling, T. & Irwin, J. J. ZINC 15—Ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Hall, H. K. Correlation of the base strengths of amines. J. Am. Chem. Soc. 79, 5441–5444 (1957).

    CAS  Google Scholar 

  58. 58.

    Gerosa, G. G., Spanevello, R. A., Suárez, A. G. & Sarotti, A. M. Joint experimental, in silico, and NMR studies toward the rational design of iminium-based organocatalyst derived from renewable sources. J. Org. Chem. 80, 7626–7634 (2015).

    CAS  PubMed  Google Scholar 

  59. 59.

    DelMonte, A. J. et al. Experimental and theoretical kinetic isotope effects for asymmetric dihydroxylation. evidence supporting a rate-limiting ‘(3 + 2)’ cycloaddition. J. Am. Chem. Soc. 119, 9907–9908 (1997).

    CAS  Google Scholar 

Download references

Acknowledgements

We thank NSERC (Discovery programme) for financial support. Calcul Québec and Compute Canada are acknowledged for generous CPU allocations.

Author information

Affiliations

Authors

Contributions

N.M., M.B.P. and J.P. designed and wrote the programs REACT2D, FINDERS and CONTRUCTS (J.P., N.M.), QUEMIST (M.B.P.), UI, REDUCE, SELECT and ACE (N.M.). S.P. and M.B.P. have tested the usability and contributed to the design of the platform. P.O.N. contributed to the design of the platform. The testing (four scenarios) and data analysis were performed by S.P., M.B.P. and N.M. All the authors contributed to the manuscript.

Corresponding author

Correspondence to Nicolas Moitessier.

Ethics declarations

Competing interests

Virtual Chemist is distributed by Molecular Forecaster (free of charge for academic research) co-founded by N.M. (CEO: J.P.).

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1,2, Tables 1–18, and References.

Supplementary Data 1

Supplementary Data 1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Burai Patrascu, M., Pottel, J., Pinus, S. et al. From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis. Nat Catal 3, 574–584 (2020). https://doi.org/10.1038/s41929-020-0468-3

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing