Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning

Chen, Shuan; An, Sunggi; Babazade, Ramil; Jung, Yousung

doi:10.1038/s41467-024-46364-y

Download PDF

Article
Open access
Published: 13 March 2024

Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning

Nature Communications volume 15, Article number: 2250 (2024) Cite this article

3142 Accesses
11 Altmetric
Metrics details

Subjects

Abstract

Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models.

Simultaneously improving reaction coverage and computational cost in automated reaction prediction tasks

Article 22 July 2021

Machine learning in chemical reaction space

Article Open access 30 October 2020

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Article Open access 16 March 2021

Introduction

Atom-to-atom mapping (AAM) plays a crucial role in preparing reaction data by identifying the one-to-one mapping between reactant atoms and product atoms. High-quality AAM allows fast recognition of the reaction center of a given chemical reaction, which is essential for many of the developed methods working on chemical reaction analysis and prediction.

One of the widely used applications of AAM is the construction of a condensed graph of reaction (CGR)^1,2, which combines the reactant and product graphs into a single representation and has shown promise in various reaction tasks, including reaction condition prediction^3,4, reaction similarity search⁵, and even predicting advanced reaction quantities such as activation energy or reaction yield^6,7. Additionally, AAM enables the automatic identification of reaction centers and extraction of reaction templates from databases, which are utilized in predicting reaction outcomes^8,9,10 and single-step retrosynthesis^{11,12,13,14,15} machine learning (ML) models. Since these applications are highly dependent on the AAM of reaction data, the quality of AAM greatly impacts the performance of machine learning models. For instance, the incorrect mapping on an alkene epoxidation would generate an invalid retrosynthesis reaction template and unclear reaction mechanism (Fig. 1), turning valuable reaction data into noisy reaction data. Unfortunately, the commonly used USPTO reaction dataset has been reported to contain issues like incorrect AAM or missing reactants, which directly affect downstream ML models^10,16,17,18. Incorrect AAM can lead to the learning of incorrect chemistry, resulting in unrealistic prediction models and retrosynthesis pathways. With the growing number of downstream models being developed, the curation of high-quality AAM for reaction datasets becomes an urgent task to ensure the quality of reaction prediction models.

**Fig. 1: The importance of accessing correct atom-to-atom-mapping (AAM) in terms of generating retrosynthesis reaction templates and deriving reaction mechanisms based on chemical knowledge.**

Existing methods for AAM identification can be generally categorized into rule-based^{17,19,20,21,22,23,24,25} and ML-based methods^18,26. Most of the rule-based methods identify AAM based on minimal chemical distance (MCD)²⁷ or maximum common subgraph (MCS) isomorphism algorithms²⁸. Solving the AAM problem by rule-based approaches is challenging because such a subgraph isomorphism problem has been known be be an NP-hard problem since the 1970s, and there is no efficient algorithm to find the exact solutions^29,30,31. Recently, machine learning models have been developed to bypass the time-consuming subgraph matching process and map the atoms in the reactions directly by the information extracted from the model’s learned features. Schwaller et al.¹⁸ proposed an unsupervised-learning-based model called RXNMapper to link the grammar dependency of each atom between reactants and products. By focusing on specific attention weights of the language model, RXNMapper not only achieved a promising prediction accuracy that outperformed existing rule-based methods but also largely reduced the computational time of performing AAM on large reaction datasets. More recently, Nugmanov et al.²⁶ developed GraphomerMapper using a similar unsupervised learning strategy with RXNMapper based on a graph-based Transformer and trained the model on a much larger reaction dataset.

Although the above-mentioned approaches have shown improving accuracy over previous methods, a perfect 100% accuracy of AAM is required since the flaw in the reaction data will be amplified in the downstream reaction prediction models. Yet, currently, existing methods have not shown a reliable approach to detect potentially incorrectly predicted AAM, which makes the error in the predictions hard to identify. Furthermore, although existing ML-based unsupervised methods are found to be much faster than rule-based methods and applicable to a wider range of reactions, training a model without knowing the correct AAM may lead to unexpected errors even for simple reactions. As later shown in this paper, previous methods have incorrectly mapped over 5% of reactions in the widely used US patent dataset.

Here, we present a precise graph-based AAM model, named LocalMapper, via human-in-the-loop machine learning. Apart from previous ML-based approaches, which learn the AAM without correct answers, we manually label the AAM of reaction data to train the model. While the manual labeling of a large amount of AAM in a large dataset can be an exhaustive and expensive task, we design an active learning framework to manually label only a small fraction of reactions diversely sampled from a large dataset. With these chemist-labeled AAM, we train a graph neural network (GNN) to learn the correct AAM of reaction using both local message passing and long-range attention. For a publicly available USPTO-50K dataset, the model can predict the AAM with 98.5% accuracy only by learning from 2% of the chemist-labeled reactions. More importantly, the AAM of 97% of the reactions in the dataset confidently predicted by LocalMapper shows a 100% prediction accuracy. The same perfect accuracy is observed by testing the model with a diverse out-domain reaction test set. We expect our approach can be used to generate reliable AAM for reaction databases and improve the quality of future ML models relying on AAM. We summarize the important breakthroughs of this paper in three aspects.

1.
The proposed knowledge-based uncertainty identification allows the fast chemical-aware verification of ML model predictions, yielding 100% correct AAM for 3,000 randomly sampled confident predictions.
2.
The developed model, LocalMapper, achieves state-of-the-art AAM prediction accuracy by learning the chemist-verified AAM from high-quality training data curated by human-in-the-loop machine learning. We show a better prediction accuracy compared to the existing ML-based models, RXNMapper¹⁸ and GraphormerMapper²⁶ by only labeling 2% of the reactions.
3.
In an out-of-distribution experiment, LocalMapper shows favorable prediction accuracy over two existing ML-based AAM models, while maintaining 100% accuracy on the confident predictions.

Results

The human-in-the-loop machine learning framework

In this work, we propose a graph-based model, LocalMapper, to learn the correct AAM through human-in-the-loop machine learning. To train LocalMapper, we manually label the AAM for each reaction to guarantee the correctness of AAMs in the reactions for training the model. Because manual labeling AAM for chemical reactions is intensely time-consuming (in general over one minute per reaction), it is impractical to label a large portion of the reactions in a large dataset. Therefore, we introduce active learning to label only a small fraction of representative reactions. The overall workflow can be decomposed into the following 5 steps (Fig. 1a), and more details about LocalMapper (Fig. 1b) and prediction confidence (Fig. 1c) are described in the next two subsections.:

1.
Random sampling: To initialize the active-learning process, we randomly sample k reactions from the unmapped reaction dataset., where k is an affordable small number for a human expert to label the AAM at one time.
2.
Label and train: Next, we manually label the AAM for the sampled k reactions and use these reactions to train the proposed graph-based model LocalMapper, structurally similar to the retrosynthesis model LocalRetro¹⁴ and reaction outcome prediction model LocalTransform³². Reaction templates extracted from human-mapped reactions are used to update a template library, which will be used for later uncertainty identification.
3.
AAM prediction: Next, we use LocalMapper to predict the atom-atom correlation between reactants and products for all the reactions in the dataset. According to the atom-atom correlation predicted by LocalMapper, we generate the AAMs for each reaction following the atom-mapping procedure introduced by Schwaller et al. ¹⁸
4.
Confidence identification: For each predicted reaction’s AAM, we extract the reaction template to represent its pattern of reactivity. If the extracted reaction template exists in the current template library, the set of AAMs predicted at the reaction is considered a confidence prediction, otherwise an uncertain prediction.
5.
Active sampling: For each unique template extracted from uncertain predictions, we sample one reaction starting from the template sharing the most reactions, until k reactions are sampled. These reactions are then labeled by human chemists and used the train the model in the next iteration, repeating step 2.

From the second iteration, we train the model using semi-supervised learning by sampling 100 reactions from the confident predictions from each unique verified reaction template to increase the model’s robustness. These sampled reactions are split into the training and validation set by a 9:1 ratio to prevent overfitting.

LocalMapper

To predict the AAM between the reactant and product in the reaction, we design a graph-based model, called LocalMapper, to learn the probability of each atom in the reactant being repositioned to the atom in the product $p({{atom}}_{r}|{{atom}}_{p})$. Similar to our previous models for retrosynthesis, LocalRetro¹⁴, and reaction outcome prediction, LocalTransform³², we use the graph to represent molecules, with atoms as nodes and bonds as edges, and learn the AAM by both local and global features of the atoms in the reactions by message passing neural networks³³ and attention mechanism³⁴ (Fig. 2b).

First, we encode the local chemical environment of each atom using 3 message-passing layers³³ and update the atom features in the product by atom features from the reactants through 3 multi-head cross-attention blocks³⁴. After the features of each atom between reactants and products are sufficiently communicated, we calculate the AAM correlation between product and reactant by a single-head attention block. After normalizing the attention scores with the Softmax function, the probability of each atom in the reactant being the same atom of each atom in the product is estimated. Following the atom-mapping procedure introduced in RXNMapper¹⁸, we use the resulting probability to identify the AAM from product to reactant from the highest probability to the lowest probability. In the example shown in Fig. 2b, oxygen from the water molecule at the reactant side is identified as the source of the oxygen on ketone in the product molecule. The mathematical details of each layer and pseudocode of LocalMapper can be found in the Method Section.

Knowledge-based prediction confidence

Accessing the prediction confidence is one of the most important features of ML models, which informs the use of whether the model’s prediction is reliable or not. Sampling and labeling uncertain predictions to train the model is usually referred to as active learning, which can efficiently explore the necessary data to label for the model to further learn. Popular methods of quantifying the prediction confidence include Monte Carlo dropout³⁵, bootstrapping³⁶, and multiplying the prediction probabilities^18,37. Despite a positive correlation between accuracy and uncertainty using these approaches, none of these approaches use in-domain knowledge but solely depend on the model parameters.

Here, we introduce a knowledge-based approach to identify prediction confidence by examining the presence of a reaction template derived from the predicted AAM in the chemist-verified template library. Since the reaction mechanism of a chemical reaction is determined by itself, we assume there exists only one correct chemically reasonable reaction template that can be derived from the correctly predicted reaction AAM. On the other hand, reactions with incorrectly predicted AAMs would not give chemically reasonable templates. Therefore, we define the prediction as confident if the reaction template derived from the predicted AAM has already been identified and verified by a human expert during the manual reaction labeling process, otherwise classified as uncertain predictions. Hence, the uncertain predictions are either wrong or correct but have not been validated by human experts yet. These uncertain predictions can be sampled and confirmed by human experts in the active learning process. The example illustrated in Fig. 1c shows that only when the AAM of the given reaction is correctly predicted would it yield a chemically reasonable reaction template (template A) and be identified as a confident prediction, otherwise uncertain (template B). We use an extended version of the local reaction template, extended-local reaction template (ELRT), to represent the reactions in this work. See the “Methods” section and Supplementary Section 1 for more details about ELRT.

Results of active-learning

In our experiments, we perform active-learning to train LocalMapper on the USPTO-50K dataset, containing 49,996 reactions curated by Schneider et al. ³⁸, by sampling 200 reactions at each active-learning iteration and repeating for 5 iterations (k = 200, n = 5). The number of reaction templates and the prediction coverage, the percentage of predicted AAM yielding reaction templates existing in the template library, are shown in Fig. 3a. At the first iteration, 200 reactions were randomly sampled from all the reactions. The AAMs of these reactions yield 90 unique reaction templates and cover 59.7% of the total reactions. As the iteration increases, the number of unique reaction templates steadily increases (209, 348, 467, 555 for n = 2, 3, 4, 5), while the increment of prediction coverage delays over iterations (88.1%, 92.9%, 93.9%, 95.0% for n = 2, 3, 4, 5), meaning the less popular reaction templates were sampled in later iterations. This is a great example of how active-learning was used to prioritize the sampled reactions to maximize the sample efficiency.

Next, we show examples of sampled reactions during active-learning at n = 1, 3, 5 in Fig. 3b–d. At n = 1, the simple and popular reactions such as substitution reactions and redox reactions were sampled. The reaction shown in Fig. 3b is one of the nucleophilic acyl substitution reactions, accounting for 11.0% of the total reactions in the dataset. At n = 3, more organometallic reactions such as Gridnard reactions and Stille coupling were sampled. The reaction shown in Fig. 3c is a nucleophilic methylation with methyllithium as the methylating reagent. At n = 5, several ring-forming intramolecular reactions were sampled. The reaction shown in Fig. 3d is a thiazole synthesis reaction from imine and primary thioamides.

AAM evaluation

To assess the prediction accuracy of LocalMapper, we conducted a comparative analysis with two unsupervised learning-based models: RXNMapper¹⁸ and GraphormerMapper²⁶. To ensure a fair model comparison, we evaluate these models without the use of AAM fixer³⁹, which automatically fixes the known incorrect AAMs to correct AAMs after predictions. We implemented these models using their publicly available software on GitHub. Before evaluating the AAM models., we filter out the reactions from the USPTO-50K dataset if they include invalid product mapping and confusing reagents as previously reported by Schwaller et al.¹⁸. The former criteria filters out reactions with a product showing repeating atom-mapping or atoms without atom-mapping, while the latter criteria filters out reactions having reactants structurally similar (Tanimoto similarity ≥ 0.5) to the product but not participating in the reaction. Following these criteria, 1166 reactions were excluded, leaving 48,830 reactions for AAM evaluation. More definitions and examples of problematic reactions can be found in Supplementary Section 2.

Given that the USPTO-50K dataset was known to have potentially incorrect AAMs^10,16,18, we report three different accuracy metrics in this article. The first metric assumes the AAM recorded in the dataset as ground truth (referred to as “dataset accuracy” or ${{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{overall}}}}}}}^{{{{{{\rm{dataset}}}}}}}$). The second metric involved manually checking the 3000 sampled confident predictions generated by both RXNMapper or LocalMapper (referred to as “manual checked accuracy”, ${{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{conf}}}}}}.}^{{{{{{\rm{manual}}}}}}}$). Lastly, we introduced a “calibrated accuracy” metric (${{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{overall}}}}}}}^{{{{{{\rm{calibrated}}}}}}}$, Eq. 1) by combining the results from the dataset accuracy and the manually checked accuracy.

$${{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{overall}}}}}}}^{{{{{{\rm{calibrated}}}}}}}={{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{unconf}}}}}}.}^{{{{{{\rm{dataset}}}}}}}\times {{{{{{\rm{Ratio}}}}}}}_{{{{{{\rm{unconf}}}}}}.}+{{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{conf}}}}}}.}^{{{{{{\rm{manual}}}}}}}\times {{{{{{\rm{Ratio}}}}}}}_{{{{{{\rm{conf}}}}}}.}$$

(1)

where the accuracy of unconfident prediction is estimated by

$${{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{unconf}}}}}}.}^{{{{{{\rm{dataset}}}}}}}=\frac{\left({{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{overall}}}}}}}^{{{{{{\rm{dataset}}}}}}}-{{{{{{\rm{Accuracy}}}}}}}_{{{{{{\rm{conf}}}}}}.}^{{{{{{\rm{dataset}}}}}}}\times {{{{{{\rm{Ratio}}}}}}}_{{{{{{\rm{conf}}}}}}.}\right)}{{{{{{{\rm{Ratio}}}}}}}_{{{{{{\rm{unconf}}}}}}.}}$$

(2)

Because RXNMapper also gives a confidence score for each prediction, which shows a positive correlation with the prediction accuracy¹⁸, we binarize the confident score of RXNMapper by its prediction confidence score of 0.9 (according to the best performing results shown in the Supplementary Material of ref. ¹⁸) to facilitate the comparison with the confident predictions generated by LocalMapper. Note that GraphomerMapper does not generate a confidence score with its prediction; therefore, we did not assess the accuracy of the confident prediction of this model. The accuracy of AAM predictions is calculated by comparing the condensed graph of reaction (CGR) between the model’s prediction and the ground truth using CGRtools toolkit² following previous works^26,39 to ensure that equivalent but different AAMs between the ground truth and model predictions did not lead to underestimations of prediction accuracy.

The results of AAM compared with RXNMapper and GraphormerMapper on the USPTO-50K dataset are shown in Table 1. Before we conducted manual checks to assess the correctness of the dataset’s AAM, RXNMapper exhibited an impressive overall accuracy of 98.1% on the 48,830 reactions in the USPTO-50K dataset. In comparison, GraphormerMapper demonstrated a commendable overall accuracy of 92.8%, based on the dataset AAM. Moreover, within the prediction generated by RXNMapper, 30.4% of the confident predictions (i.e., with a confidence score exceeding 0.9) show a nearly perfect accuracy at 99.7%. In contrast, LocalMapper yields a high ratio of confident predictions at 97% but only exhibits a 91.5% overall prediction accuracy, noticeably lower than RXNMapper’s accuracy and slightly behind that of GraphormerMapper. For these confident predictions from LocalMapper, the calculated accuracy was 92.8% based on the dataset AAM.

Table 1 Atom-to-atom mapping (AMM) results of RXNMapper, GraphormerMapper, and LocalMapper on the USPTO-50K dataset before and after manually checking the reaction AAMs

Full size table

To investigate the incorrectly predicted confident predictions from RXNMapper and LocalMapper, we randomly sampled 3000 reactions from a pool of 14,422 reactions confidently predicted by both RXNMapper and LocalMapper. After manually checking these predicted AAMs, we found all the confident predictions from LocalMapper are indeed correct, but they have been incorrectly mapped in the original dataset. In particular, within 3000 randomly sampled reactions, 6.6% of them were ester hydrolysis reactions, and they were all correctly predicted by LocalMapper but incorrectly predicted by RXNMapper. It is worth highlighting that these reactions were initially misaligned in the dataset’s AAM, matching RXNMapper’s AAM predictions, further indicating the potential for overestimating RXNMapper’s prediction accuracy and underestimating LocalMapper’s performance based on the dataset’s AAM. To address this discrepancy, we recomputed the calibrated accuracy using Eq. 2, which aims to reflect the actual prediction accuracy more accurately. Consequently, the calibrated accuracy showed LocalMapper achieving a higher accuracy rate at 98.5% compared to RXNMapper’s 96.2%. Moreover, it is essential to emphasize that 97% of confidently predicted AAMs generated by LocalMapper are highly likely to exhibit perfect accuracy.

To qualitatively compare and gain insights into the differences between LocalMapper and the second-best performing model, RXNMapper, we conducted a detailed analysis of AAMs between the dataset, RXNMapper, and LocalMapper, as visually represented in Fig. 4a through a Venn diagram. Among the reactions within the dataset, 90.5% of reactions were found to have equivalent AAMs. For the remaining 9.5% of reactions where the predicted AAMs differed, RXNMapper shared 7.6% of equivalent AAMs with the dataset, while LocalMapper exhibited lower overlap, sharing only 1% and 0.8% with the dataset and RXNMapper, respectively. These statistics provide insight into the “low accuracy” of LocalMapper when assuming the dataset’s AAMs as ground truth. In Fig. 4b, c, we illustrate two examples of unique AAM predictions generated by LocalMapper. These examples represent ester hydrolysis and acetal hydrolysis reactions, respectively. In Fig. 4b, LocalMapper correctly mapped the highlighted oxygen atom in the product (number 16) to water in ester hydrolysis reactions, whereas RXNMapper and the dataset suggested that the oxygen originated from the leaving group. In Fig. 4c, LocalMapper accurately mapped the highlighted oxygen atom in the product (number 16) to water, whereas RXNMapper and the dataset consistently misattributed the oxygen atom to the acetal oxygen.

**Fig. 4: Overall results of LocalMapper predicting atom-to-atom mappings (AAMs) on the USPTO-50K dataset.**

Further analysis of the reaction templates of the 3,308 confident predictions generated by LocalMapper, which differed from the dataset’s AAM, revealed interesting insights. Among these unique predictions, 81.7% were ester hydrolysis reactions (as shown in Fig. 4b), 8.1% were esterification reactions (Fig. 4e), 3.5% were acetal hydrolysis reactions (Fig. 4c), and 0.3% were Mitsunobu reactions (Fig. 4f). These AAMs were all correctly predicted by LocalMapper but incorrectly mapped in the original dataset.

Next, we examine the generalizability of LocalMapper on the golden dataset compiled by Lin et al.³⁹ including 1851 reactions after standardizing (fixing invalid valences or radicals) and manually mapping the reactions collected from Jaworski et al.¹⁷ and popular reactions from the USPTO collections⁴⁰. We found there are 90 unbalanced reactions, 2 repeated reactions, and 1 reaction without product in the golden dataset. Consequently, we evaluated the models on the remaining 1758 reactions. Examples of unbalanced reactions can be found in Supplementary Section 3.

Since this dataset mixes the sources from the original literature, it is hard to analyze the model performance on different reaction sources. Therefore, we compared the reactions recorded in the golden dataset with the categorized reaction sets compiled by Jaworski et al.¹⁷ and extracted 491 reactions from the golden dataset, including 256 USPTO reactions³⁸, 173 typical reactions from the Organic Synthesis collection⁴¹, and 62 mechanistically complex reactions from various literature sources^42,43. To enhance LocalMapper’s ability to confidently predict a wider spectrum of organic reactions, we conducted further training of the model for 2 additional iterations on the USPTO-FULL dataset^18,44 (containing 1,065,119 reactions) with sampling 500 reactions at each iteration (k = 500, n = 2). All the reactions in this test set were excluded from the training set of LocalMapper.

The results compared with RXNMapper¹⁸ and GraphormerMapper²⁶ are shown in Table 2. When evaluating the models on the golden dataset, we found there are 8 reaction AAMs confidently predicted from LocalMapper but different from ground truth. We found these reactions are either wrongly mapped in the golden dataset (7 reactions) or selective reactions (1 reaction), in which multiple AAMs are acceptable. Therefore, we show the prediction results after calibrating the accuracy after manual checking in Table 2. The different AAMs and the original results following ground truth AAMs can be found in Supplementary Section 4.

Table 2 Atom-to-atom mapping (AMM) results of RXNMapper, GraphormerMapper, and LocalMapper on manual-mapped reactions examined on four different sources

Full size table

When predicting on the full golden dataset, irrespective of reaction sources, LocalMapper achieves an impressive 89.8% prediction accuracy, surpassing RXNMapper by 3.3% and GraphormerMapper by 7.1%. Focusing on the 256 USPTO reactions, LocalMapper excels with a remarkable 99.2% prediction accuracy, outperforming GraphormerMapper by 5.4% and RXNMapper by 9.7%. For typical reactions, LocalMapper achieves a prediction accuracy of 93.6%, exceeding the other two models by margins of 2.3% and 5.7%. In the case of complex reactions, LocalMapper secures the second-highest prediction accuracy at 69.4%, slightly surpassing GraphormerMapper (66.1%) and greatly higher than RXNMapper (59.7%). Importantly, the ratio of confident predictions across different datasets exhibits variations, yet the prediction accuracy of these confident predictions consistently remains at 100% for all four examined datasets. In contrast, while RXNMapper demonstrates over 90% confident prediction accuracy for the golden dataset, USPTO reactions, and typical reactions, it only achieves a 50% confident prediction accuracy for complex reactions despite the double ratio of confident predictions compared to LocalMapper.

It’s worth noting that the confident prediction ratio of LocalMapper tends to decrease when applied to reactions that differ more significantly from the training reactions, i.e., USPTO reactions. This trend is evident in the prediction accuracy of LocalMapper, which decreases from 99.2% for USPTO reactions to 93.6% for typical reactions, and further to 69.4% for complex reactions. These findings underscore an essential insight from LocalMapper: not only are its confident predictions highly reliable, but the overall prediction accuracy for a set of reactions can be estimated based on the ratio of confident reactions.

In Fig. 5a–c, we conduct a detailed analysis of the number of bond changes derived from the AAMs (according to their CGRs) of reactions that were inaccurately predicted by LocalMapper, RXNMapper, and GraphormerMapper, respectively. Generally, the majority of incorrectly predicted AAMs across all three models and various reaction sources result in an increased number of bond changes compared to the corresponding ground truth AAMs. Notably, RXNMapper stands out for producing a substantial number of incorrectly predicted AAMs that result in a decreased number of bond changes in patent reactions, primarily involving ester hydrolysis reactions. To illustrate the impact of such predictions, we present an example in Fig. 5d, wherein even a small number (4) of incorrectly predicted AAMs can result in a significantly higher count (10) of bond changes compared to the correct AAMs. However, the examples depicted in Fig. 5e, d underscore that predicted AAMs, showing either the same or fewer bond changes, do not consistently align with the ground truth AAMs, especially in the context of complex reactions.

**Fig. 5: Comparative analysis of the number of bond changes in reaction AAMs.**

Discussions

The key distinction between LocalMapper and the other two existing ML-based models lies in the use of chemist-labeled data during training. While training without manual mapping may offer computational efficiency, it can lead to unforeseen systematic errors, such as those seen in AAM predictions for simple reactions in Fig. 4. This emphasizes the importance of meticulous data labeling, despite its time and expertise demands. Moreover, manual labeling yields valuable chemical rules that can be leveraged for robust knowledge-based prediction confidence identification, which contributes to the 100% confident prediction accuracy of LocalMapper.

It is vital to distinguish between LocalMapper’s knowledge-based prediction confidence and the AAM fixer employed in GraphormerMapper²⁶. While both methods leverage the insights of chemists to enhance prediction accuracy, they diverge significantly in their operational mechanisms. The AAM fixer directly rectifies the model’s AAM predictions but does not enhance the model itself. It relies on manual heuristics to correct known inaccuracies, making it challenging to scale up without extensive experimental adjustments. In contrast, knowledge-based prediction confidence empowers human chemists to label uncertain predictions, thereby facilitating model improvement through active learning. This approach is data-driven and easily scalable by expanding active learning iterations. We have also considered a model-free approach involving the direct application of all known reaction templates to the reactants for obtaining the reaction AAM by matching the known product. While this method enhances the robustness of AAM prediction, mapping on the USPTO-50k dataset requires approximately 10 times longer, taking 6 h to complete compared to the 35 min required by LocalMapper.

Although we labeled only 47.3 K reactions (97%) in the USPTO-50K dataset after manually annotating 1000 reactions in this paper, it is remarkable that the same model can confidently label 544.5 K reactions (51.1%) in the full USPTO dataset. This represents a substantial increase in the labeling efficiency achieved through manual annotation. Furthermore, with two additional active learning iterations, this number grows to 712.6 K reactions (66.9%). However, it is important to note that LocalMapper tends to yield a significantly lower ratio of confident predictions when applied to entirely distinct reaction datasets, such as quantum-mechanical reactions⁴⁵ and enzymatic reactions⁴⁶. These reactions may follow entirely different reaction mechanisms (unimolecular one-to-many reactions and enzyme-catalyzed reactions, respectively) that were never encountered during active learning on organic reaction datasets. For such cases, we recommend engaging domain-specific chemists to undertake active learning iterations to adapt the model effectively before using it for large-scale AAM tasks.

In summary, we propose a graph-based ML model, LocalMapper, to precisely identify the AAM for large reaction datasets via human-in-the-loop machine learning. By manually labeling a small amount of reaction data with expert knowledge, we train an human-in-the-loop ML model to precisely and automatically label a large number of reactions sharing similar reaction rules. The proposed knowledge-based active sampling enables the human expert to only label the AAM of 2% of reactions that include the reaction templates of 97.0% of reactions in the entire dataset. We show an overall 98.5% AAM prediction accuracy, with 100% accuracy for confident predictions on a widely used USPTO-50K dataset, and a similar result is also observed in a diverse out-of-distribution test set. We expect the proposed LocalMapper can be used to provide precise reaction AAMs for future downstream reaction prediction models and benefit the chemistry community to learn more statistical insights into the reaction dataset. The trained LocalMapper model and the generated AAMs on USPTO-50K and USPTO-FULL datasets introduced in this paper are available at https://github.com/snu-micc/LocalMapper.

Methods

Extended-local reaction template (ELRT)

After the AAM is identified from LocalMapper, we extract the reaction template from the mapped reaction to categorize the reaction type of the given reaction. As a popular reaction template extraction tool, RDChiral⁴⁷ was developed to extract the reaction template considering atom neighbors, special groups, and stereochemistry and has been used in many retrosynthesis prediction models. However, the reaction template extracted by RDChiral was considered to be too specific, leading to low generalizability of reactions with the same reaction type (four reactions per template on average, extracted from USPTO-50K). Therefore, Chen and Jung¹⁴ modified the reaction template to only focus on local changes, which significantly improved the template generalizability to 76 reactions per template on average. Despite the enhanced generalizability of the local reaction template, important functional groups, such as acetal, carbonyl group, and nitrile, need to be included to make the reaction template more chemically understandable for the present purpose of AAM. Therefore, we extend the local reaction template by including important functional groups and denote it as extended-local reaction template (ELRT). More examples and the full set of functional groups included in the ELRT can be found in Supplementary Section 1.

Due to the absence of essential reagent and catalyst information in many reactions within the USPTO-50K dataset, we do not incorporate reagent and catalyst details into the reaction templates. For instance, we observed that at least 1166 (49.4%) out of 2362 Suzuki coupling reactions lack Pd catalyst, 520 (34.6%) out of 1500 nitro reduction reactions do not feature a reduction agent, and 170 (39.4%) out of 431 Mitsunobu reactions do not include diethyl azodicarboxylate (DEAD) or diisopropyl azodicarboxylate (DIAD). As a result, we make the simplifying assumption that common and necessary reagents or catalysts are present in the reactions during template extraction.

Molecular graph

The inputs of LocalMapper are the graphs of reactants and products of the target reaction We represent the reactant graph as ${{{{{{\bf{G}}}}}}}_{r}{{{{{\boldsymbol{=}}}}}}{{{{{\boldsymbol{(}}}}}}{{{{{{\bf{V}}}}}}}_{r}{{{{{\boldsymbol{,}}}}}}{{{{{{\bf{E}}}}}}}_{r}{{{{{\boldsymbol{)}}}}}}$ and the product graph as ${{{{{{\bf{G}}}}}}}_{p}{{{{{\boldsymbol{=}}}}}}{{{{{\boldsymbol{(}}}}}}{{{{{{\bf{V}}}}}}}_{p}{{{{{\boldsymbol{,}}}}}}{{{{{{\bf{E}}}}}}}_{p}{{{{{\boldsymbol{)}}}}}}$, where ${{{{{\bf{V}}}}}}$ (vertices) denotes atoms and ${{{{{\bf{E}}}}}}$ (edges) denotes bonds. The initial atom and bond features are the same as the ones used in LocalRetro¹⁴ and LocalTransform³², available in Supplementary Section 6. Both graphs are built using the DGL-LifeSci⁴⁸ Python package. The features of each atom in the reactants are denoted as ${{{{{{\bf{h}}}}}}}_{r,u}$ (for atom $u$) and the features of each bond in the reactants are denoted as ${{{{{{\bf{h}}}}}}}_{r,{uv}}$ (for the bond between atom $u$ and atom $v$). Similarly, the features of each atom and bond in the products are denoted as ${{{{{{\bf{h}}}}}}}_{p,u}$ and ${{{{{{\bf{h}}}}}}}_{p,{uv}}$.

Message massing neural network (MPNN)

To encode the surrounding environmental information for each atom, we used a message-passing neural network (MPNN)^33,49 to update the atom features for 3 iterations. We denote the message passing function by ${{{{{\rm{MPNN}}}}}}\left(\cdot \right)$, which update the atomic features ${{{{{{\bf{h}}}}}}}_{u}$ of atom $u$ by its neighbor atoms $\{v\}$ and bonds $\left\{{uv}\right\}$ in the molecule (Eqs. 3 and 4).

$${{{{{{\bf{h}}}}}}}_{r,u}^{t+1}={{{{{\rm{MPNN}}}}}}\left({{{{{{\bf{h}}}}}}}_{r,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{V}}}}}}}_{r}},{\left\{{{{{{{\bf{h}}}}}}}_{{uv}}\right\}}_{{uv}\in {{{{{{\bf{E}}}}}}}_{r}}\right)$$

(3)

$${{{{{{\bf{h}}}}}}}_{p,u}^{t+1}={{{{{\rm{MPNN}}}}}}\left({{{{{{\bf{h}}}}}}}_{p,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{V}}}}}}}_{p}},{\left\{{{{{{{\bf{h}}}}}}}_{{uv}}\right\}}_{{uv}\in {{{{{{\bf{E}}}}}}}_{p}}\right)$$

(4)

Reaction attention

After encoding the local chemical environment of each atom in the individual molecule, we enable the atoms in the product to refine their features by looking at the atoms in the reactants through multi-head attention blocks³⁴. In particular, we used multi-head attention ${{{{{\rm{MultiHeadAtt}}}}}}(\cdot )$ between the atoms in the products and reactants:

$${{{{{\rm{MultiHeadAtt}}}}}}\left({{{{{{\bf{h}}}}}}}_{p,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{G}}}}}}}_{r}}\right)={{{{{\rm{Concat}}}}}}\left({{{{{{\rm{head}}}}}}}_{2}\left({{{{{{\bf{h}}}}}}}_{p,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{G}}}}}}}_{r}}\right),\right.\\ \left.{{{{{{\rm{head}}}}}}}_{2}\left({{{{{{\bf{h}}}}}}}_{p,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{G}}}}}}}_{r}}\right),\ldots,{{{{{{\rm{head}}}}}}}_{{{{{{\rm{n}}}}}}}\left({{{{{{\bf{h}}}}}}}_{p,u},{\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}}_{v\in {{{{{{\bf{G}}}}}}}_{r}}\right)\right)$$

(5)

where ${{{{{\rm{Concat}}}}}}(\cdot )$ is the concatenation operation between each attention head.

The output of each attention head is the updated atoms features according to attention score ${{{{{{\bf{e}}}}}}}_{u,v}$ and value ${{{{{{\bf{V}}}}}}}_{n,v}$

$${{{{{{\rm{head}}}}}}}_{{{{{{\rm{n}}}}}}}\left({{{{{{\bf{h}}}}}}}_{u},\left\{{{{{{{\bf{h}}}}}}}_{v}\right\}\right)=\sum {{{{{\rm{Softmax}}}}}}\left({{{{{{\bf{e}}}}}}}_{u,v}\right){{{{{{\bf{V}}}}}}}_{n,v}$$

(6)

where attention score ${{{{{{\bf{e}}}}}}}_{u,v}$ is computed by the query Q, key K, and value V of each atom features, which are calculated by the linear layers in each attention head, and normalized by the hidden dimension $d$ and the number of attention head $n$:

$${{{{{{\bf{Q}}}}}}}_{u}={{{{{{\bf{w}}}}}}}_{Q}{{{{{{\bf{h}}}}}}}_{u}$$

(7)

$${{{{{{\bf{K}}}}}}}_{v}={{{{{{\bf{w}}}}}}}_{K}{{{{{{\bf{h}}}}}}}_{v}$$

(8)

$${{{{{{\bf{V}}}}}}}_{v}={{{{{{\bf{w}}}}}}}_{V}{{{{{{\bf{h}}}}}}}_{v}$$

(9)

$${{{{{{\bf{e}}}}}}}_{u,v}=\frac{{{{{{{\bf{Q}}}}}}}_{u}{\left({{{{{{\bf{K}}}}}}}_{v}\right)}^{T}}{\sqrt{d/n}}$$

(10)

In our experiment, we used 3 reaction attention blocks with 8 attention heads in each attention block. The dropout rate in the multi-head self-attention layer was set to 0.1. Gated transformation, skipped-connection, and layer normalization were applied after the attention mechanism and followed by standard feed-forward neural networks

$${{{{{{\bf{h}}}}}}}_{p,u}^{t+1}={{{{{{\bf{h}}}}}}}_{p,u}^{t}+{{{{{{\bf{w}}}}}}}_{f}\left({{{{{{\rm{Sigmoid}}}}}}\left({{{{{\bf{w}}}}}}_{g}{{{{{{\bf{m}}}}}}}_{p,u}^{t}+{{{{{{\bf{b}}}}}}}_{g}\right)}\right)+{{{{{{\bf{b}}}}}}}_{f}$$

(11)

where ${{{{{{\bf{w}}}}}}}_{f}$ and ${{{{{{\bf{b}}}}}}}_{f}$ are the weights and biases of feed-forward neural networks, ${{{{{{\bf{w}}}}}}}_{f}$ and ${{{{{{\bf{b}}}}}}}_{f}$ are the weights and biases of gated transformation, and ${{{{{{\bf{m}}}}}}}_{p,u}^{t}$ is the message of atom $u$ in the product $t$ obtained from the multi-head attention block at step $t$.

Atom-mapping classifier

Finally, the AAM score between atom ${u}_{p}$ in the products and atom ${u}_{r}$ in reactants $p({u}_{r} | {u}_{p})$ was computed by another single-head attention block as an atom-mapping classifier:

$$p({u}_{r} | {u}_{p})={{{{{\rm{Classifier}}}}}}({{{{{{\bf{h}}}}}}}_{r,u}{{{{{\rm{|}}}}}}{{{{{{\bf{h}}}}}}}_{p,u})={{{{{\rm{Softmax}}}}}}\left(\frac{{{{{{{\bf{Q}}}}}}}_{r,u}{({{{{{{\bf{K}}}}}}}_{p,u})}^{T}}{\sqrt{d}}\right)$$

(12)

Training objectives

The designed LocalMapper model is trained to optimize the AAM score $p({u}_{r}|{u}_{p})$ between each pair of corresponding atoms in the products and reactants through cross-entropy losses. Let $({u}_{r,i},{u}_{p,i})$ be the pair of atoms in the products and reactants sharing the same atom-number $i$, the objective of AAM to train the model parameter ${{{{{\boldsymbol{\theta }}}}}}$ is

$${{{{{{\mathcal{L}}}}}}}_{{{{{{\rm{AAM}}}}}}}=\mathop{\max }\limits_{{{{{{\boldsymbol{\theta }}}}}}}{\mathbb{E}}\left[\log ({p}_{{{{{{\boldsymbol{\theta }}}}}}}({u}_{r,i}{{{{{\rm{|}}}}}}{u}_{p,i}))\right]$$

(13)

Training hyperparameters and pseudocode

All the chemical operations are done using the RDKit⁵⁰ python package. We used Pytorch⁵¹ and DGL-LifeSci⁴⁸ for neural network training and testing. We train our model for 100 epochs with batch size 16 by Adam optimizer⁵² with ${10}^{-6}$ weight decay and set the initial learning rate to ${10}^{-3}$. The learning rate is reduced by a factor of 0.5 after when the validation loss does not decrease after a training epoch. Model gradients are clipped at a maximum norm of 20. Training LocalMapper takes around 4 h for the USPTO-50K dataset (at fifth iteration) and 6 h for the full USPTO dataset (at second iteration), while the inference takes 35 min for the former and 14 h for the latter datasets. The pseudocode of training LocalMapper is given in algorithm 1.

Algorithm 1

The pseudocode of training LocalMapper.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The AAMs for 3000 sampled reactions on the USPTO-50K dataset and out-of-distribution reactions predicted by the three evaluated ML models can be found at https://github.com/snu-micc/LocalMapper⁵³. The USPTO-50K and USPTO-full datasets remapped by LocalMapper generated in this study have been deposited in Figshare (https://doi.org/10.6084/m9.figshare.25046471.v1⁵⁴). Source data are provided with this paper.

Code availability

The code for LocalMapper described in this manuscript is publicly available at https://github.com/snu-micc/LocalMapper.

References

de Luca, A., Horvath, D., Marcou, G., Solov’ev, V. & Varnek, A. Mining chemical reactions using neighborhood behavior and condensed graphs of reactions approaches. J. Chem. Inf. Model. 52, 2325–2338 (2012).
Article PubMed Google Scholar
Nugmanov, R. I. et al. CGRtools: Python Library for molecule, reaction, and condensed graph of reaction processing. J. Chem. Inf. Model. 59, 2516–2521 (2019).
Article CAS PubMed Google Scholar
Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).
Article CAS PubMed Google Scholar
Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).
Article ADS CAS PubMed Google Scholar
Varnek, A., Fourches, D., Hoonakker, F. & Solov’ev, V. P. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J. Comput. Aided Mol. Des. 19, 693–703 (2005).
Article ADS CAS PubMed Google Scholar
Heid, E. & Green, W. H. Machine learning of reaction properties via learned representations of the condensed graph of reaction. J. Chem. Inf. Model. 62, 2101–2110 (2022).
Article CAS PubMed Google Scholar
Spiekermann, K. A., Pattanaik, L. & Green, W. H. Fast predictions of reaction barrier heights: toward coupled-cluster accuracy. J. Phys. Chem. A 126, 3976–3986 (2022).
Article CAS PubMed Google Scholar
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, S. & Jung, Y. A generalized-template-based graph neural network for accurate organic reactivity prediction. Nat. Mach. Intell. 4, 772–780 (2022).
Article Google Scholar
Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry 23, 5966–5971 (2017).
Article CAS PubMed Google Scholar
Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dai, H., Li, C., Coley, C. W., Dai, B. & Song, L. Retrosynthesis prediction with conditional graph logic network. Adv. neural inf. process. syst. 32 (2019).
Chen, S. & Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au 1, 1612–1620 (2021).
Article CAS PubMed PubMed Central Google Scholar
Seidl, P. et al. Improving few- and zero-shot reaction template prediction using modern Hopfield networks. J. Chem. Inf. Model. 62, 2111–2120 (2022).
Article CAS PubMed PubMed Central Google Scholar
Toniato, A., Schwaller, P., Cardinale, A., Geluykens, J. & Laino, T. Unassisted noise reduction of chemical reaction datasets. Nat. Mach. Intell. 3, 485–494 (2021).
Article Google Scholar
Jaworski, W. et al. Automatic mapping of atoms across both simple and complex chemical reactions. Nat. Commun. 10, 1434 (2019).
Article ADS PubMed PubMed Central Google Scholar
Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 7, eabe4166 (2021).
Article ADS PubMed PubMed Central Google Scholar
Indigo Toolkit. https://lifescience.opensource.epam.com/indigo/ (2024).
Chemaxon Docs. AutoMapper user’s guide. https://docs.chemaxon.com/display/docs/automapper-user-s-guide (2024).
Akutsu, T. Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comput. Biol. 11, 449–462 (2004).
Article CAS PubMed Google Scholar
Latendresse, M., Malerich, J. P., Travers, M. & Karp, P. D. Accurate atom-mapping computation for biochemical reactions. J. Chem. Inf. Model. 52, 2970–2982 (2012).
Article CAS PubMed Google Scholar
First, E. L., Gounaris, C. E. & Floudas, C. A. Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model. 52, 84–92 (2012).
Article CAS PubMed Google Scholar
Lynch, M. F. & Willett, P. The automatic detection of chemical reaction sites. J. Chem. Inf. Comput. Sci. 18, 154–159 (1978).
Article CAS Google Scholar
McGregor, J. J. & Willett, P. Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions. J. Chem. Inf. Comput. Sci. 21, 137–140 (1981).
Article CAS Google Scholar
Nugmanov, R., Dyubankova, N., Gedich, A. & Wegner, J. K. Bidirectional graphormer for reactivity understanding: neural network trained to reaction atom-to-atom mapping task. J. Chem. Inf. Model. 62, 3307–3315 (2022).
Article CAS PubMed Google Scholar
Jochum, C., Gasteiger, J. & Ugi, I. The principle of minimum chemical distance (PMCD). Angew. Chem. Int. Ed. Engl. 19, 495–505 (1980).
Article Google Scholar
Raymond, J. W. & Willett, P. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16, 521–533 (2002).
Article ADS CAS PubMed Google Scholar
Cook, S. A. The complexity of theorem-proving procedures. in Proc. Third Annual ACM Symposium on Theory of Computing - STOC ’71 151–158 (ACM Press, 1971).
Chen, W. L., Chen, D. Z. & Taylor, K. T. Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci. 3, 560–593 (2013).
Article CAS Google Scholar
Crabtree, J. D. & Mehta, D. P. Automated reaction mapping (ACM). J. Exp. Algorithmics 13, 15:1.15–15:1.29 (2009).
Article Google Scholar
Chen, S., Jung, Y. A generalized-template-based graph neural network for accurate organic reactivity prediction. Nat. Mach. Intell. 4, 772–780 (2022).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. International conference on machine learning. 1263–1272 (PMLR, 2017).
Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (2017).
Noh, J., Gu, G. H., Kim, S. & Jung, Y. Uncertainty-quantified hybrid machine learning/density functional theory high throughput screening method for crystals. J. Chem. Inf. Model. 60, 1996–2003 (2020).
Article CAS PubMed Google Scholar
Jang, J., Gu, G. H., Noh, J., Kim, J. & Jung, Y. Structure-based synthesizability prediction of crystals using partially supervised learning. J. Am. Chem. Soc. 142, 18836–18843 (2020).
Article CAS PubMed Google Scholar
Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schneider, N., Stiefl, N. & Landrum, G. A. What’s what: the (nearly) definitive guide to reaction role assignment. J. Chem. Inf. Model. 56, 2336–2346 (2016).
Article CAS PubMed Google Scholar
Lin, A. et al. Atom-to-atom mapping: a benchmarking study of popular mapping algorithms and consensus strategies. Mol. Inform. 41, 2100138 (2022).
Article CAS Google Scholar
Brown, D. G. & Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 59, 4443–4458 (2016).
Article CAS PubMed Google Scholar
Organic Syntheses. http://www.orgsyn.org/ (2024).
Kurti, L. & Czako, B. Strategic Applications of Named Reactions in Organic Synthesis (Elsevier, 2005).
Grossman, R. B. The Art of Writing Reasonable Organic Reaction Mechanisms (Springer International Publishing, 2019).
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature (University of Cambridge, 2012).
Grambow, C. A., Pattanaik, L. & Green, W. H. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 7, 137 (2020).
Article CAS PubMed PubMed Central Google Scholar
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2018).
Article ADS CAS PubMed Google Scholar
Coley, C. W., Green, W. H. & Jensen, K. F. RDChiral: an RDKit wrapper for handling stereochemistry in retrosynthetic template extraction and application. J. Chem. Inf. Model. 59, 2529–2537 (2019).
Article CAS PubMed Google Scholar
Li, M. et al. DGL-LifeSci: an open-source toolkit for deep learning on graphs in life science. ACS Omega 6, 27233–27238 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, Y., Tarlow, D., Brockschmidt, M. & Zemel, R. Gated graph sequence neural networks. International Conference on Learning Representations (2016).
RDKit: open-source cheminformatics. http://www.rdkit.org (2024).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019).
Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. International Conference on Learning Representations (2015).
Chen, S. snu-micc/LocalMapper: first release. zenodo https://doi.org/10.5281/zenodo.10555198 (2024).
Chen, S. USPTO reaction datasets remapped by LocalMapper. figshare https://doi.org/10.6084/m9.figshare.25046471.v1 (2024).

Download references

Acknowledgements

This work was supported by the Digital Research Innovation Institution Program funded by NRF Korea (RS-2023-00283902, Y.J.), Technology Innovation Program funded by MOTIE Korea (20015850, Y.J., S.C., S.A.), SRC Center for Electron Transfer (2021R1A5A1030054, Y.J., S.A.) funded by NRF Korea, and AI Graduate School Program of SNU funded by IITP Korea (2021-0-01343, Y.J.).

Author information

Authors and Affiliations

Department of Chemical and Biomolecular Engineering, KAIST, Daejeon, South Korea
Shuan Chen, Sunggi An & Yousung Jung
Department of Chemical and Biological Engineering, Seoul National University, Seoul, South Korea
Shuan Chen, Sunggi An & Yousung Jung
Graduate School of AI, KAIST, Daejeon, South Korea
Ramil Babazade
Institute of Chemical Processes, Seoul National University, Seoul, South Korea
Yousung Jung
Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, South Korea
Yousung Jung

Authors

Shuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Sunggi An
View author publications
You can also search for this author in PubMed Google Scholar
Ramil Babazade
View author publications
You can also search for this author in PubMed Google Scholar
Yousung Jung
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.C. designed the methods, performed the computational experiments and analyses, and wrote the initial draft of the manuscript. S.A. assisted the computational experiments. R.B. assisted to evaluate the mechanism of complex reactions. Y.J. discussed the results, edited the manuscript, and supervised the project.

Corresponding author

Correspondence to Yousung Jung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ramil Nugmanov and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, S., An, S., Babazade, R. et al. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning. Nat Commun 15, 2250 (2024). https://doi.org/10.1038/s41467-024-46364-y

Download citation

Received: 04 August 2023
Accepted: 20 February 2024
Published: 13 March 2024
DOI: https://doi.org/10.1038/s41467-024-46364-y

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.