Abstract
Interatomic potentials derived with Machine Learning algorithms such as DeepNeural Networks (DNNs), achieve the accuracy of highfidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closedshell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating openshell anions and cations. We introduce the AIMNetNSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spincharges with error errors ~0.01e for small and mediumsized organic molecules, compared to the reference QM simulations. The AIMNetNSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.
Introduction
A large body of research in the field of chemistry is concerned with the flow and behavior of electrons, which gives rise to important phenomena such as making and breaking chemical bonds. Quantum chemistry (QC) provides a mathematical framework for describing the behavior of atomistic systems through the solution of the Schrödinger equation, allowing for a detailed description of charge distribution and molecular energetics. QC provides the tools to accurately construct the potential energy surface (PES) of molecules, i.e., energy as a function of molecular geometry. Density Functional Theory (DFT) framework often underpins the methods of choice for such calculations when working with mediumsize molecules by providing a good balance between accuracy and computational cost. Unfortunately, standard DFT methods for the treatment of the Nelectron system typically require ~O(N^{3}) numerical cost. This cubic scaling has become a critical challenge that limits the applicability of DFT to a few hundred atom systems. This also limits the accessibility of longer dynamical simulation time scales, which are critical for simulating certain experimental observables. Consequently, a lot of progress has been made in the development of interatomic potentials providing a complex sought out PES functional (geometry > energy) using machine learning (ML)^{1,2}, which have been applied to a variety of systems^{3,4,5,6,7,8}.
Deepneural networks (DNN)^{9,10} are a particular class of ML algorithms proven to be universal function approximators^{11}. These DNNs are perfectly suitable to learn a representation of the PES for molecules. There are multiple distinct DNN models for ML potentials reported in the literature. They could be divided into two groups. The original BehlerParrinello (BP)^{12} and its modifications ANI^{13,14} and TensorMol^{15} rely on 2body (radial) and 3body (angular) symmetry functions to construct a unique descriptor of atomic environment for a particular atom, then use a DNN to predict atomic properties as a function of that descriptor. Other models, for example, HIPNN^{16}, DTNN^{4}, SchNet^{17}, and PhysNet^{18} use noninvariant radial symmetry functions or interatomic distances and iteratively construct a representation of the atomic environment through messagepassing techniques^{19}.
The ANAKINME (ANI) method^{13,20} is one example of a technique for building transferable DNNbased molecular potentials. The key components of ANI models are the diverse training data set^{21} and BP type descriptors^{12} with modified symmetry functions^{13}. The ANI1ccx data set was built from energies and forces for ~60K small organic molecules containing 5 and 0.5 million nonequilibrium molecular conformations calculated at DFT and highfidelity Coupled Clusters (CCSD(T)) levels, respectively^{21}. Test cases showed the ANI1ccx model to be chemically accurate compared to the reference Coupled Cluster calculations and exceeding the accuracy of DFT in multiple applications^{14}. Finally, the AIMNet (AtomsInMolecules neural Network) architecture, a chemically inspired, modular deepneural network molecular potential improves the performance of ANI models for longrange interactions and continuum solvent effects^{8}.
The physical properties of molecular systems are often labeled as intensive or extensive properties. This nomenclature relates to the dependency of the property upon the size of the system in question^{22}. The notation has been introduced by Tolman over one hundred years ago^{23}. Some studies have used ML for intensive properties^{24,25,26,27,28,29} independent of the system size, which poses challenges to ML techniques due to spatial nonlocality and longrange interactions.
In this work, we examine how DNN models like ANI and AIMNet can be applied to predicting intensive properties like electron attachment (electron affinity) and electron detachment (ionization potential). The conventional wisdom would be to fit different ML potentials for every quantummechanical state (neutral, cation, and anion). QM calculations for ionized states of the molecule are typically more expensive due to the unrestricted Hamiltonian formalism and subsequent spin polarization of orbitals. Therefore, we seek to answer a critical question: Can we fuse information from different molecular charge states to make ML models more accurate, general, and dataefficient? With the success of deep learning in many applications involving complex multimodal data, this question can be addressed by learning different states of the molecules with one common ML model, and the goal is to use the data in a complementary manner toward learning a single complex problem. We explore two synergistic strategies for joint modeling: multitask learning^{24,30} and data fusion. One of the main advantages of joint learning is that a hierarchical representation can be automatically learned for each state, instead of individually training independent models. In addition to electron attachment and detachment energies, we also choose to learn spinpolarized charges for every state reflecting quantum mechanics of the wavefunctions. This choice of properties is deliberate, as it allowed us to compute reactivity descriptors such as philicity indices and Fukui functions based on conceptual Density Functional Theory (cDFT) theory^{31,32}. cDFT, or Chemical Reactivity Theory, is a powerful tool for the prediction, analysis, and interpretation of chemical reactions^{33,34}. Here all cDFT indexes were computed directly from the neural network without additional training that permitted us to bypass quantum mechanical calculations entirely.
Results
Highdimensional neural networks (HDNNs)^{12} rely on the chemical bonding nearsightedness (‘chemistry is local’) principle by decomposition of the total energy of a chemical system into atomic contributions. For each atom in the molecule, HDNN models encode the local environment (a set of atoms within a predefined cutoff radius) as a fixedsize vector and use it as an input to a feedforward DNN function to infer individual atomic contribution to the total energy. The ANI model (Fig. 1a) transforms coordinates R of the atoms in the molecule into atomic environment vectors (AEVs): a set of translation, rotation, and permutation invariant twobody radial \({g}_{{ij}}^{(r)}\) (gaussian expansion of interatomic distances) and threebody angular \({g}_{{ijk}}^{(a)}\) (joint gaussian expansion of average distances to a pair of neighbors and cosine expansion of angle to those atoms) symmetry functions, where index i corresponds to a “central” atom and j and k refer to the atoms from its environment. Using the information of atomic species types Z, the AEV’s are reduced in a permutationinvariant manner into the Embedding vectors G, which encode both geometrical and type information of the atomic environment. The ANI model uses the concatenation of the sums of \({g}_{{ij}}^{(r)}\) and \({g}_{{ijk}}^{(a)}\), which correspond to a distinct chemical type of neighbor, or a combination of the types for two neighbors. This is equivalent to multiplication of the matrices \({{{{{{\boldsymbol{g}}}}}}}_{{{{{{\boldsymbol{i}}}}}}}^{({{{{{\boldsymbol{r}}}}}})}\) and \({{{{{{\boldsymbol{g}}}}}}}_{{{{{{\boldsymbol{i}}}}}}}^{({{{{{\boldsymbol{a}}}}}})}\) with rows composed of AEV’s, and corresponding matrices A^{(r)} and A^{(a)} composed with onehot (categorical) encoded atom or atompair types:
This definition of the HDNN models suffers from the “curse of dimensionality” problem. Namely, the size of G depends on the number of unique combinations of atomic species included in parametrization (size of vectors in A^{(a)}). Also, since the information about the type of the “central” atom is not included in G, it uses multiple independent DNNs defined for each atom type (\({{{{{{\mathcal{F}}}}}}}^{({{{{{\mathcal{Z}}}}}}_{{{i}}})}\)) to model Interactions of the atom with its environment and outputs atomic energy \({E}_{i}\):
The AIMNet model (Fig. 1b) was developed to address the dimensionality issue with the ANI model. Instead of onehot encoding of atomic species, it uses learnable atomic feature vectors (AVFs) A in Eq. 1. The AFV vectors encode similarities between chemical elements. This approach eliminates the dependence of the size of Embedding layer on the number of parametrized chemical species. The AIMNet model utilizes the idea of multimodal learning, making a simultaneous prediction of different atomic properties from several output heads attached to the common layer of multilayer neural nets. This layer is enforced to capture the relationships across multiple learned modalities and serves as a joint latent representation of atoms in the molecule. Therefore, we call this layer an AIM vector. Finally, the architecture of AIMNet has a specific implementation of message passing through updating the AFV based on neighbor atoms atomic environments. This way, the model operates iteratively, at each iteration t predicting atomic properties P and updated features A, using the same (shared across iterations) neural network function \({{{{{\mathcal{F}}}}}}\):
The approach has an analogy with a solution of oneelectron Schrodinger equation with selfconsistent field (SCF) iterations, where oneelectron orbitals (AFV in case of AIMNet) adapt to the potential introduced by other orbitals in the molecule (embedding vectors G in case of AIMNet). Though there is no convergence guarantee for AIMNet due to the absence of the variational principle, in practice statistical errors decrease and converge at t = 3 being an empirical observation.
The AIMNet and ANI models do not use total molecular charge and therefore could not discriminate between different charge states of the same conformer. The straightforward way to obtain reasonable predictions is to train separate models for neutral, anionic, and cationic species. Since the AIMNet model works well in multitask regime^{8}, we also design an AIMNet architecture that simultaneously predicts energies and spinpolarized atomic charges with multiple output heads from the same AIM layer for a predefined set of charge states (AIMNetMT, Fig. 1c). All three states share the same AFV representation, Interaction, and Update blocks. This setting allows us to evaluate if the common feature representations can capture correlations across different states and, if possible, take advantage of that.
In this paper, we introduce an extension to the AIMNet architecture which allows the model to predict energy, properties, and partial atomic charges for a specified state based on total molecular charge and spin multiplicity (or, alternatively, total α and β spin charges) given as input for the model. The key component of the new model is the Neural Spincharge Equilibration unit (NSE, Fig. 1d), which makes prediction of partial spinpolarized atomic charges \({\widetilde{q}}^{s}\) and atomic weight factors \({f}^{s}\) (conceptually related to atomic Fukui functions, ∂q/∂Q) from the AIM layer using fullyconnected NN output head. The factors \({f}^{s}\) are used to redistribute atomic spin charges such as their sum is equal to the specified total molecular spin charges:
where index s corresponds to spincomponent of the charge density, \(\widetilde{q}\) and \(q\) are initial and renormalized charges, N is number of atoms and \(Q\) total is the total charge of the molecule. The consequent Update block injects normalized atomic charges into the AFV vector. This way, during the next AIMNet iteration, the information about charge distribution will be used in the Embedding block. We should note, that for the AIMNet and AIMNetMT models the sum of atomic charges is not necessarily an integer, but rather is very close to the total integer molecular charge due to errors in atomic charge predictions. However, for the AIMNetNSE model, the charges are conserved and add up to the total molecular charge by construction.
A summary of the performance for all four models is presented in Table 1. Vertical ionization potentials (IP) and electron affinities (EA) were computed directly from the corresponding differences of energies of neutral and charged states:
The prediction errors are evaluated on the Ions12 (up to 12 nonH atoms) data set which provides a measure of the performance of the model with respect to the data points similar to those used for training. On the other hand, errors on Ions16 (13–16 nonH atoms) can be seen as a more appropriate testbed that is probing generalization capabilities of the model across the unknown chemical and conformational degrees of freedom (i.e., unseen molecules). Further, we evaluate the performance of the models on the data set of equilibrium conformations of neutral druglike molecules ChEMBL−20 (13–20 nonH atoms) as a realistic example application of the model. We report rootmeansquare errors (RMSE), rather than more popular in the field^{5,17,35} mean absolute errors (MAE). MAE is less sensitive to severe prediction errors and could often mislead about the generalization capabilities of the models.
While ANI models are known to achieve stateoftheart performance^{14,36} on conformational energies and reaction thermochemistry in druglike molecules, the problem addressed here is challenging due to the presence of charged species. Similar to our previous results for neutral molecules^{8}, all AIMNet flavors substantially improve upon ANI, especially for the total energy of cations and vertical IPs. The original ANI model does not include explicit longrange interactions. All interactions are described implicitly by the neural network; therefore, the interactions described by the model do not extend beyond the AEV cutoff distance (R_{cut} = 5.2 Å in this work). Since the ANI model performs well on neutral molecules and is completely shortsighted and has no capability to perform charge equilibration either explicitly or implicitly, we use it as a baseline for comparison. Because both extra electrons (in case of anions) and holes (in case of cations) are spatially delocalized, the nonlocal electrostatics extends beyond the cutoff distance and spatially spans over the molecule.
While the AIMNet and AIMNetMT models show reasonable accuracy for neutral and anionic species, the errors for cations are few times larger, especially for the ChEMBL data set. This indicates the shortcoming in the extensibility of implicit charge equilibration with “SCFlike” passes. Overall, the datafused AIMNetMT model performs marginally better then separate AIMNet models for each charge state. Contrary, the AIMNetNSE model with explicit charge equilibration shows consistent performance across charge states and molecule sizes, both for near and offequilibrium conformers. The RMS errors on IP and EA values are approach 0.1 eV for optimized structures and to 0.15 eV for offequilibrium geometries. Fig. 2 provides overall correlation plots for energies and charges as predicted by AIMNetNSE model for Ions16 data set. Please see Supplementary Figs. 3–5 for plots for similar plots produces with the other models. Note, since regression plots are colored by the density of points on the log scale, the vast majority of points are on the diagonal line. The AIMNetNSE models consistently provide the same level of performance across the energy range of 400 kcal/mol (~17 eV) without noticeable outliers. The model is able to learn atomic charges up to 0.01e (electron, elementary charge) for neutral molecules and 0.02e for ions as shown in Fig. 2 (also see Supplementary Table 2). Table 1 also compares the performance of individual models to the performance of their ensemble prediction (marked as “ens5”). In principle, model ensembling is always desirable and, on average, provide a performance boost of 0.5 kcal/mol for all energybased quantities.
The AIMNetNSE model has a superb utility for highthroughput applications. In this sense, it is interesting to compare this model with the excellent semiempirical IPEAxTB method^{37}. The IPEAxTB is a reparametrization of GFNXTB Hamiltonian to predict EA and IP values of organic and inorganic molecules. The reparametrization aimed to reproduce PW6B95/def2TZVPD results. The IPEAxTB method was successfully used to make accurate predictions of electron ionization mass spectra^{37} and for highthroughput screening of polymers^{38,39}. For mediumsized organic molecules, the AIMNetNSE model brings the accuracy/computational performance ratio to the a new level. For the ChEMBL20 data set, the RMSE of IPEAxTB EA and IP vs PBE0/madef2SVP are 4.6 and 10.6 kcal/mol, compared to AIMNetNSE errors of 2.7 and 2.4 kcal/mol, respectively. Therefore, the AIMNetNSE is considerably more accurate and at least two orders of magnitude faster than IPEAxTB when running on similar hardware.
To elucidate the importance of iterative “SCFlike” updates, the AIMNet model was evaluated with a different number of passes t. AIMNet with t = 1 is very similar to the ANI model. The receptive field of the model is roughly equal to the size of the AEV descriptor in ANI; and no updates were made to the AFV vector and atomic embeddings. Fig. 3 shows that the aggregated performance of prediction for energies improves with an increasing number of passes t. This trend is especially profound for cations. As expected, the accuracy of AIMNet with t = 1 is very similar or better compared to the ANI network. The second iteration (t = 2) provides the largest improvement in performance for all three states. After t = 3, the results are virtually converged. Therefore, we used t = 3 to train all models in this work. These observations for charged molecules are remarkably consistent with results for neutral species^{8}.
Let us consider 4amino4′nitrobiphenyl molecule as an illustrative example (Fig. 3). This is a prototypical optoelectronic system, where a πconjugated system separates the electrondonating (NH_{2}) and accepting (NO_{2}) groups. These polar moieties underpin an increase in the transition dipole moment upon electronic excitation leading to twophoton absorption. The effect of donoracceptor substitution is apparent from the groundstate calculations of the charge species where electron and hole in cation and anion, respectively, are shifted towards the substituent groups with strong delocalization across π orbitals of the aromatic rings. Fig. 3 illustrated the charge equilibration procedure in AIMNetNSE models and compares it to DFT results. During the first pass, before charge normalization, the predicted densities are the same for anion and cation (note inverse color codes for anion and cation on Fig. 3), but after weighted normalization, the spincharge density is already slightly shifted towards the nitro group in the anion and the amino group in the cation. At the same time spin charges on the hydrogen atoms does not change, as expected. After three iterations the AIMNetNSE model correctly reproduces spindensity wavelike behavior with opposite phases for the cation and anion as predicted by DFT. There is no sign alternation for spin charge for 4, 4′ positions, however, the absolute value of spincharge difference for these atoms is high. Overall, the AIMNetNSE model predicts spin charges for nonhydrogen atoms of this molecule with MAE 0.03e for anion and 0.02e for cation. Notably, the 4amino4′nitrobiphenyl molecule was neither part of the training nor validation data, exemplifying the new architecture’s ability to transfer spindensity predictions to completely unseen molecules.
In AIMNetNSE, the physical meaning of the weights f (see Eq. 4) is related to atomic Fukui functions, \(\partial {q}_{i}/\partial Q\), e.g., how much would atomic charge \({q}_{i}\) change with the change of total charge Q. In practice, the model would assign higher values of f to the atoms which tend to have different charges in different charge states of the molecule, for example, to aromatic and hereto atoms. The value of f also reflects the uncertainty in charge distribution predicted by the neural network. A somewhat related approach for weighted charge renormalization was used previously^{40}. It was based on charge prediction uncertainty estimated with ensemble of random forests, however without noticeable improvement in charge prediction accuracy. Our neural spincharge equilibration method provides a simple and affordable alternative to other ML charge equilibration approaches^{41,42,43} based on QEq method which finds charge distribution by minimization of molecular Coulomb energy. While the QEq solution impose physicsbased constraints for the obtained charge distribution, it is limited by the approximate form of Coulomb integral and could be computationally demanding due to the required matrix inversion operation.
The described neural charge equilibration could be an attractive alternative to popular charge equilibration schemes like EEM^{44}, QEq^{45}, and QTPIE^{46} that use simple physical relationships. They often suffer from transferability issues and might produce unphysical results. To our knowledge, this is a primary example where the ML model provides a consistent and qualitatively correct physical behavior between molecular geometry, energy, integral molecular charge, and partial atomic charges. Upon submitting this manuscript we learned about work by Xie^{47}, where ML model built to predict energy as a function of electron populations in prototypical LiH clusters. Other schemes like BP^{12}, TensorMol^{15}, HIPNN^{48,49}, and PhysNet^{18} typically employ auxiliary neural network that predicts atomic charges from a local geometrical descriptor. Electrostatic interactions are computed with Coulomb’s law based on those charges. In principle, many effects can be captured by a geometrical descriptor, but it does not depend on the total charge and spin multiplicity of the molecule. Following the basic principles of quantum mechanics to incorporate such information successfully, the model should adapt according to changes in the electronic structure, preferably in a selfconsistent way. This is exemplified here through the case of the AIMNetNSE model.
Case study for chemical reactivity and reaction prediction
As a practical application of AIMNetNSE model, we demonstrate a case study on chemical reactivity and prediction of reaction outcomes. The robust prediction of the products of chemical reactions is of central importance to the chemical sciences. In principle, chemical reactions can be described by the stepwise rearrangement of electrons in molecules, which is also known as a reaction mechanism^{50}. Understanding this reaction mechanism is crucial because it provides an atomistic insight into how and why the specific products are formed.
DFT has shown to be a powerful interpretative and computational tool for mechanism elucidation^{51,52,53,54}. In particular, conceptual DFT (cDFT) popularized many intuitive chemical concepts like electronegativity (χ) and chemical hardness (η)^{55}. In cDFT, reactive indexes measure the energy (E) change of a system when it is a subject to a perturbation in its number of electrons (N). The foundations of cDFT were laid by Parr et al.^{56} with the identification of the electronic chemical potential µ and hardness η as the Lagrangian multipliers in the Euler equation. In the finitedifference formulation, these quantities could be derived from EA and IP values as
The Fukui function f(r) is defined as a derivative of the electron density on the total number of electrons in the system. These global and condensedtoatom local indexes were successfully applied to a variety of problems in chemical reactivity^{57,58}. Using finitedifference approximation and condensed to atoms representation, Fukui functions for electrophilic (\({f}_{a}^{}\)), nucleophilic (\({f}_{a}^{+}\)), and radical (\({f}_{a}^{0}\)) reactions are defined as:
Another useful cDFT reactivity descriptor is the electrophilicity index given by
as well as it’s condensed to atoms variants for electrophilic (\({\omega }_{a}^{}\)), nucleophilic (\({\omega }_{a}^{+}\)) and radical (\({\omega }_{a}^{\pm }\)) attacks:^{59}
On the basis of the predicted with AIMNetNSE vertical IPs, EAs, and charges, we could directly compute all listed cDFT indexes. Fig. 4 displays the correlation plots for all nine quantities. The AIMNetNSE model achieves an excellent quality of prediction of three global indexes with R^{2} ranging from 0.93 to 0.97. Condensed indexes are more challenging to predict, with philicity index (\({\omega }_{a}^{+}\)) being the hardest (R^{2} is 0.82). This is related to the overall larger errors in the cation energy predictions. Here we would like to emphasize again that none of these properties were part of the cost function or training data. The values were derived from the pretrained neural network and therefore opens the possibility of direct modeling fully bypassing cDFT calculations and wavefunction analysis. The accuracy of AIMNetNSE predicted condensed indexes appears to be suitable to make a reliable prediction of reaction outcomes.
Let us exemplify the prediction of site selectivity for aromatic C–H bonds using electrophilic aromatic substitution (EAS) reaction. The EAS reaction is a standard organic transformation. Its mechanism involves the addition of an electrophile to the aromatic ring to form a σcomplex (Wheland intermediate) followed by deprotonation to yield the observed substitution product (Fig. 5). The reactivity and regioselectivity of EAS would generally depend on the ability of the substituents to stabilize or destabilize a σcomplex.
Recently EAS attracted significant attention from computational studies due to its importance in latestage functionalization (LSF) for the drug development process^{60}. A direct and numerically very expensive approach to EAS selectivity predictions is to calculate all transition states on the complete path from reactants to products. A popular approach called RegioSQM achieves high site prediction accuracy based on enumeration and calculation of σcomplex with semiempirical quantum mechanical calculations^{61}.
Table 2 lists the accuracy of regioselectivity prediction with recently published methods using data from ref. ^{60}. A random forest (RF) model with DFT TPSSh/Def2SVP derived descriptors like charges (q), bond orders (BO), Fukui indexes, and solvent accessible surface (SAS) achieves 90% accuracy on the validation data (note different DFT methodology used for this study and for training our DNNs). This model relies on QM calculations of reagents but does not require searching σcomplexes. When QM descriptors are combined with RegioSQM, the RF classifier exhibits an excellent performance of 93%. While the RegioSQM model is accurate, it is slow for highthroughput screening. A modest data set of a few hundred molecules takes about two days to complete on a multicore compute node. Very recently, Weisfeiler–Lehman Neural Network (WLNN) was suggested to predict site selectivity in aromatic C–H functionalization reactions^{62}. This model was trained on 58,000 reactions from the Reaxys database and used RDKit molecular descriptors. WLNN achieves an accuracy approaching 90% for the prediction of EAS regioselectivity.
We used AIMNetNSE to calculate Fukui coefficients and atomic philicity indexes. We also added the AIM layer of the query atom in cationradical form of the molecule as an additional set of descriptors. The size of the AIM layer is smaller (144 elements) than the training data set size (602 data points). The use of crossvalidation scores and the random forest method generally mitigates any overfitting issues. As we argued before^{8} the multimodal knowledge residing inside the AIM layer could be exploited as an informationrich feature representation. The RF classifier trained with AIMNetNSE descriptors displays an excellent performance of 90% on the validation set and 85% on the test set. While obtained predictions for the electrophilic aromatic substitution reaction are only marginally better than previously reported values, our model achieve six orders of magnitude computational speedup since no quantum mechanical simulations are necessary.
Discussion
We recently witnessed that machine learning models trained to quantummechanical data achieve formidable success in quantitative predictions of groundstate energies and interatomic potentials for common, typically chargeneutral organic molecules. Nevertheless, a quantitative description of complex chemical processes involving reactions, bond breaking, charged species, and radicals remains an outstanding problem for data science. The conceptual challenge is a proper description of spatially delocalized electronic density (which strongly depends on molecular conformation) and accounting for longrange Coulombic interactions stemming from the inhomogeneously distributed charges. These phenomena appear as a consequence of the quantummechanical description of delocalized electronic wavefunctions. Consequently, representation of spatially nonlocal, frequently intensive molecular properties is problematic for common neural nets adapting local geometric descriptors. The recently developed AIMNet neural network architecture addresses this challenge via an iterative message passingbased process, which ultimately captures complex latent relationships across atoms in the molecule.
In the present work, we introduced the AIMNetNSE architecture to learn a transferrable potential for organic molecules in arbitrary charge states. For neutral, cationradical and anionradical species, the AIMNetNSE model achieves consistent 3–4 kcal/mol accuracy in predicting energies of larger molecules (13–20 nonH atoms), even though it was only trained small molecular up to 12 nonH atoms. In addition to energy, the AIMNetNSE model achieve stateoftheart performance in the prediction of intensive properties. It demonstrates accuracy of about 0.10–0.15 eV for vertical electron affinities and ionization potentials across a broad chemical and conformational space.
The key ingredients that allow the AIMNetNSE model to achieve such a high level of accuracy are (i) multimodal learning, (ii) joint informationrich representation of atom in a molecule that is shared across multiple modalities, and (iii) Neural SpinCharge Equilibration (NSE) block inside the neural network. In contrast to the standard geometric descriptors, we have highlighted the importance of incorporating adaptable electronic information into ML models. Essentially the AIMNetNSE model serves as a charge equilibration scheme. AIMNetNSE brings ML and physicsbased models one step closer by offering a discrete, physically correct dependence of system energy with respect to a total molecular charge and spin states.
As a side benefit, it can provide a highquality estimate of reactive indexes based on conceptual DFT and reliable prediction of reaction outcomes. Overall, demonstrated flexible incorporation of quantum mechanical information into the AIMNet structure and data fusion exemplify a step toward developing a universal single neural net architecture capable of quantitative prediction of multiple properties of interest. As we show in our case studies the AIMNetNSE model appears as a fast and reliable method to compute multiple properties like ionization potential, electron affinity, spinpolarized charges, and a wide variety of conceptual DFT indexes. It potentially emerges as a dropin replacement calculator in a myriad of potential applications where high computational accuracy and throughput are required.
Methods
Data set
For the training data set, we randomly selected about 200k neutral molecules from the UNICHEM database^{63} with molecule size up to 16 “heavy” (i.e., nonhydrogen) atoms and set of elements {H, C, N, O, F, Si, P, S, and Cl}. We choose molecular dynamics (MD) as a fast and simple method to explore molecular PESs around their minima. Thermal fluctuations of atoms in MD simulations allow for the nearequilibrium sampling of molecular conformational space. Similar approaches have been explored in previous reports^{13,21}. Notably, all traditional molecular force fields are designed to describe closedshell molecules only. Therefore, to overcome this limitation, we choose a quantum mechanically derived force field (QMDFF ^{64}) as an efficient method to construct systemspecific and chargespecific mechanistic potential for a molecule. We relied on the GFN2xTB^{65} tightbinding model to obtain minimum conformation, force constants, charges, and bond orders that are needed for the QMDFF model.
The workflow to generate molecular conformations is summarized in Fig. 6. Starting from SMILES representations, we generated a single 3D conformation for each molecule using the RDKit ^{66} library. The molecule in each of three charge states (i.e., neutral, cation and anion) was optimized using the GFN2xTB method, followed by a calculation of force constants, charges, and bonds orders to fit moleculespecific QMDFF parameters. This custom force field was used to perform a 500 ps NVT MD run, with snapshots collected every 50 ps for the subsequent DFT calculations. For each snapshot, we performed several singlepoint DFT calculations with a charge for the molecule set to the value at which the MD was performed, as well as its neighboring charge state, i.e., −1, 0 for anions, −1, 0, +1 for neutral, and 0, +1 for cations (Fig. 6). This results in up to 70 singlepoint DFT calculations per molecule. For DFT calculations we selected PBE0/madef2SVP level of theory as a reasonable compromise between accuracy and computational expenses. PBE0 is a nonempirical hybrid DFT that is widely used to compute molecular properties. Exact exchange and diffuse functions in the basis set are needed in order to describe anionic species. All DFT calculations were performed using the ORCA 4.0 package^{67}. Atomic spinpolarized charges were calculated the NBO7 software package^{68} for PBE0/madef2SVP wavefunction.
We split all data into two subsets: Ions12 data set contains 6.44 M structures with up to 12 heavy atoms of which 45%, 25%, and 30% are neutral, cations, and anions, respectively. Ions16 data set has 295k structures of 13–16 nonhydrogen atoms size with 48%, 24%, and 26% of neutral, anionic, and cationic species, respectively. Please see Supplementary Table 1 and Figs. 1–2 for more details. We used Ions12 data set for training and validation, whereas Ions16 was utilized for testing. Ions16 data set has larger, more complex structures and thus probes the model transferability.
For further evaluation of model performance, transferability, and extensibility we compiled a data set that should be close to realworld application. We randomly selected 800 of organic molecules from ChEMBL database^{69,70} with 13–20 nonhydrogen atoms, 100 per molecular size. The neutral state of each molecule was optimized with B973c composite DFT method^{71}, then a singlepoint energy calculation using the same B973c method was performed for anion and cation radicals. The resulting data set, referred as ChEMBL20, covers equilibrium conformations of “druglike” molecules.
Training protocol
The ANI model and AIMNet variants were trained using minibatch gradient descent powered by the Adam optimizer^{72}. For training performance considerations, all minibatches were composed of molecules with the same number of atoms, to avoid padding. Proper data feed shuffling was achieved with the multiGPU Dataparallel approach: gradients on model weights were averaged after 8 random batches were evaluated in parallel. The effective combined batch size was 2048. The training was performed on 8 Nvidia V100 GPUs, with a computational cost of about 200 s for the AIMNetMT model and 130 s for the AIMNetNSE model per epoch of Ions12 data set with 6.4 M data points. We employ a reduceonplateau learning rate schedule, which leads to training convergence within 400–500 epochs.
The training objective was minimization of weighted multitarget mean squared error (MSE) loss function with included errors in energy and charge predictions. The AIMNet architecture shares weights of Embedding, Interaction blocks, and fullyconnected output heads for all “SCFlike” iterative passes. The models were trained with 3 passes. The outputs from each pass were included into weight function, except for during training the AIMNetNSE model. Due to the architecture of the AIMNetNSE model during the first pass, it makes predictions without the use of information about the total spin charge. Therefore, for this model only, outputs from the two last passes were included in the loss function. Although all final predictions of AIMNet models were obtained with t = 3, we found it beneficial to restrain a network to give reasonably accurate results on earlier iterative passes, as it provides regularization to the model. Additional details about the loss function are given in the SI.
The baseline ANI and AIMNet models were trained independently for each of the three charge states of the molecules. For AIMNetMT and AIMNetNSE, joint training for all charge states was performed, and errors for each charge state were included in the loss function. The training was done against 5fold crossvalidation data splits. These five independent models were used to build an ensemble for more accurate predictions, denoted as “ens5” later in the text. All AIMNet model variants, as well as the ANI model, were implemented with the PyTorch framework^{73}. The AIMNetNSE model, example inference scripts, and test datasets are available in a public code repository at https://github.com/isayevlab/aimnetnse.
Data availability
The test datasets used this study are publicly available at https://doi.org/10.5281/zenodo.5007980.
Code availability
The trained AIMNetNSE models in and the code to reproduce this study is available at https://doi.org/10.5281/zenodo.5008270 and in GitHub at https://github.com/isayevlab/aimnetnse.
References
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. 11, 2336–2347 (2020).
Rupp, M., Tkatchenko, A., Müller, K.R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatchenko, A. Quantumchemical insights from deep tensor neural networks. Nat. Commun. 8, 13890 (2017).
Chmiela, S. et al. Machine learning of accurate energyconserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).
Bartók, A. P. et al. Machine learning unifies the modeling of materials and molecules. Sci. Adv. 3, e1701816 (2017).
Zubatyuk, R., Smith, J. S., Leszczynski, J. & Isayev, O. Accurate and transferable multitask prediction of chemical properties with an atomsinmolecules neural network. Sci. Adv. 5, eaav6490 (2019).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251–257 (1991).
Behler, J. & Parrinello, M. Generalized neuralnetwork representation of highdimensional potentialenergy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Smith, J. S. et al. Approaching coupled cluster accuracy with a generalpurpose neural network potential through transfer learning. Nat. Commun. 10, 2903 (2019).
Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol0.1 model chemistry: a neural network augmented with longrange physics. Chem. Sci. 9, 2261–2269 (2018).
Lubbers, N., Smith, J. S. & Barros, K. Hierarchical modeling of molecular energies using a deep neural network. J. Chem. Phys. 148, 241715 (2018).
Schütt, K. T., Sauceda, H. E., Kindermans, P. J., Tkatchenko, A. & Müller, K. R. SchNet  a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Proceedings of the 34th International Conference on Machine Learning, PMLR 70, 1263–1272 (2017).
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Smith, J. S. et al. The ANI1ccx and ANI1x data sets, coupledcluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Redlich, O. Intensive and extensive properties. J. Chem. Educ. 47, 154 (1970).
Tolman, R. C. The measurable quantities of physics. Phys. Rev. 9, 237–253 (1917).
Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. N. J. Phys. 15, 095003 (2013).
Pronobis, W., Schütt, K. T., Tkatchenko, A. & Müller, K.R. Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning. Eur. Phys. J. B 91, 178 (2018).
Westermayr, J. et al. Machine learning enables long time scale molecular photodynamics simulations. Chem. Sci. 10, 8100–8107 (2019).
Chen, W. K., Liu, X. Y., Fang, W. H., Dral, P. O. & Cui, G. Deep learning for nonadiabatic excitedstate dynamics. J. Phys. Chem. Lett. 9, 6702–6708 (2018).
Dral, P. O., Barbatti, M. & Thiel, W. Nonadiabatic excitedstate dynamics with machine learning. J. Phys. Chem. Lett. 9, 5660–5663 (2018).
St. John, P. C., Guan, Y., Kim, Y., Kim, S. & Paton, R. S. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with subsecond computational cost. Nat. Commun. 11, 2328 (2020).
Westermayr, J., Gastegger, M. & Marquetand, P. Combining SchNet and SHARC: the SchNarc machine learning approach for excitedstate dynamics. J. Phys. Chem. Lett. 11, 3828–3834 (2020).
Geerlings, P., De Proft, F. & Langenaeker, W. Conceptual density functional theory. Chem. Rev. 103, 1793873 (2003).
Chattaraj, P. K. Chemical Reactivity Theory (2009).
Cohen, M. H. & Wasserman, A. On the foundations of chemical reactivity theory. J. Phys. Chem. A 111, 2229–2242 (2007).
Sandfort, F., StriethKalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structurebased platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).
Christensen, A. S., Bratholm, L. A., Faber, F. A., Glowacki, D. R. & von Lilienfeld, O. A. FCHL revisited: faster and more accurate quantum machine learning. J. Chem. Phys. 152, 044107 (2020).
Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).
Ásgeirsson, V., Bauer, C. A. & Grimme, S. Quantum chemical calculation of electron ionization mass spectra for general organic and inorganic molecules. Chem. Sci. 8, 4879–4895 (2017).
HeathApostolopoulos, I., Wilbraham, L. & Zwijnenburg, M. A. Computational highthroughput screening of polymeric photocatalysts: exploring the effect of composition, sequence isomerism and conformational degrees of freedom. Faraday Discuss 215, 98–110 (2019).
Wilbraham, L., Berardo, E., Turcani, L., Jelfs, K. E. & Zwijnenburg, M. A. Highthroughput screening approach for the optoelectronic properties of conjugated polymers. J. Chem. Inf. Model. 58, 2450–2459 (2018).
Bleiziffer, P., Schaller, K. & Riniker, S. Machine learning of partial charges derived from highquality quantummechanical calculations. J. Chem. Inf. Model. 58, 579–590 (2018).
Ghasemi, S. A., Hofstetter, A., Saha, S. & Goedecker, S. Interatomic potentials for ionic systems with density functional accuracy based on charge densities obtained by a neural network. Phys. Rev. B 92, 45131 (2015).
Faraji, S. et al. High accuracy and transferability of a neural network potential through charge equilibration for calcium fluoride. Phys. Rev. B 95, 1–11 (2017).
Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. A fourthgeneration highdimensional neural network potential with accurate electrostatics including nonlocal charge transfer. Nat. Commun. 12, 398 (2021).
Mortier, W. J., Van Genechten, K. & Gasteiger, J. Electronegativity equalization: application and parametrization. J. Am. Chem. Soc. 107, 829–835 (1985).
Rappé, A. K. & Goddard, W. A. III Charge equilibration for molecular dynamics simulations. J. Phys. Chem. 95, 3358–3363 (1991).
Chen, J. & Martínez, T. J. QTPIE: charge transfer with polarization current equalization. a fluctuating charge model with correct asymptotics. Chem. Phys. Lett. https://doi.org/10.1016/j.cplett.2007.02.065 (2007).
Xie, X., Persson, K. A. & Small, D. W. Incorporating electronic information into machine learning potential energy surfaces via approaching the groundstate electronic energy as a function of atombased electronic populations. J. Chem. Theory Comput. 16, 4256–4270 (2020).
Sifain, A. E. et al. Discovering a transferable charge assignment model using machine learning. J. Phys. Chem. Lett. 9, 4495–4501 (2018).
Nebgen, B. et al. Transferable dynamic molecular charge assignment using deep neural networks. J. Chem. Theory Comput. 14, 4687–4698 (2018).
Herges, R. Organizing principle of complex reactions and theory of coarctate transition states. Angew. Chem. Int. Ed. Eng. 33, 255–276 (1994).
Houk, K. N. Frontier molecular orbital theory of cycloaddition reactions. Acc. Chem. Res. 8, 361–369 (1975).
Houk, K. et al. Theory and modeling of stereoselective organic reactions. Science 231, 1108–1117 (1986).
Jones, G. O., Liu, P., Houk, K. N. & Buchwald, S. L. Computational explorations of mechanisms and liganddirected selectivities of coppercatalyzed Ullmanntype reactions. J. Am. Chem. Soc. 132, 6205–6213 (2010).
Reid, J.P., Sigman, M.S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 https://doi.org/10.1038/s415860191384z (2019).
Ayers, P. W. & Levy, M. Perspective on “density functional approach to the frontierelectron theory of chemical reactivity.” Theor. Chem. Acc. https://doi.org/10.1007/s002149900093 (2000).
Parr, R. G. & Yang, W. Density functional approach to the frontierelectron theory of chemical reactivity. J. Am. Chem. Soc. 106, 4049–4050 (1984).
Chermette, H. Chemical reactivity indexes in density functional theory. J. Comput. Chem. 20, 129–154 (1999).
Chattaraj, P. K. Chemical reactivity theory: a density functional view. Chem. Duke. Educ. https://doi.org/10.1201/9781420065442 (2009).
Chattaraj, P. K., Maiti, B. & Sarkar, U. Philicity: a unified treatment of chemical reactivity and selectivity. J. Phys. Chem. A 107, 4973–4975 (2003).
Tomberg, A., Johansson, M. J. & Norrby, P. O. A predictive tool for electrophilic aromatic substitutions using machine learning. J. Org. Chem. 84, 4695–4703 (2019).
Kromann, J. C., Jensen, J. H., Kruszyk, M., Jessing, M. & Jørgensen, M. Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions. Chem. Sci. 9, 660–665 (2018).
Struble, T. J., Coley, C. W. & Jensen, K. F. Multitask prediction of site selectivity in aromatic C–H functionalization reactions. React. Chem. Eng. 5, 896–902 (2020).
Chambers, J. et al. UniChem: a unified chemical structure crossreferencing and identifier tracking system. J. Cheminform. 5, 1–9 (2013).
Grimme, S. A general quantum mechanically derived force field (QMDFF) for molecules and condensed phase simulations. J. Chem. Theory Comput. 10, 4497–4514 (2014).
Bannwarth, C., Ehlert, S. & Grimme, S. GFN2XTB  an accurate and broadly parametrized selfconsistent tightbinding quantum chemical method with multipole electrostatics and densitydependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).
Landrum, G. RDkit: Opensource Cheminformatics https://www.rdkit.org/ (2021).
Neese, F. The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2, 73–78 (2012).
Glendening, E. D. et al. 7.0 (Theoretical Chemistry Institute, University of Wisconsin, Madison 2018).
Davies, M. et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 43, W612–W620 (2015).
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. https://doi.org/10.1093/nar/gky1075 (2019).
Brandenburg, J. G., Bannwarth, C., Hansen, A. & Grimme, S. B973c: a revised lowcost variant of the B97D density functional method. J. Chem. Phys. 148, 064104 (2018).
Loshchilov, I. & Hutter, F. Fixing weight decay regularization in Adam. Preprint at https://arxiv.org/abs/1711.05101 (2017).
Paszke, A. et al. Pytorch: an imperative style, highperformance deep learning library. In: Advances in Neural Information Processing Systems 8026–8037 (2019).
Sfiligoi, I. et al. The pilot way to grid resources using GlideinWMS. In: 2009 WRI World Congress on Computer Science and Information Engineering, CSIE 2009, IEEE Vol. 2, 428–432. https://doi.org/10.1109/CSIE.2009.950 (2009).
Pordes, R. et al. The open science grid. J. Phys. 78, 012057 (2007).
Acknowledgements
O.I. acknowledges support from NSF CHE1802789 and CHE2041108. This work was performed, in part, at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated for the U.S. Department of Energy (DOE) Office of Science. The authors acknowledge Extreme Science and Engineering Discovery Environment (XSEDE) award CHE200122, which is supported by NSF grant number ACI1053575. This research is part of the Frontera computing project at the Texas Advanced Computing Center. Frontera is made possible by the National Science Foundation award OAC1818253. This research in part was done using resources provided by the Open Science Grid^{74,75}, which is supported by the award 1148698, and the U.S. DOE Office of Science. We gratefully acknowledge the support and hardware donation from NVIDIA Corporation and express our special gratitude to Jonathan Lefman. The work at Los Alamos National Laboratory (LANL) was supported by the Laboratory Directed Research and Development (LDRD) program and was done in part at the Center for Nonlinear Studies (CNLS) and the Center for Integrated Nanotechnologies (CINT), a U.S. Department of Energy and Office of Basic Energy Sciences user facility, at LANL. J.S.S., R.Z., and O.I. thank CNLS and CINT for their support and hospitality.
Author information
Authors and Affiliations
Contributions
R.Z., S.T., and O.I. conceived the idea. R.Z. carried out the implementation with input from J.S. and B.N., R.Z., J.S., and B.N. run Q.M. calculations. R.Z. and O.I. wrote the manuscript. All authors provided critical feedback and helped shape the research, analysis, and manuscript. S.T. and O.I. supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zubatyuk, R., Smith, J.S., Nebgen, B.T. et al. Teaching a neural network to attach and detach electrons from molecules. Nat Commun 12, 4870 (2021). https://doi.org/10.1038/s41467021249040
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467021249040
This article is cited by

SPICE, A Dataset of Druglike Molecules and Peptides for Training Machine Learning Potentials
Scientific Data (2023)

Extending machine learning beyond interatomic potentials for predicting molecular properties
Nature Reviews Chemistry (2022)

Artificial intelligenceenhanced quantum chemical method with broad applicability
Nature Communications (2021)

SpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects
Nature Communications (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.