Quantum-chemical insights from deep tensor neural networks

Learning from data has led to paradigm shifts in a multitude of disciplines, including web, text and image search, speech recognition, as well as bioinformatics. Can machine learning enable similar breakthroughs in understanding quantum many-body systems? Here we develop an efficient deep learning approach that enables spatially and chemically resolved insights into quantum-mechanical observables of molecular systems. We unify concepts from many-body Hamiltonians with purpose-designed deep tensor neural networks, which leads to size-extensive and uniformly accurate (1 kcal mol−1) predictions in compositional and configurational chemical space for molecules of intermediate size. As an example of chemical relevance, the model reveals a classification of aromatic rings with respect to their stability. Further applications of our model for predicting atomic energies and local chemical potentials in molecules, reliable isomer energies, and molecules with peculiar electronic structure demonstrate the potential of machine learning for revealing insights into complex quantum-chemical systems.

The box spans between the 25% and 75% quantiles, while the whiskers mark the 5% and 95% quantiles.  Supplementary Figure 4: List of 6-membered carbon rings ordered by the sum of energy contributions of the ring atoms. The energy contributions were predicted using the GDB-9 model with three interaction passes trained on 50k reference calculations. Energy contributions are given in kcal mol −1 .  Fig. 1. The deep network can be interpreted as representing a local potential Ω M A (r) created by the atoms of the molecule. Putting a probe atom A with nuclear charge z at a position r described by the distances to the atoms of the molecule d 1 , . . . , d n yields an energy E probe .  -9  25k  28  35  42  50k  55  71  82  100k  110  139  162   Benzene  25k  21  27  32  50k  44  53  61  100k  84  104  121   Toluene  25k  24  27  32  50k  45  55  64  100k  88  108  127   Malonaldehyde  25k  21  25  29  50k  41  52  59  100k  85  106  117   Salicylic acid  25k  22  31  32  50k  44  54  65  100k  91  109  125 All durations in hours. All models were trained using stochastic gradient descent with momentum for 3.000 epochs on an NVIDIA Tesla K40 GPU.

Supplementary Discussion
Performance depending on number of reference calculations and interaction passes Supplementary Figs. 1 and 2 show the dependence of the performance on the number of training examples for the benzene MD data set and GDB-9, respectively. In both learning curves (a), an increase from 1.000 to 10.000 training examples reduces the error drastically while another increase to 100.000 examples yields comparatively small improvement. The error distributions (b) show that models with two and three interaction passes trained on at least 25.000 GDB-9 references calculations predict 95% of the unknown molecules with an error of 3.0 kcal mol −1 or lower. Correspondingly, the same models trained on 25.000 or more MD reference calculations of benzene predict 95% of the unknown benzene configurations with a maximum error lower than 1.3 kcal mol −1 . Beyond a certain number of reference calculations, the models with one interaction pass perform significantly worse in all theses respects. Thus, multiple interaction passes indeed enrich the learned feature representation as demonstrated by the increased predictability of previously unseen molecules.

Relation to convolutional neural networks
In a convolution layer, local filters are applied to local environments, e.g., image patches, extracting features relevant to the classification task. Similarly, local correlations of atoms may be exploited in a chemistry setting. The atom interaction in our model can indeed be regarded as a non-linear generalization of a convolution. In contrast to images however, atoms of molecules are not arranged on a grid. Therefore, the convolution kernels need to be continuous. We define a function C t : R 3 → R B yielding c t i = C t (r i ) at the atom positions. Now, we can rewrite the interactions as with f (r j ) = W cf C t (r j ) + b f 1 , h(x) = tanh(W fc x).
For h being the identity, the sum is equivalent to a discrete convolution of f and g.

Supplementary Methods
Computing an alchemical path with the DTNN The alchemical paths in Supplementary Fig. 5 were generated by gradually moving the atoms as well as interpolating between the initial coefficient vectors for changes of atom types. Given two nuclear charges A, B, the coefficient vector for any charge Z i = α i A + (1 − α)B with 0 ≤ α ≤ 1 is given by Similarly, in order to add or remove atoms, we introduce fading factors β 1 , . . . , β n ∈ [0, 1] for each atom. This way, influences on other atoms as well as energy contributions to the molecular energy E = n i=1 β i E i can be faded out.