When a glass-forming liquid is cooled rapidly, its viscosity increases dramatically and it eventually transforms into an amorphous solid, called a glass, whose physical properties are profoundly different from those of ordered crystalline solids1. At even lower temperature, around 1 K, the specific heat of a disordered solid is much larger than that of its crystalline counterpart as it scales linearly rather than cubically with temperature. Similarly, the temperature evolution of the thermal conductivity in glasses is quadratic, rather than cubic2,3,4,5,6,7,8,9,10,11. A theoretical framework rationalizing such anomalous behavior was provided by Anderson, Halperin, and Varma12 and by Phillips13,14. They argued that the energy landscape of amorphous solids contains many nearly-degenerate minima, connected by the localized motion of a few atoms, that can act as tunneling defects, called two-level systems (TLS)15,16,17,18,19. Since then, localized structural defects have been understood to play a crucial role in many other glass properties20. Understanding the microscopic origin of such localized defects and how to control their density and physical properties is a major goal not only to improve our fundamental understanding of amorphous solids, but also for technological applications, such as optimizing the performance of certain quantum devices21,22.

The development of particle-swap computer algorithms23,24 has allowed the creation of computer glasses at unprecedentedly low temperatures. Combined with potential energy landscape exploration algorithms25,26,27,28,29,30,31,32,33,34, this provides a powerful method to investigate the nature of defects in materials prepared under conditions comparable to experimental studies35,36. These tools have enabled direct numerical observation of TLS35,37, confirming the experimental result7,8,10,11,38 that the density of tunneling defects is strongly depleted as the kinetic stability of a glass increases. Similar results have been obtained for a different kind of defect, namely soft vibrational modes39. The direct detection of TLS revealed some of their microscopic features, namely that fewer atoms participate in the TLS of stable glasses35. It was also shown that TLS is not in a one-to-one correspondence with soft harmonic35,36,40 or localized36,41 modes.

The main limitation of the direct landscape exploration method is its large computational cost, making it hard to construct the large library of defects needed for a robust statistical analysis of their physical properties. After accumulating a large number of inherent structures (IS), one must run a computationally expensive algorithm to find the minimum energy path connecting pairs of IS in order to determine if the pair forms a proper defect (e.g., to form a TLS, a defect must have quantum energy splitting within thermal energy at 1 K). The very large number of IS pairs detected makes it is impossible to characterize all of them. In previous works, some ad hoc filtering rules were introduced in order to identify candidate TLS and focus computational effort on them35,36,37 but the success rate of such filters is poor. A considerable computational effort, which consisted in sampling ~108 IS, resulted in the direct detection of about 60 TLS. It is then obvious that most of the computational effort has been wasted in the study of pairs that form defects that do not tunnel at low temperatures. Looking for TLS is akin to finding the proverbial needle in a haystack.

In this paper, we demonstrate the relevance of machine learning techniques to predict with enhanced accuracy whether a pair of inherent structures forms a defect of a certain type. Recently, machine learning (ML) was shown to be extremely effective in using structural indicators to predict structural, dynamical, or mechanical properties of glassy solids20,42,43,44,45,46,47,48,49,50. Here, we use supervised learning to streamline the identification of defects. We focus in particular on the classical energy barrier and the quantum splitting associated with defects, which are relevant to identify TLS. Our study has two goals: (i) develop a faster way to identify TLS compared to the standard approach described below35,37 in order to collect a statistically significant number of tunneling defects; (ii) determine the structural and dynamical features characterizing TLS as well as their evolution with glass preparation. To address (i) we show that our machine learning model can be trained in a few minutes using a small quantity of data, after which the model is able to identify candidate TLS with high speed and accuracy. To address (ii) we determine which static features are the most important for the model prediction. We show that TLS are not necessarily pairs of IS explored consecutively in the dynamics. We conclude by explaining how the ML model distinguishes TLS from non-TLS and how it is able to identify glasses prepared at different temperatures. While here we mostly focus on TLS, which is the rarest defect in glasses, our method should easily apply to other problems, such as supercooled liquid dynamics, plasticity, or devitrification of glassy solids.


In the following, we focus on the concreteness of the problem of detecting rare tunneling TLS. We also demonstrate that our method can successfully predict the classical energy barrier between two energy minima, with applications to the efficient detection of other kinds of defects.

Machine-learning approach

The standard procedure32,33,35 to identify TLS is sketched in Fig. 1a. The following steps aim at identifying potential candidates for TLS (see the Methods section for details): (1) Equilibrate the system at the preparation temperature Tf. Glasses with lower Tf have increased glass stability. (2) Run molecular dynamics to sample configurations along a dynamical trajectory at the exploration temperature T < Tf. (3) Perform energy minimization from the sampled configurations to produce a time series of energy minima, or inherent structures (IS). (4) Analyze the number of transitions recorded between pairs of IS in the dynamical exploration of step 2, and select the pairs of IS that are explored consecutively.

Fig. 1: Numerical search for two-level systems.
figure 1

a Exploring the potential energy landscape: different glass samples define different metabasins in the rough landscape. (Inset) Each glass metabasin is explored via molecular dynamics simulations (black arrow) during which frequent energy minimization (dashed arrows) generates a large number of inherent structures (IS). Previous works restricted the search for candidate defects only to pairs of IS explored consecutively in the dynamics. b Our machine-learning approach instead considers all IS pairs, irrespective of the dynamical exploration, and rapidly provides a robust prediction for their properties, such as quantum splitting. The candidates selected by the ML model are then analyzed via a minimum energy path-finding protocol (NEB algorithm) and their properties are computed exactly and compared with the ML prediction.

Step 4 was necessary because it is computationally impossible to analyze all pairs of IS, as the number of pairs scales quadratically with the number of minima. The filter defined in step 4 was physically motivated by the fact that TLS tend to originate from IS that are not too distant in order to have a reasonable tunneling probability. As such it is likely that those pairs of IS get explored one after the other during the exploration dynamics in step 2. Overall, given NIS inherent structures, this procedure selects for \({{{{{{{\mathcal{O}}}}}}}}({N}_{IS})\) pairs to be analyzed.

Once potential candidates are selected, the procedure continues as follows: (5) For each selected pair of IS, look for the minimum energy path and the classical barrier between them by running a minimum energy path-finding algorithm, such as the nudge elastic band (NEB) algorithm51,52,53. This provides the value of the potential energy V(ξ) along the minimum energy path between the pair, where 0 ≤ ξ ≤ 1 is the reaction coordinate. (6) Select pairs whose energy profile V(ξ) has the form of a double well (DW), i.e. exclude paths with multiple wells. (7) Solve the one-dimensional Schrödinger equation:

$$-\frac{{\hslash }^{2}}{2m{d}^{2}\epsilon }{\partial }_{\xi }^{2}\Psi (\xi )+V(\xi )\Psi (\xi )={{{{{{{\mathcal{E}}}}}}}}\Psi (\xi ),$$

where ξ is a normalized distance along the reaction path ξ = x/d and energy is normalized by a Lennard-Jones energy scale ϵ, the effective mass m and the distance d are calculated as in ref. 35. We obtain the quantum splitting (QS) \({E}_{qs}={{{{{{{{\mathcal{E}}}}}}}}}_{2}-{{{{{{{{\mathcal{E}}}}}}}}}_{1}\) from the first two energy levels \({{{{{{{{\mathcal{E}}}}}}}}}_{1}\) and \({{{{{{{{\mathcal{E}}}}}}}}}_{2}\). The quantum splitting is the most relevant parameter for TLS because when Eqs ~ T the system can transition from one state-to the other via quantum tunneling13. In particular, since we choose to report the data in units that correspond to Argon35, a double well excitation will be an active TLS at T=1 K when Eqs < 0.0015ϵ, where ϵ sets the energy scale of the pair interactions in the simulated model.

Overall, since at low temperature the landscape exploration dynamics is slow, one would like to spend most of the computational time on steps 2–3 to construct a large library of pairs of IS. A first problem is that when the library of IS grows larger it takes a lot of time to perform steps 5–7. Moreover, the main bottleneck lies in the fact that most of the pairs that go through the full procedure turn out not to be TLS. The large computational time dedicated to steps 5–7 is thus wasted. Furthermore, many pairs of IS can be close but not sampled consecutively during the dynamics, owing to the high-dimensional nature of the potential energy landscape.

We now introduce our machine learning (ML) approach to the problem, whose main advantage is to consider all pairs of IS as TLS candidates. As shown below, our approach can detect TLS which are otherwise excluded in step 4. We distinguish two phases: training and deployment. Our supervised training approach, detailed in the Methods section and sketched in Fig. 2, takes just a few hours of training on a single CPU. It requires an initial dataset of \({{{{{{{\mathcal{O}}}}}}}}(1{0}^{4})\) full NEB calculations, whose collection is the most time-consuming part of the training phase. Once training is complete, the ML model can be deployed to identify new TLS.

Fig. 2: Flowchart of the machine learning model.
figure 2

The dataset is constructed by comparing all the pairs of inherent structures (IS), focusing on the M atoms that displace the most between two IS (highlighted by colors: bright, resp. dark, indicate small, resp. large particle radius). Specific features are extracted to construct the input vector X. We then train a classifier to predict whether a pair of inherent structures forms a double well potential (DW) or not. The DW is finally processed using a multi-layer stacking strategy to predict the quantum splitting of the double well potential. Our pipeline analyses a given pair of IS in ~10−4s.

Its workflow is similar to the standard one, with some major improvements. It proceeds with the following steps: (1)–(3) The first 3 steps are similar to the standard procedure to obtain a collection of inherent structures from a dynamical exploration. (4) Apply the ML model to all possible pairs of IS to predict which pairs form a DW potential. (5) Apply the ML model to predict the quantum splitting (QS) for all predicted DW and filter out the configurations that are not predicted to be TLS by the ML model. (6)–(8) For the pairs predicted to be TLS by the ML model only, run the NEB algorithm and select the pairs that form a DW potential. Solve the one-dimensional Schrödinger equation in order to obtain the exact value of the quantum splitting.

In the Methods section, we provide details on how steps 1–3 are performed: glass preparation, exploration of the potential energy landscape via molecular dynamics simulations and minimization procedure, as well as NEB computation. We also explain how it is possible to use steps 4-5 as a single shot or as an iterative training approach, see Methods Sec. IV H.

Importantly, the well-trained ML model has two significant advantages over the standard approach. First, \({{{{{{{\mathcal{O}}}}}}}}({N}_{IS}^{2})\) pairs of IS are scanned to identify TLS, compared to a much smaller number \({{{{{{{\mathcal{O}}}}}}}}({N}_{IS})\) in the standard procedure. Second, if a pair of IS passes step 5 and goes through the full procedure it is very likely to be a real TLS. As a consequence, by using the ML approach one can spend more time doing steps 2–3 to produce new IS, since fewer pairs pass step 5. At the same time, for any given number of IS, the ML approach can analyze all possible pairs and is, therefore, able to identify many more TLS, as we demonstrate below.

Quality of the machine learning prediction

In refs. 35,36, the authors analyze a library of 14,202, 23,535, and 117,370 pairs of inherent structures for a continuously polydisperse system of soft repulsive particles, equilibrated at reduced temperatures Tf = 0.062, 0.07, and 0.092, respectively. The standard approach described in Sec. II A leads to the identification of 61, 291, and 1008 TLS for the three temperatures, respectively. Note that this approach uses pairs of IS that are selected by the dynamical information contained in the transition matrix between pairs of IS35. This was done to filter out all non-DW potentials. For all pairs in this small subset, the quantum splitting was then calculated.

Instead, the ML approach starts by independently evaluating the relevant information contained in each IS and constructs all possible combinations, even for pairs that are not dynamically connected in the landscape exploration. Following the steps discussed in Sec. II A the model is then able to predict which of all pairs form a DW, as well as the value of their quantum splitting, very accurately. From a quantitative perspective, this means that the same dynamical trajectories now contain many more TLS candidates in the ML approach compared to the standard approach.

In this section we briefly describe the flowchart of the model summarized in Fig. 2. A detailed description of the machine learning model is provided in the Methods section. First, for all the available IS, we evaluate a set of static quantities that we use to construct the input features for each pair of IS. By convention, we label the IS with α = 1, . . . , NIS by increasing potential energy \({E}_{1} < ... < \, {E}_{{N}_{IS}}\) where Eα is the potential energy of ISα. We use the convention that α < β. The input feature for a pair αβ of IS consists in the potential energy difference ΔE = Eβ − Eα between the two minima, the displacements \(\Delta {\overrightarrow{r}}_{i}\) of the M particle which displace the most between the two IS (labeled with i = 1, . . . , M by decreasing displacement), as well as the relative positions between those M particles, the total displacement of particles d, and participation ratio PR, all computed by comparing the two IS. We also use as input, the number of transitions recorded in the dynamical exploration Tαβ (from the lowest energy IS to the highest, in our convention) and Tβα (from highest to lowest IS). See Methods Sec. IV E for more details. We then apply it in series two model ensembles. Model ensembling consists in averaging the output of different ML models to achieve better predictions compared to each of them separately. The first ensemble is trained to classify DW (Methods, Sec. IV F), which is a necessary condition for TLS. A DW is defined when the minimum energy path between the two IS resembles a quartic potential, as sketched in Fig. 1b. For the pairs that are predicted to be DW, a second model ensemble (Methods, Sec. IV G) is used to predict the quantum splitting (QS), which determines if the pair is a TLS or not.

To showcase the performance of the ML model we report in Fig. 3a–c the exact QS calculated from the NEB procedure, compared with the value predicted by the model. We see that the data concentrates around the diagonal, indicating good correlation between true and predicted values. The Pearson correlation reported in the figure provides a quantitative measure for the correlation. The train/test split is performed by randomly selecting 10% of the pairs to be used only for the evaluation. We have trained three independent models to work at the three different glass preparation temperatures. As explained in the Methods (Sec. IV E), the model needs the information about the MN particles that displace the most between the two IS only to achieve the excellent precision demonstrated in Fig. 3. In the Supplementary Fig. 2 we show that the optimal value is M = 3, i.e., information on only three particles is needed for the model to identify TLS, confirming the low participation ratio in TLS. Furthermore, the models have been trained using the smallest number of samples, randomly selected from all the IS pairs available, that allow the model to reach its top performance. We have also performed an analysis of the optimal training time. Details on these points are provided in Supplementary Note 1. The performances presented in Fig. 3 are achieved by training the model for ~10 hours of single CPU time, but we also show in Supplementary Note 1 that it is possible to already achieve >90% of this performance by training the ensemble for only 10 minutes.

Fig. 3: Machine-learning prediction for the quantum splitting and classical energy barrier between pairs of inherent structures (IS).
figure 3

ac Quantum splitting and df energy barrier predicted by the ML model compared to the exact value. The model was not trained on these IS pairs. Glass stability decreases from left to right: glasses are equilibrated at (a, d) Tf = 0.062, (b, e) Tf = 0.07, and (c, f) Tf = 0.092. The ML model was trained on 7000, 10,000, and 30,000 samples respectively, using the information on the M = 3 particles with the largest displacements. All models were trained for ~10 hours of single CPU time.

The ML approach introduced here is also easily generalizable to target other quantities related to state-to-state transition, such as excitations and higher energy effects. We modified the quantum splitting predictor to instead predict the classical energy barrier between two IS states. If the minimum energy path between two IS forms a DW, we define the classical energy barrier as the energy difference between the saddle point and the lowest energy minimum. In Fig. 3d–f, we report the value of the energy barrier predicted by the ML model compared to the exact value calculated from the NEB procedure. We use the same hyperparameters and features as the quantum splitting predictor. Such a high performance demonstrates that our ML approach can predict other types of transitions between states, associated with distinct kinds of defects.

Capturing elusive TLS with machine learning

We now use ML in order to speed up the TLS search. This highly efficient method allows us to collect a library of TLS of unprecedented size, generated from numerical simulations with the same interaction potential as in ref. 35, see Methods for its definition. First of all, we reprocess the data produced to obtain the results presented in ref. 35 with our new ML method based on iterative training (Methods, Sec. IV H), obtaining new information about the connection between TLS and dynamics. Next, we perform ML-guided exploration to collect as many TLS as possible. This sizable library of TLS allows us to perform for the first time a detailed statistical analysis of TLS and compare their distribution to the distribution of double wells. We perform this analysis for glasses of three different stabilities. Finally, we discuss the microscopic features of TLS not only by looking at their statistics, but also by analyzing what the ML model has learned, and how it expresses its predictions.

Prior to this paper, it was not possible to evaluate all the IS pairs collected in ref. 35. For this reason, the authors introduced a filtering rule based on the assumption that high transition rates during the dynamic landscape exploration are a good indicator that the minimum transition path between two IS forms a double well. Therefore, ref. 35 discarded all pairs αβ of IS such that the number of jumps Tαβ (from low to high energy IS) and Tβα (high to low) during the MD exploration is smaller than four, i.e., \(\min \big({T}_{\alpha \beta },{T}_{\beta \alpha }\big) < 4\). This reduced the number of pairs to 14,202, 21,109, and 117,339 for glasses prepared at Tf = 0.062, 0.07, and 0.092, respectively. In order to have comparable data at the three temperatures, for Tf = 0.092 we only consider a subset of glasses corresponding to 30920 IS pairs.

The results of the TLS search are summarized in the red columns of Tab. 1. Overall, the standard procedure reaches a rate of TLS found per NEB calculation of 4 × 10−3, 13 × 10−3, and 8 × 10−3 with increasing Tf. We compare these numbers with those obtained with our iterative training procedure applied to the same data, see green columns of Tab. 1. We immediately notice two major improvements. First, the overall number of TLS that we find from the same dataset is more than twice larger. Second, the ratio of TLS per NEB calculation is more than 15 times larger, corresponding to 62 × 10−3, 211 × 10−3, and 194 × 10−3 with increasing Tf.

Table 1 Comparison of the standard procedure with our iterative training approach

We conclude that the iterative ML approach is much more efficient than the standard procedure, and also that TLS does not necessarily have a large dynamical transition rate, since the dynamical-filtering approach discards more than half of them.

Differences between DW and TLS

With our ML-driven exploration of the energy landscape we can focus the numerical effort on DW and favorable TLS candidates, while processing a larger number and/or longer exploration trajectories. This allows us to consider a larger set of independent glasses of the same type as those treated in ref. 35, which is particularly relevant for ultrastable glasses generated at the lowest temperature Tf  = 0.062. While in ref. 35 the collection of 61 TLS required more than 14,000 NEB calculations, we are able to identify 872 TLS running 11 iterations of iterative training using only a total of 5500 NEB calculations in addition to the ~6000 used for pretraining. In the next section we analyze these results to discuss the nature of TLS.

The database of glasses that we analyze with iterative training contains 5 times more IS than in ref. 35, and we find up to 15 times more TLS by running around half of the NEB calculations. We report in Fig. 4 the results from this extended TLS search. In Fig. 4a, we report predicted and true values for the quantum splitting, with a background color coding for the confusion matrix. The threshold is set by the fact that TLS have Eqs < 0.0015ϵ. The number of data points in each quadrant is reported in the inset. The horizontal dashed lines highlight the percentage of true TLS detected by running the NEB algorithm for all points with a predicted quantum splitting below the line. Due to false negatives, it is better to also consider transitions slightly above the TLS threshold. We see that all TLS are identified by considering only the pairs that are predicted to be within twice the quantum splitting threshold of TLS. All TLS thus are safely detected by running 2484 NEB calculations, out of 4147 DW in total. In Fig. 4b, we report the cumulative density of TLS quantum splitting n(Eqs), which according to the TLS model scales as n(Eqs) ~ n0Eqs at low Eqs12,14. We show indeed that n(Eqs)/Eqs converges to a plateau n0 for small Eqs. We have recorded n0 = 0.67, 4.47 and 25.14 in units of ϵ−1σ−3. This is approximately 1023, 1024 and 1025 eV−1 cm−3 in Ar units. In refs. 35,37, we discuss the comparison between numerical and experimental TLS densities. The ML approach allows us to collect significantly better statistics compared to ref. 35, confirming that the TLS density n0 decreases by several orders of magnitude from hyperquenched to ultrastable glasses. Lastly, in Fig. 4c, we report the histograms of the number of TLS and DW per glass at the three temperatures. We see that when the glasses are ultrastable (Tf = 0.062) most of the glasses have very few TLS. Conversely, poorly annealed (Tf = 0.092) glasses show a very unbalanced distribution, with a few glasses that contain most of the DW and TLS.

Fig. 4: The machine-learning-driven exploration identifies an unprecedentedly large number of two-level systems.
figure 4

a We compare the model predictions to the calculated values at the end of our iterative training, for stable glasses Tf = 0.062. The confusion matrix is color-coded in the background and reported on the bottom right of a. Horizontal dashed lines report the percentage of TLS that is predicted below that value of quantum splitting, showing that >95% of the TLS are within twice the TLS threshold of 0.0015ϵ. b Cumulative distribution of quantum splitting n(Eqs) divided by Eqs. c Histograms of the number of TLS and DW per glass, at the three preparation temperatures Tf = 0.092, 0.07, and 0.062 from top to bottom. We have considered 5, 30, and 237 glasses, respectively. Results are reported for Ar units.

Interpretation of the ML model

The ML model contains precious information about the distinctive structure of TLS. First, the present and previous works7,8,10,11,35,37,38 find that the density of TLS decreases upon increasing glass stability, which in our simulations is controlled by the preparation temperature Tf. Thus, one may also expect temperature-dependent TLS features. In the Supplementary Note 4 we show that when the ML model is trained on glasses prepared at Ttrain and deployed on glasses prepared at Tprediction ≠ Ttrain there is only a minor performance drop and the model is able to perform reasonably well. This implies that the model captures distinctive signatures of TLS that do not depend strongly on the preparation temperature. Yet, we also show in the Supplementary Note 4 that it is very easy to train another ML model to predict the temperature itself and eventually add it to the pipeline.

Overall, the ML model is not only able to capture the different microscopic features of TLS, but it can also suggest what the specific influence of each feature is. To interpret this information we calculate Shapley values54 for each input feature and report them in Fig. 5 for the quantum splitting predictor (a) and the double well classifier (b). The features are ranked from the most important (top) to the less important (bottom). We first discuss the quantum splitting predictor Fig. 5a. The horizontal axis reports their impact (SHAP value) on the model output: large positive SHAP values predict on average a high value of the quantum splitting (QS). The data points are colored following the value of the feature itself. The most important feature is the classical energy splitting ΔE corresponding to the potential energy difference between the two IS. In our model, a large value of classical splitting ΔE (red) implies a large QS, i.e., non-TLS transitions. The second most important feature is the largest single particle displacement \(\Delta {\overrightarrow{r}}_{1}\), which has to be larger than a threshold corresponding to 0.3σ in order to predict a TLS (low SHAP, hence low QS). The total displacement d is the third most important and shows a similar effect. All the remaining features have a less clear and much smaller effect on the model prediction and they only collaborate collectively to the final QS prediction. Details on features definition are provided in the Methods. In the Supplementary Note 2 we show that it is possible to obtain very good performance even when removing some of the features with the largest Shapley values, which means that the ML interpretation is not unique.

Fig. 5: Determining important features to predict quantum splittings and classify double well potentials.
figure 5

Importance of the different features for the ML models a predicting the quantum splitting (QS), and b classifying double wells, both at Tf = 0.062. The features include the single particle displacements \(\Delta {\overrightarrow{r}}_{i}\) (which decrease with increasing label i), their relative position compared to the most displacing particle \(|{\overrightarrow{r}}_{1}-{\overrightarrow{r}}_{i}|\) and the number of recorded transitions from the lowest to highest energy IS in the dynamic exploration Tαβ (and Tβα from high to low-energy IS). The features are ordered from top to bottom by decreasing importance. Each point corresponds to a single IS pair, with a color coding for the feature value (red: high, green: low). The points are spread vertically for readability. The feature impact on the model output is given on the horizontal axis: large SHAP values predict large QS values, low SHAP values predict low QS (more likely to be a TLS).

According to this Shapley analysis we explain the ML prediction in the following way: the energy difference ΔE between two IS is the main predictor for the quantum splitting, and it has to be small for TLS. Then, the largest particle displacement \(\Delta {\overrightarrow{r}}_{1}\) is necessary to understand if the two IS are similar and what is their stability (we show in the Supplementary Note 2 that \(\Delta {\overrightarrow{r}}_{1}\) is the most important feature to identify the glass stability). Then the total displacement d complements this information and gives local information about the displacements of the other particles. Lastly, all the other inputs provide fine tuning to refine the final prediction and are discussed in more detail in the Supplementary Note 2. Interestingly, in the Supplementary Note 2 we also show that even without the two most important features, the ML approach can still identify TLS candidates reasonably well.

Microscopic features of TLS

We have shown that by following a ML-driven approach it is possible to collect a significant library of TLS for any preparation temperature. However it may be useful to discuss alternative strategies to rapidly identify TLS. In general, since TLS are extremely rare defects7,8,10,11,38 a filtering rule is necessary in order to reduce the number of possible candidates. In particular, ref. 35,36,37 proposed to use the number of transitions recorded between two IS during the MD exploration, and to exclude pairs that are not explored consecutively. This is based on the assumption that DW (and consequently TLS) correspond to IS pairs that are close to each other and should thus be visited consecutively in the dynamics (non-zero number of recorded transitions).

Instead, we prove in Fig. 6 that a filter based on dynamical information only is a poor predictor. In Fig. 6a, we report the distribution of the number dynamical transitions recorded between two inherent structures α and β, Tαβ + Tβα (both from low to high and high to low-energy IS). We report three curves measured for TLS, DW, and all pairs, measured at Tf = 0.062. While the slowly decaying tail of TLS and DW suggests that they often exhibit a large transition rate, actually most TLS and DW are formed by IS with no recorded transitions between them in the dynamic exploration.

Fig. 6: Microscopic features of two-level systems and double-well potentials in ultrastable glasses.
figure 6

Probability distributions of a number of recorded transitions between the two inherent structures Tαβ + Tβα, b classical energy splitting ΔE, c largest particle displacement \(\Delta {\overrightarrow{r}}_{1}\), d total displacement d and eWKB off-diagonal element Δ0. We color-coded in red the regions of parameters where we do not expect to find TLS, which instead concentrate in the green regions. The points in the yellow regions could be any of TLS, DW, or non-DW. The green regions could serve as an alternative way to rapidly identify TLS. Data for stable glasses created at Tf = 0.062.

Our interpretation is that even though the transition from one IS to the other is favorable, the landscape has such a large dimensionality (3N) that even very favorable transitions may never take place in a finite exploration time. This issue can become more severe when the trajectories are shorter, for example if the exploration is performed in parallel. We confirm this observation with the results reported in Tab. 1, where we have used our iterative training approach to re-analyze the data of ref. 35, including pairs with no recorded transition, and found many more TLS.

We conclude that even though the number of recorded transitions is the most important feature to predict which pair forms a double well, as seen in Fig. 5b, a filter based solely on them still misses many pairs of interest and therefore is not the most efficient.

In Fig. 6b, we focus on the distribution of the classical splitting ΔE, or energy difference between the two IS. When ΔE is large, the transition path between IS rarely forms a DW, or a TLS (red region). On the other hand, there are many pairs with a very small ΔE which are not necessarily more likely to be DW or TLS, hence the yellow region (could be any of TLS, DW, or non-DW). Ultimately we find a ‘sweet spot’ (green region), where TLS are more frequent. The ML model also captures this feature, as seen from the SHAP parameter of ΔE in Fig. 5a. The next most important feature according to the ML model is the largest particle displacement \(\Delta {\overrightarrow{r}}_{1}\), reported in Fig. 6c. When it is larger than ~0.8σ we rarely find TLS and DW, but we do not find them also when \(\Delta {\overrightarrow{r}}_{0} < 0.3\sigma\). The second row in Fig. 5a confirms that the ML model has discovered this feature. In Fig. 6d, we report the total displacement d. If d > 0.9σ the pair is so different that it is not likely to be a TLS or DW, while this probability increases for smaller d. In Fig. 6e, we report the distribution of off-diagonal elements Δ0, measured using the WKB approximation as explained in ref. 35. We find that the distribution obtained from TLS and DW scales as 1/Δ0, in good agreement with the standard TLS model12.

Finally, if one is interested in identifying TLS in a ‘quick and dirty’ way, we propose to use the number of recorded transitions to filter DW from non-DW, and then to select a sweet spot for the classical energy splitting and the displacements for selecting optimal TLS candidates.


In this paper we have introduced a machine-learning approach to explore complex energy landscapes, with the goal of efficiently locating double wells (DW) and two-level systems (TLS). We demonstrate that it is possible to use ML to rapidly estimate the quantum splitting of a pair of inherent structures (IS) and accurately predict if a DW is a TLS or not. We also show that our ML approach can be used to predict very accurately the energy barrier between pairs of IS, which would be useful to analyze supercooled liquid dynamics, or for the response to a mechanical deformation. Overall, this approach allows us to collect a library of defects of unprecedented size that would be impossible to obtain without the support of ML.

The ML model uses as input the information calculated from a pair of inherent structures. After just a few minutes of supervised training it is able to infer with high accuracy the quantum splitting of any new pair of inherent structures. We establish that our ML model based on model ensembling and gradient boosting is fast and precise. Its efficiency allows us to introduce an iterative training procedure, where we perform a small batch of predictions and then retrain the model.

After performing statistical analysis over the unprecedented number of TLS collected with our method, we have discovered that many DW and TLS are not consecutively visited during the dynamical exploration. We reanalyzed the data collected for the study of ref. 35 and found that more than half of the TLS were missed, because the corresponding IS were not visited consecutively and the pair was consequently discarded. Our ML approach not only finds more than twice the number of TLS from the same data, but it also requires significantly fewer calculations. We conclude that ML significantly improves the existing approaches. The ML method allows us to propose a ‘quick and dirty’ way to predict TLS: a) use Tαβ, Tβα for predicting DW; b) for those which are DW, use the classical energy splitting between the two IS to predict which are TLS.

We also discuss the microscopic nature of DW and TLS. We perform a Shapley analysis to dissect the ML model and understand what it learns, and we compare this with the extended statistics of TLS that we are able to collect. We find that the quantum splitting is mostly related to the classical energy splitting and the displacements of the particles. Overall, the Shapley analysis suggests that TLS are characterized by one particle that displaces between 0.3 and 0.9 of its size, while the total displacement and the energy difference between the two states remains small. The local structure around the particle is not as important, nor is the number of times we actually see this transition during the exploration dynamics.

Lastly we investigate the effect that glass stability (equivalent to the preparation temperature in our simulations) has on double wells and TLS. The ML model learns that at higher temperatures all the pairs are characterized by more collective rearrangements, but TLS are similar for any preparation temperature.

Ultimately, since our ML approach is extremely efficient in exploring the energy landscape and is easy to generalize to target any type of state-to-state transition (as we show for the energy barriers), we hope that our method will be used in the future to analyze not only TLS, but also many other examples of phenomena related to specific transitions between states in complex classical and quantum settings.


Glass-forming model

We study a three-dimensional polydisperse mixture of N = 1500 particles of equal mass m. The particle diameters σi are drawn from the normalized distribution P(0.73 < σ < 1.62)  1/σ3. Two particles i and j separated by a distance rij interact via the repulsive pair potential

$$v({r}_{ij})/\epsilon={({\sigma }_{ij}/{r}_{ij})}^{12}+{c}_{0}+{c}_{2}{({r}_{ij}/{\sigma }_{ij})}^{2}+{c}_{4}{({r}_{ij}/{\sigma }_{ij})}^{4},$$

only if rij ≤ 1.25σij, with the non-additive interaction σij = 0.5(σi + σj)(1−0.2σiσj). The polynomial coefficients c0, c2, c4 ensure continuity of v and its first two derivatives at the interaction cutoff. We study the system at number density ρ = 1 in a cubic box with periodic boundary conditions. We express energies and lengths in units of ϵ and the average diameter σ, respectively. Times measured during molecular dynamics (MD) simulations are expressed in units of \(\sqrt{m{\sigma }^{2}/\epsilon }\). We make two choices for physical units following past work,35 one corresponds to Ar atoms55: m = 6 × 10−26 kg, ϵ/kB = 478 K, σ = 3.41 × 10−10 m and τ = 1.08 × 10−12 s; the other is for a NiP alloy56: m = 1.02 × 10−25 kg (62Ni isotope), ϵ/kB = 4447 K, σ = 2.21 × 10−10 m and τ = 2.86 × 10−13 s.

Glass sample preparation

We fully equilibrate ng = 5, 50, 200 independent configurations of the liquid at preparation temperatures Tf = 0.092, 0.07, 0.062, respectively. We do so employing the hybrid MD/particle-swap Monte Carlo algorithm described in ref. 24. The algorithm alternates between blocks of 5N attempts of particle-swap Monte Carlo moves and short MD runs of duration tMD = 0.1 to thermalize the liquid efficiently. Glassy samples are then created by rapidly cooling the equilibrium configurations to T = 0.04 using regular MD with a Berendsen thermostat57. The preparation temperature Tf is thus Tool’s fictive temperature58 and characterizes the degree of stability of the glass: glasses prepared at a lower Tf are more stable. We compare these Tf with characteristic temperature scales. The mode-coupling crossover temperature is Td = 0.1, and the extrapolated experimental glass transition temperature, where the relaxation time is 12 decades larger than at the onset of glassy dynamics, is Tg = 0.06724. The lower Tf = 0.062 glasses are ultrastable, while the higher Tf = 0.092 are hyperquenched.

Energy landscape exploration and transition matrix T αβ

We use classical molecular dynamics (MD) to explore the potential energy landscape of the glasses. We run MD simulations at T = 0.04 in the NVE ensemble with a time step dt = 0.01. Configurations along the MD trajectory are quenched to the closest potential energy minimum, or inherent structure (IS), every τquench = 0.2, 0.1, 0.05 (for glasses prepared at Tf = 0.062, 0.07, 0.092, respectively) using the conjugate gradient method. The quench period τquench is chosen such that two consecutive quenches typically reach different IS, separated by one energy barrier. For each ng glass sample we perform 100, 100, 200 MD trajectories starting from different initial velocities, each lasting 40000, 100000, 10000 time steps (low to high Tf). For Tf = 0.092 we used a subset of the data obtained in35.

The transition matrix elements Tαβ count how many times ISα and ISβ are visited consecutively, i.e., ISα is reached at time t, and ISβ at time t + τquench. Overall, Tαβ is a number that counts the number of transitions observed from ISα to ISβ, with no physical dimensions. Since this quantity depends on the specific trajectories collected, it varies for different quenching rates and simulation times. Here, we demonstrate that one advantage of our ML approach over a brute-force approach, as employed in ref. 35, is to consider all IS pairs, not only those visited consecutively in the MD trajectory (characterized by Tαβ > 0), as potential TLS. This expands massively the pool of candidates, while ensuring that computational effort is targeted to IS pairs that are likely to be TLS.

To analyze the transition between two IS we compute the multi-dimensional minimum energy path separating them. This is done by the nudged elastic band (NEB) method51,52 implemented in the LAMMPS package. We use 40 images to interpolate the minimal energy path, that are connected by springs of constant κ = 1.0ϵσ−2, and use the FIRE algorithm in the minimization procedure51,53. The NEB algorithm outputs a one-dimensional potential energy profile V(ξ) defined for the reduced coordinate ξ, between the two minima.

Quantum splitting computation

We extrapolate the potential obtained from the NEB, defined only between the two minima, to obtain a full double well potential. We used a linear extrapolation of the NEB reaction path. Let us denote r1 and r2 the coordinates of the particles in the first two images of the system along the reaction path (r1 is an energy minimum). We extrapolate the potential V starting from r1 and measuring the potential energy of the configuration moving in the direction r1r2. We perform a similar extrapolation at the other minimum.

Once the classical potential V(ξ) is obtained by extrapolation as discussed above, we solve numerically the Schrödinger Eq. (1) using a finite difference method. In general, the Laplacian term should take into account curvature effects along the reaction coordinates, as discussed in ref. 35. For simplicity, we neglect these effects and use the standard second derivative along the reaction coordinate.

Dataset and features construction for ML approach

The first step of our machine learning approach is the evaluation of a set of static quantities for all the available IS. This set consists of: potential energy, particle positions, averaged bond-orientational order parameters59 determined via a Voronoi tessellation, that we denote {q2, q3q12}, and finally particle radii. The cost of this operation scales as the number of available states NIS, but we use these quantities to calculate the features of \(\sim {N}_{IS}^{2}\) pairs. A detailed analysis reported in Supplementary Note 1, shows that the bond-orientational parameters and the particle sizes are not very useful for the ML model. Since their calculation is slower than all the other features, we do not include them in the final version of the ML approach.

To construct the input features for each pair of IS we combine the information of the two states evaluating the following: (1) Energy splitting ΔE: energy difference between the two IS. (2) Displacements \(\Delta {\overrightarrow{r}}_{i}\): displacement vector of particle i between the two configurations. When used in this context index i increases with decreasing displacement \(\Delta {\overrightarrow{r}}_{1}\, > \, \Delta {\overrightarrow{r}}_{2} > ...\). (3) Total displacement d: total distance between the two IS defined as \({d}^{2}={\sum }_{i}|\Delta {\overrightarrow{r}}_{i}{|}^{2}\). Participation ratio PR: defined as \(PR={({d}^{2})}^{2}/({\sum }_{i}|\Delta {\overrightarrow{r}}_{i}{|}^{4})\). (4) Distance from the displacement center \(|{\overrightarrow{r}}_{1}-{\overrightarrow{r}}_{i}|\): we measure the average distance of particle i from the center of displacement \({\overrightarrow{r}}_{1}\), identified as the position of the particle that moves the most. This quantity identifies the typical size of the region of particles that rearrange. (5) Transition matrix Tαβ (resp. Tβα): number of times the dynamics explores consecutively the lowest (resp. highest) then the highest (resp. lowest) energy minimum.

The crucial step of the feature construction is that we can reduce the number of features by considering only the M particles whose displacement is the largest between pairs of IS. We make this assumption because we expect that the low temperature dynamics is characterized by localized rearrangements involving only a small fraction of the particles7,8,10,11,35,38. In Supplementary Note 1, we confirm this assumption by showing that the ML model achieves optimal performances even when M is very small. So, the choice of MN makes the ML model computationally effective without any performance drop.

Double well classifier

A necessary condition in order for a pair of IS to be a TLS is that the transition between the pair forms a double well (DW) potential. A DW is defined when the minimum energy path between the two IS resembles a quartic potential, as sketched in Fig. 1b. The final goal of the ML model is to predict the quantum splitting (QS) of the pair and identify pairs with low QS. The first obstacle in the identification of TLS is that DW represent only a small subgroup of all IS pairs. For instance in ref. 35, only ~0.5% of all the IS pairs are DW at the lowest temperature. It is then mandatory to filter out pairs that are not likely to be a DW.

In the machine learning field there are usually many different models that can be trained to achieve similar performances, with complexity ranging from polynomial regression to deep neural networks. Here, we perform model ensembling and use ensembles both for DW classification and QS prediction. Model ensembling consists in averaging the output of different ML models to achieve better predictions compared to each of them separately. In practice, we use the publicly available AutoGluon library60. In this approach, we train in a few minutes a single-stack ensemble that is able to classify DW with >95% accuracy. In the Supplementary Note 1, we justify this choice of ML model and provide details on performances and hyperparameters.

In particular, we get the best results using ensembles of gradient boosting models61,62, which have proven to be the optimal choice in estimating barrier heights of chemical reactions computed with similar methods63. The gradient boosting approach predicts the probability p(yi) = G(xi) that a pair xi is a DW, where yi = 1 if the pair is a DW and 0 otherwise. It is based on a series of m = 1, …, nWL weak learners Wm that minimize the log-loss score

$$\frac{1}{n}\mathop{\sum }\limits_{i}^{n}{y}_{i}\log \left[p({y}_{i})\right]-(1-{y}_{i})\log \left[1-p({y}_{i})\right],$$

where n is the number of IS pairs. In this approach, each of the weak learners Wm attempts to improve over the result of its predecessor by predicting the residual hm−1(xi) = yi − Wm−1(xi). The final prediction thus becomes

$$p({y}_{i})=\mathop{\sum }\limits_{m=1}^{{n}_{{{{{{{{\rm{WL}}}}}}}}}}{W}_{m}({x}_{i}|{W}_{l < m}).$$

This approach usually outperforms random forests64, where the prediction is just the average over the weak learners \(p({y}_{i})=1/{n}_{{{{{{{{\rm{WL}}}}}}}}}\mathop{\sum }\nolimits_{m=1}^{{n}_{{{{{{{{\rm{WL}}}}}}}}}}{W}_{m}({x}_{i})\). We then build a stack of gradient boosting models such that the final prediction is given by \(p({y}_{i})=1/{n}_{{{{{{{{\rm{GB}}}}}}}}}\mathop{\sum }\nolimits_{k}^{{n}_{{{{{{{{\rm{GB}}}}}}}}}}{c}_{k}{G}_{k}({x}_{i})/{\sum }_{k}{c}_{k}\), which is the weighted average over the nGB models in the ensemble with learnable weights ck. Overall, our DW classifier turns out to be very accurate and rapid, achieving >95% accuracy after only 10 minutes of training. As such, it is convenient to use it to filter out the pairs that do not require the attention of the QS predictor because they cannot be TLS anyway.

Quantum splitting predictor

We want to predict the quantum splitting of a pair of IS for which the features discussed in Sec. IV E have been computed. We need this prediction to be very precise, because we know that a pair can be considered a TLS when Eqs < 0.0015ϵ, but Eqs can vary significantly so errors may be large. In the Supplementary Note 1 we show that models such as deep neural networks and regression are not stable or powerful enough to achieve satisfying results. We thus follow the same strategy introduced for DW classification, by using model ensembling60 and gradient boosting61,62,64,65. Compared to the DW predictor, each model Gk in the ensemble now performs a regression task by predicting Eqs,k = Gk(xi). We then construct a multi-layer stack (schematized in Fig. 2), where the prediction of the first stack \({E}_{qs}^{(0)}=1/{n}_{{{{{{{{\rm{GB}}}}}}}}}\mathop{\sum }\nolimits_{k}^{{n}_{{{{{{{{\rm{GB}}}}}}}}}}{c}_{k}{G}_{k}({x}_{i})/{\sum }_{k}{c}_{k}\) is concatenated to xi and used as input for the following stack. At the same time, we also perform k-fold bagging66, which consists in splitting the data in k subsets used to train k copies of each model with different data. This has shown to be particularly effective in improving the prediction for small datasets60.

In order to train the model we first collect a set of Eqs examples. The size of this training set is discussed in the Supplementary Fig. 2 where we find that the minimum number is around 104. We can use some of the data already collected in previous work in ref. 35 for the training. Moreover, since we are interested in estimating with more precision the lowest values of Eqs we train the model to minimize the following loss function

$${{{{{{{\mathcal{L}}}}}}}}=\frac{\mathop{\sum }\nolimits_{i=1}^{n}{w}_{i}{\left({E}_{qs,{{{{{{{\rm{true}}}}}}}}}-{E}_{qs,{{{{{{{\rm{predicted}}}}}}}}}\right)}^{2}}{n\mathop{\sum }\nolimits_{i=1}^{n}{w}_{i}},$$

which is a weighted mean-squared error. The weights correspond to wi = 1/Eqs,true in order to give more importance to low Eqs values. We thus train our model to provide a very accurate prediction of the value Eqs for any given pair. Once the model is trained it takes only ~10−4s to predict the QS of a new pair (compared to 1 minute to run the standard procedure). If we predict a value Eqs < 0.0015ϵ, then we have identified a TLS much faster.

Iterative training procedure

We finally introduce an approach to optimally employ our ML model to process new data: the iterative training procedure. To produce the results reported in Fig. 3 we trained the model once using a subset of the already available data. This is a natural way to proceed when the goal is to process new data that are very similar to the training set, and the training set is itself large enough. However, since the goal of the proposed ML model is to ultimately drive the landscape exploration and collect new samples, the single-training approach may encounter two types of problems. First, at the beginning there may be not enough data and, second, the findings of the model do not provide any additional feedback.

To solve both problems we introduce the iterative training procedure. The idea of iterative training is to use the predictive power of ML to create and expand its own training set, consequently enhancing its performance by iteratively retraining over the new data. Compared to standard active learning methods, iterative training does not focus on new samples with the highest model uncertainty, but instead it iterates the predictions on the samples below the threshold of interest. Details on the method and parameters are discussed in Supplementary Note 3. In practice, we start from a training set of K0 ~ 103−104 randomly selected pairs to have an initial idea of the relation between input and output. We then use the ML approach outlined in Fig. 2 to predict the Ki = 500 pairs with the lowest QS. For these TLS candidates, we perform the full procedure to calculate the true QS and determine whether the pair is a DW or a TLS. In the Supplementary Fig. 6, we report the result of this procedure when we process a new set of trajectories from the same polydisperse soft sphere potential as in ref. 35. In general the first few iterations of iterative training have a poor performance. In fact, we find that >70% of the first Ki pairs are actually non-DW. After collecting additional Ki measurements, we retrain the model. We report in Tab. 2 the average time for each step of the ML procedure. The retraining can be done in ~10 min, after which the model is able to predict the next Ki pairs with lowest QS. Overall, to process NISpairs we estimate that the computational time of the iterative approach is \({t}_{i}=\big[{K}_{0}\cdot 1{0}^{2}+{N}_{iter}\big({K}_{i}\cdot 1{0}^{2}+1{0}^{3}+{N}_{{{{{{{{\rm{ISpairs}}}}}}}}}\cdot 1{0}^{-5}\big)\big]s\). If NISpairs > 109 it is possible to significantly reduce ti by permanently discarding the worst pairs, but this is not needed here. We iterate this procedure Niter times, until the last batch of Ki candidates contains less than 1% of the total number of TLS. We believe that continuing this iterative procedure would lead to the identification of even more TLS/DW, but this is out of the scope of this paper.

Table 2 Computational time needed to perform our ML approach, on an Intel i9-9980HK CPU