Nanoclusters have drawn much attention due to their special physical and chemical properties1,2 which are distinct from molecules or bulk crystal materials. These properties make them useful in diverse research fields including catalysis3,4,5,6, chemical sensing7, fluorescence8,9, and medicine10. The unique properties of nanoclusters are largely the consequence of distinct size-dependent atomic structures, quantum finite-size effects, and very large surface-to-volume ratios11,12. These properties are generally not smooth functions of cluster sizes and can fluctuate with the addition or removal of a single atom13.

Computational screening is a promising way to identify nanoclusters with desirable properties, but to predict the properties of a nanocluster from first principles it is necessary to first identify the low-energy atomic structures of the cluster. Many optimization methods have been proposed to perform global structure searches for nanoclusters, including the basin hopping method14,15, unbiased random sampling16, particle swarm optimization17,18, simulated annealing19,20 and genetic algorithms (GA)21,22,23,24,25. Each of these methods involves the evaluation of the energies of a large number of candidate structures, which makes it critically important to evaluate structure energies with a method that is both fast and sufficiently accurate to distinguish between competing structures. Density functional theory (DFT) provides a high level of accuracy, but its speed and scalability typically limit the search to nanoclusters of sizes up to only several dozen atoms26,27,28,29. Classical interatomic potentials, which typically have simple functional forms derived from fundamental physics, are several orders of magnitude faster than DFT and have been used to search for ground state structures with up to a few hundred atoms14,30,31. However, classical interatomic potentials often lack the accuracy required to resolve the energy differences between competing candidate structures, especially for the low-lying local minima on the potential energy surface (PES) that are often only tens of meVs apart21,32.

In recent years, an alternative type of interatomic potential has emerged in the form of machine-learned interatomic potentials (MLIPs)33,34,35,36,37, which are parameterized by fitting to a set of training data. Examples of MLIPs are neural network potentials38,39,40, Gaussian approximation potentials (GAP)41,42,43, spectral neighbor analysis potentials44,45, moment tensor potentials (MTP)46,47,48, the atomic cluster expansion (ACE)49, and potentials found through symbolic regression50. Although they may be slower than classical interatomic potentials by an order of magnitude or more, MLIPs are generally more accurate and are still orders of magnitude faster than ab initio calculations51.

One of the challenges in using MLIPs to search for ground state structures is that because the ground state is unknown, it is difficult to ensure that the potential is constructed in a way that will yield accurate ground state energies. To address this challenge we use active learning52,53,54, in which the potential is trained adaptively with new data generated during the search. Recently, similar strategies have also been successfully applied using MLIPs for global optimization of bulk crystalline materials55,56 and nanoclusters57,58. Kolsbjerg et al. used actively-learned neural network potentials to identify the structures for small (up to 13-atom) Pt-based clusters on an MgO (100) support57. Tong et al. identified low-energy structures for B36, B40, and B84 clusters using a GAP potential trained on the fly58. Here we demonstrate that MTP, which have been shown to have a good balance between accuracy and speed for bulk materials51, can be used to accurately and efficiently identify low-energy structures for metal clusters from 21 to 55 atoms. As the properties of small metal clusters can change in a non-smooth way with cluster size, we further investigate the transferability of these potentials to clusters of varying size.

We demonstrate our approach by searching for low-energy structures of aluminum clusters. Aluminum nanoclusters are actively studied for applications like catalysts for nitrogen dissociation59, and optoelectronics60. They are also important model systems in theoretical chemistry for metallic aromaticity61, magnetism of nanoclusters62 and superatoms63. We have discovered new aluminum cluster structures for 25 out of the 35 sizes that are at least 1 meV/atom lower in DFT-calculated energy than the lowest-energy structures we have found in the literature29,30,64,65,66. New lowest-energy structures for an additional two sizes were discovered by DFT-only GA used for benchmarking. Our approach, described in detail below, provides a template that can be used to significantly accelerate the computational design of atomic clusters, and paves the way for determining atomic structures of large nanoclusters.


Hyperparameter selection for moment tensor potentials

Existing benchmarks of MTP on bulk crystalline structures51,67 give generally good sets of parameters for training reliable MTP, but little information is available on good parameters for training on clusters. To identify a good set of parameters for our calculations we used Al clusters with 24 atoms as a model system and tested various combinations of hyperparameters, including potential complexity (defined by the parameter levmax)48, the amount of training data, and the weight for force components (the “force weight”) relative to the weight for energies. One quarter of the structures were randomly selected for validation, with the rest used to train the potential. Additional details about the construction of this dataset are provide in the “Methods” section and Supplementary Note 1.

For a fixed training set, both energy and force errors decrease steadily as increasingly complex potentials are used (Fig. 1b). However, such a gain is at the expense of an exponential growth in training costs (Fig. 1c). Additional analysis of other combinations of hyperparameters (Supplementary Figs. 47) shows similar trends as in Fig. 1. To balance accuracy and training costs we used levmax = 14 and a force weight that was 1/1000 that of the energy weight for all subsequent active learning genetic algorithm (GA_AL) runs. Using these parameters, we found that force and energy errors plateaued after the training set exceeded about 1000 structures (Fig. 1d).

Fig. 1: Benchmarks of hyperparameters for training MTP potentials.
figure 1

Training (dashed lines) and validation (solid lines) root-mean-squared errors for both energies and force components are plotted against a force weights relative to the energy weight, b potential complexity levmax, and d the number of structures in the training set. a, b, d share the same legend as shown in a. Training costs of potentials with different relative force weights and complexities are shown in c. Where not specified, the training data contained 3000 clusters, levmax = 14, and the force weight was 1/1000 that of the energy weight.

Prediction of structures for Al clusters with 21–40 atoms

We evaluated our approach by predicting structures of aluminum nanoclusters with 21–40 atoms (Fig. 2). The performance of the GA_AL algorithm was evaluated by comparing it with a GA that used only DFT to calculate energies (GA_DFT), where both algorithms were run for the same amount of computing time. For our initial evaluation, GA_AL was initialized with untrained potentials and a new potential was trained at each cluster size. Details of how we performed the comparison are provided in the “Methods” section.

Fig. 2: Performance benchmarks between GA_AL initialized with untrained potentials and GA_DFT.
figure 2

a DFT-calculated energy differences between the lowest-energy structures found in GA_AL and GA_DFT. Negative values indicate GA_AL discovered structures with lower energies. b Acceleration ratios of GA_AL relative to GA_DFT. The sizes for which GA_AL discovered better structures are marked with *, suggesting that their acceleration ratios could be larger if GA_DFT were allowed to run for longer time.

For eight cluster sizes (21, 23, 24, 26, 28, 33, 35 and 36), mostly among the smaller clusters, GA_AL and GA_DFT found essentially the same lowest-energy clusters (Fig. 2a), with similarity scores, a measure of geometrical differences, below 0.3 (see the “Methods” section). For 10 out of the 20 sizes (25, 27, 30–32, 34, 37–40), GA_AL found clusters that were lower in energy than those found by GA_DFT by at least 1 meV/atom, with an average energy difference of −5.06 meV/atom (or −169.64 meV/cluster). For clusters of 22 atoms, GA_AL identified a distinct cluster with a calculated energy within 0.1 meV/atom of the lowest-energy cluster identified by GA_DFT. The lowest-energy 33-atom cluster found by GA_AL is 1.47 meV/atom lower in energy than the one found by GA_DFT, but it is structurally similar based on both the similarity score and visual inspection (see Supplementary Fig. 16). Therefore it is not counted as a new lowest-energy cluster. There was only one cluster size (29 atoms), for which GA_DFT found a distinct cluster with lower energy than that found by GA_AL. For this size the cluster found by GA_DFT was lower in energy by 3.61 meV/atom (104.69 meV/cluster). On average, the energies of structures found by GA_AL are lower by 2.43 meV/atom (82.27 meV/cluster).

To quantify how much more quickly the GA_AL approach finds low-energy structures, we define the “acceleration ratio” as the ratio of the time it took GA_DFT to find its lowest-energy structure to the time it took GA_AL to find a structure with at least as low of the energy. The time spent for GA_AL includes the time spent in the GA search, time required to generate training data and evaluate low-energy clusters using DFT, and time spent retraining the interatomic potential. In the case of size 29, GA_AL failed to discover better or equivalent configurations, so the ratio is set to 0. Among the remaining sizes, the acceleration ratio ranged from 0.19 to 9.12. The average acceleration ratio across all 20 sizes is 2.29 with a median of 1.80 (Fig. 2b). GA_DFT often did not find a cluster with energy as low as that found by GA_AL (Fig. 2b), suggesting that if the acceleration ratios were based on the time required to find the lowest-energy structure from both algorithms they would be larger. Energy evolution plots illustrating the acceleration of GA_AL relative to GA_DFT are provided in Supplementary Fig. 12.

Size-transferable interatomic potentials for nanoclusters

The results presented in the previous section were obtained by training a new potential at every cluster size, as there is a risk that a potential trained on one cluster size might not work well for clusters of another size due to the fact that the properties of atomic clusters can change discontinuously with the number of atoms in the cluster. However using a potential trained at one size to find structures of a different size could significantly speed up the structure search by reducing the total amount of training data that must be generated. In particular, using potentials trained with smaller clusters to predict the structures of larger clusters can have significant performance advantages, as the cost of generating training data using DFT typically scales as approximately the cube of the number of valence electrons in the cluster68.

We examined how accurately potentials trained on clusters of a range of small sizes are able to predict the energies of clusters with larger sizes. Training data were separated into a group of 3000 clusters with even numbers of atoms (22, 26, 30, 34, and 38) and another group of 3000 clusters with odd numbers of atoms (21, 25, 29, 33, and 37), as DFT calculations indicate that even-sized clusters and odd-sized clusters have distinct ground state magnetic moments (see Supplementary Note 3). A third training set of also 3000 clusters was selected from a set containing all sizes listed above, odd and even. The validation sets were composed of about 3000 clusters for each cluster size between 50–55 atoms. Details of the construction of the training and validation sets can be found in the “Methods” section.

All three mixed-size potentials predicted energies of the large clusters with validation errors (~10 meV/atom) comparable to training errors (Fig. 3a, b). The errors in the predicted forces (~165 meV/Å) were slightly worse than the fitting errors. These errors are similar to the training and validation errors achieved when all of the training and validation data consisted of clusters of 24 atoms (Fig. 1). The validation errors are below those achieved on silicon42 and boron58 clusters with GAP potentials, but some of this difference is likely due to the natures of the different elements. Mixing training data with different magnetic moments did not have a significant adverse effect on model predictions (Fig. 3). The potential trained on clusters with odd numbers of atoms has slightly smaller prediction errors for both force and energy than the potential trained on clusters with even numbers of atoms regardless of whether the validation set contains even-sized or odd-sized clusters. The potential trained with both even and odd clusters has larger energy training errors than both of the even and odd potentials, but energy validation errors between the validation errors for even and odd potentials. For forces, the potential trained with both even and odd clusters has the lowest training and validation errors in all cases.

Fig. 3: Training and validation root-mean-square errors in energy and force components for different potentials.
figure 3

The validation data consisted of clusters with 50–55 atoms. Plots in the left column show a energy and c force errors for potentials validated on odd-sized clusters, while the right column displays b energy and d force errors for potentials validated on even-sized structures. Potentials are labeled by the type of training data used to generate them. Numeric labels represent single-size potentials whose training sets contain exclusively clusters of the labeled sizes. The labels “odd”, “even” and “all” represent potentials whose training sets are made up by clusters with odd, even, and mixture of odd and even numbers of atoms respectively.

For comparison, we evaluated the ability of potentials trained on clusters of a single size. For each size, the training data consisted of 3000 dissimilar clusters, with exception of clusters with 21 atoms (the smallest size) for which our training set only had 2136 clusters after removing structurally similar clusters (see “Methods”). For potentials trained on clusters of a single size, training and validation errors were similar for forces, and potentials trained on a single size may predict forces with significantly lower errors than the potentials trained on a mixed set of sizes. However validation errors for energies are notably worse than the training errors, especially for potentials trained on small clusters (Fig. 3a, b). The accuracy strongly depends on the size of the clusters in the training set, with larger sizes having the lowest errors. This suggests that quantum finite-size effects may be particularly pronounced for clusters with fewer than about 30 atoms, limiting the extent to which potential models trained at these sizes can be transferred to larger sizes. The training algorithm may also have a difficult time determining how the undercoordination of surface atoms in a cluster affects its energy when all clusters in the training set have approximately the same surface area. In contrast, training sets with a mixture of cluster sizes provide more information on how the energy is affected by the cluster surface area, which may improve the prediction accuracy for clusters of varying sizes. Parity plots of energies and force components of both single-size and mixed-size potentials can be found in Supplementary Figs. 8 and 9.

Because of the particular importance of identifying the structures of low-energy clusters, we evaluated the potentials on the lowest-energy structures we found or collected from the literature with 50 to 55 atoms (see Supplementary Fig. 10). The diversity of the training sets with mixed cluster sizes proved beneficial for identifying low-energy clusters, as the MTP extrapolation grades (see Methods) for the lowest-energy clusters in the validation set are less than one, suggesting interpolation, with respect to the training sets of odd-sized and even-sized clusters. On the other hand, the low-energy structures had extrapolation grades above 1, suggesting extrapolation, with respect to the training sets of single-sized clusters. Accordingly, the mixed-size potentials had much lower energy errors than the single-size ones.

Prediction of structures for Al clusters with 41–55 atoms

To identify low-energy structures with 41–55 atoms, we used the mixed-size potentials trained on clusters with odd or even numbers of atoms to initialize GA_AL searches for clusters with an odd or even number of atoms respectively. Initializing the GA_AL algorithm with a pre-trained potential demonstrated significant performance advantages (Fig. 4). On average GA_AL explored 144 times as many clusters as GA_DFT during the calculation time of the GA_AL run, with an average acceleration ratio of 12.06 (and a median of 8.86) compared to GA_DFT (Table 1). To calculate the acceleration ratios, the time spent pre-training the potential for GA_AL was also included in the total GA_AL time. However the one-time cost of generating the training data used for pre-training was not, as there is no incremental cost of generating this data for each new size. For sizes 45 and 54, the acceleration ratio was set to 0 since they failed to discover better or equivalent configurations as GA_DFT. The acceleration ratios for the remaining sizes ranged from 1.24 to 71.21, which is consistent with the spread of nearly two orders of magnitude we observed for the runs initialized on untrained potentials. The full set of times to solutions for both GA_DFT and GA_AL are plotted on a log scale in Supplementary Fig. 15.

Fig. 4: Performance benchmark of GA_AL against GA_DFT on clusters with 41 to 55 atoms.
figure 4

a Energy evolution plots for GA_AL initialized with untrained potentials (green), GA_AL initialized with mixed-size potentials (orange) and GA_DFT (blue), for clusters with 50 to 55 atoms. The energy levels used to calculate acceleration ratios are marked by dashed lines. The CPU times used to compute acceleration ratios are marked by crosses. Lines start from the time when the first DFT-calculated energy is available. b Energy differences between the lowest-energy clusters identified by GA_AL and the lowest-energy clusters identified by GA_DFT at the end of simulations. c Acceleration ratios of GA_AL relative to GA_DFT. The sizes for which GA_AL successfully discovered configurations with lower energy than the lowest of GA_DFT are marked by *.

Table 1 Summary of the performance of GA_AL relative to GA_DFT.

The sizable increase of acceleration ratios for GA_AL with pre-trained potentials can be credited to a significant reduction in the number of times DFT is called for learning on the fly in the early stages of the search. For GA_AL runs initialized with untrained potentials, clusters in the early stages of the runs tend to have high energies (Fig. 4a), so training steps in the early stages are sampling a relatively high-energy region of configuration space. For GA_AL runs initialized with well-trained potentials, computational resources are more efficiently spent exploring the low-energy configurations.

Literature comparison

We examined the quality of the ground state configurations discovered using GA_AL for clusters of 21–55 atoms by comparing them with the lowest-energy structures that have been previously reported for aluminum clusters. Here we only consider studies for which we were able to find the atomic coordinates of the discovered structures29,30,64,65,66. All structures collected from the literature were reoptimized by DFT using the same settings as those used in GA_AL. For 25 of the 35 sizes, GA_AL found structures at least 1 meV/atom lower in energy than the lowest-energy structure in the literature, and for another 7 sizes it identified the same lowest-energy structures as were available in the literature. For clusters of 22 atoms, GA_DFT found a structure that is structurally distinct from the lowest-energy structure reported in literature29 (and was rediscovered by GA_AL), but has only slightly lower energy (by 0.15 meV/atom). The GA_DFT algorithm discovered structures lower in energy than those discovered by GA_AL and the literature for 2 sizes (29 and 54). For clusters of 45 atoms, GA_DFT rediscovered the best-known structure from the literature but GA_AL did not. Detailed results are provided in Fig. 5 and Supplementary Table 6. A complete panel of lowest-energy clusters with 21 to 55 atoms can be found in Supplementary Fig. 17 and coordinates of these clusters have been published on the Novel Materials Discovery (NOMAD) repository69,70 and listed in Supplementary Dataset 1.

Fig. 5: Comparison between lowest-energy clusters reported in the literature and those discovered by GA_AL and GA_DFT.
figure 5

a Energy differences of lowest-energy clusters from various methods relative to the energies of clusters reported by Doye et al.64. The lowest-energy frontier is connected by a red dashed line, and the sizes for which GA_AL did not discover the best configurations are circled. “Morse”, “Glue” and “DNN” in the legend represents the Morse potential, a glue potential and a deep neural network potential. All clusters from the literature were re-optimized by DFT using the same settings that we used in the genetic algorithms. b New lowest-energy clusters discovered by GA_AL and GA_DFT. The labels of clusters identified by GA_DFT and not GA_AL are blue. Symmetries are denoted in parentheses. Alternative views of representative layered close-packed structures (43) and tetrahedral clusters (36) are included above and to the right of the clusters.

Structures found using GA_AL have lower energies than the lowest-energy literature structures by an average of 16.81 meV/atom, with a maximum of 51.81 meV/atom at size 36 (1.87 eV/cluster). For the five sizes for which GA_AL did not discover the best structures (Fig. 5a), the energies are no more than 4.0 meV/atom above those of the structures with the lowest known energies.

Cluster structure analysis

Low-energy aluminum clusters already start to show morphological regularity in the size range studied in this work (Supplementary Fig. 17). Favorable structures of low-lying clusters include layered close-packed structures (sizes 40–43) and tetrahedra with closed-packed surfaces (sizes 35–37, 54–55). Tetrahedra are favored when they can form mostly closed shells of atoms. These structure types reflect the facts that aluminum has an FCC crystal structure in its bulk phase and (111) surfaces have the lowest free surface energies among low-index facets71. Analysis of cohesive energies shows a peak at size 36 (Supplementary Fig. 18), indicating the cluster with 36 atoms is highly stable relative to the clusters of neighboring sizes. It has a perfect tetrahedral shape and the highest degree of symmetry (D2d) among the lowest-energy clusters. Experimental studies showed a surprisingly high melting temperature close to the bulk melting temperature of Al cation clusters with around 37 atoms72, echoing with the cohesive energy peak we observed. More details can be found in Supplementary Note 6.


The GA_AL approach presented here has clear advantages, including about an order of magnitude acceleration compared to GA_DFT on average. The advantages of the active learning approach are apparent by a comparison with the work of Tuo et al.66, who used a neural-network potential trained on DFT calculations in a GA to search for low-energy aluminum clusters. However the potential they used was not retrained on the fly, and the DFT-calculated energies of the structures they discovered are significantly higher than the ones discovered by our approach (Fig. 5) and higher than those discovered by a DFT search by Aguado and López29. The difference is likely due to the inability of the machine-learned potential to extrapolate accurately to structural motifs that were not present in the original training data.

Due to the stochastic nature of GA, the performance advantage varied by nearly two orders of magnitude across systems, and in a small number of cases GA_DFT outperformed GA_AL. The performance of the active learning approach depends on the particular implementation of the algorithm, and there are several potential areas for improvement for the GA_AL approach used here. One is the relatively high-energy prediction errors of MTP for nanoclusters compared with bulk systems. Benchmarks by Zuo et al.51 showed that MTP has energy errors generally less than 5 meV/atom and sometimes even lower than 1 meV/atom for bulk elemental systems. However, for nanoclusters, as exhibited in Fig. 1, validation energy errors are at the order of 10 meV/atom for small clusters with 24 atoms. They can be lowered by using smaller force weights, but the improvement comes at the expense of driving up force errors, which increases the possibility of creating artificial local minima. The relatively large energy errors increase the chance that the energy of the lowest-energy cluster is overestimated and never enters the pool. To mitigate this risk we used a relatively large pool size (25 clusters) to expand the energy window of pool clusters and raise the chance of the lowest-energy cluster being captured in the pool. A large pool has also been shown to increase the success rate of identifying the lowest-energy isomer due to the structural diversity of the pool22, although at the expense of slowing convergence speed21,22. An alternative approach would be to use a machine learning framework that results in a more accurate interatomic potential. Very recently, Lysogorskiy et al. have demonstrated an efficient implementation of the ACE, which was shown to be faster and more accurate than MTP on bulk copper and silicon systems73. The ACE approach could also be promising for nanoclusters and is worth evaluating in future work.

We found that the energy window of pool clusters became narrower as the structure search continued, implying a high density of metastable states with energies close to the global minimum, especially for large clusters. This is not unexpected since the dimension of configuration space dramatically increases as system size grows. The relatively narrow energy window increases the chance of the pool missing the lowest-energy isomer, as the window size may be comparable to the error in MTP energy predictions. A possible workaround is to run GA_AL and then use the discovered low-energy clusters to seed a GA_DFT search. This would consume additional computational resources but decrease the uncertainty in the proposed lowest-energy structures.

Another area for improvement is the relationship between the extrapolation grades (used to identify structures that trigger retraining) and prediction errors. A high extrapolation grade normally implies an energy evaluation with high uncertainty, but a low grade does not necessarily guarantee an accurate prediction (see Supplementary Fig. 19). In practice, we addressed this challenge by starting DFT re-optimization and retraining whenever the majority of clusters in the pool had MTP-calculated energies but not DFT-calculated energies. An alternative approach would be to implement similarity-based measurements of uncertainty, which might more accurately identify structures for which the prediction errors are likely to be large.

The calculation of extrapolation grades was a significant portion of the overall computational cost of the GA_AL algorithm, taking on average about 10% of total wall-time (Supplementary Fig. 20). The routines for calculating the extrapolation grade in the MLIP package were not parallelized, so this portion of the algorithm would run on a single processor while the other processors reserved for the workflow sat idle. Having a parallelized and internal implementation of the grade calculation could considerably reduce the cost of this step and substantially increase the acceleration ratios of GA_AL against GA_DFT.

The choice of exchange-correlation functional (or other sources of inaccuracy in DFT calculations) could also affect the energy ranking of low-lying clusters. Galvão and Viegas showed the lowest-energy cluster from more accurate functionals are generally already included in the set of 5–10 lowest-energy clusters found by less accurate functionals32. For this reason, we list the ten lowest-energy clusters of each size from 21 to 55 atoms in the Supplementary Dataset 1 and also published them on the NOMAD repository69,70.

Although we have demonstrated that potentials trained on small clusters can be used to predict the structures of clusters about twice as large, it is not clear how well these potentials will work on significantly larger particles. If transferability can be retained up to larger clusters, the methods we have presented could be used to efficiently create a comprehensive datasets of cluster structures for small particles with structures that cannot be simply described as that of a truncated crystal.


Similarity measurement

We quantify geometric similarity between two cluster structures of same size by a similarity score calculated using an approach based on the spectral decomposition of extended distance matrices74. The score is non-negative and a smaller value implies higher similarity. Identical clusters have a score of 0, and visually distinguishable clusters typically have a score above about 0.3. The similarity measure is used to prevent geometrically similar clusters from being simultaneously included in the pool, which can improve the efficiency of GA22, and to select diverse training data for the MTP56.

Genetic algorithm

A GA is a global optimization method inspired by the principles of natural selection75. We developed our own code based off the pool-based Birmingham Parallel Genetic Algorithm24 with some variations. A pool of low-energy clusters of fixed size is maintained during the search. Initial clusters are generated by randomly distributing atoms in space. Once the pool is filled, genetic operations, namely, crossover and mutation, are applied to parent clusters selected from the pool to generate child clusters. Child clusters that are dissimilar to all pool clusters and have a lower energy than the pool cluster with the highest energy will replace the highest-energy pool cluster. Additional details of the GA can be found in the Supplementary Method.

Genetic algorithm with actively learned interatomic potentials

To accelerate the GA search for new stable nanoclusters, we use MLIPs (to improve speed) trained on-the-fly using active learning (to maintain accuracy). We refer to this combination of GA and active learning as “GA_AL”. The active-learning query strategy uses the generalized D-Optimality criterion implemented in the MLIP package47,48, which assigns unlabeled data an “extrapolation grade” based on a measure of the extent to which the unlabeled data is outside of the space spanned by the training data. An extrapolation grade above 1 implies extrapolation relative to the current training set and large errors should be expected, while a value below 1 indicates interpolation48.

The GA_AL runs batch retraining cycles and maintains a waitlist of structures to be included in the next cycle (Fig. 6). Two extrapolation grade thresholds are used when determining whether a newly-generated cluster should be added to the waitlist. The first threshold, γbreak, is used to screen clusters before relaxation using MTP. The trained potential may struggle to relax clusters with extrapolation grades above this threshold, so they are automatically added to the waitlist. Structures with extrapolation grades below γbreak are relaxed. If the extrapolation grade of the relaxed structure is greater than the second threshold, γselect, then it too is added to the waitlist. In this work γbreak was set to 10 for GA_AL initialized with untrained potential, as the default value recommend by MTP code48. A looser value of 1000 was used for the pre-trained potential, as it is not as important to add training data to a potential that has already been trained. As-generated clusters with extrapolation grades about 1000 typically cannot be evaluated accurately by MTP potentials, but we found they could still be relaxed by MTP to reasonable configurations. Starting DFT relaxations from configurations pre-relaxed using MTP was used to reduce computational costs. The parameter γselect was set to 1.01 for all searches. When the waitlist reaches a user-defined size (here set to 5), the GA is paused and a retraining cycle begins. All new clusters in the pool as well as clusters on the waitlist are relaxed using DFT and added to the training set. Before retraining the potential, a similarity screening is applied to select the most geometrically diverse set of configurations from all relaxation steps (discussed below), which maximizes structural diversity and reduces training cost.

Fig. 6: Schematic workflow of genetic algorithm with on-the-fly active learning (GA_AL).
figure 6

A double-threshold scheme is employed in which a looser threshold, γbreak, and a tighter threshold, γselect, are used to determine extrapolation of as-generated structures and MTP-relaxed structures, respectively. Retraining starts when the waitlist exceeds a user-defined capacity, which was set to 5 in this work.

Because of the uncertainty in MTP-predicted energies, there is a risk that the pool over time becomes polluted with structures with erroneously low MTP-predicted energies. To mitigate this risk, a retraining cycle is also started whenever a majority of the clusters in the pool (>50%) have energies that were calculated using MTP and not DFT. GA_AL is considered to be converged when no cluster with an energy lower than the lowest-energy pool cluster has been found for 4000 new clusters.

When initializing GA_AL with pre-trained potentials, it is beneficial to switch off retraining at the beginning of the search. In this approach, extrapolating clusters are discarded and new ones are regenerated until they are interpolating, allowing the GA to more fully explore the PES of the pre-trained potential. We did this for the first 5000 clusters in GA_AL runs initialized with mixed-size potentials when generating clusters with 41–55 atoms. Additional discussion and justification for this approach are provided in the Supplementary Method.

Moment tensor potentials

We used the MLIP package48 to train MTP. The hyperparameters for training include potential complexity, energy weight, force weight and stress weight. Potential complexity is characterized by the maximum level of moments, levmax, of basis functions46,48. The energy weight was always set to 1, so the force weight can be seen as the weight of force components relative to the weight of the energy. The stress weight was set to 0 since it is irrelevant in the case of clusters due to the lack of lattice. We generated potentials with levmax = 14 and a force weight of 1/1000 relative to the energy weight, to balance between accuracy and training cost, as shown in the “Results” section and Supplementary Note 1. The inner and outer cutoff radii defining the local atomic neighborhood were set to the default values of 2 and 5 Å, and eight radial basis functions were used48. A maximum number of 5000 training iterations were allowed for potential fitting. This limit was never reached, as the maximum number of training iterations in any GA_AL run was 1399.

Training data selection for pre-trained potentials

To select the training data for the pre-trained mixed-size potentials, we used a diversity-based strategy and an energy-based strategy to select structurally diverse structures from DFT relaxations and to improve accuracy in the low-lying regions of the PES. All DFT calculations were collected from GA_DFT and GA_AL runs on clusters with 21–40 atoms. For each of the constituent cluster sizes, relaxation trajectories were only kept if the corresponding local ground states have similarity scores larger than 0.3 with the local ground states of all other trajectories already included in the training set. Within each trajectory, only dissimilar ionic steps were selected as well. We accomplished this by including the fully relaxed structure and iterating backwards through the relaxation until encountering a structure with a similarity score, relative to the most recently-added structure, that was at least 0.3. That structure was then added to the training set, and we repeated this procedure until all relaxation steps were exhausted. This diversity-based strategy was applied throughout this work, in data preparation processes for both training and validation sets.

Following the similarity screening, we performed an energy-based selection strategy. First, all structures that passed through the diversity screening were grouped into sets based on the number of atoms in the cluster. In total, 50% of the training data selected from each set consisted of the structures with lowest energy, 10% consisted of the structures with the highest energy, and the remaining 40% were randomly picked from the remaining ionic steps. The inclusion of high-energy training data ensures that relaxation by MTP does not lead to physically unrealistic high-energy configurations.

A total of 3000 structures were selected for each of the training sets. For potentials trained with clusters of multiple sizes (the ones labeled by “odd”, “even” and “all” in Fig. 3), equal numbers of structures were chosen from each constituent size. For potentials trained with clusters of a single size, all 3000 structures were chosen from clusters of that size. The training set for the potential trained with clusters of 21 atoms only contains 2136 structures after the diversity filtering of DFT calculations, and they were all included in the training set.

Validation data selection for pre-trained potentials

We collected validation data for mixed-size potentials from GA_DFT runs on clusters with 50–55 atoms. The diversity-based strategy discussed above was used to select a structurally diverse set of structures. The validation sets contain around 3000 structures for each size. The validation data cover a wide range of energies and forces to ensure a thorough validation of potentials in a variety of atomic environments. The average force components are about 0.08 eV/Å with maxima of about 5 eV/Å across validation cluster sizes. Details of the validation dataset can be found in Supplementary Tables 2 and 3.

Training data selection for on-the-fly retraining in GA_AL

We also use similarity filtering, as described above, to select structures for on-the-fly retraining. Similarity filtering is used to select distinct clusters from each relaxation trajectory during active learning. We used a tight similarity threshold of 0.3 on small to medium clusters (21–40) and a looser threshold of 0.15 on large clusters (41–55). The looser threshold is meant to increase the fraction of available data being added to the training set at each retraining cycle, as relaxing large clusters is computationally more expensive. We do not check similarity between new training data and all existing data on-the-fly, as this is computationally costly.

Comparison of GA_AL against GA_DFT

Both GA_AL and GA_DFT were run using the same set of GA parameters (see Supplementary Method) on all 24 cores of Intel E5-2680 V3 processors. The GA_AL runs were performed until 4000 consecutive new clusters had been generated without identifying a new lowest-energy cluster. GA_DFT runs were performed for at least the same amount of time as GA_AL runs for fair comparison. For large clusters with 50–55 atoms, GA_DFT searches were executed for a much longer time of 21 days, to reach comparable energy levels as GA_AL runs (see also Supplementary Method). The only difference between the GA_AL and GA_DFT algorithms was that GA_AL used MTP retrained with active learning and DFT for relaxation, whereas GA_DFT used only DFT for relaxation. Low-energy structures identified in GA_AL were re-optimized using DFT at the retraining stage and only DFT-evaluated energies were reported at the end, to ensure an ab initio level of accuracy.

To determine whether a new low-energy structure had been found, we considered both the DFT-calculated energy and similarity scores. A cluster that is lower in energy by at least 1 meV/atom and at the same time has a similarity score, compared to the existing lowest-energy structure, that is greater than 0.3 is identified as a new lowest-energy cluster. Clusters that have a total energy within 1 meV/atom to the existing lowest-energy cluster but are structurally dissimilar are considered as energetically similar clusters and are not counted as new lowest-energy clusters. Borderline cases were inspected manually. More details of how we determined whether the algorithm had found a new lowest-energy structure can be found in Supplementary Note 5.

DFT calculations

All DFT calculations were carried out using the Vienna ab initio simulation package (VASP)68,76,77,78 with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional79,80,81. The projector augmented wave (PAW) dataset shipped with VASP with the title “PAW_PBE Al 04Jan2001” was used82,83. Reciprocal space was sampled by a single k-point at the Г point and the kinetic energy cutoff for the plane-wave basis was set to 240 eV. The electronic self-consistency loop was considered to reach convergence when subsequent steps had an energy difference below 10−5 eV and the convergence criterion for ionic relaxation was set to a force difference below 0.01 eV/Å. Our dataset84 shows that ground state Al nanoclusters with even sizes above 18 and odd sizes above 7 have net spins of 0 μB and 1 μB, respectively (see Supplementary Note 3). Therefore, all ab initio calculations fix the magnetic moment to 0 μB for even-sized clusters and to 1 μB for odd-sized clusters using the parameter NUPDOWN in VASP. Periodic images of clusters are separated by a vacuum of size at least 10 Å to avoid any spurious interactions (see convergence tests in Supplementary Note 9).