Unfolding the structural stability of nanoalloys via symmetry-constrained genetic algorithm and neural network potential

The structural stability of nanoalloys is a challenging research subject due to the complexity of size, shape, composition, and chemical ordering. The genetic algorithm is a popular global optimization method that can efficiently search for the ground-state nanoalloy structure. However, the algorithm suffers from three significant limitations: the efficiency and accuracy of the energy evaluator and the algorithm’s efficiency. Here we describe the construction of a neural network potential intended for rapid and accurate energy predictions of Pt-Ni nanoalloys of various sizes, shapes, and compositions. We further introduce a symmetry-constrained genetic algorithm that significantly improves the efficiency and viability of the algorithm for realistic size nanoalloys. The combination of the two allows us to explore the space of homotops and compositions of Pt-Ni nanoalloys consisting of up to 4033 atoms and quantitatively report the interplay of shape, size, and composition on the dominant chemical ordering patterns.


INTRODUCTION
The world's energy future is toward the transition from fossil fuels to renewable energy sources. Bimetallic nanoparticles (NPs) are a class of heterogeneous catalysts that are widely used in energy conversion due to their superior catalytic activity and selectivity compared to their bulk and monometallic counterparts [1][2][3][4][5][6] . To study the catalytic properties of a bimetallic NP, the prerequisite is that the structure has to be thermodynamically stable. In the context of a nanoalloy (NA), the stability is governed by many factors including size, shape, chemical composition and chemical ordering. Complementing experimental methods, atomistic simulations offer great opportunities in the study of the structural properties of NAs, as thermodynamic stability can be accurately quantified. Global optimization methods are commonly used in computational studies to search for stable structures through the large space of all homotops and compositions of a NA system. 7 A genetic algorithm (GA) is a metaheuristic global optimization method inspired by the idea of natural selection and survival of the fittest. Over the years, the GA has been developed as a robust and efficient tool for finding the optimal shape and chemical ordering of a NA [8][9][10][11] . Due to the large number of required energy evaluations, analytical models have been used as the primary energy calculator in the GA. Radillo-Díaz et al. 12 employed the Gupta many-body potential 13,14 in the GAs to investigate the structural properties of Pt-Ni NAs with less than 30 atoms and found that Pt 0.5 Ni 0.5 and Pt 0.75 Ni 0.25 NAs exhibit strong Pt and Ni surface segregation, respectively, whereas no clear segregation is found in Pt 0.25 Ni 0.75 NAs. Yang et al. 15 performed GAs on Pt 1−x Ni x NAs with sizes of 38, 55, 147, and 309 atoms employing the Gupta potential and found that the most stable structures are icosahedral NAs with the composition of Pt 0.45 Ni 0.55 . Using an embedded-atom method 16,17 (EAM) in combination with the GA, Oh et al. conducted a computational study on Pt-based binary truncated octahedrons (TOhs) with 1654 atoms at fixed compositions and the predicted stable chemical orderings are in good agreement with experimental observations. Lysgaard et al. 18 employed the effective medium theory 19 (EMT) potential in a GA, searching for stable stoichiometry and chemical ordering of the 309-atom Cu-Au icosahedral NA and found that the most stable structure is a Cu 0.44 Au 0.56 NA with Cu core and Au shell. By incorporating a bond centric (BC) model 20 , Dean et al. 21 recently developed a GA that can predict stable binary NA structures of any size, shape and composition. An alternative global optimization method employing the SMATB potential is also proved to be efficient and exhaustive 22 . Despite the fact that these analytical models are very fast to evaluate, they can yield spurious local minima for NAs due to poor description of the PES 23 . To improve the reliability of the result, GAs incorporated with density functional theory (DFT) calculations have also been performed in recent years [24][25][26][27][28][29] . However, almost all DFT-based GAs are limited to particle sizes of <50 atoms due to the high computational demand.
The trade-off between speed and accuracy of the energy calculator has always been a major concern for the GAs. While accurate DFT calculations can be performed on small NAs, a compromise of speed over accuracy seems to be the only option for larger NAs. One solution toward this dilemma is to incorporate a DFT-trained machine learning (ML) model into the GA. Jennings et al. 30 proposed an active learning scheme where a Gaussian process regression model is used in conjunction with the GA and achieved a 50-fold reduction in the number of energy evaluations for the 147-atom Pt-Au NA compared to a full DFT-based GA. Zhang et al. 31 performed a ML-accelerated GA on the 38-atom Pt-Au NA by training the SchNet 32 model on-the-fly. Despite the fact that these active learning schemes minimize the required DFT calculations by a large margin, they are still inapplicable to large systems where DFT relaxations are intractable. An alternative solution is to fit a ML force field on small clusters and apply it on larger systems. The high-dimensional neural network potential (NNP) proposed by Behler and Parrinello 33,34 has shown good transferability beyond the sizes of the reference NAs in the training set [35][36][37] . The NNP has already been used in combination with the GA in recent studies on nanoclusters [38][39][40] , yet the studied systems remain relatively small. With the energy basins being evaluated at DFT-level accuracy while the computational cost w.r.t the number of atoms N being reduced from O(N 3 ) to O(N), it is even more beneficial to use the NNP for larger NAs.
One of the merits of GA is that it can efficiently locate the full convex hull of a A 1−x B x NA system in a single run. From the convex hull one can get useful information such as the vertices representing the lowest mixing energy (or excess energy) at each stable composition. For a binary NA, the mixing energy per atom is defined as [Eq. 1] 41,42 where N is the total number of atoms, N A and N B are the numbers of atoms of type A and B, respectively, E AB is the total energy of the relaxed NA, E A and E B are the energies of the relaxed pure NPs A and B, respectively. A traditional generation-based GA for searching the convex hull starts by generating a random population of binary NAs. Throughout the GA run, the population size remains the same in each generation. The candidates of the population perform genetic operations including mutation, permutation and crossover to procreate offsprings which evolves a generation. Cut-splice crossover 11,43 is a typical crossover operator which acts by juxtaposing two parts from each participating parent and thereby creating two offsprings with inherited properties. A mutation (or permutation) operator introduces a random structural or chemical change to a chosen parent so that the offspring can gain new traits compared to its parent. All the individual candidates are ranked in accordance with their mixing energies. Candidates with lower mixing energies are assigned with higher fitness scores. We employ a fitness function that follows a niching routine 44 , where the fittest candidates for each composition are equally weighted. This forces each NA to only compete with other NAs with the same composition, thereby ensuring that the population can maintain diversity in composition. The niche GA evolves toward an optimal population in which all candidates across all compositions are low in mixing energy.
As the size of the NA increases, the efficiency of the GA itself also becomes a bottleneck. It becomes especially challenging to perform a niche GA across all compositions for a large NA given the enormous search space. An effective acceleration of sampling can be achieved by enforcing constraints based on symmetry considerations, as realized in the class of 'grouping' approaches 41,[45][46][47] .
Here we combine GA with grouping approaches, thus managing to exploit the advantages of both techniques. In detail, we have developed a GA variant based on the rotational and reflectional symmetries of the NAs, namely symmetry-constrained genetic algorithm (SCGA), to adapt to the dire needs of an efficient structural optimization tool for NAs comprising thousands of atoms. The SCGA still follows the typical workflow of a traditional GA, except that the population in each generation now only consists of NAs with symmetric chemical orderings.
In general, the chemical ordering of a binary NA can be classified either of mixed or segregated types. Mixed patterns can be found in both extended multicomponent systems and NAs, where the components are distributed either randomly or in a regular arrangement as in L1 0 and L1 2 chemical ordering patterns, see Fig. 1(a) and (b) for illustrative depictions. Besides mixed patterns, NAs are also known to be particularly prone to phase segregation due to size confinement. 4 typical segregated patterns are illustrated in Fig. 1: (c) core-shell, where one component at the core is surrounded by a shell of a different component; (d) multi-shell, where multiple shells are populated with alternating species; (e) patchy, where one component appears as islands of atoms, forming a patchwork-like nanostructure; and (f) Janus, where the nanostructure is bisected into two distinct phases comprising different components.
The common ground of all patterns depicted in Fig. 1 except the randomly-mixed pattern is that the geometric shape and chemical ordering are both reproducible when rotating around one of their symmetry axes. From this we come to the definition of symmetric in the context of NA 47 -both the shape and chemical ordering being rotationally invariant to at least one C n axis by an angle of 2π/n. Most NPs have a multitude of C n axes, often with multiple C 2 axes and a few higher order symmetry axes. In this work, we only consider higher order symmetries, around which a symmetric chemical ordering is more likely to present in a stable NA. Figure 2 shows all unique C n axes of fcc, icosahedral and decahedral NPs excluding their C 2 axes.
To further classify the symmetries of different types of symmetric NAs, we introduce the concept of three basic symmetries: spherical, cylindrical and planar 47 . To exploit the symmetry of a symmetric NA, a set of atom groups, which we here name as symmetry-equivalent groups (SEGs), can be chosen with caution so that all atoms are of the same type in each group. The atoms in a symmetric NA can be grouped under different criteria depending on the symmetry that one wants to exploit. Assuming the C n axis is aligned to the z direction, the SEGs divided by the three basic symmetries are then defined as spherical: constant distance to the geometric center; cylindrical: constant horizontal distance to the z-axis; planar: constant z-coordinate. According to these definitions, the L1 0 NA shown in Fig. 1(b) has both cylindrical and planar symmetries, whereas the segregated patterns in (c), (d) and (e) only have spherical symmetry, and the Janus pattern in (f) only has planar symmetry. For many systems such as Pt-Ni, Janus-like phase separation is known to be highly unlikely due to the size mismatch. Therefore it is useful to also  take into consideration the reflectional symmetry and single out a high symmetry subset from planar symmetry with the definition of mirror planar: constant vertical distance to the xy-plane. However, the examples depicted in Fig. 1(b)-(f) are all highly symmetrical. In reality, NAs with lower symmetry can still be favored over the highly symmetrical ones. Given that the degree of symmetry is inversely related to the number of SEGs divided by it, we can introduce additional symmetry definitions with more SEGs to account for NAs with lower symmetry. One way to achieve this is by combining two of the three basic symmetries 47 . This will always result in a circular symmetry regardless of the intersection between any two of the sphere, cylinder and plane symmetries. The SEGs divided by circular symmetry is thereby defined as circular: constant horizontal distance to the z-axis and constant z-coordinate.
Similar to mirror planar symmetry, we can also introduce a high symmetry subset for circular symmetry with the definition of mirror circular: constant horizontal distance to the z-axis and constant vertical distance to the xy-plane.
In most cases, there are only so many atoms that are perfectly located on a constant surface. Thereby we generalize the definition of a constant surface by introducing a tolerance measurement for each SEG. The tolerance can be interpreted as the largest allowed difference between the distance metrics of the closest-to-center atoms in two adjacent SEGs, where the type of distance metrics is determined by the criterion of the symmetry. A comprehensive summary of these NA symmetries can be found in Supplementary Table 2. As can be seen, the number of circular (or mirror circular) SEGs are much more than those divided by basic symmetries, hence exists much more unique symmetric NAs. This is also the reason we neglect the lowest symmetries such as twofold rotation, mirror and inversion symmetries, as the search space is not significantly reduced.
SCGA is a GA variant specifically designed for symmetric NAs, where the genetic operators are adapted in a way that the symmetry of the NA is always preserved. To achieve this, one must first provide the SEGs given by the symmetry that one wants to exploit. A SCGA starts by randomly generating an initial population of symmetric NAs, then applying symmetry-constrained operators to procreate offsprings with inherited symmetry. Compared with a traditional GA, the SCGA adds the constraint that the atoms in each SEG must have the same type of element for every candidate throughout the run.
The obvious advantage of a SCGA over a traditional GA is that it only explores a drastically reduced subspace of NA structures. To quantify the difference, we can calculate the number of all homotops of an N-atom binary NA at a composition of A n B N−n as [Eq. 2] However, the number of symmetric homotops at a certain composition is system-specific. As an alternative, we can calculate the sum of the number of homotops at each composition. For a binary NA system the total number of homotops across all compositions is given by [Eq. 3] In contrast, the total number of symmetric binary NAs across all compositions can be written as [Eq. 4] where K Cn is the number of SEGs around a C n axis.
As an example, we illustrate the number of homotops at each composition of a TOh260 binary NA in Fig. 3(a). It is common to perform multiple SCGA runs in parallel w.r.t different symmetry axes. In the case of TOh260, we consider symmetric homotops with threefold and fourfold mirror circular symmetry, each constituting 22 SEGs. As can be seen, when approaching the 1:1 composition, the number of symmetric homotops only accounts for about 1/10 70 of the number of all homotops. As the particle size becomes larger, the total number of homotops across all compositions becomes astronomical due to the combinatorial explosion as described in Eq. (3). As a result, the SCGA becomes more advantageous, reaching a reducing factor of 10 1160 when the number of atoms exceeds 4,000, as manifested in Fig. 3(b). The significant reduction of the search space makes it possible to perform global optimization on large NAs at a reasonable computational cost. Also, searching through the whole space of homotops will inevitably encounter duplicates due to the selfsymmetry of the NPs, whereas every symmetric homotop is unique by definition. We want to add that the validity of the results obtained by the SCGA should not be a major concern given the fact that the ground states of NAs are either perfectly symmetric or highly symmetrical in most cases.
With the combination of the NNP and the SCGA, our scheme aims to predict stable NA structures with DFT-level accuracy at any given size, shape and composition, thereby bridging the materials gap between theory and experiment. Among various reported bimetallic nanocatalysts, we resort to the Pt-Ni NA as our target system since it has received considerable attention due to the low material cost of Ni, as well as the exceptional electrocatalytic properties for the oxygen reduction reaction (ORR) [48][49][50][51] . It has been shown that Pt-Ni NAs can be applied to hydrogen production 52-54 , proton exchange membrane fuel cells (PEMFCs) [55][56][57][58] and metal-air batteries. 59 However, despite extensive studies in experiments, the structural stability of Pt-Ni NAs in relation to size, chemical composition and chemical ordering still remain controversial. In the present work, a NNP is constructed to represent the high-dimensional potential energy surface (PES) of the Pt-Ni bimetallic system. The NNP is then employed in conjunction with the SCGA to determine the optimal chemical orderings of Pt x Ni 1−x NAs across all compositions in a wide range of sizes, encompassing the most interesting ones in view of applications, i.e., from 1.5 to 5 nm in diameter and in the most favorable shapes.

RESULTS AND DISCUSSION
In this section, we first describe the construction of the NNP for Pt-Ni NAs including four main steps: structure sampling and resampling, and NNP training and validation. In the second part of the section, we report quantitatively the results of the stable Pt-Ni NAs from 1.5 to 5 nm obtained by NNP-based SCGA runs. We then carry out a comprehensive analysis of the favored chemical ordering as a function of shape, size and composition, followed by a comparison of our results with previous experimental and theoretical findings on Pt-Ni NAs. Technical details for the reference first-principles calculations, the construction of the NNP and the SCGA runs can be found in the Methods section.

Construction of the neural network potential
To construct an initial training set for the NNP, a small set of monometallic motifs are prepared as prototypes. To ensure the diversity and versatility of the training set, we include NPs, surface slabs and bulk structures in these prototypes. This is especially important for applications on large NPs, in which the inner core becomes more bulk-like and the facets closely resemble extended surfaces. For bulk structures, 4 space groups are considered: I4/ mmm, P4/mmm, Pm3m and R3m. For surface slabs, we include 8 Miller indices reported to be exposed on pure Pt and Ni NPs 60 : (100), (111), (110), (221), (311), (322), (331) and (332). For NPs we consider the three most abundant structural motifs-fcc, icosahedron (Ih) and decahedron (Dh), together with their derivatives formed by various truncations, all depicted in Supplementary Fig.  5. The sizes of these NP prototypes are strictly restricted to be no larger than 210 atoms so that all DFT calculations are still relatively efficient. On the other hand, no NPs under 100 atoms are considered in the prototypes as they often exhibit strong quantum effects and complex spin multiplicities, which could significantly bias the PES fit on larger particles.
To extensively sample the configurational space of Pt-Ni alloys, semi-grand canonical Monte Carlo (SGCMC) simulations are performed for all prototypes with an EAM potential 61 using the implementation in LAMMPS. 62 Since the chemical potential difference between the two species, Δμ, is fixed in each SGCMC simulation, we hence carry out SGCMC simulations at various Δμ to account for various compositions. The SGCMC simulations are coupled with canonical molecular dynamics (MD) simulations to also sample the structural space. As a result, we have sampled more than 26,000 Pt-Ni alloy structures with random chemical orderings from the monometallic prototypes, including more than 20,000 NAs. The detailed hybrid MC-MD simulations are described in Supplementary Note 1.
Despite extensive sampling of Pt-Ni NAs using hybrid MC-MD simulations, the sampled structures either favor limited types of chemical orderings (e.g., core-shell and multi-shell in the case of EAM for Pt-Ni), or do not exhibit any strong ordering at all. Therefore it is important to also sample NAs with a variety of symmetries displayed by their chemical orderings. In this work, we use the symmetric NA generator implemented in the Alloy Catalysis Automated Toolkit (ACAT) 63 to generate NAs with circular and mirror circular symmetries from the prototypes. This allows us to sample more than 29,000 symmetric NAs with various degrees of symmetry. When generating symmetric NAs, we always align the z direction to the corresponding C n axes depending on the shape.
It is still a daunting task to perform DFT calculations on all 55,000 sampled Pt-Ni alloy structures. Query by committee (QBC) 64 is a good way to refine the training set without losing too much diversity. In this work, we combine a bootstrap ensemble with a batch-based QBC scheme to efficiently screen images in the training set. We use the serial version of RuNNer 34,65 to minimize the computational cost of constant training and retraining of all NNP models in the ensemble. A detailed description of the QBC resampling can be found in Supplementary Note 2.
As manifested in Supplementary Table 1, after the first round of QBC resampling, the number of images in the training set has been reduced to more than 17,000. The training set has been further refined to 6828 images by repeating another round of QBC resampling with a reduced batch size. A third trial resampling has been carried out with a smaller batch size, but the accuracy of the NNP fit on the reduced training set is unable to meet the target energy convergence criterion 3 meV ⋅ atom −1 . Therefore, we use the database after the second QBC resampling as our final training set.
We have calculated the energy and forces of each structure in the resampled database using DFT. To estimate the accuracy of the DFT-trained NNPs, the resampled database have been split into a training set and a test set with a ratio of 9:1. Table 1 shows the energy and force root-mean-square errors (RMSEs) of the training and the test sets for different neural network (NN) architectures after 30 epochs of training. We use the NN architecture of two hidden layers each containing 20 neurons as our final model since it provides the best compromise between accuracy and efficiency. With this model, the final energy RMSEs for the training and the test sets are 1.54 meV ⋅ atom −1 and 1.77 meV ⋅ atom −1 , respectively, while the force RMSEs for the training and the test sets are 69.07 meV ⋅ Å −1 ⋅ atom −1 and 69.40 meV ⋅ Å −1 ⋅ atom −1 , respectively. As manifested in the parity plots in Supplementary Fig. 6, the PES located by the NNP is in good agreement with DFT. Also, there is no observable difference in performance between the NNP predictions on the training and the test sets, indicating that the model is not overfit. The NNP is further validated on an independent test set, as demonstrated in Supplementary Note 3.  (111) and (100) facets exposed, while Ih motif only has (111) facets exposed. We do not consider other fcc motifs such as cube and octahedron, as Pt-Ni NAs in the size range of 3-5 nm are most frequently found to be cubo-octahedral in experiments. 66 We first look at the chemical composition in the 1st, 2nd and 3rd outermost shells of each stable NA. In Fig. 4 we plot the Ni content as a function of both size and composition. The number of atoms in each NP is converted to the diameter by an approximation of D = 0.315N 1/3 for better visualization, where D is defined as the diameter of a sphere with the same volume as the particle. As can be seen, surface segregation of Pt with subsurface segregation of Ni is favored by all three motifs in most concentration regions. The Pt surface segregation is not a surprise since Ni possesses comparable surface energy to Pt 68 but a much smaller atomic radius. Interestingly, as the particle size increases, the Pt enrichment on the surface becomes more severe, whereas the subsurface becomes more mixed. We also notice that the Ih motif also exhibits Pt enrichment in the 3rd outermost shell. This is confirmed by analyzing the Pt content in each concentric shell of the most stable Ih Pt-Ni NAs at various sizes and nine representative compositions, as depicted in Supplementary Fig. 8. In addition, the Ni content tends to increase gradually as the shell moves toward the core.
Next, we investigate the ordering patterns for the bulk-like core of Pt-Ni NAs, i.e., all concentric shells excluding the strongly segregated surface and subsurface. Considering the complexity of chemical orderings, it is important to quantify the degree of intermixing of the two components in each SEG of a binary NA. Herein, we introduce a normalized mixing order parameter with the definition adapted from a previous paper 47 as [Eq. 5] where K is the number of SEGs, x A i and x B i are the mole fraction of metal A and metal B in the ith SEG, respectively. Mixed patterns often have medium to high S mix , whereas segregated patterns tend to have very low S mix values. For example, a perfect core-shell or multi-shell Ih NA has a S mix of 0 if the atoms are grouped by spherical symmetry, which is inherently present in the concentric shells. If we group the atoms by planar symmetry, a perfect layered (including Janus) NA also has a S mix of 0, while rock salt structure represents a perfectly mixed pattern with a S mix of 1.
The ordering parameters S mix for the three basic symmetriesspherical, cylindrical and planar-are plotted against both size and composition in Supplementary Fig. 9. In general, Pt-Ni NAs show a trend of intermixing as the size becomes larger. For instance, 1.5-3.5 nm Ih and Dh Pt-Ni NAs exhibit strong chemical orderings with spherical and cylindrical symmetries respectively, but both become less segregated after the diameter exceeds 3.5 nm. However, strong chemical ordering with planar symmetry is present in both stable fcc and Dh Pt-Ni NAs at almost all compositions and sizes, indicating an inclination of forming layered ordering patterns. Figure 5 depicts the putative ground-state structures (hereinafter referred to as ground-state structures for simplicity) of 2-5 nm Pt-Ni NAs with fcc, Ih and Dh motifs considering all possible compositions. It is not a surprise that the ground-state structures of all motifs have symmetry around their highest order symmetry axes, e.g., C 4 over C 3 for fcc and C 5 over C 3 for Ih. The ground state of the 2 nm TOh Pt 166 Ni 148 exhibits an L1 2 phase in the inner core. As the size becomes larger, the inner planar layers have a strong tendency of forming the L1 0 phase, in which the number of heterobonds between layers is maximized. The ground-state structures of Dh Pt-Ni NAs also show similar L1 0 layered phase present in the inner bulk-like region. With the additional cylindrical symmetry, they exhibit onion-like patterns in the direction perpendicular to the C 5 axis. The ground-state structures of Pt-Ni NAs with Ih motif also form onion-like multishell patterns in the direction perpendicular to the symmetry axis. However, unlike fcc and Dh, there is no strong ordering present in the direction along the symmetry axis. Despite the discrepancies between different motifs, all ground-state structures of Pt-Ni NAs have segregated Pt-skin and Ni-subskin in common, therefore higher Pt concentration than Ni.
Finally, we study the structure crossover of Pt 1−x Ni x NAs of various motifs, as well as the evolution of the ordering pattern as a function of the particle size at various compositions. Direct comparisons of the energetics of NAs with different compositions and sizes are made by introducing the structural transformation inertness, Δ 69,70 , defined as [Eq. 6] where N is the total number of atoms, N A and N B are the number of atoms of type A and B, respectively, E AB is the total energy of the NA, E A bulk and E B bulk are the per atom potential energies of the bulk metals A and B, respectively, which are calculated to be The structural transformation inertness of pure Pt and Ni NPs at all magic numbers is shown in Supplementary Fig. 10. Apparently, Ni NPs are more stable than Pt NPs, and the thermodynamic stability of a NP is proportional to the particle size. The TOh motif is favored by Pt NPs in the whole size range from 1.5 to 5 nm, followed by Dh. The Ih motif is strongly disfavored by Pt, becoming even less stable than COh after the particle size exceeds 2 nm. On the other hand, Ih is stable up to size of 3 nm for Ni NPs and then plateaus. The energy differences between different motifs of Ni NPs are not as distinct as in Pt NPs. The TOh motif is still favored at almost all sizes from 2 to 5 nm, while Dh becomes more stable than Ih after the diameter exceeds 2.7 nm.
The structural transformation inertness of the lowest-energy Pt-Ni NAs with representative Pt:Ni ratios of 4:1, 3:1, 2:1, 3:2, 1:1, 2:3, 1:2, 1:3 and 1:4 at all magic numbers is summarized in Fig. 6. In general, TOh motif is always either the most or the second most stable motif at all compositions and sizes. Noticeably, there is a competition between TOh and Dh for Pt-Ni NAs at high Pt concentration, while Ih becomes the competitor with TOh at high Ni concentration. Inspecting the chemical ordering patterns allows us to rationalize the stability competition between different motifs.
Starting with fcc Pt-Ni NAs at low Ni content (below 33% in Ni), Ni starts populating the particle by segregating below the (100) facets as a prodrome of the L1 0 ordering. As shown in Fig. 7(a), the combined pattern of Pt-skin and Ni-subskin with very poor Ni in the interior can have important consequences on the ORR catalytic activity of these systems 51,71 . With a subsequent enrichment in Ni, the fcc NAs evolves into L1 0 to maximize the number of heterobonds between layers, determining a stabilization of NAs characterized by a Pt-skin followed by a perfect L1 0 inner core, as depicted in Fig. 7(b). In the same compositional range, Ih motif is characterized by a Pt-skin followed by a rigorous subsurface segregation of Ni and by patchy patterns in the internal shells 46 . On the side of stability, Ih NAs are in general less stable than fcc NAs due to the stress accumulated in the (Pt-rich) internal regions of the particles. However, a crossover between TOh and Ih motifs takes place when moving from 33 to 50% in Ni content. The rationale for this crossover is that the TOh NAs accommodate the extra Ni in the subsurface shell by keeping the unaltered L1 0 pattern in the core. The L1 0 pattern is so stable that, when the subsurface sites are not sufficient to accommodate all the extra Ni, the particle prefers to decorate the surface with Ni rather than altering the interior. This Ni enrichment simultaneously determines the destabilization of TOh NAs due to the difficulty of stabilizing a complete subsurface shell of Ni in a crystalline fcc motif, as well as the stabilization of Ih due to its natural multi-shell structure and the stress release arisen from the size mismatch between Pt and Ni. Interesting results are observed at the composition of around 1:1, where Pt-Ni NAs with Ih motif are remarkably more stable than those with TOh motif in almost the whole size range considered, although an eventual crossing between the two motifs (favoring TOh) seems to take place in the region between 4.5 nm and 5 nm. When further increasing Ni content, the scenario does not change much in terms of the strong competition between TOh and Ih motifs. At Ni content of around 75%, an ordered L1 2 pattern appears in the interior of fcc NAs as shown in Fig. 7(c), while Ni tends to populate the subsurface and the whole interior of Ih NAs.
We then look into the specific compositions and sizes at which Ih motif is stable. First, we notice that even a tiny amount of Pt doping would significantly stabilize an Ih Ni NP, even at size of 5 nm. Among all the stable Ih ordering patterns, the flower-like pattern present in the Ih923 Pt-Ni NAs at Ni concentrations of 50%, 60% and 67% is particularly interesting. As shown in Fig. 8, the formation of two patchy patterns in the 3rd and 4th outermost shells makes the Ih923 NAs distinctly more stable than TOh motif at the same size around 3 nm. By analyzing the ratio of Pt-Ni bonds within each concentric shell of the most stable Ih NAs at the nine representative compositions, we confirm that the heterobonding between Pt chains and Ni islands in the 3rd outermost shell gives rise to the stabilization of Ih motif when the Ni concentration is no <50%, depicted in Supplementary Fig. 11.
On a side note, Dh Pt-Ni NAs are always less stable than fcc and Ih motifs in the whole compositional range, despite the noticeable stabilization at high Pt concentration. COh is always the unfavorable fcc structure compared to TOh due to the large surface area of the less stable (100) facets.
As a general comparison, the oscillating concentration profile for the near-surface region of the ground-state Pt-Ni NAs obtained in this work, i.e., Pt segregation in the first and third outermost shells and a Ni-enriched subsurface, is consistent with the previous Monte Carlo simulations of segregation in Pt-Ni NAs using the MEAM potential 72 , as well as the experimental findings of sandwich-like structures present in the stable Pt-Ni (111) and (100) surfaces 71,73,74 . Significant surface Pt enrichment in annealed Pt-Ni NAs of various compositions has been reported experimentally [75][76][77] , but a detailed dissection of the interior chemical ordering is still lacking from experiments. According to the phase diagram of bulk Pt-Ni alloy obtained both experimentally and theoretically [78][79][80] , the L1 2 ordered phases are formed at Pt:Ni The fcc ground-state NAs cutting along and perpendicularly through the C axis, respectively; c the Ih ground-state NAs cutting along the C axis; d, e the Dh ground-state NAs cutting along and perpendicularly through the C axis, respectively. Pt is gray, Ni is green. ratios of 3:1 and 1:3, while the L1 0 ordered phase is formed at the ratio of 1:1. As depicted in Fig. 7, the stable fcc Pt-Ni NAs obtained in this work show similar trend in the interior region of Pt 0.25 Ni 0.75 and Pt 0.5 Ni 0.5 , but some prefer a Pt-rich core over L1 2 in Pt 0.75 Ni 0.25 . In fact, at Pt:Ni ratio of 3:1, we only observe the L1 2 phase in relatively small fcc Pt-Ni NAs (≤586 atoms). It appears to us that the fcc Pt-Ni NAs, at least in the size range of 2.8-5 nm, tend to firmly hold the segregated pattern of Pt-skin and Ni-subskin when decreasing the Ni content from 50 to 25%, which prevents the subsurface Ni atoms from entering the interior to form the L1 2 pattern. On the other hand, when increasing the Ni content from 50 to 75%, some Pt atoms start to abandon the surface in order to mix with Ni atoms and form the L1 2 phase in the interior. Interestingly, our finding is in good agreement with a previous global optimization study of COh561 and COh923 Pt-Ni NAs using a tight-binding potential 81 , where only two ordered phases of L1 0 and L1 2 are found to be stable at Pt:Ni ratios of 1:1 and 1:3,   respectively. However, one noticeable difference is that there is no strong surface or subsurface segregation found in their work.
Lastly, we want to add that the experimental observation of Pt-Ni NAs is most likely to be a mix of TOh and Dh motifs at high Pt concentration, and a mix of TOh and Ih at medium to high Ni concentration, with an unexpected Ih domination at around 1:1 composition. The present crossover analysis and the energetic information provided above only aim to suggest the probable ratio of each motif at each size and composition that should be observed experimentally.
To summarize, we have devised a SCGA methodology based on the rotational and reflectional symmetries of the NAs, and integrate it with a NNP to predict the structures of stable Pt-Ni NAs of various sizes, shapes and chemical compositions with high accuracy and efficiency. We have trained an accurate NNP with an energy training RMSE of only 1.54 meV ⋅ atom −1 . We have demonstrated the prominent predictive power of the NNP as well as the efficiency and accuracy of the SCGA by multiple validation case studies. Finally, we have integrated the NNP with the SCGA to search for the full convex hulls of 36 Pt-Ni NA systems with fcc, Ih and Dh motifs, and sizes from 1.5 to 5 nm, i.e., in the most interesting size range for Pt-Ni NA applications in hydrogen production, PEMFCs, and metal-air batteries [52][53][54][55][56][57][58][59] .
Based on the resulting convex hulls, we predict that for 1.5 to 5 nm Pt-Ni NAs: (i) all motifs have strong Pt segregation to the surface (Ptskin) accompanied by Ni segregation to the subsurface (Ni-subskin); (ii) Ih is the predominant motif at 1:1 composition for almost all considered sizes, especially at around 3 nm thanks to the formation of a flower-like ordering pattern; (iii) stable fcc motif forms sandwichlike inner core along the symmetry axis, stable Ih motif forms onionlike structure in the direction perpendicular to the symmetry axis, while stable Dh motif forms a combination of both; (iv) TOh and Dh motifs are mainly observed for stable Pt-rich NAs, whereas TOh and Ih motifs are favored in equally-mixed and Ni-rich NAs; (v) the interior of the stable fcc NAs exhibits the L1 0 ordered phase at the composition around Pt 0.5 Ni 0.5 NAs, while the L1 2 phase is only found at the composition of Pt 0.25 Ni 0.75 for medium-sized NAs.
These findings provide theoretical insight to the largely unexplored structural stability of Pt-Ni NAs, with potential implications on their catalytic activity [52][53][54][55][56][57][58][59] . Overall, our NNPbased SCGA scheme allows for efficient and effective study on the mixing behavior between a variety of metals. In addition, our SCGA implementation is not limited by the number of metal components in the NA, hence can be especially beneficial for studying high-entropy alloys 82,83 . The rapid and accurate predictions of the stable shape, chemical composition and chemical ordering of realistic size NAs enable rational design of stable NA catalysts used in real-word applications.

Density functional theory calculations
All reference DFT calculations for the training set of 6828 Pt-Ni alloy structures have been performed using the Vienna Ab initio Simulation Package 84,85 with the spin-polarized revised Perdew-Burke-Ernzerhof 86 exchange-correlation functional. We have also performed convergence tests w.r.t k-points and vacuum size. The convergence criterion is decided by the target accuracy of the PES given by the NNP w.r.t. DFT, which is <3 meV ⋅ atom −1 . According to the results shown in Supplementary Fig. 7, a vacuum layer of 8 Å is sufficient for Pt-Ni NAs. For bulk and surface slabs, we use the same k-point density as that obtained from a conventional 4-atom bulk Pt-Ni unit cell (500 k-points per Å −3 of reciprocal cell). The Γcentered k-point grids are constructed for all NAs using the Monkhorst-Pack scheme 87 . The ionic cores are described by the projector augmented wave potentials 88 and a plane wave cutoff energy of 500 eV is used for all calculations. Gaussian smearing is used with a width of 0.1 eV and the electronic energy convergence criterion is set to 1 × 10 −6 eV.

Neural network potential
In the Behler-Parrinelo neural network framework, the total energy E of a system σ with N atoms is given by the sum of all atom-wise contributions governed by their local chemical environments, [Eq. 7] where σ i is the local structure of the ith atom within a cutoff radius r c sufficient enough to include all relevant interactions (6 Å in this work). Atom-centered symmetry functions 89 are used as local structure fingerprints to be fed into the input layer of a NNP. In this work, we employ both radial (type 2) and angular (type 3) symmetry functions defined as [Eqs. [8][9]] where j and k are the two neighboring atoms of the central atom i, r ij , r ik , r jk are the interatomic distances between every two atoms, θ jik is the interatomic bond angle centered at atom i. The form of radial symmetry functions is characterized by the two parameters η and r s , whereas the form of angular symmetry functions is determined by three parameters η, ζ and λ. The cutoff function f c is defined as [Eq. 10] The training of the NNPs has been carried out with a Kalman filter 90 using the n2p2 91,92 implementation. The parameters for the radial and angular symmetry functions used in this work can be found in Supplementary Tables 3, 4. All possible elemental combinations are considered for each parameter set, leading to 106 symmetry functions in total.

Symmetry-constrained genetic algorithm
Symmetric NAs are evolved during a SCGA by performing random symmetry-constrained genetic operations on the SEGs. As depicted in Fig.  9, we have implemented three genetic operators for the SCGA: GroupSubstitute randomly selects one of the groups and change the type of all atoms in that group to another type. This mutation operator inevitably changes the composition of the NA; GroupPermutation takes two random groups of different atom types and swap the type of all atoms in one group with the other. The composition can be fixed by only allowing swaps between two groups with identical number of atoms; GroupCrossover takes two parents, concatenates the atoms in the first half subgroups from mother with those in the second half subgroups from father. The crossover point is chosen by an algorithm instead of the mid point if the composition is meant to be fixed in the SCGA. One clear advantage of group crossover over the traditional cut-splice crossover is Fig. 9 Schematics of the genetic operators for the SCGA. G1 to G6 denote the 6 groups divided by a certain symmetry criterion. The type of all atoms in a group always remains the same. Groups depicted in different colors are populated with different types of atoms. that the operation does not involve direct atom manipulation, which potentially speeds up the GA and at the same time avoids creating distorted or unphysical structures by splicing particle parts together. This is especially advantageous when employing the NNP in the GA since the relaxation of distorted structures are more likely to fail due to extrapolation of the NNP.
All genetic operators for the SCGA are implemented in ACAT 63 and used in conjunction with the GA module of Atomic Simulation Environment 93 (ASE). The SCGA is first validated by benchmarking against a baseline (traditional GA), as demonstrated in Supplementary Note 4. Next, we have performed SCGA to search for full convex hulls of Pt 1−x Ni x NAs with mirror circular symmetry around their symmetry axes.
To start with each SCGA run, we initialize a population of random symmetric NAs by randomly populating SEGs with Pt or Ni atoms using a symmetric NA generator. we then use the NNP calculator with the FIRE 94 optimizer for small particles (<1000 atoms) or the L-BFGS 95 optimizer for larger particles to efficiently locate energy basins (local minima) of each candidate. The force convergence criterion is set to 0.01 eV ⋅ Å −1 for particles with <500 atoms, 0.05 eV ⋅ Å −1 for particles with 500-1000 atoms, and 0.1 eV ⋅ Å −1 for particles with more than 1000 atoms. Next, the relaxed candidates are ranked according to their fitness given by − E mix . Parents are selected based on their ranks (roulette wheel selection) 10 to breed a new generation by performing mutation or crossover. The local optimization, selection and genetic operation processes are reiterated until convergence.
SCGA runs have been performed in parallel w.r.t. C 3 and C 4 axes for fcc, C 3 and C 5 axes for Ih, and C 5 axis for Dh motifs. For NAs with more than 1000 atoms, C 3 axis are abandoned for fcc and Ih motifs for simplification, and additional fixed-composition SCGA runs probing both high and low concentrations are performed since the symmetric NAs at medium concentration quickly becomes too dominant in the population after few generations of full convex hull search. A SEG tolerance of 1 Å is used for all SCGA runs. Group substitute, group permutation and group crossover operators are used in all SCGAs with a ratio of 3:4:3. For the group substitute operator, we mutate only one SEG if the number of SEGs K is <20, mutate two if 20≤K≤50, and mutate three if K > 50. When initializing the SCGA runs, the population size is always set to half of the number of atoms in the NA. For NAs with <300 atoms, the SCGA converges when there are no change in the population for five consecutive generations. Looser convergence criteria are set for larger NAs.

DATA AVAILABILITY
The majority of the data generated in this study are openly accessible on our Zenodo repository. 96 The repository includes: (i) the dataset consisting of the initial 55,982 Pt-Ni alloys sampled by EAM; (ii) the final DFT dataset consisting of 6828 resampled Pt-Ni alloys used for training the NNP; (iii) all stable Pt-Ni NAs (vertices on the convex hulls) obtained by the NNP-SCGA runs; (iv) all the input files and scripts for hybrid MC-MD simulations, QBC resampling, DFT calculations, NNP training, NNP-SCGA runs and convex hull analyses.

CODE AVAILABILITY
A code for generating symmetric NAs and running the SCGA is freely available on GitLab. 63 We note that both the symmetric NA generator (acat.build. ordering.SymmetricClusterOrderingGenerator) and the group operators (acat.ga.group_operators.GroupPermutation / GroupCrossover) have the option of fixing or varying the composition, which allows the user to perform the SCGA for NAs at a certain composition or across all compositions.