Introduction

Disordered and partially disordered systems can contain a relatively high number of components. In recent years, active research in these high-component disordered systems has spanned a range of breakthrough technologies1,2, including high-temperature, strong, and lightweight high-entropy alloys3, superionic lithium conductors4, ultra-high temperature ceramics for structural applications in extreme environments5, and sustainable battery design with improved performance6,7.

It is known that local configurations are important for some materials properties. For instance, in multi-principal element entropy alloys (MPEAs), magnetic interactions can drive atomic orderings which explain otherwise anomalous material properties8. Given the challenges in modeling multicomponent alloys9, coarse-grained Hamiltonians such as the cluster expansion (CE) approach have been remarkably useful, leading to the discovery of hierarchical ground state orderings10, prediction of configurational energetics11, and generation of mesoscale phase-field models12.

The CE approach which maps the configurational problem in a crystalline solid on that of a lattice model has been used in pseudo-binary and ternary ionic systems to predict solid state phase diagrams in the CaO–MgO system13, understand fluorine solubility14, observe lithium (Li)-gettering in fluorinated cathodes15, and characterize short-range-order16. In this work, we apply the CE approach to study a new class of partially disordered spinel (PDS) materials which exhibit ultrahigh energy and power density17 and discuss new challenges specific to high-component ionic CE. Since ionic systems have greater formation energy than do metallic systems, prediction errors tend to be higher18, motivating methodology studies such as this one to aid in reducing sources of error and developing predictive CE models. We first explain the complexity of PDS and summarize the theory of multicomponent, multi-sublattice systems introduced in detail elsewhere19,20. Next, we discuss ab-initio data generation and preparation specific to ionic CEs, namely species charge assignments and structural relaxations that maintain sublattice topology.

We then introduce new methods for fitting the CE by grouping the thousands of possible effective cluster interactions which are the expansion coefficients of the basis functions that describe the configurational arrangement. We demonstrate how to group site interactions required to address the compositional constraints arising from the charge neutrality requirement in ionic systems. Rank deficiency problems occur within groups of basis functions on the same lattice figure because it is not possible to sample all configurations with ab-initio calculations. We handle this by applying sparse group lasso regularization when the energetics of unsampled configurations is represented in lower-order features. Finally, we show that models of high-component systems are prone to higher errors compared to models of lower-dimensional systems which have been well-explored, and bring the new perspective that model predictability should instead scale with configuration space size.

Background: motivation to use CE to study high-component ionic systems

Our work was inspired by a new class of Li-Mn-oxyfluorides in the PDS structure, which have demonstrated ultrahigh power and energy density in Li-ion batteries, delivering over 900 Wh kg−1 17,21. These materials are approximately based on an AB2X4 spinel structure which consists of a face-centered cubic (FCC) anion (X) framework with half the octahedral sites occupied by metal B (the “16d” sites) and the other half (“16c” sites) unoccupied. A small number of tetrahedral sites (“8a” sites) are occupied by the A metal. It is the requirement that these occupied tetrahedral sites have no face-sharing with octahedral sites that creates the 16d/16c cation ordering on the octahedral sites. PDS is a significant departure from this classic stoichiometric spinel both because it has a higher cation/anion ratio than spinel and is partially disordered. For example, the PDS compound of Ji et al.17 has stoichiometry Li1.68Mn1.6O3.7F0.3 with 84% of the Mn in 16d sites and 16% of the Mn in 16c sites. Li, which in LiMn2O4 would solely occupy the 8a site fills only 52% of the tetrahedral 8a sites, 30% of the octahedral 16d sites, and 28% of the octahedral 16c sites. Note that in PDS the cation/anion ratio is higher than in a stoichiometric spinel, where it is 3/4.

The structure of the PDS is challenging to understand because the cation-excess space removes baseline understanding of how cations can be arranged in the structure. For instance, it is unclear how the Mn occupancy of the 16c site affects the occupancy of the nearest neighbor 8a sites with which it is face-sharing, which is usually not preferable in oxides. Because the 8a sites form a percolating transport channel for Li one would expect their blockage to lead to poor Li transport, but this is contrary to what is observed experimentally.

In principle, a well-parametrized configurational CE would enable equilibration of the local structure in the system with Monte Carlo (MC) techniques, as is done to identify chemical short-range order22,23,24, compute phase diagrams25,26,27, and find ground states26,28,29. We show that in practice, simple assumptions and typical approaches to obtain the CE are difficult for this system with such high configurational degrees of freedom. For the PDS materials, which have the stoichiometries Li1.68Mn1.6O3.7F0.3 and Li1.68Mn1.6O3.4F0.6, the anion FCC lattice hosting binary disorder (O2−, F) forms two types of symmetrically distinct cation sites with different allowed species on them: an octahedral site with quinary disorder taken from the space of (Li+, Mn2+, Mn3+, Mn4+, Vacancy) and a tetrahedral site with ternary disorder among (Li+, Mn2+, Vacancy). Without symmetrizing, the configuration space in a primitive cell has dimension 90, obtained by taking the product of all site spaces from the anion, octahedral, and two tetrahedral sites30. It is clear that this CE transcends the usual complexity of CE models which are typically done for dimension two or three, resulting from one site space with binary or ternary disorder31,32.

Background: introduction to CE theory

We provide only a brief introduction to the mathematics of CE, referring the reader to classic works by Sanchez, Ducastelle, and Gratias19 and van de Walle20 for a comprehensive explanation of multi-component CE. For multi-sublattice ionic CE, we refer the reader to work by Tepesch, Garbulsky, and Ceder30 and our recent review33.

The CE approach assumes an underlying well-defined set of sites (“the lattice”) over which species can distribute. The lattice can be partitioned into “sublattices” with different allowed species decorations. For instance, in ionic systems there are typically at least two such sublattices: one for the cation species and another for the anion species. Here, the terminology “lattice” is used in a broader sense than in crystallography where in the strictest sense of the term it only refers to the Bravais lattice of a structure.

The basic principle of the CE is that a relaxed DFT structure is represented by an occupation string σ which describes exactly which species occupy each site on all sublattices. A CE representation of the energy is possible as long as this mapping between a DFT-relaxed structure and occupation string σ is one-to-one. This distinction is necessary because a lattice cluster expansion model cannot capture the exact spatial positions of atoms. Rather, it strictly specifies the decoration σ of a lattice.

Any scalar extensive quantity q can be represented as a function of its decoration σ as:

$$q\left( {\mathbf{\upsigma}} \right) = \mathop {\sum }\limits_\beta m_\beta J_\beta {\langle{{\Phi }}_\alpha \left( {\mathbf{\upsigma}} \right)\rangle}_\beta$$
(1)

where β are symmetrically distinct groupings of site basis functions on the lattice. The cluster α is a multi-index array with entries which label the corresponding single-site basis functions. mβ is the number of clusters α equivalent by symmetry in whatever normalizing unit scalar q is taken (e.g. per cell, per site, etc.), Jβ is the effective cluster interaction (ECI), and Φα is the cluster function. The 48 symmetry operations for the CE lattice are four C3 rotations, three C4 rotations, and an inversion, corresponding to the point group \(m\bar 3m\). Lastly, the average of cluster functions evaluated over a crystal is the correlation function and the concatenation of all correlation functions is referred to as the correlation vector.

As an example, we demonstrate the construction of a single cluster function Φα and evaluate it for a LiF structure using the orthogonal sinusoidal basis20. The n number of site basis functions, indexed from αj=0, …, n1, for a single site σi with ni possible species are:

$$\phi _{\alpha _j,n_i}\left( {\sigma _i} \right) = \left\{ {\begin{array}{ll} 1,\,\,\qquad\qquad\,\,\,\,\qquad{{{{\mathrm{if}}}}\,\alpha _{{{\mathrm{j}}}} = 0} \hfill \\ { - {{{\mathrm{cos}}}}\left( {\frac{{2\pi \left\lceil\frac{{\alpha _j}}{2}\right\rceil\sigma _i}}{{n_i}}} \right),{{{\mathrm{if}}}}\,\alpha _{{{\mathrm{j}}}} \,>\, 0\,{{{\mathrm{and}}}}\,{{{\mathrm{odd}}}}} \hfill \\ { - {{{\mathrm{sin}}}}\left( {\frac{{2\pi \left\lceil\frac{{\alpha _j}}{2}\right\rceil\sigma _i}}{{n_i}}} \right),{{{\mathrm{if}}}}\,\alpha _{{{\mathrm{j}}}} \,>\, 0\,{{{\mathrm{and}}}}\,{{{\mathrm{even}}}}} \hfill \end{array}} \right.$$
(2)

Given a set of single-site basis functions \(\{ {\phi _{\alpha _j,n_i}} \}\), the cluster function in Eq. (3) is the tensor product of the ni single-site basis functions on each possible site in σ:

$${{\Phi }}_\alpha \left( {\mathbf{\upsigma}} \right) = \mathop {\prod }\limits_{i = 1}^N \phi _{\alpha_i ,n_i}\left( {\sigma _i} \right)$$
(3)

The product is a “cluster-like”19 because only the occupancies on which the site function is not equal to the constant “1” are relevant.

To be explicit, we write the cluster function for a specific octahedral-tetrahedral “geometric cluster”. (A geometric cluster is strictly a set of crystallographic sites, whereas a cluster α is, in full technicality, a cluster of functions. However, for simplicity we refer to α as a cluster.) We then evaluate the cluster function for the occupancy string of Li1F1, which is \({\mathbf{\upsigma}} _{{LiF}} = [\sigma _1 = 0,\sigma _2 = 2,\sigma _3 = 2,\sigma _4 = 1]\) because the species on the octahedral (site 1), tetrahedral (sites 2 and 3), and anion (site 4) are [“Li+”, “Vacancy”, “Vacancy”, and “F-”]. In this example, we have chosen the site variables for an octahedral Li, a tetrahedral vacancy, and an anion fluorine to be 0, 2, and 1 respectively, but other site variables can be chosen.

The cluster function for sites 1 and 2 is:

$$\begin{array}{l}{{\Phi }}_\alpha \left( {\mathbf{\upsigma}} \right) = \left[ {1, - {{{\mathrm{cos}}}}\left( {\frac{{2\pi \left\lceil\frac{1}{2}\right\rceil\sigma _1}}{5}} \right), - {{{\mathrm{sin}}}}\left( {\frac{{2\pi \left\lceil\frac{2}{2}\right\rceil\sigma _1}}{5}} \right),} \right.\\ \left. { - {{{\mathrm{cos}}}}\left( {\frac{{2\pi \left\lceil\frac{3}{2}\right\rceil\sigma _1}}{5}} \right), - {{{\mathrm{sin}}}}\left( {\frac{{2\pi \left\lceil\frac{4}{2}\right\rceil\sigma _1}}{5}} \right)} \right] \otimes \left[ {1, - {{{\mathrm{cos}}}}\left( {\frac{{2\pi\left\lceil\frac{1}{2}\right\rceil\sigma _2}}{3}} \right), - {{{\mathrm{sin}}}}\left( {\frac{{2\pi \left\lceil\frac{2}{2}\right\rceil\sigma _2}}{3}} \right)} \right]\end{array}$$
(4)

This tensor product yields a basis set for the geometric cluster comprising site 1 and site 2. The basis functions for sites 1 and 2 are indexed as \(\left(\alpha_{j},\,\alpha_{{j^{\prime}}}\right)\), where αj indexes the basis functions for the octahedral site (with ni = 5) and \(\alpha_{{j^{\prime}}}\) indexes the basis functions for the tetrahedral site (with ni = 3). So, this set of multi-indices for cluster α is the Cartesian product of basis function indices, specifically: \(\left(\alpha_{j},\,\alpha_{{j^{\prime}}}\right)\) {(1, 1), (2, 1), (3, 1), (4, 1), (1, 2), (2, 2), (3, 2), (4, 2)}. The relevant set of basis functions has contracted multi-indices, meaning that all labels that are 0 (i.e. not part of the cluster in general) are dropped. When translational symmetry is included as well, these contracted multi-indices make up a set we call B, which is the set of symmetrically distinct orbits β. We will demonstrate the use of B in our regularization scheme later on.

Using Eq. (4), we calculate that \({{\Phi }}_{\alpha _j,\alpha _{j\prime } = (1,\,1)}\left( {\sigma _{LiF}} \right) = - 0.5\). The entire set of correlation functions for the set of contracted multi-indices are then: [−0.5, 0, −0.5, 0, −0.866, 0, −0.866, 0].

Results: high-component CE in oxides: example of partially disordered spinels (PDS)

Besides the CE to capture energy configurational energy dependence of the PDS, we also add an explicit term to capture the electrostatic energy in Ewald form34. The Ewald summation is a technique to efficiently sum up long-range electrostatic interactions and their periodic images and is a sum of the direct space, constant, and reciprocal space terms. The proportionality constant for this term is also fitted and can be thought of as representing the dielectric constant. We use Pymatgen35 to calculate the total Ewald energy. Lastly, we apply a form of the structure inversion method proposed by Connolly and Williams36 to determine the ECI and dielectric constant for the Ewald energy by fitting to DFT energies.

In building our CE, we consider relevant geometric clusters arising from multi-body interactions within a certain distance from one another. Our geometric clusters consist of pairs of sites less than 7 Å apart, triplets with points less than 5 Å apart, quadruplets 4 Å apart, and quintuplets 3 Å apart. Given that our lattice model fixes the nearest-neighbor octahedral-tetrahedral cation distance to 1.82 Å; the octahedral-anion bond length to 2.1 Å, and the tetrahedral-anion bond length to 1.82 Å, the correlation vector for a given structure has length 4587. Adding the Ewald energy adds one more dimension to our feature vector, resulting in a total length of 4588.

We use an in-house developed Python package, Statistical Mechanics On Lattices (smol), to generate the correlation vectors on the orthogonal sinusoidal basis in Eq. 2. Because of the large number of possible ECI in these high-component systems, even when limiting their interaction range to 7 Å, the fitting of cluster interactions to DFT energies always starts off as an under-determined system because the number of DFT-relaxed structures used as training data will be fewer than the number of ECI. Well-known statistical tools based on regularized regression exist to handle model generation in under-determined systems: lasso37, group lasso38, and sparse group lasso (SGL)39 all techniques which we will discuss later.

Result: data preparation – automatic, optimized charge assignments

In ionic systems, the same ion can behave differently in terms of their size, site coordination preference, or local interactions when it has a different formal valence. For instance, crystal field effects lead to a strong preference for Mn2+ to be tetrahedral, which is not observed for Mn3+ or Mn4+. This site and interaction preference cannot easily be captured when all Mn ions are treated as the same “Mn” species, as would be done in the CE of metallic systems, and therefore different charge states of Mn ions must be treated as different species. Prior work in ionic CE have also explicitly treated these charge states40,41. In this section, we describe how to optimally assign charge states to ions from electronic structure data. The details of DFT calculations are provided in Methods.

The charge density around a transition metal ion itself is often remarkably invariant with respect to the formal valence42,43, due to the hybridization shift with the anion that takes places when an electron is removed from the metal44,45. For example, total charge density integration upon Li insertion in λ-Mn2O4 (to spinel LiMn2O4) reveals greater charge-transfer to the oxygen anion, in that, upon Li insertion the Mn ion gains 0.136 electronic charge per electron, whereas the oxygen accepts 0.171 electronic charge43. Thus, there is a strong electron exchange with oxygen and for this reason magnetic moments have instead been found to be a much better guidance for the formal valence of an ion46.

We use the magnetic moment arising from d-orbital contributions to identify Mn charge states. These magnetic moments are obtained by integrating the local (spin up minus spin down) moments in a sphere around each Mn atom. Charge assignment is non-trivial because the moment distribution around Mn ions varies depending on its environment. For instance, we find that in MnF3 and Mn2O3 the magnetic moments for Mn3+ are 3.770 μB and 3.797 μB respectively, reflecting little difference between a F and a O environment for Mn3+. Yet, in Mn3OF5, which contains Mn2+ and Mn3+, the moment on Mn3+ is 4.077 μB which is significantly higher than in MnF3 and Mn2O3. (The moments on the two Mn2+, 4.351 μB and 4.393 μB, are clearly different from that on Mn3+.) Evidently, knowing the Mn moments in the pure oxide and pure fluorine reference states is not enough to assign charges in mixed-valence Mn-oxyfluoride compositions.

The cation configurations may also influence the magnetic moment distribution in non-obvious ways. To see this, we provide the Mn moments for three different polymorphs of Li6Mn4O10, in which the average Mn oxidation state is 3.5+, in Table 1, along with their nearest neighbor (NN) cation environments. Table 1 shows that all three polymorphs of Li6Mn4O10 are assigned to be “charge-balanced” if appropriate differentiation between moments for Mn3+ and Mn4+ is made. The magnetic moments for high-spin Mn3+ and Mn4+ are expected to be 4 μB (t2g3eg1) and 3 μB (t2g2eg1), which are reasonably represented in Polymorphs A and C, given that moments in reality are lower on the metal center since the surrounding oxygen hybridizes and shares some of the magnetic moment45,47,48.

Table 1 Description of three polymorphs of composition Li6Mn4O10.

Within the Mn4+ environments, we observe that the magnetic moments on the Mn4+ ion are relatively rigid when there are six or eight surrounding Li+ in polymorphs A and C. This is because the moment is always about 2.7 or 3.0 μB even when the surrounding environment is more Mn-rich as seen in polymorph C. However, the Mn4+ moment can be higher (3.169 μB) if the environment around Mn4+ has seven Li+, as seen in polymorph B.

Within the Mn3+ environments, the effects are even less clear: Having eight surrounding Li+ is associated with having a range of moments: from 3.232 μB to 3.629 μB. When Mn3+ is surrounded by equal Mn3+ and Mn4+ the moment is about 3.5 μB (polymorph A), but a more oxidized environment around Mn3+ can lead to either lower (3.232 μB in polymorph A) or higher (3.629 μB in polymorph C) moments. It may be necessary to know details of how the NN cations are arranged around Mn3+ to systematically understand how the moment is distributed.

Lastly, as a final indication of the effects which can influence magnetic moment distribution, we observe that Polymorph B does not have as well-separated magnetic moments, indicating some degree of self-interaction error which could be reduced by applying a Hubbard U correction49.

Given hundreds of relaxed DFT structures with moments arising from the various effects described (chemical, configurational, and remnant self-interaction), the challenge is to find an optimal solution for differentiating among Mn2+, Mn3+, and Mn4+. Our approach is to use Bayesian optimization via Gaussian Processes50 and assign charges to moments via some black box mapping function f under the condition of maximizing the total number of charge-neutral DFT structures. Black box optimization is particularly useful in this situation where each set of magnetic moments is computationally expensive, and the exact form of f is neither known nor necessarily differentiable51. We formulate f to depend on three magnetic moment upper cutoffs (corresponding to upper cutoffs for the three Mn valence states) that determine the charge for each Mn atom. The solution which minimizes the loss, the sum of the absolute value of each structure’s charge, is the final solution. We apply the Bayesian Optimization module in scikit-learn52 to charge-balance 642 out of 775 structures. The upper cutoffs are Mn3+: 4.082 μB, Mn4+: 3.228 μB, and Mn2+: 4.973 μB. Explanation S1 describes the approach in more detail. All magnetic moments in all DFT structures and their Bayesian-optimized cutoffs are plotted in Fig. 1.

Fig. 1: All Mn moments in 775 DFT-SCAN structures with Bayesian-optimized moments (dashed line).
figure 1

The optimized cutoffs results in 642 out of 775 charge-balanced structures.

Results: data preparation – structure mapping

As mentioned earlier, the rigorous implementation of the cluster expansion to model configurational disorder relies on a one-to-one mapping between relaxed DFT structures and a lattice occupation53. Typically, mapping back to the lattice configuration is done by performing structure matching after density rescaling, such that the density of the relaxed DFT structure is a multiple of the primitive cell14,16. Such mapping can be performed using the StructureMatcher functionality in Pymatgen35. In this structure mapping, an attempt is made to map all atoms from the relaxed DFT structure onto a subset of the sites of a supercell of the primitive cell within a set tolerance. Because the sites of the supercell of the primitive cell are the ideal “rigid” lattice sites, the mapping allows for each atom (and its species) in the relaxed DFT structure to be associated with a lattice site, and the remaining lattice sites (with no associated relaxed atom) are assumed to be vacant.

However, in ionic systems significant relaxation may occur in the DFT calculation of a structure. This may include distortions of the anion lattice due to size differences of the cations, vacancies, Jahn–Teller effects, and off-center relaxations of the cations in their anion coordination polyhedron. As long as the relaxed DFT structures maintain the topology of the CE lattice, they can in principle be mapped onto the lattice model. In the previously described structure mapping method, atoms that distort outside of the set tolerance can no longer be mapped to lattice sites. For example, in Fig. 2a, the relaxed Li+ (green sphere) should be associated with the cation lattice site (white sphere) at the center of the anion coordination polyhedra because it still sits within the anion coordination polyhedra but is outside the set tolerance for structure mapping. While one could attempt to include the case in Fig. 2a by simply increasing the tolerance for mapping, such increased tolerance can result in the mis-mapping of other atoms. Because in ionic systems the identification of a cation with a specific anion polyhedron is a key topological element, we propose a new method to properly map moderately distorted cations to cation lattice sites based on their anion coordination polyhedra.

Fig. 2: Details of structural mapping in Li-Mn-O-F rocksalt system.
figure 2

a Example of Li+ (green sphere) and its anion polyhedra in relaxed DFT structure which cannot be mapped to its proper cation site (white sphere), but which can be mapped using the new mapping technique. b Diagram of new structure mapping process, which involves mapping the anions (ai) of the relaxed structure srelaxed to the anion sites of the lattice configuration slattice (left), followed by mapping the cations of srelaxed to the proper cation sites in slattice by matching the anions in their anion polyhedra. c Example structure which fails to map using new mapping technique due to an Mn2+ that has relaxed too far from one of the O2− anions in its anion polyhedra.

Figure 2b demonstrates how we can obtain mappings from the relaxed DFT structures to the lattice configuration. Because the anion FCC framework of the spinel materials defines the cation sites, we first map only the anion sites (ai) in the relaxed structure (srelaxed) to the anion lattice sites (slattice) directly using the traditional StructureMatcher approach. This mapping must be successful for the relaxed structure to be considered as having an FCC anion lattice. For the cations, which can undergo larger relaxations, we associate each cation (ci) in srelaxed to its anion polyhedra by finding the set of nearest neighbor ai whose convex hull is not broken by ci. Because we can map the anions from structure srelaxed to its anion lattice sites (the anions in slattice), and we can also locate the cations in srelaxed in their anion polyhedra, we can map the cations to their cation lattice sites via an intermediate mapping based on the anion polyhedra of the cation sites in both srelaxed and slattice.

Using a combination of the StructureMatcher method and the method for mapping cations based on their anion polyhedra in the FCC lattice, we successfully obtain lattice configurations for 448 relaxed DFT structures, resulting in an overall efficiency of 70%. Of the 194 structures that fail to map, we are unable to map 106 structures due to a failure in the anion mapping (i.e., the anion FCC lattice is not adequately maintained). An additional 16 structures contain mappings of species to cation lattice sites where they are disallowed (i.e., Mn3+ or Mn4+ on the tetrahedral sites). The remaining 72 structures cannot be mapped due to improper identification of the anion polyhedra in the srelaxed. Improper identification of the anion polyhedra can result when relaxation of the cation is so severe that it distorts so far (>3.1 Å) away from one or more of the anions constituting its polyhedra that the neighboring anion is no longer identified as a possible member of the cation’s anion polyhedra, as in Fig. 2c. In this case the Mn ion in the octahedral has taken on a 2+ valence state which strongly prefers tetrahedral coordination. In an octahedron, a pseudo-tetrahedral environment can be achieved by relaxing to the center of the pyramid that constitutes half of the octahedron.

Lastly, we de-duplicate all 448 structures by their correlation vectors, finding a total of 428 distinct structures to be used for training and testing. Figure S2a shows the 0 K DFT ground states. The ground states are consistent with low-temperature experimental phases reported in the phase diagram of Li-Mn-O spinel in air by Paulsen and Dahn54.

Result: charge constraints on point basis functions in ionic CE

Given DFT structures which are charge-balanced and mapped to all sublattices, we next describe fitting procedures specific to reducing error in ionic CE. In statistics and machine learning, it is standard to center the target vector, i.e. train on \(E\left( {\vec \sigma } \right) -\left\langle{E\left( {\vec \sigma } \right)}\right\rangle\), so we propose here that J0 can be fitted to the average energy of the training set. However, note that since the zeroth basis function is defined as 1, the true value of J0 is the average energy of the random sample with sampled centered basis functions.

Next, the charge neutrality constraint limits the rank of the point basis functions. By writing the charge constraint for the number of species Ni, of each type i:

$$N_{Li^ + } + 2N_{Mn^{2 + }} + 3N_{Mn^{3 + }} + 4N_{Mn^{4 + }} = N_{O^{2 - }} + N_{F^ - }$$
(5)

it is clear any function of N, such as the occupation mapping function, \(f\left( {N_{Li^ + }, \ldots } \right) = \sigma\), and functions of σ, such as the single site basis functions, will also be constrained and have its rank reduced by one arising from Eq. (5). This is why with the fitting of the point ECI, the correct degrees of freedom need to also be enforced such that one ECI is set to 0. Otherwise, overfitting of the point ECI will result in higher out-of-sample error.

Results: applying structured sparsity due to rank deficiency

In principle a CE is always under-determined because there exist an infinite number of basis functions for a finite number of training data. In simple binary systems, we can sometimes posit that a subset of basis functions are relevant and solve for their corresponding ECI by fitting to an over-determined system. However, with high-component systems this procedure becomes complicated because even with a small set of clusters we have a large number of ECI. In these cases, we will always start from an under-determined system and use statistical approaches to enforce sparse solutions. One might add a constraint to the least-squares error function using Lagrange multipliers to penalize the L1 norm of solutions, an approach known as lasso regularization37. Lasso regularization returns more sparse models than least-squares regression which always returns dense solutions. When coefficients are set to zero in lasso regularization, their corresponding basis functions play no role in energy prediction. Lasso regularization has been used to study Ag-Pt, model protein folding in the zinc-finger motif55, and construct models for Cu-Pt, Ag-Pt, and Ag-Pd via reweighted Bayesian compressive sensing56. We find that applying lasso regularization to our system results in higher average training and testing errors (see Figure S1 for the pure lasso case). Other approaches to select clusters in fitting CE include using genetic algorithms57,58,59 or the steepest descent algorithm to add or remove clusters one at a time as a function of cross-validation score60.

In this section we introduce another regularization approach, sparse group lasso (SGL)39, which builds on lasso regularization by applying structured sparsity: starting from the usual penalized lasso regression framework with an n by p covariate matrix X (made of p–1 correlation functions and the Ewald energy, for n structures) and a response vector with centered energies E’, SGL further breaks down X into sub-matrices X(B), where each sub-matrix has dimension n by pB where pB is the size of a member B in B. (Remember that pB is the number of contracted multi-indices labeling a geometric cluster, which, symmetrized and evaluated over the random structure on this CE lattice, produces member B.) pB is effectively a weighted penalization. The ECI Jβ are chosen such that they minimize the objective function to solve the convex optimization problem:

$$min_{J_\beta }\left( {\frac{1}{{2n}}\left\| {E^\prime - \mathop {\sum }\limits_B {{{\boldsymbol{X}}}}^{\left( B \right)}J_\beta ^{\left( B \right)}} \right\|_2^2 \,+ \,\lambda \alpha \left\| {J_\beta } \right\|_1 \,+ \,\left( {1 - \lambda } \right)\alpha \mathop {\sum }\limits_B \sqrt {p_B} \left\| {J_\beta ^{\left( B \right)}} \right\|_2} \right)$$
(6)

The penalty parameter α > 0 bounds both the l1 norm of all ECI Jβ and the l2 norm of the vector of ECI that are within each orbit B, \(J_\beta ^{\left( B \right)}\). λ[0, 1] is a mixing parameter. In the limiting cases when λ = 1, the objective function becomes that of lasso; when λ = 0, the objective function becomes that of group lasso38, an approach which enforces orbit-wise sparsity. Intermediate values of λ enforce sparsity in \(J_\beta ^{\left( B \right)}\). The mixing parameter λ is set to 0.5 and α is 0.056 in this study, and details of the hyperparameter optimization are given in Methods and Figure S1.

Our approach to enforce structured sparsity by applying SGL is fundamentally different than enforcing structured sparsity via hierarchical cluster selection rules by applying group lasso, the approach used by Leong and Tan to study the ternary Mo-V-Nb alloy61. In their work, cluster functions are selected only after their sub-clusters are also selected. Here, we do not employ such hierarchical constraints. Instead, structured sparsity is obtained by grouping ECI by their corresponding orbit B, obtaining orbit-wise sparse solutions when entire groups of ECI are set to zero.

Furthermore, within an orbit B, sparsity in \(J_\beta ^{\left( B \right)}\) is attained. This is a necessary approach to handle under-determined sub-matrices X(B), which is common when including larger geometric clusters such as quadruplets. Consider the tetrahedral site with basis functions \(\{ \gamma _{\alpha _{j\prime },3}\}\) face-sharing with three of its nearest-neighbor octahedral sites each with basis functions \(\{ \gamma _{\alpha _j,5}\}\). The cluster α labelled with these basis functions are shown in Fig. 3a. In total, there are 80 contracted multi-indices, which can be obtained after taking the Cartesian product of the single site basis functions, removing all labels where αj = 0 or \(\alpha_{j^{\prime}}= 0\), and applying translation symmetry. During fitting of \(J_\beta ^{\left( B \right)}\), in order for the submatrix corresponding to orbit B to be full rank, we need to train with at least 80 unique, symmetrized decorations so that nunique = pB = 80. We call the submatrix for our training data X(B) and the submatrix of the fully random set of structures on this CE lattice \({{{\boldsymbol{X}}}}_{random}^{\left( B \right)}\). The fully random set of structures is the set that contains every possible lattice configuration in a supercell with size up to the largest cutoff (7 Å).

Fig. 3: Rank deficiency in high-component ionic systems.
figure 3

a Illustration of the cluster α with single site basis functions labelled by color. Since the zeroth label is independent of occupation and is always 1, the colors are the same (orange). However, since the rest of the basis functions (\(\gamma _{\alpha _j > 0,n_i}\left( {\sigma _i} \right)\)) are dependent on occupation, they are differently colored. b Rank of all orbits B for the fully random set of structures and the (c) rank deficiency of all orbits B for the set of structures in this study. The size of the cluster generating orbit B is indicated for points (1), pairs (2), triplets (3), quadruplets (4), and quintuplets (5).

However, the rank in our DFT input set of this quadruplet is only 61 so X(B) is clearly rank-deficient. The complexity in part comes from the ability of Li+ and Mn2+ to occupy the central tetrahedral site when the neighboring octahedral sites host Mn2+, Mn3+, or Mn4+. Such configurations exist in theory, but in reality, tetrahedral Li+ or Mn2+ will only occur if its three nearest-neighbor octahedral sites are also vacant or hosting Li+, since the octahedral site and tetrahedral site are very close together (around 1.8 Å apart) and experience strong electrostatic repulsion when both are occupied. Between these two closely situated cation sites there is little charge shielding. Thus, the configuration space of this quadruplet cluster spans a larger space than physically sampleable, so rank deficiency, defined as \(rank({{{\boldsymbol{X}}}}_{random}^{\left( B \right)}) - rank\left( {{{{\boldsymbol{X}}}}^{\left( B \right)}} \right)\), is observed.

This symptom of under-sampling is evident in submatrices X(B) for other members B. Figure 3b shows the rank of the submatrices for each member in B for the fully random set of structures on this CE lattice, \({{{\boldsymbol{X}}}}_{random}^{\left( B \right)}\), and Fig. 3c shows the rank deficiency observed in the physical set of structures used in this study. There is clearly rank deficiency across triplets, quadruplets, and quintuplets ranging from five for triplets, to almost 30 for quadruplets. To avoid overfitting in all cases, sparsity of the ECI within an orbit can be enforced by using sparse group lasso. The lack of information on the energetics of these configurations is not a problem as long as their energies as represented by lower order clusters are high enough so that they are never sampled in MC simulations with the CE. Figure 4 shows examples of clusters with face-sharing cations, where occupancy of the 8b or 48f site results in face-sharing with octahedral sites. When Li+ or Mn2+ occupy the 48f site which face-shares with two occupied octahedral sites (Fig. 4a, c), the defect energies predicted by the CE are +0.052 eV/spinel formula unit and +0.067 eV/spinel formula unit, respectively. Thus, the even more cation-rich clusters in Fig. 4b, d, where Li+ or Mn2+ face-shares with four octahedral sites, are unlikely to be sampled during MC, since their CE-predicted energies are even higher.

Fig. 4: Examples of high-energy cation configurations predicted by the CE per spinel formula unit (f.u.), LiMn2O4, where the scenarios are Li+ or Mn2+ insertions onto vacant sites in spinel (48f or 8b).
figure 4

The defect energies are calculated as either \(E^{final} - E^{initial} - \mu _{tet}^{Li^ + }\) for Li+ insertion and \(E^{final} - E^{initial} - \mu _{tet}^{Mn2 + }\) for Mn2+ insertion. The chemical potentials are calculated, starting from the spinel structure, as: \(E\left( {Li_8Mn_{16}O_{32}} \right) - E\left( {Li_7Mn_{16}O_{32}} \right) = \mu _{Li^ + }\) and \(E\left( {Li_7Mn_1^{2 + }Mn_{16}O_{32}} \right) - E\left( {Li_7Mn_{16}O_{32}} \right) = \mu _{Mn^{2 + }}\). a The Li+-occupied 48f site face-shares with two Mn, resulting in a +0.052 eV/spinel f.u. increase in energy. b Adding Li+ to a more metal-rich cluster, the 8b site, results in an even higher increase in energy: +0.071 eV/spinel f.u. c The Mn2+-occupied 48f site, face-sharing with two Mn, has a +0.067 eV/ spinel f.u. increase while the (d) Mn2+-occupied 8b site increases by +1.12 eV/ spinel f.u.

Results: applying sparse group lasso

The testing and training error depend on the number of training examples (known as the learning curve) and model complexity (known as the capacity curve)62, and both evaluations are shown in Fig. 5. The learning curve, which compares training data size against a loss (root mean squared error (RMSE) per primitive cell in our case), is a widely used metric to assess model convergence. During this process which is shown in Fig. 5a, we conduct 50 cross-validation trials, setting aside 80% of the total sample size for training and testing on the remaining 20%. Since the learning curve converges in training and testing RMSE, SGL is neither under-fit nor over-fit, and the validation dataset is representative63. The mark of an under-fit model is that training and cross-validation RMSE continue to decrease with increasing examples, indicating the model fitting was halted prematurely. On the other hand, over-fit models diverge in training and cross-validation RMSE because the model has been over-fit to the training samples.

Fig. 5: Loss as a function of sample size and model complexity.
figure 5

The green (red) colors indicate the average and standard deviation of the loss for training (testing) in 50 cross-validation trials, setting aside 80% of the 428 structures for training and 20% for testing. a The learning curve for SGL with a loss function of root mean squared error (RMSE) per primitive cell as a function of sample size. The chosen hyperparameters are α = 0.056 and λ = 0.5. b The RMSE as a function of model complexity, starting from including only the first orbit in a pair cluster and ending with including all geometric clusters up to quintuplets. Each individual model always uses the Ewald energy and all features in orbits up to the orbit number indicated. The number of significant features selected for each model is in yellow, showing how the number of ECI increases to over 150 in the last model.

The learning curve in Fig. 5a shows the typical behavior of testing and training64, where with few training samples the model has enough free parameters to completely model the training set, so the training error is small. This sampling is not indicative of the test set so the out-of-sample error is high. With increasing training set size, the test error decreases. As we further increase the training set size, a testing RMSE of around 70 meV/primitive cell and training error of around 60 meV/primitive cell are reached. Figure S3 applies early theoretical work by Cortes et al.65 in the convergence of learning curves, to project that the asymptotic RMSE convergence for PDS is 70 meV/primitive cell, indicating that more structure sampling is not expected to reduce the error in Fig. 5a. This convergence is not a property of the group lasso, but the best achievable performance among “all” models62,65.

In CE, it is generally known and demonstrated through examples that better predictability can be achieved if more interactions are considered66. We apply this concept in the capacity curve in Fig. 5b, carrying out 50 cross-validation trials, again setting aside 80% of the 428 structures for training and 20% for testing. We build increasingly complex models by including more orbits, and always fitting with the Ewald energy. By adding more orbits in B, we find that both training and testing RMSE converge to 60 meV/primitive cell and 70 meV/primitive cell, respectively, approaching the limiting performance or asymptotic performance of the data62.

In fact, this convergence is almost achieved using solely orbits from pairs and triplets. The lack of continuously decreasing RMSE indicates that even with information from quadruplets or quintuplets, the predictability does not improve, suggesting that in capturing the configurational energetics of Li-Mn-O-F the most critical information is contained in pairs and triplets. Figure S2 shows the convex hull of the CE which reproduces most of the ground states predicted by DFT. However, the CE also stabilizes additional ground state configurations along the MnO-MnO2 tie line (Figure S2c). The depth of the hull in the CE and DFT phase diagram are similar (−0.3 eV/atom).

Figure 5 shows conclusively that in training a model for this high-component system, the out-of-sample average RMSE converges to around 70 meV/primitive cell or 35 meV/atom, with on average 176 selected features. This error is higher than errors reported for CE fits in multicomponent rocksalt systems: less than 8 meV/atom in ternary disorder for Li–Mn–Zr–O and Li–Mn–Ti–O16, 18 meV/oxygen in ternary Li–Ni–Vac–O67, and 21 meV/atom in ternary-binary disorder for rocksalt Li+–Vac–Cr3+–O2-–F18. However, as we will discuss, this error may be reasonable given the dimensionality of PDS system, as lower-dimensional fits to subspaces of this dataset provide comparable RMSE to those in the literature.

Discussion: high prediction error in high-component ionic systems

We demonstrated that charge constraints on point-orbits limit their rank and showed that rank deficiency in higher-order orbits can be handled by applying SGL. However, even with increasing model complexity as shown in Fig. 5b, we are unable to converge to lower RMSE than around 35 meV/atom for out-of-sample RMSE. In fact, Fig. 5b shows that including orbits after pairs and triplets offer little improvement in RMSE.

Here we show data indicating that this higher RMSE may be reasonable for high-component systems due to their dimensionality.

Table 2 illustrates the RMSE for increasing chemical complexity in ternary, quaternary, ternary-binary, quaternary-binary, quinary-binary, binary-quinary-binary, and ternary-quinary–binary systems, subspaces of our cluster expansion model. We fit each system using pairs up to 7 Å and triplets up to 5 Å because Fig. 5b suggests predictability is mostly achieved with these interactions in the Li–Mn–O–F space. We again perform 50 cross-validation trials, training SGL on 80% of the number of structures available and testing on the remaining 20% of structures. The average number of features, average RMSE, and total number of applicable structures (of the 428) are shown for each composition space modeled. Where possible, we juxtapose with reported RMSE in the literature in parentheses.

Table 2 The performance of various composition models using only pairs up to 7 Å and triplets up to 5 Å and the Ewald energy.

Even when using 126 structures, for the ternary Li+–Mn3+–Mn4+–O system, we achieve a respectable RMSE of 13 meV/atom, compared to values of 8 meV/atom for Li–Mn–Ti–O16 and 18 meV/atom for Li–Ni–Vac–O67. Including another level of complexity, vacancies on the octahedral site, results in a similar level of error of 13 meV/atom. The error increases to 25 meV/atom for a ternary-binary system, which is similar to 22 meV/atom18 for Li–Cr–O–F and 24 meV/atom for Li–V–O–F68. For quaternary-binary and quinary-binary disorder, the RMSE are 28 meV/atom and 27 meV/atom. Lastly, we observe the RMSE of 35 meV/atom with the inclusion of binary or ternary disorder on the third sublattice.

Table 2 shows that adding sublattices in ionic CE always increases the RMSE, but the level of error in lower-dimensional systems is still comparable to those reported in literature. This finding is remarkable as it shows that ternary and ternary-binary ionic systems do not require multiple hundreds of DFT data and that pair and triplet interactions are reasonably sufficient starting CE models, provided that they utilize the approaches described here.

As this is the first high-component ionic CE that uses three sublattices with 10 species, the high RMSE may be reasonable in that the same approach is able to represent lower-dimensional systems well. The analysis suggests that high-component systems may be limited to higher RMSE compared to those in lower component systems. New compressed sensing approaches, such as one which employs coherency and redundancy to utilize the compressibility of configurational energy, may be promising alternative routes to increase predictability69.

Conclusions

We have described practical and theoretical advances in high-component ionic CE models. Automated charge assignments and modified structure mapping procedures enable more complex data to be included during fitting. We show that electroneutrality constraints decrease the rank of charge-constrained orbits, and rank deficiency in orbits can be handled by using sparse group lasso regularization. This lack of information is not a problem as long as the energetics of high-energy configurations are represented in lower order clusters so that they are never sampled during Monte Carlo simulations. We discuss that the new approaches predicting higher RMSE in this work still predict lower RMSE consistent with those in literature dealing with lower-dimensional systems, and suggest that, considering practical limitations, the high RMSE may be unavoidable for high-component ionic CE. In summary, the approaches outlined in this work provide critical guidance for meticulous understandings of other high-dimensional ionic systems not just limited to the FCC anion lattice.

Methods: first-principles data generation

We use Density Functional Theory (DFT) with the semi-local SCAN meta-generalized gradient density functional approximation for the exchange-correlation correction. Previous studies found SCAN70 to be most suitable for ground state structure prediction in ionic systems71 due to its ability to capture medium-range Van der Waals interactions72. In addition, internal coordinate relaxations are closer to experimentally reported values for SCAN than those observed in PBE and PBE+U73. These reasons make the DFT-SCAN approximation a rational choice for parametrizing the effective cluster interactions in an ionic system, despite its higher computational cost.

For our system 775 DFT-SCAN structures are calculated using the Vienna Ab Initio Simulation Package (VASP)74,75, using the projector augmented wave (PAW) method76,77, with reciprocal space discretization of 25 k-points per Å−1 and a plane wave energy cutoff of 520 eV. Calculations use the VASP-recommended pseudopotentials (Li_sv, Mn_pv, O, and F) are converged to 10−6 eV in total energy and 0.01 eV/Å on atomic forces. The initial set of structures were generated by scraping an internal database for structures within the Li-Mn-O-F composition space containing fewer than 50 atoms to limit computational cost, ionic substitution for Mn4+ onto spinel-like Li-Ti-O structures from another work78, the Inorganic Crystal Structure Database for defect spinels, and Monte Carlo CE searches for ionic configurations with low Ewald energy. The typical iterative approach to refine structures79,80 was completed using CE-Monte Carlo, concluding the search for new structures when the cross-validation (CV) score is 65 meV/primitive cell, equivalently 33 meV/atom, assuming a rocksalt composition. The smallest and largest cell sizes sampled are 1, corresponding to Li2O, and 64, corresponding to Li32Mn32O64.

Methods: hyperparameter optimization in lasso, sparse group lasso, and group lasso

Algorithm overview

Deviating from the algorithm described in39 which cyclically iterates through all groups, we iterate through each member in B only once, from the first member B1 to the last member.

  1. (1)

    (Outer loop) For the current group Bi, execute step 2.

  2. (2)

    Check if the coefficients are identically 0 by seeing if they obey the sub-gradient equations in ref. 39. If not, apply step 3.

    1. (3)

      (Inner loop) Solve for the coefficients \(J_\beta ^{\left( B \right)}\) using Elastic Net regularization, choosing a random coefficient to be updated every iteration.

We test the hyperparameter α and degree of mixing λ for the three approaches (Lasso, Group Lasso, and Sparse Group Lasso), and show the results in Figure S1. We sample α from 25 evenly spaced intervals on the log scale from 10−1.5 to 10−0.5 and λ for 0 (pure lasso), 0.25, 0.50, 0.75, and 1.0 (pure group lasso). The root-mean squared errors, setting aside 80% for the training and 20% for the testing, and number of features are averaged over 50 cross-validation trials. We find that the sparsest solutions and lowest error result from even-mixing of lasso and group lasso (λ = 0.5) and α = 0.056.