Coevolutionary search for optimal materials in the space of all possible compounds

Over the past decade, evolutionary algorithms, data mining, and other methods showed great success in solving the main problem of theoretical crystallography: finding the stable structure for a given chemical composition. Here, we develop a method that addresses the central problem of computational materials science: the prediction of material(s), among all possible combinations of all elements, that possess the best combination of target properties. This nonempirical method combines our new coevolutionary approach with the carefully restructured “Mendelevian” chemical space, energy filtering, and Pareto optimization to ensure that the predicted materials have optimal properties and a high chance to be synthesizable. The first calculations, presented here, illustrate the power of this approach. In particular, we find that diamond (and its polytypes, including lonsdaleite) are the hardest possible materials and that bcc-Fe has the highest zero-temperature magnetization among all possible compounds.


INTRODUCTION
Finding materials with optimal properties (the highest hardness, the lowest dielectric permittivity, etc.) or a combination of properties (e.g., the highest hardness and fracture toughness) is the central problem of materials science. Until recently, only experimental materials discovery was possible, with all limitations and expense of the trial-and-error approach, but the ongoing revolution in theoretical/computational materials science (see 1,2 ) begins to change the situation. Using quantum-mechanical calculations, it is now routine to predict many properties when the crystal structure is known. In 2003, Curtarolo demonstrated the data mining method for materials discovery 3 by screening crystal structure databases (which can include known or hypothetical structures) via ab initio calculations. At the same time, major progress in fully nonempirical crystal structure prediction took place. Metadynamics 4 and evolutionary algorithms [5][6][7] have convinced the community that crystal structures are predictable. Despite the success of these and other methods, a major problem remains unsolved: the prediction of a material with optimal properties among all possible compounds. Totally, 4950 binary systems, 161,700 ternary systems, 3,921,225 quaternary systems, and an exponentially growing number of highercomplexity systems can be created from 100 best-studied elements in the Periodic Table. In each system, a very large number of compounds and, technically, an infinite number of crystal structures can be constructed computationally, and an exhaustive screening of such a search space is impractical. Only about 72% of binary, 16% of ternary, 0.6% of quaternary, and less than 0.05% of more complex systems have ever been studied experimentally 8 , and even in those systems that have been studied, new compounds are being discovered continually [9][10][11] . Studying all these systems, one by one, using global optimization methods is unrealistic. Data mining is a more practical approach, but the statistics shows that the existing databases are significantly incomplete even for binary systems, and much more so for ternary and more complex systems. Besides, data mining cannot find fundamentally new crystal structures. When searching for materials optimal in more than one property, these limitations of both approaches become even greater. We present a new method implemented in our code, MendS (Mendelevian Search), and show its application to the discovery of (super)hard and magnetic materials.

Mendelevian space
Global optimization methods are effective only when applied to property landscapes that have an overall organization, e.g., a landscape with a small number of funnels, where all or most of the good solutions (e.g., low-energy structures) are clustered. Discovering materials with optimal properties, i.e., performing a complex global optimization in the chemical and structural space, requires a rational organization of the chemical space that puts compounds with similar properties close to each other. If this space is created by ordering the elements by their atomic numbers, we observe a periodic patchy pattern (Fig. 1a), unsuitable for global optimization.
In 1984, Pettifor suggested a new quantity, the so-called "chemical scale," that arranges the elements in a sequence such that similar elements are placed near each other, and compounds of these elements also display similar properties 12 . This way, structure maps 13 with well-defined regions of similar crystal structures or properties can be drawn. In a thus ordered chemical space, evolutionary algorithms should be extremely effective: they can zoom in on the promising regions at the expense of unpromising ones.
What is the nature of the chemical scale or the Mendeleev number (MN), which is an integer showing the position of an element in the sequence on the chemical scale? Pettifor derived these quantities empirically, while we redefined them using a more universal nonempirical way that clarifies their physical meaning (the method for computing MN is explained in the Supplementary Information). Goldschmidt's law of crystal chemistry states that the crystal structure is determined by stoichiometry, atomic size, polarizability, and electronegativity of atoms/ ions 14,15 , while polarizability and electronegativity are strongly correlated 16 . Villars et al. 17 introduced another enumeration of the elements, emphasizing the role of valence electrons, which he called "Periodic number" (PN). He also showed that atomic size and electronegativity can be derived from AN and PN 17 . In redefining the chemical scale and MN, we used the most important chemical properties of the atom-size R and electronegativity χ (Pauling electronegativity)-the combination of which can be used as a single parameter succinctly characterizing the chemistry of the element. However, we need to emphasize that the chemical scale and MN are only used in this method for visualizing the results (the choice of MN for plotting such a Pettifor map is up to the user), while in our global coevolutionary algorithm, each atom is represented by both its size R and electronegativity χ to increase the accuracy. In this work, the atomic radius R is defined as half the shortest interatomic distance in the relaxed (for most elements hypothetical) simple cubic structure of an element-see the Supplementary Table 1. Figure 2 shows the overall linear correlation between the MNs redefined in this work and those proposed by Pettifor. Carefully chosen MNs should lead to strong clustering in the chemical space, where neighboring systems have similar properties. The results of our searches for hard binary compounds using the PN, the MNs suggested by Pettifor and our redefined MNs are shown on Pettifor maps (Fig. 1b-d). Satisfyingly, our redefined MNs result in a better-organized chemical space with a clearer separation of regions containing binary systems with similar hardness. In fact, if our MNs (which are the sequences of projected elements on their regression line in the space of crudely correlated atomic radius and electronegativity) generate a good 2D map, with clear grouping of similar chemical systems (e.g., Na-Cl, K-Cl, Ca-Cl, Na-Br systems are located nearby), then a much better grouping is expected in the space of the initial two parameters R and χ, and it is in this space where variation operators of our method are defined (Fig. 3a, b). Also it worth mentioning that sizes and Noble gases were excluded because of their almost complete inability to form stable compounds at normal conditions. Rare earths and elements heavier than Pu were excluded because of the problems of the DFT calculations. In total, we consider 74 elements that can be combined into 2775 possible binary systems. Each pixel is a binary system, the color encodes the highest hardness in each system. electronegativities of the atoms change under pressure-and using standard definitions of the MN (such as AN, PN, or Pettifor's MN) will not work well. Our recipe, however, is universal and only requires atomic sizes and electronegativities at the pressure of interest. In this paper, we illustrate our method by binary systems, although more complex, at least ternary, systems are also tractable. In a nutshell (but see Methods section for details), our method performs evolution of a population of variablecomposition chemical systems (each of which is tackled by an evolutionary optimization) -i.e. is an evolution over evolutions. Individual chemical systems are allowed to evolve and improve, then are compared and ranked, and the fittest ones get a chance to produce new chemical systems (which will partially inherit structural and chemical information from their parents). Evolving the population of such chemical systems, one efficiently finds the globally optimal solution and numerous high-quality suboptimal solutions as well.
Search for hard and superhard binary systems Pareto optimization 18 of hardness and stability was performed over all possible structures (with up to 12 atoms in the primitive cell) and compositions limited to the binary compounds of 74 elements (i.e., all elements excluding the noble gases, rare earth elements, and elements heavier than Pu). In this work, 600 systems have been computed in 20 MendS generations from a total of 2775 unary and binary systems that can be made of 74 elements, i.e., only about one fifth of all possible systems were sampled. Figure 4 shows the efficiency of this method in finding optimal materials. In this fast calculation, numerous stable and metastable hard and superhard materials were detected in a single run. Carbon (diamond and other allotropes) and boron, known to be the only superhard elements, were both found. In addition, both new and numerous known hard and superhard binary systems, as  well as potentially hard systems, were found in the same calculation, among them B and Ru x B y 37-39 . We reported some of the results of our search in a separate paper on the Cr-B, Cr-C, and Cr-N systems 40 , and our study of the W-B system 41 was inspired by the present finding of promising properties in the Mo-B system (also published in 42 ). The list of all systems studied during the calculation is available in Supplementary Information.
Because of the huge compositional space (2775 systems, each with 10 2 possible compositions, each of which having a very large number of possible structures), it was necessary to shorten the time of calculations by reducing the number of generations and/ or population size. Therefore, the structures and compositions found may be approximate and may need to be refined for the most interesting systems by a precise evolutionary calculation for each system. The results are shown in Table 1. Of these, some transition metal borides are predicted to be hard, some already reported as hard materials (e.g., Mo x B y 43, 44 and Mn x B y 45 ) or discussed as potentially hard (e.g., Tc x B y 46 , Fe x B y 47 , and V x B y 48 ). Interestingly, a number of previously unknown hard structures more stable than those reported so far were predicted in these systems. Our calculations also revealed completely new hard systems, S x B y and B x P y , and, quite unexpectedly, the Mn x H y system was discovered to contain very hard phases (Table 1).
For the Mo x B y system, several simultaneously hard and lowenergy structures were detected in our calculations. Of these, only the stable R3m structure of MoB 2 was studied before, and the reported hardness for this structure (experimentally obtained 24.2 GPa 49 and theoretically found 33.1 GPa 44 ) is in close agreement with the value calculated in this work (28.5 GPa). MoB 3 and MoB 4 were studied widely before 43,44 , and a few low-energy and in some cases hard structures were reported for these systems (i.e., R3m-MoB 3 , 31.8 GPa; 43 P6 3 /mmc-MoB 3 , 37.3 GPa 44 ; and much softer P6 3 /mmc-MoB 4 , 8.2 GPa 44 ). In this work, new low-energy structures with high hardness were discovered in these systems (Table 1).
For the Mn x B y system, we propose several new compounds which are simultaneously hard and have low energy (Table 1). In the previous study 50 , P2 1 /c-MnB 4 was discussed to be stable and have a very high hardness (computed to be 40.1 GPa 50 , experimentally obtained 34.6-37.4 GPa 51 ), while C2/m-MnB 4 was claimed to be the second lowest-energy structure with the energy difference of 18 meV/atom. Our study confirms the stability of P2 1 / c-MnB 4 . However, we discovered another MnB 4 structure, with the Pnnm space group, the energy of which is lies between the energies of two aforementioned structures of MnB 4 ( Table 1). In this work, we found that the ferromagnetic phase of Pnnm-MnB 4 is more stable than the nonmagnetic one, and the hardness of 40.7 GPa was computed for this magnetic structure.
Because of the radioactivity of technetium, the Tc x B y system has not been studied experimentally, while computational studies of this system started recently 46,[52][53][54] . In 2015, P3m1-TcB was predicted to be energetically more favorable than the previously discussed Cmcm and WC-type structures 55 . The reported hardness for this structure, 30.3 GPa 55 , is very close to the value predicted in this study (31 GPa). Because of the prediction of other stable compounds (e.g., Tc 3 B 5 ) in our work, this structure became metastable (by 13 meV/atom). In this work, P6m2-TcB 3 with the computed hardness of 27.2 GPa was predicted as a stable structure at zero pressure. Other works 53,54 , conducted in parallel to ours, also detected this structure and claimed that it is synthesizable at pressures above 4 GPa 56 . Another low-energy (3 meV above the convex hull) hard structure (33.1 GPa) with the P3m1 space group for TcB 3 was also predicted in our study. P6m2-Tc 3 B 5 , a compound having a hardness of 30.6 GPa and stable at zero pressure, is predicted in our work for the first time. Several other simultaneously hard (in the range of 30 to 36 GPa) and lowenergy metastable phases of Tc x B y predicted in this work are shown in Table 1.
In recent years, many efforts were focused on searching for lowenergy phases of V x B y and studying their electrical and mechanical properties. As a result, several low-energy hard and superhard phases were predicted 48,55 . Nevertheless, the experimental data exist only for the well-known hexagonal VB 2 (AlB 2 -type) with the P6/mmm space group 57 . In addition to some previously studied structures 58 (e.g., Cmcm-VB, Immm-V 3 B 4 , and P6/mmm-VB 2 ), which were also found in our calculations, a few boron-rich phases possessing simultaneously low energy and very high hardness were discovered ( Table 1). The calculated hardness for these boron-rich phases is very close to or above 40 GPa (VB 7 : 39.7 GPa, VB 5 : 40 GPa, and VB 12 : 44.5 GPa). A new extremely hard P4m2-V 3 B 4 phase is predicted here, with the energy 6 meV lower than the previously proposed Immm structure. Most of the studies of the Fe x B y system were dedicated to the FeB 2 and FeB 4 phases 47,59,60 . Several works studying different Fe x B y compounds 61,62 reported Fe 2 B, FeB, and FeB 2 as stable phases. In this work, we detected another stable phase, FeB 3 , with the P2 1 /m space group and the hardness of 30.7 GPa. To the best of our knowledge, FeB 3 was never reported, neither theoretically nor experimentally. The orthorhombic Pnnm-FeB 4 , with the energy of 2 meV above the convex hull (Table 1), was synthesized at pressures above 8 GPa, and its hardness was reported to be 62 (5) GPa 59 , which encouraged many computational studies of this structure. However, none of them confirmed such a high value of hardness, while the Vickers hardness reported in several independent works varies in the range of 24-29 GPa 47,60,62,63 . We calculated its hardness to be 28.6 GPa.
In the B x P y system, the cubic boron phosphide BP with the zincblende structure is a well-known compound with the hardness reported to be roughly the same as that of SiC 64 . In our calculations, the hardness of SiC and BP was found to be 33 GPa For these phases we found that ferromagnetic solutions are more stable than nonmagnetic. Elastic constant were computed assuming these are ferromagnetic structures, the energy difference between the ferromagnetic and non-magnetic solutions for † and ‡ is 0.037 (eV/transition-metal) and 0.092 (eV/transition-metal) and magnetization is equal to 0.016 and 0.034 μ B Å −3 , respectively.
Z. Allahyari and A.R. Oganov and 37 GPa, respectively. Moreover, B 6 P was discovered as another stable compound in this system and predicted to be superhard, with the computed hardness exceeding 41 GPa. In the SiC system, in addition to the known diamond-type β-SiC, another similar structure (actually, a polytype of β-SiC) with the R3m space group and nearly the same hardness was found. The energy of this structure is just 1 meV/atom higher than that of β-SiC.
The Mn x H y system is unexpected in the list of hard systems, but several very hard phases were indeed found in it ( Table 1). All of these phases are nonmagnetic, highly symmetric, and energetically favorable (lying either on the convex hull or close to it), with the hardness of up to 30 GPa. In this system, two thermodynamically stable compounds (Mn 2 H and MnH) were predicted, with the space groups P3m1 and P6 3 /mmc, and computed hardness of 21.5 and 29.5 GPa, respectively (in Table 1, only structures with the hardness above 26 GPa are shown for this system).
Generally, B x S y system is not hard, but metastable boron sulfides turn out to be potentially hard. We found a low-energy metastable phase of this system, Cmcm-B 4 S 3 , with the hardness unexpectedly exceeding 30 GPa. This can stimulate future studies of this system.
For a better insight, some of the prominent structures seen in our simulations are shown in Fig. 5a. More details on all phases presented in Table 1 are given in Supplementary Information.
In our calculations, some boron hydrides were predicted to be superhard, but they had high energy and were not included in Table 1. However, it may be possible to stabilize these hard phases under pressure, or by chemical modification. Figure 5b shows the studied materials in the space "hardnessfracture toughness." Diamond, lonsdaleite and cubic BN possess the best properties, but are metastable at normal conditions. Among the stable phases, borides of transition metals (especially from groups VB, VIB, VIIB) stand out: we note VB 2 , V 3 B 4 , MoB 2 , CrB 4 , WB 5 , and MnB 4 in particular. These and related materials (see 65 ) present a high technological interest.
The fact that all known binary superhard systems were found in a short coevolutionary run demonstrates the power of the method, which is ready to be applied to the other types of materials.
Search for magnetic binary systems In addition to the Mendelevian search for stable/metastable hard and superhard materials, we performed another Mendelevian search for materials with maximum magnetization and stability to examine the power and efficiency of the method in fast and accurate determination of materials with target properties. We performed this calculation over all possible structures (with up to 12 atoms in the primitive cell) and compositions limited to the binary compounds of 74 elements (i.e., all elements excluding the noble gases, rare earth elements, and elements heavier than Pu). In this calculation, well-known ferromagnets iron, cobalt, nickel, and several magnetic materials made from the combination of these elements with other elements were detected before the sixth generation. Here, for each structure we performed spinpolarized calculations using the GGA-PBE functional 66 as implemented in the VASP code 67,68 . More details on structure relaxation and input parameters can be found in Supplementary Information. The chemical landscape of magnetization and evolution of its sampling in the Mendelevian search for magnetic materials are shown in Fig. 6d-f; this was formed after calculating 450 binary systems over 15 generations. In this plot, materials with high magnetization are clearly clustered together. Figure 6d, f shows how the (co)evolutionary optimization discovered all the promising regions at the expense of the unpromising ones. This calculation has found that among all substances, bcc-Fe has the highest magnetization at zero Kelvin.

DISCUSSION
We have developed a method for predicting materials having one or more optimal target properties. The method, called Mendelevian search (MendS), based on the suitably defined chemical space, powerful coevolutionary algorithm, and multi-objective Pareto optimization technique, was applied to searching for lowenergy hard and superhard materials. Note that due to the property of evolutionary and coevolutionary algorithms to enhance sampling of the most promising regions of the search space (where the optimal, as well as all or most of the high-quality suboptimal solutions are clustered together), each MendS search discovers a large number of materials with excellent properties at a low computational cost. Well-known superhard systemsdiamond, boron allotropes, and the B-N system-were found in a single calculation together with other notable hard systems (Si-C, B-C, Cr-N, W-C, metal borides, etc.). The Mn-H system was discovered to be unexpectedly hard, and several new hard and superhard phases were revealed in the previously studied systems (V-B, Tc-B, Mn-B, etc.). The method successfully found almost all known hard systems in a single run, and a comprehensive chemical map of hard materials was produced. A similar chemical map was plotted for magnetic materials; well-known magnetic systems such as Ni, Co, Fe were found within just a few generations. These examples show the power and efficiency of our method, which can be used to search for optimal materials with any combination of properties at arbitrary conditions. As the first step in prediction of novel materials possessing desired properties, the method to a large extent solves, in a fully nonempirical way, the central problem of computational materials science.

METHODS
The whole process can be described as a joint evolution (or coevolution) of evolutionary runs, each of which deals with an individual variable-composition system. Having defined the chemical space, we initialize the calculation by randomly selecting a small number of systems from the entire chemical space for the first MendS generation. These systems are then optimized by the evolutionary algorithm USPEX 5-7 in its variablecomposition mode 69 , searching for compounds and structures with optimal properties (e.g., here we simultaneously maximized hardness and stability), after which MendS jointly analyses the results from all these systems. Removing identical structures using the fingerprint method 70 , jointly evaluating all systems, refining and preparing the dataset, and discarding the structures that are unstable by more than 1.0 eV/atom, MendS ranks all systems of the current generation and selects the fittest (in present calculations, fittest 60% were selected) variable-composition systems as potential parents for new systems. Applying chemical variation operators, such as mutation and heredity, to these parent systems yields offspring systems for the next coevolutionary generation. In addition, some systems are generated randomly to preserve the chemical diversity of the population. This process is continued until the number of coevolutionary generations reaches the maximum predefined by the user (Fig. 3c). The underlying ab initio structure relaxations and energy calculations were performed using density functional theory with the projector augmented wave method (PAW) as implemented in the VASP code 67,68 . Further details on the input parameters of MendS, USPEX, and VASP are given in Supplementary Information.
Defining fitness: multi-objective (Pareto) optimization Many scientific and engineering problems involve optimization of multiple conflicting objectives, for example, predicting novel materials that improve upon all critical properties of the known ones. The multi-objective evolutionary algorithm (MOEA) enables searching simultaneously for materials with multiple optimal properties, such as the enthalpy, hardness, density, dielectric permittivity, magnetization, etc. Here we performed searches optimizing simultaneously (1) stability, measured as the distance Fig. 6 Sampling of the chemical space. Systems produced (a, d) randomly in the 1st generation, and using all variation operators in the (b, e) 5th and (c, f) 10th generations in searching for hard (a-c) and magnetic (d-f) materials. Randomly generated systems are shown as violet circles.
above the convex hull (chances of a compound to be synthesizable are higher if the compound is stable or low-energy metastable, i.e., is on the convex hull or close to it), and (2) hardness, computed using the Lyakhov-Oganov model 71 .
Hardness is a complicated property of materials which cannot be evaluated directly and rigorously from the crystal structure because it usually includes many nonlinear and mesoscopic effects. However, there are number of empirical models making it possible to estimate hardness from atomic-scale properties. The Chen-Niu empirical model 72 is based on the relation between the elastic moduli and hardness. Although this model is reliable, calculating the elastic constants of materials on a large scale is computationally expensive. A similar model based on the elastic moduli was recently proposed by Mazhnik and Oganov 73 and unlike Chen's model, does not overestimate the hardness value of materials with low or negative Poisson's ratio while for other materials gives similar results. The Lyakhov-Oganov model 71 , which computes the hardness from bond hardnesses, is more convenient for high-throughput searches: it is numerically stable, usually reliable, and can be used in calculations without significant cost, taking the crystal structure and chemical composition as input. For better understanding of the reliability of the mentioned models, a comparison of the computed values and experimental results for hardness of various materials is presented in ref. 65 .
The result of the multi-objective optimization is, in general, not a single material, but a set of materials with a trade-off between their properties, and these optimal materials form the so-called first Pareto front. Similarly, 2nd, 3rd, … nth Pareto fronts can be defined (Fig. 4). In our method, the Pareto rank 18 is used as a fitness.
Variation operators in the chemical space are of central importance for an efficient sampling of the chemical space using the previously sampled compositions and structures. These operators ensure that different populations not only compete, but also exchange information, i.e., learn from each other. An efficient algorithm could be constructed where the chemical space is defined by just one number for each element-the MN (or chemical scale); we use this for plotting the Pettifor maps, but within the algorithm itself, we resort to an even better option where each element is described by two numbers-electronegativity χ and atomic radius R, rescaled to be between 0 and 1-and it is this space where the variation operators act. There are three variation operators defined in the chemical space: chemical heredity, reactive heredity, and chemical mutation.
Chemical heredity replaces elements in parent systems with new elements such that their electronegativities and atomic radii lie in-between those of their parents (Fig. 3a). In doing so, we explore the regions of the chemical space between the parents AB þ CD ! XY; (1) where A, B, C, D, X, and Y are different elements, X is between A and C or A and D which is chosen randomly, and Y is between the other two elements (B and C or B and D). Reactive heredity creates offspring by taking combinations of the elements from parents. For example, if the parents are A-B and C-D, their child is one of the A-C, A-D, B-C, and B-D systems.
Chemical mutation randomly chooses one of the elements of a parent and substitutes it with an element in its vicinity in the space of χ and R (Fig.  3b).
In both chemical mutation and chemical heredity, all elements are assigned the probability P i ¼ e Àαxi P e Àαxi ; i ¼ 1; 2; :::; to be selected, where x is the distance of element i from the parent element (in the case of chemical heredity, this formula is used to give a higher weight to the fitter parent, shown by a dark green point in Fig. 3a), and α is a constant (α = 1.5 is used here). The result of applying these chemical variation operators is shown in Fig. 6: the promising regions of the chemical space are sampled more thoroughly at the expense of the unpromising regions. When a new system is produced from parent system (s), it inherits from them a set of optimal crystal structures which are added to the first generation, greatly enhancing the learning power of the method.
After finishing the coevolutionary simulation, we took the most promising systems identified in it and performed longer evolutionary runs for each of them, calculating the final hardness using the Chen-Niu model 72 , and fracture toughness-using the Niu-Niu-Oganov model 74 .

DATA AVAILABILITY
The raw data required to reproduce most of the findings are available to download from [https://data.mendeley.com/datasets/jbp7rs29cc/draft?a=adad25b3-f101-4cfe-864f-6979ab6800f7]. The raw data required to reproduce the results for the Mn-H system cannot be shared at this time because they form a part of an ongoing study.