Introduction

Machine learning (ML) is increasingly used in the field of materials informatics as an effective tool for discovering quantitative structure— or composition—property relationships that can accelerate materials design1,2,3,4,5. However, the black-box model of ML is often criticized not able to provide new “physical laws”, which limits its potential in certain cases6,7. Symbolic regression (SR) is an approach of interpretable machine learning that simultaneously searches for the optimal mathematical formula of a function and set of parameters in the function1,8. Therefore, SR is capable to deliver interpretable mathematical formulas that may provide direct guidance for materials design. Despite the great potential, the application in the field of material science is still limited.

In this communication, we demonstrate that SR can construct a simple descriptor that enables the acceleration of the materials discovery for oxide perovskite catalysts. Oxide perovskites (ABO3) are an important family of catalysts for OER applications9,10, which are in high demand for renewable energy production and storage, such as hydrogen production from water-splitting11 and rechargeable metal-air batteries12, because of their structural flexibility, compositional versatility, and chemical stability13. Moreover, oxide perovskites have recently been extended to the bifunctional application of OER and oxygen reduction reaction14,15. The catalysis activities of oxide perovskite catalysts can be described by descriptors, as demonstrated by various studies over the past sixty years. Several descriptors, such as the reaction free energy16,17 and eg occupancy9,18, have been successfully used to understand the trend of OER activity and achieved great success in this regard. Nevertheless, those descriptors require prior knowledge based on density functional theory (DFT) calculations therefore bear limited applicability to design new materials, where DFT-calculated values are unknown a priori and highly dependent on the used methodologies19. Meanwhile, it is difficult for DFT calculation to accurately determine eg occupancy where the surface spin state is not well known20. A good descriptor should be simple and yet provide physical insight21, which will guide and accelerate the discovery of new perovskite oxide OER catalysts. In this work, we propose that SR is perfectly suitable for identifying suitable descriptor to accelerating the discovery of new perovskite catalysts.

Figure 1 shows the workflow diagram of this study. SR analysis may not require massive datasets, if the datasets used are consistent and reliable1,22. Therefore, we firstly synthesise 18 well-studied oxide perovskite catalysts to produce consistent and comparable datasets of OER activity for SR analysis. A descriptor with the balance of simplicity and accuracy is then chosen and help develop strategies to accelerate the discovery of new oxide perovskites. The generality of the descriptor is confirmed by analysing data reported independently by other research groups. Based on this descriptor, materials screening is conducted to search for new oxide perovskite catalysts with improved OER activities. To validate the predictions, a few numbers of new oxide perovskites with potentially high OER activity are synthesised and their OER activities are characterised and compared with their predicted values and those of current state-of-the-art oxide perovskite catalysts.

Fig. 1: Workflow diagram.
figure 1

It contains four major parts: dataset generation (blue), SR (red), materials design and screening (green) and experimental verification (brown).

Results

Data acquisition

Comparable training data used in SR analysis are of crucial importance for SR in order to produce useful mathematical formulas23. Since the first discovery of oxide perovskite LaNiO3 as OER catalyst in 1970s24, the chemical management of A- or B-site cations has been used to tune the OER activity, permitted by the structural and chemical flexibility of perovskite structures. The results reported by different groups and produced under different experimental conditions over a period of half a century are summarised in a recent review article13. However, the comparability of those data is doubtful due to different environments of experiments and measurements. To ensure meaningful and valuable SR analysis, we synthesised eighteen known oxide perovskite catalysts (Supplementary Fig. 1; Supplementary Table 1). Four samples were made for each perovskite and OER measurement was conducted three times under the same conditions with freshly made catalyst inks. Four each measurement, the VRHE values at five current densities of 50 µA cm−2, 5 mA cm−2, 10 mA cm−2, 15 mA cm−2 and 20 mA cm−2 in linear sweep voltammetry (LSV) curve were adopted for SR analysis. Therefore, there are totally 18 perovskites × 4 samples × 3 measurements × 5 current densities = 1080 data points (Fig. 2a). The values were then normalised by the catalyst loading concentration and Brunauer–Emmet–Teller (BET) surface area (Supplementary Table 2) and shown in Fig. 2a and Supplementary Data 15. Details of the materials synthesis, along with the structural and OER characterisation, can be found in the “Methods”. Seven of these oxide perovskite catalysts were also reported by Suntvich et al.9 and the results from both groups showed the same trend in VRHE values (Supplementary Fig. 2), although the absolute values are slightly different.

Fig. 2: Data collection and process.
figure 2

a The landscape of all VRHE data produced by experiments, including eighteen conventional and five new perovskites (totally twenty-three perovskites listed as ‘Materials index’ with sequence shown in Table 1). Each perovskite has been made four samples and each sample has been measured three times (totally twelve measurements listed as ‘No. of measurements’). For each measurement, we adopted VRHE values at five current densities of 50 µA cm−2, 5 mA cm−2, 10 mA cm−2, 15 mA cm−2, and 20 mA cm−2. The exact values of those data points are provided in Supplementary Data 15. b The flowchart of symbolic regression based on genetic programming (see more details of this flowchart and SR in Supplementary Information).

SR training

With the available experimental data shown in Fig. 2a, SR was then adopted to construct mathematical formula linking the materials parameters and VRHE. To ensure that the SR analysis determines mathematical formulas that are useful for our purpose, it is critical to select relevant parameters to be included in the mathematical formulas based on prior knowledge1. Considering the importance of previous descriptors3,9,12,13,25,26, we chose electronic parameters such as the number of d electrons for TM ions (Nd), electronegativity values χA and χB, and valence states QA, as well as structural parameters such as ionic radii RA, the tolerance factor t, and the octahedral factor μ, where A and B refer to the A- and B-site cations, respectively (Table 1). The tolerance factor t, defined as \(\frac{{{r_{\mathrm{A}}} + {r_{\mathrm{O}}}}}{{\sqrt 2 ({r_{\mathrm{B}}} + {r_{\mathrm{O}}})}}\) and octahedral factor μ, defined as rB/rO, are commonly used features in ML studies of perovskites23,27,28.

Table 1 Key materials parameters of 23 selected oxide perovskites.

The mathematical formulas were then generated and selected by using SR with genetic programming (GPSR) as implemented in gplearn code29. The flowchart of GPSR process in this work is described in Fig. 2b. In this work, SR initially builds a population of random mathematical formulas with these parameters as variables. Then, these mathematical formulas breed, mutate, and evolve to form new ones via genetic programming. The derived mathematical formulas compete to model experimental data by evaluating the mean absolute errors (MAEs) between the predicted and experimental VRHE. A grid search of hyper-parameters resulted in ~8640 mathematical formulas (descriptors), which were characterised by their MAE’s and complexities, as described in Fig. 3a. The hyperparameters setup can be found in Method part and extended information about GPSR can be found in Supporting Information.

Fig. 3: Descriptor generation and performance.
figure 3

a Pareto front of MAE vs. complexity of 8640 mathematical formulas shown via density plot. b VRHE vs. μ/t (black diamonds: conventional perovskites; red dots: new perovskites). The current densities are normalised by BET surface areas (Supplementary Table 2) and loading amount. c Figure 2 from the study of Suntiviich et al.9 reproduced with permission from the American Association for the Advancement of Science. d Reformatted plot according to descriptor μ/t of c. The MAE (Pearson correlation coefficient) for c, d were 20.6 meV (0.923) and 21.0 meV (0.928), respectively.(The error bar in b is produced by the maximum and minimum values in experiments data).

Descriptor generation and analysis

Of the produced descriptors, only those with low MAE (high accuracy) and low complexity are suitable for guiding the discovery of new oxide perovskite catalysts. The nine mathematical formulas at the Pareto front [marked as A–I in Fig. 3a] that met the criteria of simplicity and accuracy among the 43,200,000 candidates are shown in Table 2. Among them, μ/t is the best compromise between complexity and accuracy. To clearly show the correlation, the VRHE at current densities of 5 mA cm−2 are shown in terms of μ/t in Fig. 3b. For each perovskite, the average values and error bars are the experimental uncertainties from 12 measurement data (4 samples with each 3 measurements). Interestingly, it shows a linear and monotonic behaviour instead of prevalent volcano shape for conventional descriptors. Such linear correlations remain at other current densities, i.e. 50 µA cm−2, 10 mA cm−2, 15 mA cm−2 and 20 mA cm−2 as shown in Supplementary Fig. 3. To further verify the generality of this descriptor, we used μ/t to fit the experimental work9 originally reporting the volcano shape for descriptor eg (Fig. 3c). As shown in Fig. 3d, μ/t provided a clear linear and monotonic correlation with VRHE, with MAE comparable to the volcano shape for descriptor eg. Apart from the seminal work of ref. 9, the generality of μ/t can be also confirmed by recent works30,31,32 as their data reorganized in Supplementary Fig. 4. For experimental data spanning over sixty years from different groups (Table 6 of ref. 13), their VRHE values are reorganized according to their μ/t values; despite some discrepancies, a roughly linear correlation was observed for the majority of the data points (Supplementary Fig. 5). Such good correlation reveals that the SR-derived descriptors, e.g., μ/t indeed provide meaningful insights for OER activity of oxide perovskites.

Table 2 The nine mathematical formulas at the Pareto front in Fig. 3a.

The descriptor μ/t reveals that the OER activity of oxide perovskite catalysts is closely related to the structural factors of the catalysts; i.e. a smaller μ and a larger t should lead to higher OER activity. Such a simple descriptor is superior to conventional descriptors since it does not require additional DFT calculations and can be directly used for materials design. Accordingly, we used a rational strategy to accelerate the screening process: adopting large cations on the A site (increasing t) and small cations on the B site (decreasing μ). Previously, the commonly used A-site cations in oxide perovskite catalysts are group IIA (Ca, Sr, Ba) and group IIIB (La, Ce, Pr) elements13. Based on the insight of the new descriptor developed here, we considered incorporating large group-IA elements (K, Rb, Cs) onto the A site to increase t. Among the TM ions that can form perovskite oxides, 3d TM ions have the smallest ionic radii, which is consistent with the fact that all existing active oxide perovskite catalysts contain Mn, Fe, Co, and Ni cations (the smallest among the 3d TM ions) on the B site. 4d/5d TM oxide perovskites are catalytically less active, despite having similar d electron configurations. Therefore, we considered that the A site contains up to two ions from (K1+, Rb1+, Cs1+, Ca2+, Sr2+, Ba2+, La3+, Ce3+, Pr3+) and the B site contains up to eight ions from (Mn3+, Mn4+, Fe3+, Fe4+, Co3+, Co4+, Ni3+, Ni4+) with variation in an increment of 0.25 for the A and B ionic ratio. Note that the actual stoichiometric ratios depend on the synthesis conditions and the formability of the target perovskites. Subject to the requirement of charge balance, 3,545 oxide perovskites were obtained and their μ/t values were calculated. These oxide perovskites are listed in Supplementary Data 6 in order of increasing μ/t value. There are many new oxide perovskites with μ/t values smaller than those of materials reported in the literature, revealing a new and large group of previously unexplored OER catalysts.

Screening, synthesis and characterisation of new oxide perovskite catalysts

The formability and stabilities of 3545 oxide perovskites have not been verified. Therefore, we selected thirteen new oxide perovskites in the smallest μ/t values (the topmost region in Supplementary Data 6) with an increment of ~0.015 in μ/t values to consider sufficient elemental and compositional diversity for experimental verification. These thirteen perovskite oxides are: Ba0.75 Sr0.25NiO3, Cs0.4La0.6Mn0.25Co0.75O3, SrNi0.75Co0.25O3, Cs0.3La0.7NiO3, Cs0.25La0.75Mn0.5Ni0.5O3, Cs0.5La0.5Mn0.5Ni0.5O3, Sr0.25La0.75Mn0.5Fe0.5O3, Ba0.75Pr0.25Ni0.5Fe0.5O3, Cs0.6La0.4Mn0.75Co0.25O3, Cs0.5La0.5MnO3, Cs0.5La0.5Mn0.25Co0.75O3, Cs0.5La0.5Mn0.5Co0.5O3, and Cs0.25Pr0.75Mn0.25Fe0.25Co0.25Ni0.25O3. The synthesis method is described in detail in the Methods section. We found that eight of them contained significant amounts of impurity or secondary phases, as indicated by the asterisks in the powder X-ray diffraction (PXRD) patterns (Supplementary Fig. 6). For example, Cs0.5La0.5Mn0.5Ni0.5O3, Cs0.6La0.4Mn0.75Co0.25O3, Cs0.5La0.5MnO3, Cs0.5La0.5Mn0.25Co0.75O3, and Cs0.5La0.5Mn0.5Co0.5O3 showed an impurity phase of MnO4+δ (main diffraction peaks at 12° and 24°). Ba0.75Pr0.25Ni0.5Fe0.5O3 contained Pr2O3 and NiO impurity phases. Five compounds including Cs0.4La0.6Mn0.25Co0.75O3, Cs0.3La0.7NiO3, Cs0.25La0.75Mn0.5Ni0.5O3, Sr0.25Ba0.75NiO3, and SrNi0.75Co0.25O3, formed pure perovskite phases, as by confirmed PXRD (Supplementary Fig. 6). The OER activities of these five new pure oxide perovskites were then characterised (Fig. 4a–c). Cs0.4La0.6Mn0.25Co0.75O3, Cs0.3La0.7NiO3, SrNi0.75Co0.25O3, and Sr0.25Ba0.75NiO3 showed lower VRHE values (higher OER activity) than BSCF did. The specific activities are also compared with the state-of-the-art perovskite oxide catalysts20. We found that our materials are among the oxide perovskite catalysts with the highest specific activities10 (Supplementary Fig. 7). Remarkably, the experimental VRHE values of these new oxide perovskite catalysts follow the same trend of SR-derived descriptor, μ/t, as shown in Fig. 3b. To further verify the descriptor, the SR procedure is repeated with the inclusion of five new predicted perovskites. Most of derived mathematical formulas that had been residing near the Pareto front (Fig. 3a), including μ/t, remain (Supplementary Fig. 8 and Supplementary Table 3); this persistence shows that the addition of more training examples does not generate a significant alteration in the model’s response, indicating that the model remained predictive with these new perovskites. It is worth noting that we have selected a very limited number of compositions for experimental synthesis and characterisation because of limited resources. It is highly anticipated that more of these predicted oxide perovskite catalysts with high OER activities can be experimentally synthesised and their OER activities will be verified.

The stability of the four new oxide perovskite catalysts with OER activities higher than previously reported oxide perovskite catalysts were tested galvanostatically at 10 mA·cm−2 disk current (Fig. 4d). We selected a higher disk current density for stability testing to verify the activity decay under strong polarisation conditions. Cs0.4La0.6Mn0.25Co0.75O3, Cs0.3La0.7NiO3, SrNi0.75Co0.25O3, and Sr0.25Ba0.75NiO3 showed lower activity degradation than BSCF. In particular, the Sr0.25Ba0.75NiO3 electrode maintained a stable VRHE over 12 h of stability testing without significant decay. Under the same conditions, the BSCF sample showed a much faster degradation rate, with only 90% retention after 9 h. After OER durability tests, the Sr0.25Ba0.75NiO3 electrode maintained its original morphology. Scanning transmission electron microscopy (STEM) and high-resolution transmission electron microscopy images revealed no significant surface amorphization. The surfaces of the Sr0.25Ba0.75NiO3 particles maintained good crystallinity after stability tests, as confirmed by clear observation of the same lattice spacings (Fig. 5) and elemental analysis (Supplementary Table 4). Recent work has shown that increasing the valence states of 3d-TMs such as Ni and Co from 2+/3+ to 3+/4+ can boost the OER activities of LaCoO3 and LaNiO333. Interestingly, apart from increasing t, Cs1+ substitution on the A site is a viable route to enhance the valence states of TM B-site ions in oxide perovskites. This correlates with the SR-derived descriptor, μ/t, since increasing the valence states inevitably reduces the ionic radii of TMs, which in turn reduces the μ value, and, therefore, reduces μ/t. Meanwhile, recent theoretical reports predicted that SrNiO3 should have high OER activity34. Unfortunately, the hexagonal close packing of Sr and O atoms prevents the formation of the perovskite structure. To mitigate this issue, La was proposed to partially substitute Sr. However, partial La substitution leads to the formation of a Ruddlesden–Popper crystal structure instead of perovskite structures31. Interestingly, the descriptor μ/t suggests that partial substitution of Sr using larger Ba atoms can enhance catalytic activity. Our experiments showed that Ba0.75Sr0.25NiO3 can be synthesised with the perovskite structure and its OER activity is even higher than BSCF (Fig. 5), demonstrating the usefulness of the SR-derived descriptor.

Fig. 4: OER characterisations of Ba0.5Sr0.5Co0.8Fe0.2O3 and predicted new oxide perovskites.
figure 4

a LSV curves. b Corresponding Tafel slopes. c Mass and specific activities. d Results of stability tests under galvanostatic conditions at 10 mA cm−2 disk current density.

Fig. 5: Morphology measurements of Ba0.75Sr0.25NiO3 before and after OER testing.
figure 5

a HRTEM before a stability test. b HRTEM after a stability test. Right side: STEM atomic mapping (scale bar: 500 nm). The labelled lattice spacing is around 0.3 nm, which corresponded to the (110) lattice planes of Ba0.75Sr0.25NiO3, in good agreement with the PXRD measurements. The insets of a, b show the fast Fourier transform image of the corresponding HRTEM image. The well-regulated arrayed spots indicated that the grown crystal had high crystallinity. HRTEM of Ba0.75Sr0.25NiO3 before and after OER testing clearly showed the same lattice spacing and very similar fast Fourier transform images, suggesting outstanding stability of the Ba0.75Sr0.25NiO3 sample under OER conditions. The maintenance of good crystallinity indicates that Ba0.75Sr0.25NiO3 is a stable OER electrocatalyst. In order to verify the atomic distribution, STEM mapping was conducted; the even distribution of the atoms over the analyzed area further demonstrates the excellent stability of the sample.

Discussion

Those results show that even with a small dataset, the SR analysis could provide simple and meaningful descriptors that enabled us to discover new oxide OER catalysts with improved activities, which is consistent with successful application of small data in materials design by adaptive ML4,5. The descriptor of μ/t implies that the catalytic activity of oxide perovskites is closely related to their structural stability, i.e. a lower stability leads to a high activity. Feature analysis in SR process shows that μ, t, and QA correlate with the catalytic activity more than RA, Nd, χA, and χB (Supplementary Fig. 9). Considering the t and μ are functions of rA and rB, we also trained SR model based on the parameters of rA, rB, Nd, χA, χB, QA without t, μ. The results are shown in Supplementary Fig. 13 and Supplementary Table 5. However, the MAE of descriptors at the same complexity on Pareto front are mostly larger than the descriptors discovered based on μ, t, rA, Nd, χA, χB, QA. The oxide perovskites showing improved OER activity had t > 1 (Table 1 and also Table 6 in ref. 13), which were considered unstable perovskites35. However, we found that these perovskites could be synthesised under suitable conditions. Notably,, we exhaustively searched Inorganic Crystal Structure Database(ICSD) and found that the existing oxide perovskites mostly have t < 0.95 and µ > 0.55 (Supplementary Fig. 14). However, oxide perovskites reported to be catalyst in the last forty years lie in a small confined range (t > 0.95 and µ < 0.55). According to the descriptor of μ/t, most of oxide perovskites are less catalytically active, which seems consistent with existing experimental results that oxide perovskite catalysts are limited in a few types of perovskites10. More in-depth understanding of correlation among μ/t, catalysis activity and structural stability is out of scope of current research but deserves further study.

In summary, we used SR to identify a simple descriptor for describing the OER activity of oxide perovskite catalysts. This simple descriptor quantitatively predicted the OER activity of oxide perovskites and enabled us to rapidly discover a series of new oxide perovskite catalysts with improved OER activities. For proof of concept, we successfully synthesised five oxide perovskites and four of them exhibited OER activities surpassing those of existing oxide perovskite catalysts reported in the literature. We anticipate that more of the predicted new oxide perovskite catalysts can be synthesised and their OER activities verified. Our results demonstrate that SR is a powerful ML technique to discover physically meaningful descriptors when sufficient comparable data is available. This work suggests a new direction for discovering functional materials with improved activities.

Methods

Symbolic regression

Symbolic regression analysis using a genetic algorithm was performed using gplearn29, a Python library that extends scikit-learn, a machine learning tool, for symbolic regression. The hyper-parameters setup for gplearn is listed in Table 3. The explanation of each hyper-parameter in Table 3 are following:

Table 3 The setup of hyper-parameters in gplearn for GPSR.

The meanings of genetic operations of pc, ps, ph, and pp above can be found in Supplementary Fig. 10. The grid search method was used for pc, ps, and parsimony coefficient. As shown in the Table 3, there are 18 pc values from 0.5 to 0.95 with step of 0.025, 8 ps values and 3 parsimony coefficients. Therefore, a grid search contains 18 × 8 × 3 = 432 hyper-parameters. More information about SR can be found in the Supplementary Information.

Experimental synthesis of oxide perovskites

The oxide perovskites were synthesised using a modified Pechini method following by thermal calcination at 850–1000 °C under dry air/oxygen atmospheres. Briefly, the acetate or nitrate precursors of the perovskite oxides (4 mmol) were mixed in methanol/H2O (10 mL, 2:1 v:v), and citric acid (10 mmol) was added to obtain a clear sol. The mixture was dried at 120 °C and the remaining solid was calcinated at 500 °C for 1 h in air. Then, the obtained powder was ground into fine powder and pressed into pellets with a diameter of 15 mm using a hydraulic press at 20 MPa. Finally, the pellets were calcinated at 850–1000 °C for 6 h under dry air/oxygen atmospheres.

Crystal structure characterisation

The structure and phase of the synthesised materials were examined by PXRD (Ultima III, Rigaku, Japan) and Raman spectroscopy (Bruker FT Raman Spectrometer with a laser wavelength of 532 nm). The morphology of the films was characterised using transmission electron microscopy (TEM; JEOL 3011, Japan), scanning transmission electron microscopy (STEM; Hitachi HD-2300A, Japan), and high-resolution TEM (HRTEM; Hitachi HD-3010A, Japan). Elemental compositions were determined using energy-dispersive X-ray spectroscopy (EDS; Oxford Instruments, UK) and inductively coupled plasma mass spectrometry (ICP-MS; Thermo Scientific XSeries 2 ICPMS, USA). The catalyst surface area was determined using Brunauer–Emmet–Teller (BET) analysis, using a BELSORP-mini II (BEL. Japan Inc.) under a flow of N2 gas.

OER characterisation

OER characterisation was performed on a glassy carbon rotating disk electrode. First, 2 mg of catalyst was dissolved in 2 mL ethanol and 100 μL Nafion solution was added. Then, the mixture was sonicated for 30 min to form a homogenous mixture. Subsequently, 90 μL of the slurry was loaded onto the surface of a glassy carbon electrode (GCE; 0.196 cm2) and the electrode was dried at room temperature. The electrolyte was purified to remove trace Fe using Ni(OH)2 powder. The OER measurements were performed using a Voltalab PGZ-301 potentiostat/galvanostat (Radiometer Analytical, France), with a Pt foil and a Ag/AgCl electrode used as the counter and reference electrodes, respectively. The loading amount of the catalysts was 0.168 mg cm−2. All potentials were plotted versus the reversible hydrogen electrode (RHE) as E(RHE) = E(Ag/AgCl) + 0.197 + 0.0591 × pH. All linear sweep voltammetry measurements were performed at a scan rate of 5 mV s−1. All OER measurements were iR-compensated (98%). Each measurement was conducted three times under the same conditions. The error bars denote variations observed from sample synthesis and OER measurements. The stability test was performed using the controlled current electrolysis method. PXRD measurements verified that all the obtained materials had the perovskite structure.

To evaluate the intrinsic activities, the current densities were normalised by the loading amount and the BET surface areas in order to exclude the increase in current as a result of high loading content and higher surface area. Normalisation was performed according to the expression: i (mA cm−2 oxide current) = i (mA cm−2 disk current) ÷ (loading amount (g cm−2) × BET surface area (cm2 g−1)). Here, i (mA cm−2 oxide current) was denoted as the normalised specific activity, while i (mA g−1 oxide current) = i (mA cm−2 disk current) ÷ (loading amount (g cm−2)) refers to the mass activity.