Machine-learning guided discovery of a new thermoelectric material

Thermoelectric technologies are becoming indispensable in the quest for a sustainable future. Recently, an emerging phenomenon, the spin-driven thermoelectric effect (STE), has garnered much attention as a promising path towards low cost and versatile thermoelectric technology with easily scalable manufacturing. However, progress in development of STE devices is hindered by the lack of understanding of the fundamental physics and materials properties responsible for the effect. In such nascent scientific field, data-driven approaches relying on statistics and machine learning, instead of more traditional modeling methods, can exhibit their full potential. Here, we use machine learning modeling to establish the key physical parameters controlling STE. Guided by the models, we have carried out actual material synthesis which led to the identification of a novel STE material with a thermopower an order of magnitude larger than that of the current generation of STE devices.

particular, machine learning informed models are utilized to search for new materials, including potential magnets 22 , ferroelectrics 23 and superconductors 24 . Although material synthesis guided by machine learning has been relatively rare so far 25 , it is very likely going to become a commonplace in the future.
Utilizing these methods, we have developed a systematic approach to uncovering the major materials variables governing the SSE. We combine machine learning modeling with high-throughput experimentation, and use modeling results for designing combinatorial libraries [26][27][28][29][30] . We have successfully leveraged the machine-learning-informed knowledge of the dependence of SSE on materials parameters to arrive at a novel and high-performance STE material utilizing ANE, which converts a heat current into an electrical current via the spin-orbit interaction in a single ferromagnetic material. Out of a number of proposed materials systems, a composition spread of one ternary system has led to the identification of Fe 0.665 Pt 0.27 Sm 0.065 , which exhibits thermopower as large as 11.12 μV/K.

Results
Material data collection. First, we performed experiments to collect STE data for various materials. Figure 1a shows the general configuration of one of the STE devices using the spin-Seebeck effect (SSE). It is composed of a paramagnetic conductive layer, a magnetic layer, and a single crystal substrate. We adopted a bilayer consisting of platinum (Pt) and rare-earth-substituted yttrium iron garnet (R 1 Y 2 Fe 5 O 12 , referred to as R:YIG), where R stands for a rare-earth element. When a temperature difference ΔT and a magnetic field H are applied along the z and the x direction, respectively, one can detect the thermopower S STE along the y direction. Therefore, this device converts electric energy into thermal energy. The details of the STE device are shown in the Supplementary Information 1.
The substitution in R:YIG with different rare-earth elements changes the physical properties of the material, and allows us to study the impact this has on the STE phenomena. Figure 1b shows the thermopower measured for Pt/R:YIG samples, fabricated on Gd 3 Ga 5 O 12 (GGG) and Gd 2.675 Ca 0.325 Ga 4.025 Mg 0.325 Zr 0.65 O 12 (SGGG) substrates and with various rare-earth elements R -La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu (Pm, which is radioactive, was excluded from the study). It is clear that S STE strongly depends on the choice of the R element. Differences in measured S STE can be dramatic; for example, the response of Pt/Yb:YIG on GGG is about three times as large as that of Pt/YIG on GGG.
The striking dependence of S STE on the choice of R suggests that the physical parameters of R strongly influence STE. To expose and quantify this connection we employed machine learning. As a first step, we considered different descriptors that can encode the properties of different rare-earth elements, such as atomic weight n R , spin and orbital angular momenta S R and L R , lattice mismatch Δa between R:YIG and the substrate, number of unfilled orbitals, elemental melting temperatures, magnetic moments, and ground state volumes, etc. It is, however, difficult to experimentally isolate and extract the S STE dependence on a given physical parameter of R. To delineate the relation between the atomic weight n R and S STE , for instance, it would be necessary to measure the S STE for different n R while keeping all other predictors fixed, which is experimentally not feasible. In order to uncover the subtle correlations and the physical origin of the STE hidden in the initial experimental results, we first calculated Pearson correlation coefficient (PCC) [shown in Fig. 1(c)]. There appears to be roughly linear relationship between Δa, n R , S R , L R and S STE . However, as for the S STE , the absolute values of PCC are less than 0.5, with makes the results inconclusive. In order to extract more reliable information, we carried out further investigation by using supervised machine learning.
Machine learning modeling. We employed four types of supervised machine learning models: Decision Tree Regression (DTR), Elastic Net (EN), Quadratic Polynomial LASSO (QP-LASSO), and Neural Network (NN) 31 . The DTR and EN models are constrained to only simple dependences, but they are straightforward to apply and can be used to extract dominant descriptors. In contrast, the NN model is very flexible and can reproduce a highly non-linear dependency, but it is much more difficult to interpret and prone to over-fitting. The QP-LASSO is in between NN and EN in terms of complexity and interpretability. In order to reduce the risk of over-fitting the available experimental data, we fix the number of descriptors to four, namely Δa, n R , S R , and L R . In the Methods section, we discuss how these parameters were chosen out of a large set (#) of possible descriptors.
We start by constructing a DTR model, which predicts a target variable -in this case S STE -by learning a series of simple decision rules based on the descriptors (Δa, n R , S R , and L R ). Figure 2a shows a visualization of the DTR model. Simple decision rules based on inequalities sort the data points quite accurately. Note that in the figure numbers in white and percentages denote the average S STE values and the proportion of data points for each group, respectively. By observing the DTR model we can directly infer the relation between S STE and the descriptors. Smaller S R , smaller Δa and larger n R lead to large S STE in the DTR model. However, we were not able to obtain a relationship between S STE and L R through this model.
As a next step, we constructed a generalized linear model, EN, which is a combination of Ridge and LASSO regressions. This method assumes linear relationship between S STE and the descriptors: Although linearity can be an unnecessarily strong assumption, it helps minimize over-fitting. Figure 2b shows the values of β 1 , β 2 , β 3 and β 4 obtained as a result of the regression fit. We can directly interpret the relationship between descriptors and S STE . The n R and L R have positive correlation with respect to S STE, while the Δa and S R have negative correlation with S STE . However, DTR and especially EN models are very constrained and preclude proper modeling of more complicated dependencies. Therefore, to verify the patterns found by the EN, non-linear regression analysis using quadratic polynomial LASSO (QP-LASSO) model is applied next. We expand the linear model in equation 1 into a quadratic one: (Note that interaction terms between the extrinsic (Δa) and the intrinsic (n R , S R , L R ) factors with respect to the R element were deliberately not included.) QP-LASSO performs descriptor selection by adding L 1 regularization term. This term tends to suppress the coefficients (β 0 , …., β 11 ), and as a result only the ones in front of the most significant descriptors remain finite. Here, the QP-LASSO automatically selected four important descriptors: Δa, n R 2 , S R 2 and n R L R . The values of these coefficients -β 1 , β 6 , β 7 and β 10 -are shown in Fig. 2c. The n R and n R L R terms are positively correlated with S STE , while the coefficients in front of Δa and S R 2 are negative. This agrees with the conclusions of the EN models.
The fourth algorithm we use is the NN -by far the most flexible of the four machine learning models we have employed. The flexibility comes at the price of significant risk of over-fitting, as well as difficulties interpreting the results. Figure 2d shows a visualization of the NN modeling result. By investigating the complex connections between the nodes (balls) carefully, we can find that the S STE increases with increasing of n R and L R , while the S STE increases with decreasing of Δa and S R . The details are shown in the Supplementary Information 2. Figure 2e shows the accuracy of DTR, EN, QP-LASSO, and NN models. The horizontal and vertical axes are the values of S STE measured in the experiments and those predicted by the machine learning models, respectively. We see that the NN model has better accuracy than the DTR, EN and QP-LASSO models, due to its much higher complexity. On the other hand, although the accuracy of the DTR, EN and the QP-LASSO models is not as high, interpreting their implications is much more straightforward. Despite these differences, all four machine learning algorithms converge on a similar picture in which S STE is positively correlated with n R and L R, while negatively correlated with Δa and S R .
The correlations between S STE and Δa, n R , and S R can be explained based on the conventional understanding of the STE phenomena. However, the positive correlation between L R and S STE uncovered by the machine learning models appears to be beyond our current knowledge of STE. (The details of the physical interpretation underlying these relations are discussed in Supplementary Information 3). The surprising connection between S STE and L R , discovered by the machine learning models here, can lead to a more comprehensive understanding of the mechanism of STE.
Development of a superior ste material. We now demonstrate that the unanticipated results of the machine learning modelling can indeed help us develop improved STE materials. We use the positive correlation between L R and S STE to search for other STE materials relying on anomalous Nernst effect (ANE). The SSE and the ANE are distinct but similar spin-driven thermoelectric (STE) phenomena, both originating in the spin-orbit interaction. Thus, it is reasonable to conjecture that tuning the orbital angular momentum L R can enhance not only the SSE but also the ANE. In conventional STE materials utilizing ANE, FePt alloy has exhibited the largest thermopower so far 32 . Therefore, we expect that adding rare-earth elements R with L R into Fe-Pt alloy will increase the thermopower. As an initial example, we investigate the thermopower of Fe-Pt-Sm ternary alloy, where we have selected Sm as one of the R elements with large L R . Figure 3a shows the general configuration of the STE device using ANE. When a temperature difference ΔT and a magnetic field H are applied along the z and the x direction, respectively, one can detect the thermopower S STE along the y direction, just like the case of SSE shown in Fig. 1a. In order to optimize the composition within the ternary alloy, various Fe-Pt-Sm alloy with different composition were investigated. The details of this device are provided in the Supplementary Information 4. Figure 3b shows the measured S STE values for the Fe-Pt-Sm as a function of composition. The largest S STE was detected near composition Fe 0.7 Pt 0.3 Sm 0.05 . To further investigate the region outlined by the green rectangle in greater detail and to confirm the contribution of L R (Sm), we investigated the thermopower of (Fe 0.7 Pt 0.3 ) 1−x M x alloy with small amount of different M atoms (M = Sm, Gd, Cu and W). Of these elements only Sm has finite orbital angular moment , while the other have L R = 0. Figure 3c shows the S STE of the (Fe 0.7 Pt 0.3 ) 1−x M x . It is clear that a small amount of Sm is necessary to maximize S STE . With Gd, Cu and W, there is no S STE enhancement, thus confirming the crucial role of large L R .

Discussion
To benchmark the S STE of the Fe-Pt-Sm material, we compare it with those of other STE materials with ANE 32 . According to Ikhlas et al., S STE of most ferromagnetic materials are on the order of 0.1 μV/K. In comparison, the largest S STE obtained here (11.12 μV/K of Fe 0.665 Pt 0.27 Sm 0.065 in Fig. 3c) is at least one order of magnitude larger than those of other known ANE materials. It is interesting to note that while some of the highest S STE values of previously known materials consist of Fe and Pt at 50/50 concentration (Pt/Fe multilayers and L1 0 FePt with S STE ≤ 1 μV/K), our investigation shows that the optimum Fe-Pt occurs at around Fe 0.7 Pt 0.3 (with S ANE of ≈7 μV/K) which is then further enhanced by Sm substitution as discussed above. Thermoelectric generators enable the reuse of ubiquitous and wasted heat energy, and are becoming indispensable components in energy harvesting systems. Heat flow sensors, on another hand, can be used in smart thermal management systems for mapping the thermal energy flow. In order to make STE devices practical for either of these applications, significant improvement of their thermopower is necessary. The identification of a novel material with dramatically enhanced thermopower reported here demonstrates that such improvements are possible, and STE technologies present a viable way towards a more energy-efficient future.
In summary, we have demonstrated the utility of machine learning both in exploring the fundamental physics of the STE phenomena and in optimizing the materials harnessing these effects. Using a data-driven approach has allowed us to construct unbiased statistical models for STE, which led us to a materials design rule, not rooted in the conventional theory of STE. Combining it with experimentation we have discovered an STE material with S STE an order of magnitude larger than that of any previously known ANE material. Thus, machine learning was the key to an important step in turning STE into a practical and affordable technology.

Methods
Fabrication of STE devices using SSE (Pt/R:YIG/GGG or SGGG). The fabrication method for the STE devices using SSE followed two steps. First, R:YIG layer was formed on the substrate (GGG or SGGG, 500 μm thickness) by means of the metal-organic-decomposition (MOD) method 33 . The MOD solution includes R, Y and Fe carboxylate, dissolved in organic solvents. Its chemical composition is R:Y:Fe = 1:2:5. The MOD solution was spin-coated on substrate at 1000 r.p.m. for 30 second, and then dried at 150 °C for 5 minutes. After pre-annealed at 450 °C for 5 minutes, it was annealed at 700 °C for 14 hours in air, to form a crystallized R:YIG layer. Its thickness was estimated to be 60 nm from the interference thickness meter. After completion of the R:YIG layer, a 10-nm-thickness Pt layer was deposited on the R:YIG layer by sputtering. For the measurement, the devices was cut into small chips, the length and width of which were 8.0 mm and 2.0 mm respectively. Fabrication of STE device using ANE (Fe-Pt-Sm/SiO 2 /Si). The STE devices using ANE were fabricated as follows. The Fe-Pt-Sm film with composition gradient was deposited on 3 inch SiO 2 /Si wafer by combinatorial sputtering at room temperature. The thickness of Fe-Pt-Sm, SiO 2 and Si layer are150 nm, 0.5 μm and 381 μm, respectively. For the measurement, it was cut into small chips, with length and width identical to those cut from the Pt/R:YIG film.

Measurement for the ste thermopower S STE .
A temperature difference ΔT directed along the z direction as shown in Figs 1a and 3a was applied between the top and the bottom of the devices, by sandwiching them between copper heat bath at 300 K and 300 + ΔT K. The magnetic field H was applied along the x direction. Under these conditions, the STE thermopower can be detected along the y direction. The distance between voltage-detection terminals were set to 6 mm. selecting descriptors and hyper-parameters for the machine learning models. One major issue in developing machine learning models is avoiding overfitting. As a general rule, when the amount of available data is small, the number of descriptors should be constrained. For example, Seko et al. 34 employed machine learning to predict melting temperature, with the number of data points (compounds) and descriptors (predictors) 248 and 10, respectively. In our case there are only 112 data points (see Fig. 2e) and therefore it is advisable to use an even smaller number of descriptors. We have considered a number of descriptors, covering different properties of the rare-earth elements. These include atomic weight, spin and orbital angular momenta, number of unfilled orbitals, melting temperatures, magnetic moments, volumes and space groups (the last four are calculated for the elemental ground state). Magpie software was used to generate some of these 35 . For data pre-processing, we calculated PCC in order to detect multicollinearity. Almost all of the descriptors calculated by the Magpie software have high PCC value with respect to either Δa, n R , S R or L R . They are easy to interpret and connect to STE www.nature.com/scientificreports www.nature.com/scientificreports/ phenomenology, and at the same time models build with only these have accuracy comparable to that of models utilizing the full list of descriptors. In the future, we hope to increase the size of the experimental data and thus be able to include more descriptors in the model.
The hyper-parameters of the models are decided with the help of Leave-Out-One Cross-Validation (LOOCV) -a widely used model validation technique. In this scheme one data point is retained as validation data for testing the model, while the rest of the dataset is used as training data. The hyper parameters of the model are determined by minimizing the error indicator such as root mean square error (RMSE) or mean square error (MSE) on the test point. In this paper, we used RMSE as the error indicator. The LOOCV was carried out by "caret" package in R programming language.
Decision tree regression (DTR). The Decision Tree Regression is a non-parametric machine learning model based on a series of simple decision rules, which combine flexibility with interpretability. The only model hyperparameter -complex parameter (cp) -was set to 3.90625 × 10 −3 by LOOCV, with cross validation error of 8.560104 × 10 −2 . The DTR was carried out by "rpart" package in R programming language.
Elastic Net (EN). The Elastic Net is a generalized linear model, combination of Ridge and Lasso regressions.
The mixing ratio of the Ridge and the Lasso (Ridge: Lasso) was set to 1: 0 based on the LOOCV. Therefore, in our case the EN model was equivalent to a Ridge regression. The LOOCV also decided the magnitude of generalization (λ: 3.90625 × 10 −3 ), and the cross validation RMSE was 8.798218 × 10 −2 . The EN was carried out by "glmnet" package in R programming language.

Quadratic polynomial least absolute shrinkage and selection operator (QP-LASSO). The
LASSO is a regression analysis method that performs both variable selection and regularization. The QP-LASSO selects among quadratic, linear and constant terms. In this case QP-LASSO selected four valuables, including Δa, n R 2 , S R 2 and n R L R , from equation (2). The LOOCV-determined magnitude of generalization is (λ: 7.55559 × 10 −3 ), and the cross validation RMSE is 8.547411 × 10 −2 . The QP-LASSO was carried out by "glmnet" package in R programming language.
Neural Network (NN). The NN method models the data by means of a statistical learning algorithm mimicking the brain. Here we have utilized simple 3-layer perceptron NN, with the number of input units, hidden units and output units being 4, 8 and 1, respectively. The hidden units and the output unit simulate the activation of a neuron by applying the hyperbolic tangent and the sigmoid functions, respectively. Mathematically, the NN models the non-linear function S STE (Δa, n R , S R , L R ) by performing the following calculation. The x 1 , x 2 , x 3 , x 4 , h and σ are Δa, n R , S R , L R , hyperbolic tangent function and sigmoid function, respectively. The weights and the bias parameters W j 2 and W ji 2 are determined by minimizing the cost function with the backpropagation algorithm. A decay value was set to 1.220703 × 10 −4 . The hyper parameters, such as the number of hidden units and the value of the decay were decided by LOOCV, and cross validation RMSE of 5.516461 × 10 −2 was achieved. For NN analysis, we used "nnet" package in R programming language.

Data Availability
The data and the code that support the results within this paper and other findings of this study are available from the corresponding author upon reasonable request.