Machine learning based prediction of lattice thermal conductivity for half-Heusler compounds using atomic information

Half-Heusler compound has drawn attention in a variety of fields as a candidate material for thermoelectric energy conversion and spintronics technology. When the half-Heusler compound is incorporated into the device, the control of high lattice thermal conductivity owing to high crystal symmetry is a challenge for the thermal manager of the device. The calculation for the prediction of lattice thermal conductivity is an important physical parameter for controlling the thermal management of the device. We examined whether lattice thermal conductivity prediction by machine learning was possible on the basis of only the atomic information of constituent elements for thermal conductivity calculated by the density functional theory in various half-Heusler compounds. Consequently, we constructed a machine learning model, which can predict the lattice thermal conductivity with high accuracy from the information of only atomic radius and atomic mass of each site in the half-Heusler type crystal structure. Applying our results, the lattice thermal conductivity for an unknown half-Heusler compound can be immediately predicted. In the future, low-cost and short-time development of new functional materials can be realized, leading to breakthroughs in the search of novel functional materials.

Half-Heusler compound has drawn attention in a variety of fields as a candidate material for thermoelectric energy conversion and spintronics technology. When the half-Heusler compound is incorporated into the device, the control of high lattice thermal conductivity owing to high crystal symmetry is a challenge for the thermal manager of the device. The calculation for the prediction of lattice thermal conductivity is an important physical parameter for controlling the thermal management of the device. We examined whether lattice thermal conductivity prediction by machine learning was possible on the basis of only the atomic information of constituent elements for thermal conductivity calculated by the density functional theory in various half-Heusler compounds. Consequently, we constructed a machine learning model, which can predict the lattice thermal conductivity with high accuracy from the information of only atomic radius and atomic mass of each site in the half-Heusler type crystal structure. Applying our results, the lattice thermal conductivity for an unknown half-Heusler compound can be immediately predicted. In the future, low-cost and shorttime development of new functional materials can be realized, leading to breakthroughs in the search of novel functional materials.
Thermal conductivity is a physical quantity representing how heat is transferred from one side of a material to the other when thermal energy is applied. It is one of the most fundamental and important physical quantities. Moreover, it is an important physical quantity in terms of application, which is necessary for the understanding of thermal management to ensure the performance, life-time, and safety for thermoelectric energy conversion devices, and spintronics technology.
Thermal conductivity can be divided into electron and lattice contributions. The thermal conductivity of electrons can be determined from the electrical conductivity using the Wiedemann-Franz law. Half-Heusler compounds have a high lattice thermal conductivity due to their high crystal symmetry. Therefore, it is necessary to know the exact lattice thermal conductivity for thermal management in half-Heusler devices. Theoretical predictions of the lattice thermal conductivity of solids can be made using non-equilibrium molecular dynamics simulations [1][2][3][4][5][6][7] or the density functional theory (DFT) calculations [8][9][10][11][12][13][14][15][16] . In recent years, with the advances in machine learning (ML) algorithms, several studies on understanding the heat transport properties of functional materials have been reported [17][18][19][20][21] . The prediction of thermal conductivity using non-equilibrium molecular dynamics requires an enormous amount of computational time because of the need to calculate the time evolution of the vast amount of atomic movements. On the other hand, because the DFT calculation can accurately calculate the interaction between atoms, the thermal conductivity can be predicted for several hundred times. Presently, theoretical studies of the lattice thermal conductivity are limited to systems with a small number of atoms in the unit cell, such as simple pure metals 8,11,16 , binary materials 9,[13][14][15][16]22 , and full and half-Heusler  [27][28][29][30][31][32][33][34][35] and spintronics technology [36][37][38][39][40] .This is because it has various electronic structures, such as semi-metals, semiconductors, and a topological insulator. Carrete et al. 41 and Liu et al. 42 attempted to predict the lattice thermal conductivity of half-Heusler compounds obtained from the results of the DFT calculations by ML. Carrete et al. reported that the lattice thermal conductivity of half-Heusler compounds can be predicted in the range of 10% using the Young's modulus value obtained from the DFT calculations as the descriptor for the ML. Liu et al. reported that the lattice thermal conductivity of half-Heusler compounds can be predicted in the range of 10% using the atomic numbers, atomic masses, and atomic radii of the constituent atoms of half-Heusler compounds as the descriptors for ML.
The prediction of the lattice thermal conductivity must be highly accurate to predict the performance of thermoelectric materials and thermal management of electronic devices. Therefore, we have developed a ML algorithm to predict the lattice parameter and lattice thermal conductivity of half-Heusler compounds from the atomic information of their constituent atoms (atomic radius and atomic mass) only. This algorithm can predict lattice thermal conductivity values with high accuracy of less than 4% for many half-Heusler compounds. In this study, we found that the lattice thermal conductivity of the half-Heusler compounds, which is difficult to predict using the DFT calculations, can be predicted with high accuracy by ML. This report will provide a breakthrough in the development of new materials, as it can contribute to the discovery of innovative materials for next-generation thermoelectric conversion and spintronics materials, and other technologies requiring thermal management in the future. Figure 1 shows a flowchart of the ML used to predict the lattice parameter and lattice thermal conductivity in half-Heusler compounds. The atomic radii, r 1 , r 2 , r 3 , and atomic masses, m 1 , m 2 , m 3 , of the elements at 4c, 4a, and 4b sites in the C1 b -type crystal structure were used as descriptors. The Python library Pymatgen, Python Materials Genomics 43 , was used to obtain the elemental information. The swapped data set of the elemental information of the 4a and 4b sites was also created because the 4a and 4b sites are interchangeable. First, to build a ML model to predict the lattice parameters, the atomic radii and masses were used as descriptors. Second, to build a ML model to predict the calculated lattice thermal conductivity, the parameters generated by various combinations of atomic masses, atomic radii, and the lattice parameters were used. The parameters of the combinations of atomic radii and atomic masses used for lattice thermal conductivity prediction are listed in Table S2 in the Supplementary materials section. Finally, to find the best combination of parameters, we used the Wrapper method with a backward feature elimination to sequentially remove unimportant parameters hence build an optimal ML model for predicting the lattice thermal conductivity. For lattice parameter and lattice thermal conductivity prediction, the multiple linear regression and boosted decision tree regression models were used as the ML models. The hyperparameters were adjusted by random sweeps to adjust the optimal hyperparameters. Python 3.6 was used for implementing the ML model. The training and test data were split between 80 and 20%. We used fivefold cross validation on the data set to evaluate the decision coefficients using the test data for each fold, and the mean of the decision coefficients, R 2 , was used to evaluate the ML model. www.nature.com/scientificreports/ Figure 2a-c show the results of ML using multiple linear regression of the lattice parameters for various half-Heusler compounds obtained from the structural optimization by DFT calculations. ML with multiple linear regression predicts the lattice parameter with a higher accuracy using the atomic masses and atomic radii than that using the atomic radii only. As described in Table S1, the stability of half-Heusler compounds is affected by the atomic radii of each site and the atomic masses. Therefore, besides the atomic radii, the atomic masses have an important influence on the lattice parameter determination. The prediction of the lattice parameter using multiple linear regression can be determined with an accuracy of approximately 5%, as shown in Fig. 2c. To determine the lattice parameter with a better accuracy, we performed ML by boosted decision tree regression as shown in Fig. 2d-f. Using atomic radius and mass as parameters, R 2 improved to 0.979. As shown in Fig. 2f, the ML model reproduced the lattice parameter almost perfectly with an accuracy of approximately ± 1%. The boosted decision tree regression using the atomic radii and masses of the atoms at the three sites of the C1 b -type structure as a description was found to be the most suitable for predicting the lattice parameter by ML. Figure 3 show the results of the predicted lattice thermal conductivity for various half-Heusler compounds using ML with multiple linear regression and boosted decision tree regression. As shown in Fig. 3a, the boosted decision tree regression model shows a higher coefficient of determination than the multiple linear regression model, and reproduces the thermal conductivity of the half-Heusler compounds well. The ML model, such as a simple multiple linear regression is not suitable for ML of the lattice thermal conductivities. This result suggests that the lattice thermal conductivity exhibits a high accuracy owing to the complex interaction of various descriptors. Figure 3b shows the feature importance scores for the 55 parameters in ML of boosted decision tree regression. Among the 55 parameters, the top 4 parameters make a significant contribution to the accuracy www.nature.com/scientificreports/ of ML. It is known that when many 55 parameters are used in ML, the prediction accuracy is reduced because of over-fitting. It is necessary to find the best combination of parameters to improve the prediction accuracy of the lattice thermal conductivity. Figure 3c shows the results of the evaluation by the Wrapper Method using backward feature elimination. The backward feature elimination calculates the permutation feature importance of each parameter and sequentially removes the unimportant parameter to find the optimal combination of parameters. The R 2 for the ML of thermal conductivity using the top four parameters is the highest R 2 of 0.84, which is an improvement over the R 2 when all the parameters are considered. The best parameter combination of the important features score for the prediction of the lattice thermal conductivity are in the following order:   Figure 4c shows the number of deviations between the predicted and calculated lattice thermal conductivities by the ML model of the boosted decision tree regression with 55 and 4 parameters. When 55 parameters were used, the lattice thermal  www.nature.com/scientificreports/ conductivity was overestimated, with a deviation of approximately 8%. Conversely, when 4 parameters were used to describe the lattice thermal conductivity, the accuracy improved to approximately ± 4%. Carrete et al. 41 and Liu et al. 42 have previously reported that the accuracy of the lattice thermal conductivity prediction in half-Heusler compounds predicted by ML is approximately ± 10%, as shown in Fig. 4d. In this study, our ML model using the lattice parameter as a descriptor and selecting an appropriate combination of atomic radii and atomic masses as a descriptor led to a significant improvement in the accuracy of the prediction of lattice thermal conductivity. Figure 5 plots the relationship between three parameters and thermal conductivity, which are important in determining the lattice thermal conductivity of half-Heusler compounds. For many compounds with large lattice parameters, the lattice thermal conductivity is below 6, which is a low lattice thermal conductivity material. The lattice thermal conductivity of a solid material is expressed by the equation κ ⁓ C v l, where C, v, and l represent the specific heat, sound velocity, and phonon mean free path, respectively. It is essential to reduce the thermal conductivity to improve the performance of thermoelectric conversion materials. For this purpose, C, v, and l should be reduced. However, it is difficult to change C and l significantly in the same crystal system. The lattice thermal conductivity is lower in a compound with a large sum of atomic mass, including the structure and small lattice parameter because the sound velocity is expressed as a function inversely proportional to the density. Therefore, for compounds with small lattice parameters, the decrease in thermal conductivity can be qualitatively explained by the decrease in sound velocity. In half-Heusler compounds with lattice parameters of 0.7 to 0.8, the thermal conductivity is below 4 for the systems BaNaSb, KBaSb, CaCdSn, and KSrSb with large differences between the average atomic mass and 4c sites in the compounds. Even in half-Heusler compounds with large lattice parameters, small lattice thermal conductivities can be achieved by selecting the heavy elements to occupy the 4c site.

Results and discussions
The material design of half-Heusler compounds has focused on compounds with elements from group 1 to 4 applied to the 4a site, elements from group 12 to 16 applied to the 4b site, and elements from group 8 to 11 applied to the 4c site. This study revealed that many half-Heusler compounds contain the group 14-16 elements to the 4c site and group 1 to 2 elements to the 4a and 4b sites. When heavier elements, such as Sn and Sb, are occupied to the 4c site, the difference between the average atomic weight in the compound and the 4c site is particularly large, and compounds are predicted to actually show a significant decrease in thermal conductivity. In the future, the synthesis of these groups of materials will lead to new low thermal conductivity half-Heusler compounds.
The computational cost is defined as the computation time required to perform a calculation, considering the number of CPUs used to accomplish the computational process. A computation time of only 0.5 h was required to develop the prediction model of thermal conductivity using ML, whereas the corresponding DFT calculations for a single compound required approximately 72 h. Therefore, the computation time required to build a model for predicting thermal conductivity using ML is sufficiently smaller than that required for the DFT calculations. This implies that the ML-based prediction models are more efficient than the DFT method for thermal conductivity calculations. Thus, ML-based modeling is a highly effective tool for predicting the thermal conductivity of half-Heusler compounds.

Conclusion
In this study, we investigated whether the lattice thermal conductivity of half-Heusler compounds can be predicted by ML from the atomic radii and masses of the constituent elements. The results show that the lattice thermal conductivity of the half-Heusler compounds can be predicted with an accuracy of ± 4% using the predicted www.nature.com/scientificreports/ lattice parameters and the atomic radii and masses. In addition to the conventional material design in which a material with a small lattice parameter and low density has a low thermal conductivity, it was found that even for materials with a large lattice parameter, one can design a material with a low thermal conductivity by selecting elements occupying the 4c site of the half-Heusler structure with a larger atomic mass than those occupying the 4a and 4b sites. Using the results of ML, the thermal conductivity of unknown half-Heusler compounds can be instantly predicted. It is expected that the search for half-Heusler compounds with the desired thermal conductivity will become easier and the development of functional materials in a short time and at a low cost will be advanced in the future.

Calculation methods
Evaluation of site selection of the half-Heusler structure and lattice parameter. Candidate half-Heusler compounds for lattice thermal conductivity calculations were sought from the Materials Project 44 . In the crystal structure of C1 b -type half-Heusler compounds, there are three types of atomic sites: 4a (0, 0, 0), 4b (0.5, 0, 0), and 4c (0.25, 0.25, 0.25) sites. The 4a and 4b sites are crystallographically interchangeable. Three structural models can be considered as one of the three constituent elements occupies the 4c site and the other two elements occupy the 4a and 4b sites. The lattice parameters were optimized using the DFT calculations for the three models. By comparing the total energies of the three models, the structural model with the lowest energy was determined to be the most stable structural model for the half-Heusler compound. The DFT calculations were performed using the Vienna ab initio Simulation Package (VASP) [45][46][47] . We adopted the projector augmentedwave (PAW) method 48,49 with the generalized gradient approximation of Perdew, Burke, and Ernzerh 50 for the exchange-correlation interactions. The k-mesh, cut-off energy and convergence energy were set to 7 × 7 × 7, 600 eV, and 10 -6 eV, respectively.
Evaluation of lattice thermal conductivity of half-Heusler compounds. The lattice thermal conductivity of half-Heusler compounds was calculated using the Phono3py code, developed by Atushi Togo et al. 22 .
The interatomic force constants were determined using various positions of the atoms in a supercell made of 2 × 2 × 2 primitive cells. Although the lattice thermal conductivity shows a temperature dependence, herein, we discuss the lattice thermal conductivity at 300 K. The calculated lattice thermal conductivities and lattice parameters are summarized in Table S1 in the Supplementary materials section. www.nature.com/scientificreports/