Introduction

The lattice thermal conductivity (κL) is a key design parameter for various technological applications. For example, heat sinks in the electronic devices require higher κL to dissipate the excessive thermal energy1, while reducing κL is an effective approach to improve the efficiency of thermoelectric (TE) conversion2. It is thus quite necessary to discover/design particular systems with desired κL. On the theoretical side, the most reliable approach for predicting κL is the solution of phonon Boltzmann transport equation (BTE) within the framework of density functional theory (DFT)3,4. However, the required calculations of the interatomic force constants (IFCs) are time-consuming, especially for those with large unit cell and low symmetry. As an alternative, the classic molecular dynamics (MD) simulations can be utilized to predict the κL of systems with complex crystal structure5. Nevertheless, the accuracy of MD significantly depends on the choice of interatomic potentials, which also limits its wide application. In a word, there remains some challenges or difficulties to accurately predict the κL, especially in a high-throughput way.

As an important technique of artificial intelligence, machine learning (ML) can efficiently determine the underlying connectivity among enormous data at extremely low cost6,7,8,9. During the past few decades, many efforts have been devoted to evaluate the κL of various systems, both theoretically and experimentally10,11,12,13,14. Based on these available data, ML can establish a mapping between the target property (κL) and the input features (such as the atomic mass, the phonon frequency, and the volume of unit cell8,15). Compared with first-principles calculations and MD simulations, the data-driven ML models enable high-throughput evaluation of κL, which exhibit strong predictive power for systems both inside and beyond the training set15,16. In addition to such direct prediction of κL, ML has been successfully used to build accurate interatomic potentials for MD simulations. Generally speaking, the machine learning potential (MLP) employs regression algorithm to determine the ab-initio potential energy surface (PES), and the atomic configurations are usually adopted as input features17,18. Recently, the MLPs have been utilized to accurately predict the κL of systems with complex crystal structures and chemical compositions, such as the alloys19, the heterostructures20, and the molten salts21. On the other hand, as the derivative of total energy with respect to the atomic displacement, IFCs can be obtained from the Taylor expansion of the PES4,22. The MLPs determine the accurate PES and thus can derive the IFCs at almost negligible computational cost, which enable accelerated solution of phonon BTE for the evaluation of κL22,23. Collectively speaking, ML can overcome the inherent disadvantages of MD simulations and first-principles calculations to accurately and readily predict κL.

The remainder of this review is organized as follows. In the section of “Direct prediction”, we give a brief introduction of the dataset construction, the feature selection, and the training algorithms, which are then combined to obtain the high-throughput models for predicting κL. In the section of “Indirect approach”, we focus on the construction of MLPs, and highlight their first-principles level accuracy as well as advantages over general approaches in predicting κL. The review is concluded with a summary of current works and future perspectives.

Direct prediction

Dataset construction and related features

As a data-driven technique, ML requires a dataset that contains the κL for a substantial number of systems to derive reliable prediction model. In general, the κL can be collected from first-principles calculations, MD simulations, and experimental measurements. As an addition, one can also obtain the κL from some materials databases. For example, thousands of entries in the Automatic FLOW (AFLOW) database have included the κL values calculated by the so-called Automatic Gibbs Library (AGL) method12,24. Here, the GIBBS quasiharmonic Debye model25 is employed to evaluate the Debye temperature and the Grüneisen parameter based on the computationally feasible adiabatic bulk modulus, which are then inserted into the well-known Slack model for the determination of κL26. We should emphasize that it would be better to collect all the κL obtained using the same approach, for example, either first-principles or MD. However, if there is not enough data available for ML, one can also consider both of them, as long as the interatomic potentials adopted in MD are well-tested and the results exhibit sufficient accuracy. Figure 1 is a schematic illustration of ML for the high-throughput prediction of κL, where the dataset is usually divided into two subsets including the training and testing sets. To avoid random selection of training data, the principal component analysis (PCA) can be used to identify systems that possess distinct features in the dataset. For example, Tranås et al. demonstrated that a model based on a semi-random pool of half-Heusler (HH) compounds (i.e. assumed “bad luck” in the training set) was unable to correctly predict the small κL values of those systems in the testing set from the rest27. As an alternative, they used active sample selection based on PCA, where three compounds with extremely low κL were included in the training process. Such an approach can significantly improve the model performance, in particular the ability to identify the low κL compounds in the testing set.

Fig. 1: A schematic illustration of machine learning for the high-throughput prediction of lattice thermal conductivities.
figure 1

Three components are usually involved in machine learning: dataset construction, input features, and training algorithms.

As mentioned in the introduction, ML is implemented to establish a mapping between the κL and some related input features, which usually contains the information about: (1) the structural properties, such as the lattice constant, the volume of unit cell, the number of atoms, and the bond length; (2) the elemental properties of the constituent atoms, including the atomic number, the atomic mass, and the Pauli electronegativity; (3) the phonon properties, such as the phonon frequency, the group velocity, the heat capacity, and the Grüneisen parameter. To be compatible with most ML algorithms, systems with different size needs to be represented by feature vectors with a fixed length. Such a problem can be solved by adopting statistical values of the elemental properties, such as the maximum, the minimum, the composition-weighted (CW) value, and the standard deviation28. Obviously, it is quite important to screen out features that are closely related to the target property. For instance, Juneja et al. revealed high Pearson correlation between κL and several fundamental properties, including the maximum phonon frequency, the average atomic mass, the volume of the unit cell, and the integrated Grüneisen parameter up to 3 THz15. Using these input features, they developed a Gaussian process regression-based ML model by training the κL of 120 dynamically stable and nonmetallic compounds. To keep only those features that are highly related to κL, Chen et al. performed the so-called recursive feature elimination (RFE) on the initial feature vector, which significantly reduces the dimensionality from 63 to 2929. It should be also noted that highly intercorrelated features could increase the computational cost and may affect the predictive power. It is thus quite necessary to check the feature relevancy and redundancy before any training process.

Machine learning algorithms

With the rapid development of artificial intelligence, various ML algorithms have been proposed, such as the Bayesian Optimization (BO)30, the eXtreme Gradient Boosting (XGBoost)31, the Neural Network (NN)32, the Kernel Ridge Regression (KRR)33, the Least Absolute Shrinkage and Selection Operator (LASSO)34, the Sure Independence Screening and Sparsifying Operator (SISSO)35, the Generalized Linear Regression (GLR)36, the Random Forest (RF)37, the Gaussian Process Regression (GPR)38, and etc. In this section, we give a brief introduction of the NN, SISSO, and RF, which are widely used to construct high-throughput ML models for the prediction of κL.

In the learning process, NN algorithm first feeds the input layer with feature data, which is then manipulated by several hidden layers, and the output layer finally generates the target value. Each neuron is connected with all the neurons from the previous layer, and it deals with the data according to a specific activation function. When data is transferred between neurons, its value will be multiplied by a weight parameter. To generate the best model, one should optimize the hyperparameters (such as the activation function, the number of neurons and the hidden layers) and minimize the loss function. It has been demonstrated that NN can effectively handle non-linear and complex problems, whereas the obtained model is usually treated as a black box.

The SISSO algorithm relies on two key steps: the feature-space construction and the descriptor identification. Specifically, the input features are first combined by iteratively using the algebraic operators \(\left\{{I, + , - , \times ,/,\exp ,\log ,\left| - \right|,\sqrt {} ,^{- 1},^2,^3} \right\}\), which could construct a huge feature space. The sure independence screening (SIS) then scores each new feature with a metric (correlation magnitude), and selects the subspace that contains the descriptors highly related with the training data. The sparsifying operators (SO) is finally utilized to find the optimal n-dimensional descriptor. Compared with many other ML algorithms, the SISSO can identify descriptors that are explicit and analytic functions of key inputs, which is very beneficial for understanding the inherent physical mechanisms.

As an ensemble learning algorithm, RF combines multiple decision trees (DT)39 to avoid overfitting. Each DT in the “forest” is individually trained by randomly selecting a subset of features. Here, the training data is divided into two or more categories at the root node based on the feature values, and each subsequent node will receive a data subgroup and then output the separations to the next nodes until all the generated groups are homogeneous. The output of a final node (called a “leaf”) gives the mean value of the corresponding separated samples. In a word, the trained DT model obtains the predicted value by determining the interval of input features. Collectively, the RF model can rank the importance of each feature according to the order of nodes, and output the average value of predicted results by all the DT ensemble models.

High-throughput prediction models

As an effective high-throughput method, ML has been widely used to predict κL of various systems in recent years15,16,27,28,29,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57. For example, Wang et al. developed a XGBoost model and its predictive power was checked by 549 compounds in the testing set, as shown in Fig. 2a44. Among 75 input features, the average atomization enthalpy (ΔHatomic) and the density (ρ) were found to be most relevant with the κL. Note that the training set contains 4937 κL calculated by the above-mentioned AGL method12, which were collected from the AFLOW database24. The model was then employed on all the entries in the Inorganic Crystallographic Structure Database (ICSD)58, and it was found that compounds containing halogen elements or heavy atoms exhibit low κL (see Fig. 2b). Among them, potential TE materials (such as BiTe2Tl and Cl2CsI) were screened out and the prediction accuracy was validated by first-principles calculations. Besides, the NN algorithm has been successfully applied to predict the κL of random multilayer (RML) and gradient multilayer (GML) structures composed of two types of conceptual atoms with different mass values, as shown in Fig. 2c45. To construct the training set, the κL of 1600 RMLs and their corresponding 1600 GMLs were calculated by MD simulations. In contrast to generally used crystal and elemental properties, the input features include several key parameters for quantifying the disorder in layer thicknesses of RMLs, as listed in Table 1. Figure 2d shows the predicted κL for 200 multilayer structures beyond the training set, which are in good agreement with the MD results. Unlike most ML models which appears as black boxes, Liu et al. employed the SISSO method to establish a physically intuitive descriptor for predicting the κL of HH compounds46. They found that the first term \(D_1 = \frac{{\bar m \times \chi _{{{\mathrm{B}}}} \times \left| {\chi _{{{\mathrm{A}}}} - \chi _{{{\mathrm{B}}}}} \right|}}{{e^{2a}}}\) dominates the three-dimensional (3D) descriptor, where the κL decreases with the lattice constant a but increases with the electronegativity difference |χAχB| between atoms at site A and B. This is consistent with the general belief that systems with larger unit cell usually have smaller κL, and stronger chemical bonding would lead to higher κL. Beyond the initial 86 training data, the strong predictive power of the descriptor was confirmed by 75 HH compounds and 15 full-Heusler (FH) systems.

Fig. 2: High-throughput prediction of lattice thermal conductivities by machine learning models.
figure 2

a The XGBoost model-predicted log-scaled κL versus the calculated values for the testing set. The top and right histograms show the corresponding data distributions. b Dependence of the predicted κL on specific elements for compounds in the ICSD, and the values are shown by colors along with the ΔHatomic and ρ. a and b are reproduced with permission from ref. 44. c Schematics of the multilayer structures and the MD simulation setup. d Comparisons of the real and predicted κL for 100 randomly generated RMLs and their corresponding 100 GMLs in the testing set. c and d are reproduced with permission from ref. 45.

Table 1 Summary of representative machine learning works on the direct prediction of lattice thermal conductivities in recent 5 years.

As typical input features for many ML models, accurate structural parameters are usually obtained by first-principles calculations40,41,54,56 if experimental results are not available. Alternatively, Jaafreh et al. utilized the crystal features of a series prototype structures to establish a RF-based model, which can be applied to related systems without the use of any DFT-relaxed structural parameters16. It should be noted that the crystal features are generated by using the area of each face of the Wigner-Seitz cell (see Fig. 3a) and the characteristics of neighboring atoms. The training set contains 2146 κL of 119 compounds at a series of temperatures from 100 to 1000 K. As shown in Fig. 3b, the RF-based model exhibits strong predictive power for 4 systems in the testing set. To go further, the model was used to predict the room temperature κL for 32,116 compounds in the ICSD, where 273 have ultralow values and 4 are even <0.1 Wm−1K−1 suggesting very promising applications in the field of energy harvesting.

Fig. 3: Machine learning models using the Wigner-Seitz cell or the graph representing the connection of atoms in the crystal as input features.
figure 3

a The Wigner-Seitz cell used to construct the feature space. b Comparisons of the calculated and predicted κL of 4 compounds in the testing set. a and b are reproduced with permission from ref. 16. c Schematic of transfer learning based on CGCNN for the prediction of κL. Here, the crystal structure is converted to a graph, where the nodes represent atoms and the edges connect the neighboring nodes. Reproduced with permission from ref. 51.

In recent years, the Convolutional Neural Network (CNN) algorithm has been adopted to predict the κL of porous graphene47, hybrid carbon-boron nitride honeycombs50, aperiodic superlattices57, and etc. The input layer of CNN is fed with particular arrays, which can be obtained by extracting characteristics of an image representing the system, instead of selecting features from various physical properties. In particular, the Crystal Graph Convolutional Neural Network (CGCNN) algorithm enables the prediction of target properties by a graph representing the connection of atoms in the crystal59. As a major advance, Zhu et al. employed CGCNN to predict the κL of all known inorganic crystals directly from their atomic structures51. As shown in Fig. 3c, they established a model based on the dataset60 containing 2668 calculated lattice thermal conductivities (named κC in their work), where the crystal structure was converted to a graph with the nodes and the edges respectively representing atoms and connections between neighboring atoms. The CNN was initialized by feature vectors that characterize each node and edge. It should be noted that these κC were firstly calculated by a semi-empirical model, which inevitably exhibit insufficient accuracy51,61. To address such a problem, they collected 132 experimentally measured lattice thermal conductivities (κexp). Due to the small size of the training set, the established model exhibits a large mean absolute error (MAE) of 0.51 (for log-scaled κexp). As correlated datasets share similar domain knowledge, they developed a transfer learning scheme (see Fig. 3c) where all layers from the model trained by κC was transferred to initialize a second CGCNN with reduced MAE of ~0.27.

It should be noted that ML models can be further optimized via active learning, which is very useful for the inverse design of systems with desired κL. For example, it is very time-consuming to identify the optimal distribution of holes that can minimize the κL of two-dimensional (2D) materials since the design space expands dramatically with increasing hole density. Taking porous graphene as a prototypical class of examples, Wan et al. adopted a CNN-based inverse design approach to determine the structure with the lowest κL, which only needs to simulate ~103 systems by MD out of the total 106 possible candidates47. By performing MD simulations, Chowdhury et al. obtained the κL of 300 randomly generated Si/Ge RMLs which is used as the initial dataset57. They iteratively identified RMLs with locally enhanced phonon transport and included them as additional training data. Using the CNN model, RMLs with unexpectedly higher κL are discovered, which can be attributed to the presence of closely spaced interfaces.

Summarizing this section, considerable progress has been made in the high-throughput prediction of κL by leveraging various data-driven models. To have a fast understanding, Table 1 provides the training and testing sets, the input features, and the adopted algorithms of representative ML works in recent 5 years. With the increasing growth of big data and accelerated development of artificial intelligence, it is expected that ML would become a major scientific paradigm for accurately predicting κL, and more ML models or descriptors could be emerged to give physical insights into different mechanisms to manipulate κL, such as phonon coherence, weak coupling of phonons, and high-order phonon anharmonicity62.

Indirect approach

Due to the lack of available training data, the above-mentioned ML models cannot be directly applicable to various 2D materials, nanowires, alloys, ternary salts, and etc. As an alternative, ML can be also utilized to construct accurate interatomic potentials or force constants so that efficient evaluation of κL become feasible using MD simulations or even first-principles calculations.

It is known that the atomic-scale simulations need to determine the PES that provides the potential energy as a function of atomic positions. In principle, the most accurate PES can be obtained by quantum mechanics calculations, which is however very time-consuming and even prohibitive for large systems. Based on the physical knowledge of the interatomic bonding, many specific analytic expressions have been proposed, known as the empirical potentials63. However, the PES is a multidimensional real-valued function, which cannot be completely fitted by these specific functional forms64. The empirical interatomic potentials thus usually exhibit insufficient accuracy and the involved parameters should be carefully optimized for different systems. Taking the bulk silicon as an example, the κL calculated by using the original Stillinger-Weber potential (~244 Wm−1K−1 at room temperature) is much higher than the experimentally measured result (~148 Wm−1K−1)65. It is thus quite necessary to develop alternative potentials for accurate prediction of κL, and MLP is one of good choices.

Training data and input features

In principle, ML can be used to fit the correlation between atomic configurations and physical properties of given systems. Compared with empirical potentials, MLPs determine the PES in a data-driven manner to describe the interatomic interactions. To ensure that the MLPs exhibit first-principles level accuracy, the dataset is usually constructed by performing ab-initio molecular dynamics (AIMD) simulations at a series of temperatures, where the energies, the forces, and the stresses of different atomic configurations are then recorded66,67,68. It should be noted that the atomic configurations are sampled from AIMD trajectories and uncorrelated with each other.

Unlike many ML models for high-throughput prediction of κL, establishing MLPs requires to input features that can represent the local environment around each atom, usually within a specific cutoff radius. The adopted features must be invariant to Euclidean transformations and permutation of chemically equivalent atom69. A simple example is the atomic Cartesian coordinates which cannot be used for training. The reason is that when the system is rotated or the chemically equivalent atoms are exchanged, a new list of Cartesian coordinates is generated, which however corresponds to the same atomic configuration. For the evaluation of κL, the widely used features are the moment tensor70, the atom-centered symmetry functions (ACSFs)71,72, and the smooth overlap of atomic positions (SOAP)18. Taking the ACSFs as an example, the \(G_i^2\) and \(G_i^4\) in the following expressions respectively describe the radial and angular environment of atom i,

$$G_i^2 = \mathop {\sum}\limits_j {{{{\mathrm{e}}}}^{ - \eta _s\left( {R_{ij} - R_s} \right)^2}} \cdot f_{{{\mathrm{c}}}}\left( {R_{ij}} \right)$$
(1)

and

$$\begin{array}{ll}G_i^4 = 2^{1 - \zeta }\mathop {\sum}\limits_{j,k \ne i}^{{{{\mathrm{all}}}}} \left( {1 + \lambda \cos \theta _{ijk}} \right)^\zeta \,\cdot\, {{{\mathrm{e}}}}^{ - \eta _a\left( {R_{ij}^2 + R_{ik}^2 + R_{jk}^2} \right)} \\ \qquad\;\;\cdot\; f_{{{\mathrm{c}}}}\left( {R_{ij}} \right) \cdot f_{{{\mathrm{c}}}}\left( {R_{ik}} \right) \cdot f_{{{\mathrm{c}}}}\left( {R_{jk}} \right).\end{array}$$
(2)

Here Rij is the distance between atom i and j, θijk is the angle centered at atom i, and fc is the smooth cutoff function. ηs and Rs define the width and the center of the Gaussians, respectively. In Eq. (2), the angular resolution and distribution can be determined by ζ and ηa, and λ has the values of +1 and −1. It should be emphasized that the construction of appropriate features is a very challenging task, and we refer the interested reader to a review article73 that summarizes recent work on the efficient representations of atomic and molecular structures.

Machine learning potentials

Table 2 summarizes several important MLPs used for the evaluation of κL, which includes the Moment Tensor Potentials (MTPs)70, the Neural Network Potentials (NNPs)71,74, and the Gaussian Approximation Potentials (GAPs)75. In particular, the MTPs exhibit an excellent balance between accuracy and computational efficiency76, which have been widely used to predict the κL of various systems, such as monolayers, alloys, and complex compounds68,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93. In principle, the purpose of training MTP is minimizing the difference between the predicted and DFT-calculated energies (E), forces (f), and stresses (σ) for K atomic configurations:70,94,95

$$\begin{array}{l}\mathop {\sum}\limits_{k = 1}^K {\left[ {w_{{{\mathrm{e}}}}\left( {E_k^{{{{\mathrm{AIMD}}}}} - E_k^{{{{\mathrm{MTP}}}}}} \right)^2 + w_{{{\mathrm{f}}}}\mathop {\sum}\limits_i^N {\left| {f_{k,i}^{{{{\mathrm{AIMD}}}}} - E_{k,i}^{{{{\mathrm{MTP}}}}}} \right|^2} } \right.} \\ +\, \left. {w_{{{\mathrm{S}}}}\mathop {\sum}\limits_{i,j = 1}^3 {\left| {\sigma _{k,ij}^{{{{\mathrm{AIMD}}}}} - \sigma _{k,ij}^{{{{\mathrm{MTP}}}}}} \right|^2} } \right] \to {{{\mathrm{minimum}}}}.\end{array}$$
(3)
Table 2 Summary of widely adopted machine learning potentials for indirectly predicting lattice thermal conductivities.

Here, we, wf, and ws are respectively positive weights that express the importance of energies, forces, and stresses in the training process. In order to improve the quality of MTPs, active learning is usually implemented, where the atomic configuration will be included in the training set if its extrapolation grade (a feature correlated with the prediction error94) is above the threshold and below the allowed maximum. Figure 4 show the widely used active learning scheme for training a MTP, which usually contains six stages labeled as A to F.

Fig. 4: Scheme of active learning bootstrapping iterations for training the Moment Tensor Potential (MTP).
figure 4

By selecting extrapolative configurations from the MD trajectories, the MTPs are trained in a loop until simulations are finished without exceeding the allowed maximum of extrapolation grade. Reproduced with permission from ref. 94.

To validate the accuracy of the trained MLP, the energies, the forces, and the stresses of different atomic configurations are checked in the testing and predicting sets. For instance, Huang et al. proposed a single atom neural network potential (SANNP) for the amorphous silicon based on the training set containing 800 atomic configurations from AIMD simulations67. Figure 5a−c respectively show the predicted total energies, atomic energies, and atomic forces in the testing set, which agree well with those obtained from DFT calculations.

Fig. 5: Validation of the accuracy of machine learning potentials.
figure 5

The intuitive linear correlations between the predicted a atomic energies, b total energies, c atomic forces and those calculated by DFT in the testing set for the amorphous silicon. Reproduced with permission from ref. 67.

Application examples

As mentioned in the introduction part, the MLPs with first-principles level accuracy can be implemented into the MD simulations or the phonon BTE to indirectly predict κL of given systems. Unlike conventional first-principle calculations or classic MD with empirical potentials, the evaluation of κL by employing MLPs simultaneously exhibits strong reliability and high efficiency, which have been demonstrated by various systems as also summarized in Table 220,21,22,23,66,67,68,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119. For example, Korotaev et al. used the active learning algorithm to develop the MTP for the CoSb3 skutterudite and accurately predict its κL at different temperatures68. Indeed, we see from Fig. 6a that the κL indirectly obtained from MTP almost coincide with the experimentally measured results. It should be emphasized that, compared with conventional first-principles calculations, the MTP can significantly accelerate the prediction process (the computational speed increased by more than four orders of magnitude).

Fig. 6: Accurate prediction of lattice thermal conductivities by utilizing machine learning potentials.
figure 6

a Comparison of the calculated and experimentally measured κL of CoSb3. The vertical lines show doubled standard deviation of the results calculated by Green-Kubo method. Reproduced with permission from ref. 68. b The lattice thermal conductivities along two different directions (κa and κc) and their average value (κp) of (Ti0.2Zr0.2Hf0.2Nb0.2Ta0.2)B2, which are plotted as a function of temperature. The inset shows the auto-correlation function. Reproduced with permission from ref. 107. c κph-v and d κ3ph+ph-v of silicon with respect to vacancy concentration at 300 K, predicted by the DFT, GAP, and empirical potentials. Reproduced with permission from ref. 116.

Due to the compositional complexity, predicting the κL of high-entropy materials is usually a challenging task. Recently, Dai et al. established a deep learning potential for the thermal insulting material (Ti0.2Zr0.2Hf0.2Nb0.2Ta0.2)B2120, which was then used in the MD simulations to calculate its average lattice thermal conductivity along two directions (κp), as shown in Fig. 6b107. At room temperature, the κp is predicted to be 4.0 Wm−1K−1, which is close to the experimentally measured value of ~4.8 Wm−1K−1.

In addition, the GAP of crystalline Si with vacancies was adopted by Babaei et al. to determine the lattice thermal conductivities contributed from phonon-vacancy scattering (κph-v)116. As can be found from Fig. 6c, the κph-v predicted from the GAP show good agreement with the DFT-calculated results at different vacancy concentrations, while those with empirical potentials exhibit much larger errors. Similar picture can be found in Fig. 6d, where the effects of three-phonon and phonon-vacancy scatterings are both included (κ3ph+ph-v). Note that the computational cost of the GAP is five orders of magnitude smaller than that of DFT calculations in their own work, indicating the high efficiency of such kind of MLP for the prediction of κL.

Last but not least, we note that several other MLPs were proposed recently, such as the spectral neighbor analysis potential, the bond order potential, the force constant potential, and the spatial density neural network force fields, which have been also demonstrated to accurately evaluate the κL of many systems at low computational cost19,121,122,123,124,125,126,127. With the deep understanding of interatomic interactions, it is reasonable to expect that more and more reliable and universal MLPs could be developed in the future.

Summary and perspective

To conclude, we hope this mini review could enable the interested reader to have a preliminary understanding of predicting κL via ML, either directly or indirectly. As high-throughput ML models for the direct prediction of κL, the input features usually contain several fundamental physical properties related to the investigated systems and the constituent elements, such as the lattice constants, the phonon frequency, the atomic mass, and so on. Such kind of data-driven models can be utilized for the rapid screening and inverse design of materials with desired κL, and their predictive power has been demonstrated by checking many systems both inside or beyond the training sets. In addition, the MLPs can be readily implemented into the MD simulations or the phonon BTE, which offer an indirect but quite efficient prediction of κL for particular systems, including crystal with defects, high-entropy compounds, amorphous structures, and so on. Compared with conventional DFT and MD approaches, the MLPs can significantly accelerate the evaluation of κL and simultaneously retain first-principles level accuracy.

Although considerable advances have been made in the direct prediction of κL via ML, there remain several challenges to be addressed in the future. For example, it is still difficult to construct large and reliable dataset required for training, which definitely affects the predictive power and the transferability of such data-driven method. In particular, many ML models are severely limited to some specific systems (e.g. HH compounds, zincblende and rocksalt structures), leaving a much larger materials space unexplored40,41,43,46,52,54. Although one can find the κL values for thousands of compounds in the AFLOW and the TE Design Lab repositories, they are usually calculated by using empirical models12,61 and may exhibit insufficient accuracy compared with those obtained from first-principles calculations, MD simulations, or experimental measurements. On the other hand, substantial advances have been made in the high-throughput discovery of 2D materials, while their thermal transport properties are less known so far128,129,130,131. Due to the limited experimental and theoretical data, it is rather difficult to derive a reliable ML model to predict the κL of various 2D materials. It is believed that the transfer learning can overcome the disadvantage of small data size by pretraining the correlated dataset. However, it still requires accurate first-principles calculations to obtain the scattering phase space of numerous systems53, which remains a tough task. To take better advantage of transfer learning, much efforts should be devoted to identify readily available physical properties that are highly correlated with κL.

In the case of developing efficient MLPs, it is usually necessary to calculate the energies, the forces, and the stresses of a substantial number of atomic configurations within the framework of DFT. Such a task is however very time-consuming for the systems with large unit cell and complex chemical composition, such as molten salts21, skutterudites68, high-entropy compounds107 and so on. Besides, the employed features usually indicate the atomic environment within a certain cutoff radius, and the established MLPs thus ignore the long-range interactions that could be very important for thermal transport properties in some cases. It is expected that the efficiency and accuracy of MLPs can be further improved by careful selection and optimization of input features and/or learning schemes.