Introduction

Lattice thermal conductivity (κl) is essential for a wide range of applications of high scientific and societal impact, including thermal insulation for energy savings1, thermal management of semiconductor devices2, thermoelectrics3, and thermal barrier coatings4. κl can be measured with experiments or predicted accurately with first principles calculations of phonon scattering rates coupled with the Boltzmann transport equation (BTE). On the computational side, Peierls et al. formulated the phonon BTE approach5 and Maradudin et al. developed the three-phonon (3ph) scattering theory6. Broido et al. combined them with ab initio force constants to enable the first principles prediction of κl7. More recently, Feng et al. formulated a general theory of four-phonon (4ph) scattering and predicted its importance8,9, which was subsequently confirmed by independent experiments10,11,12. It is generally considered now that both three-phonon and four-phonon (3ph+4ph) scattering should be considered for accurate prediction of κl that can be compared to experimental data. Meanwhile, a number of packages have been released for first principles prediction of κl, including ShengBTE13, AlmaBTE14, Phono3py15, FourPhonon16, etc.

However, both experimental measurements and first principles calculations of κl are generally expensive or even unaffordable, especially for 4ph scattering. As a result, κl is only measured or accurately predicted on a small fraction of all materials. Take silicon as an example. Using 16 × 16 × 16 q-point mesh and unity broadening factor, it takes about 7000 CPU hours to calculate κl with 3ph+4ph scattering under the relaxation time approximation (RTA)16. For many other technologically important materials including perovskites for solar cell17,18,19, tetrahedrites for thermoelectric device20 and lithium intercalation materials for Li-ion battery21,22, the large number of phonon branches due to complex crystal structures makes 4ph or even 3ph scattering calculations unaffordable.

In light of this, machine learning (ML) approaches that can achieve similar accuracy with first principles or experiments have long been desired but remained an open question. Recent progress has been made in developing end-to-end surrogate models to predict κl23,24,25, by taking the structural information of material as the descriptors without involving phonon scattering. However, the accuracy falls far short of that of experiments or first principles and can only be used for rough estimations.

In this work, we provide a machine learning approach that can predict phonon scattering rates and thermal conductivity at the experimental and first principles accuracy level, for a wide range of materials represented by Si, MgO, and LiCoO2. The success of our approach is enabled by mitigating computational challenges associated with the high skewness of phonon scattering rates and their complex contributions to the total thermal resistance. Furthermore, transfer learning between different orders of phonon scattering is used to improve the model performance. Compared to first principles calculations, our surrogates achieve up to two orders of magnitude acceleration, which would enable large-scale thermal transport informatics.

Results

BTE workflow and computational cost analysis

We begin with a computational time analysis of the phonon BTE workflow. Figure 1 illustrates the complete workflow for predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). The phonon phase space is first calculated based on phonon dispersions, from which the 3ph and 4ph scattering processes that satisfy the energy and momentum conservation are identified. Combining with third- and fourth-order interatomic force constants determined by the material structure, 3ph scattering rate (\({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime}{\prime} }}^{{{{\rm{3ph}}}}}\)) and 4ph scattering rate (\({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\)) of all the allowed scattering processes within the phonon phase space are calculated, where \(\lambda ,{\lambda }^{{\prime} },{\lambda }^{{\prime\prime} }\) and \(\lambda ,{\lambda }^{{\prime} },{\lambda }^{{\prime\prime} },{\lambda }^{{\prime\prime} {\prime} }\) represents phonon modes that are involved in the corresponding scattering processes. Subsequently, the relaxation time of the phonon mode λ for 3ph and 4ph scattering (\({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) and \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\)) are derived by considering all corresponding scattering processes in the mode λ. The total phonon relaxation time for the mode λ (τλ) is then derived based on the spectral Matthiessen’s rule26: \({\tau }_{\lambda }^{-1}={({\tau }_{\lambda }^{{{{\rm{3ph}}}}})}^{-1}+{({\tau }_{\lambda }^{{{{\rm{4ph}}}}})}^{-1}\). Finally, \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) is calculated considering the spectral contribution from every phonon mode. When calculating \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\), only 3ph scattering is considered and τλ only contains \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\). A detailed explanation of the phonon BTE could be found in Supplementary Section 1.

Fig. 1: Workflow of predicting κl by analytically solving BTE versus using the surrogate model.
figure 1

Due to the large number of scattering processes, analytically calculating every \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) (gray box) is very time-consuming. We train surrogate models with a small portion of phonon scattering processes from the phonon phase space, then use them to generate the rest of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) with a much faster speed (red box). As a result, the process of predicting κl is greatly accelerated.

In this workflow, the most time-consuming step is calculating \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\). Take silicon as an example, the calculation of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) accounts for more than 75% of the total computational cost for predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) and \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\), respectively (see Supplementary Section 2). Considering the large number of scattering processes (106 for 3ph and 1011 for 4ph), we can expect significant time savings if the prediction of the scattering rate of each individual process can be accelerated. This leads us to the idea of attempting deep neural networks (DNNs) as surrogate models to calculate \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\). We randomly choose a relatively small portion of scattering processes from the phonon phase space and calculate their scattering rates. Then we use them as the training set for our surrogate model. After training, the large number of remaining scattering processes in the phonon phase space are evaluated with the trained model. Due to the fast forward pass of the DNN, the average time of predicting a single scattering rate is anticipated to be greatly reduced, which will lead to a huge acceleration of κl prediction.

Phonon scattering surrogate models setup

In this work, we study Si, MgO and LiCoO2, three representative thermal conductors with various κl values. These three materials are well studied, with predicted values consistent with the experimental values8,22,27. Also, LiCoO2 serves as an example of complex materials, where there are four atoms in its primitive cell compared with two for Si and MgO. For each material, we train two individual DNN models for \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\), respectively. The models are used to replace the analytical calculation of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\).

As the input features for an ML model, the set of descriptive qualities (termed descriptors) should have a good correlation with the target property28,29. ML models for phonon scattering have not been attempted before and would need some careful consideration. When calculating phonon scattering rates for a particular material, each phonon scattering process is sufficiently determined by the relevant phonons involved in that process. So the descriptors for a specific 3ph or 4ph scattering process are the information of the three or four participating phonons. To describe one phonon, we choose its frequency ω, wave vector k, eigenvector e and group velocity v as the descriptors. The ω and k are used to determine a phonon based on the dispersion relation. The e is to describe the vibrational amplitude and v is to describe the propagation of the corresponding phonon mode. All of them can be obtained by solving the dynamical matrix in the lattice dynamics.

With the suitable descriptors, there were still many barriers to overcome. Figure 2a, b shows the distribution of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) and Fig. 2c shows the process of addressing challenges we met. Both \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) range tens of orders of magnitude, which are highly imbalanced and can lead to great bias for the DNN models trained to the raw target values. The derived τλ in the low-frequency range deviate from the analytical results and cannot follow the physical scaling law \(\mathop{\lim }\limits_{\omega \to 0}{\tau }_{\lambda }^{-1}=0\), which leads to a large underprediction of κl (see Supplementary Fig. 3a, b). To reduce the skewness, we performed a negative logarithm transform on our target label and train the surrogate model on the transformed dataset. The predicted τλ can now follow the physical scaling law. However, for every phonon mode, τλ tends to be overpredicted, which leads to an overprediction of κl (see Supplementary Fig. 3c, d). After careful analysis, we found higher negative error for scattering processes with large scattering rates. Although high scattering rate processes only account for a small portion of the whole phonon phase space for both 3ph and 4ph scattering, they are the major contributors to the total τλ and thus have a greater impact on thermal transport than other processes. Moreover, since the error on the logarithm scale means orders of magnitude of error in the linear scale, scattering processes with larger scattering rates will be affected more than others. Proper weights need to be assigned to high scattering rate processes when training the surrogate models, and the weights should be able to generalize to different materials and scattering types. After attempting a variety of forms of weights, we develop appropriate target-value-based loss function weights, which are suitable for both 3ph and 4ph scattering for all the materials we tested. This allows the surrogate model to reduce the error of prediction on the important high scattering rate data points during training, which leads to an accurate prediction of κl. More statistical information of the phonon scattering is shown in Supplementary Table 3. The details of assigning the target-value-based weights can be found in the “Methods” section and Supplementary Section 5.

Fig. 2: Challenges of developing surrogate models for phonon scattering and how we mitigate them.
figure 2

a Histogram of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) for Si, MgO and LiCoO2 on a logarithm scale. b Histogram of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) for Si, MgO and LiCoO2 on a logarithm scale. The distribution of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) is estimated by sampling 1,000,000 processes from the phonon phase space. There is a long tail as \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) approaching zero, which shows that the dataset is highly imbalanced. c The process of designing suitable DNN models.

Model performance for three-phonon scattering

We start from predicting \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\), which only include 3ph scattering. Figure 3a shows a comparison of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) estimated by our surrogate models against the analytical values. The coefficient of determination (R2) is 0.922 and 0.891 for Si and MgO, respectively, which demonstrates the high accuracy of our surrogate models. For LiCoO2, the R2 is 0.477, which is lower than the previous two cases. This is expected considering the complexity of its phonon scattering due to the fact that LiCoO2 has more atoms in the primitive cell than Si and MgO. Figure 3b shows the comparison of \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) between the surrogate models and the analytical models. The R2 value of \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) are 0.968, 0.957 and 0.945 for Si, MgO and LiCoO2, respectively, which are always higher than the corresponding R2 values for \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\). This implies the existence of error canceling effect during the summation of all scattering processes. Furthermore, the ML predicted \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) at low frequency satisfies the physical scaling law \(\mathop{\lim }\limits_{\omega \to 0}{\tau }_{\lambda }^{-1}=0\) (see Supplementary Fig. 4), suggesting that our surrogate models can capture the inherent physical correlations between the phonon frequency and the scattering rates. With τλ, we then calculate the spectral contribution of each phonon mode to \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) and sum them up to get the final \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\). Figure 3c shows the cumulative \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) with respect to the phonon frequency, i.e., the value of κl when only phonons with the frequency below a certain threshold are taken into account. The mean free path accumulated \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) is shown in Supplementary Fig. 15. The excellent agreement between our results and the analytical results shows the capability of our model to predict \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) in every frequency range. We repeat the calculation six times with different random dataset splits and report errors based on one standard deviation. The prediction of bulk \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) are 137.9 ± 3.6, 46.79 ± 0.30 and 16.82 ± 0.42 W/(m  K) for Si, MgO and LiCoO2, respectively. Compared with the analytical result (139.7, 47.4 and 17.01 W/(m  K) for Si, MgO and LiCoO2), the mean absolute percentage errors (MAPEs) of three materials are 2.43%, 1.39% and 2.03%, all less than 3%. Considering that the experimental uncertainty for measuring thermal conductivity can typically be 10%, the accuracy of our models is exceptional. We also consider the anisotropy property of LiCoO2 and calculate the in-plane lattice thermal conductivity (\({\kappa }_{{{{\rm{l,\parallel }}}}}^{{{{\rm{3ph}}}}}\)) and cross-plane lattice thermal conductivity (\({\kappa }_{{{{\rm{l,\perp }}}}}^{{{{\rm{3ph}}}}}\)), which are shown in Supplementary Table 6 together with all the results for each run. Moreover, our model can also perform well at the high temperature, which is demonstrated in Supplementary Section 11.

Fig. 3: The performance of 3ph scattering surrogate models for Si, MgO and LiCoO2.
figure 3

a Scatter plot of estimated \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) with respect to analytical \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\). b Scatter plot of estimated \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) with respect to analytical \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\). c Cumulative \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) with respect to the phonon frequency. d Comparison of the total computational cost of predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) between analytical and surrogate models. The surrogate models are up to four times faster than the analytical models. e Computational cost of each step in the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) with surrogate models. f Average computational cost for calculating a single \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\). The surrogate models are two orders of magnitude faster compared with the analytical models.

To demonstrate the significant reduction in computational time brought by our surrogate model, Fig. 3d shows the comparison of the computational cost between the analytical models and the surrogate models. The reported time is CPU time, i.e., the cumulative time of all CPU cores working on the job. Our surrogate models achieve a speedup of 2.85×, 2.5×, and 3.57× for Si, MgO and LiCoO2, respectively. We then analyze the computational cost of every procedure in the surrogate models, which is shown in Fig. 3e. For the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\), most of the time is used to generate datasets and train DNNs. Although our surrogate models have two orders of magnitude acceleration on average for a scattering process (Fig. 3f), this huge acceleration is partly masked by these overheads. In comparison, 4ph scattering has a much larger phonon phase space than 3ph scattering, i.e., more allowed scattering processes (1011 versus 106), and more time saving is expected for predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\).

Model performance for four-phonon scattering

Four-phonon scattering represents the frontier of the first principles prediction of thermal conductivity, but it is forbiddingly expensive and complex. Encouraged by the 3ph result, we proceed to train surrogate models for 4ph scattering following a similar procedure. The DNN structures are set to be the same as 3ph models except that the information of one more phonon is added in the descriptors. Due to the huge number of scattering processes, we cannot save \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) of all possible processes because it would exceed the maximum memory of computers. To deal with this problem, we conduct the prediction of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) mode by mode. After calculating \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\), the memory is freed for the next phonon mode (see Methods for detail). As \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) is not saved, we only present the performance on \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) and \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). The pair plots between the predicted and analytically calculated \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) are shown in Fig. 4a, with R2 values of 0.994, 0.995 and 0.979 for Si, MgO and LiCoO2, respectively. These results demonstrate the success of our DNN surrogate model in predicting \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) with very high accuracy. The cumulative \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) (Fig. 4b) further proves the accuracy of predicting \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) and spectral \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). Note that for \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\), we use \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) from surrogate models and \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) from analytical models to calculate τλ. This is to show the error solely brought by 4ph surrogate models. Based on the average of six different random dataset splits, the total \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) for Si, MgO and LiCoO2 are 120.5 ± 0.2, 42.32 ± 0.10 and 6.812 ± 0.288 W/(m  K), respectively. Compared with the analytical result (120.6, 42.2 and 6.619 W/(m  K) for Si, MgO and LiCoO2), the MAPEs are 0.09%, 0.36%, 4.46%, respectively, all less than 5%. The predicted \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) has high accuracy and also reflects a clear 4ph effect, as the 4ph scattering is expected to decrease \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). The \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) - w relation are shown in Supplementary Fig. 5. The anisotropic \({\kappa }_{l}^{{{{\rm{3ph+4ph}}}}}\) for LiCoO2 together with the results of each run for all three materials are shown in Supplementary Table 7. The mean free path accumulated \({\kappa }_{l}^{{{{\rm{3ph+4ph}}}}}\) is shown in Supplementary Fig. 15. The performance of 4ph surrogate model at the high temperature is shown in Supplementary Section 11. Figure 4c shows that the surrogate models achieve 64.3 × , 69.9 × and 17.1 × speed up for predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) of Si, MgO and LiCoO2, respectively. Due to a larger phonon phase space, the evaluation of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) takes a larger portion of the total time compared with 3ph scattering (Fig. 4d), which leads to more acceleration for the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). It is to be noted that we have less time saving on LiCoO2 compared with Si and MgO, because we use a smaller q-mesh and broadening factor for generating training data (see Methods for detail), which leads to fewer phonon modes and smaller phonon phase space compared with other two materials. Figure 4e shows that the surrogate models are on average two orders of magnitude faster than the analytical models for the prediction of a single \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\), which is the same as what we found for 3ph scattering.

Fig. 4: The performance of 4ph scattering surrogate models for Si, MgO and LiCoO2.
figure 4

a Scatter plot of estimated \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) with respect to analytical \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\). b Cumulative \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) with respect to the phonon frequency. c Comparison of the total computational cost of predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) between the analytical and surrogate models. The surrogate models are up to seventy times faster than the analytical models. d Computational cost of each step in the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) with surrogate models. e Average computational cost of calculating a single \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\). The surrogate models are two orders of magnitude faster compared with the analytical models.

Transfer learning from 3ph scattering to 4ph scattering

Transfer learning is a technique that can gain model improvement using knowledge learned in another different but relevant task. For example, it can leverage knowledge gained from learning a proxy property to improve the prediction of a target property. In thermal science, some works have been done on utilizing transfer learning to improve the model performance30,31,32. Considering the similarity of 3ph and 4ph scattering formalism, we take \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) as the proxy property and employ transfer learning from 3ph models to 4ph models to improve the prediction of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\). Generally, to perform transfer learning, two different models should share the same architecture. This is not true in our case because 4ph scattering involves one more phonon, which leads to a discrepancy in the dimension of descriptors.

To overcome this problem, we add a “virtual phonon” into the 3ph surrogate model so that the dimensions of descriptors of the modified 3ph model and the 4ph surrogate model become the same. The “virtual phonon” are dummy inputs with zero values and same dimensions as the descriptor for a phonon. Figure 5a shows the workflow of transfer learning from the 3ph model to the 4ph model. The performances of modified 3ph models are comparable with the previous 3ph surrogate models (see Supplementary Section 8), which demonstrates that the “virtual phonon” does not degrade the learning ability of models and the modified 3ph models can still capture the mechanism of phonon scatterings.

Fig. 5: Transfer learning from 3ph to 4ph surrogate models.
figure 5

a The workflow of transfer learning. A modified 3ph model with dummy inputs for `virtual phonon' is first trained and used as a `warm start' for the 4ph model. b The MAPE of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) on a logarithm scale comparing the 4ph surrogate models and the 4ph transfer learning models. The predictions with the transfer learning models are more accurate. c The MAPE of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) comparing the 4ph surrogate models trained on 3% and 0.3% of the 4ph phase space and the transfer learning model trained on 0.3% of the 4ph phase space. After transfer learning, the error of the model trained on a small training set is reduced.

Using the ‘warm start’ strategy, i.e., keeping the weights and biases of 3ph models as the initialization and training the models on 4ph data, the knowledge of 3ph scattering is then transferred to 4ph models. We first train the transferred models with the same datasets as the previous 4ph surrogate models. Figure 5b shows the MAPE of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) compared with the previous 4ph surrogate models. The transferred models reduce error by 66.7%, 75.0% and 55.8% for Si, MgO, and LiCoO2, respectively. The better performances of the transferred models suggest that our surrogate model can capture the mechanism of phonon scattering. Information embedded in the 3ph surrogate can be used to improve the performance of 4ph surrogate. More details of the performance of 4ph surrogate models are shown in Supplementary Section 8.

Finding that transfer learning can improve the model performance, we then utilize transfer learning to further accelerate the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). We take LiCoO2 as an example because it takes a larger portion of time to generate its training set compared with the other two materials. In the previous 4ph surrogate models, our sampling technique utilizes 3% of the phonon phase space as the training set. Now we take fewer data from each phonon mode and the new training set is only 0.3% of the phonon phase space. Figure 5c shows the MAPE of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) on a logarithm scale. Compared with the previous result, the prediction error doubles when using the small training set. However, with transfer learning incorporated, the error of prediction is reduced and goes back to a similar level of the result using a larger training set. On the other hand, the total computational time of predicting \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) with transfer learning decreased by around 75%, which comes from around 90% of time reduction of generating dataset and training the surrogate model (see Supplementary Fig. 8). Overall, by taking advantage of the similarity between 3ph and 4ph scattering processes, our transfer learning model reaches better performance.

Discussion

Our ML model is targeted at the phonon scattering level by predicting \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) or \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\), which are subsequently used to derive τλ and κl. Compared with other end-to-end ML models, i.e., predicting κl based on material structural information, our model is distinctly different and brings several advantages. First, instead of acting as a black box, our workflow entails all the phonon scattering physics and insights, which serve a foundational role in understanding thermal transport. Our approach allows us to obtain essential quantities like τλ which are highly important to various topics including optical linewidth33,34,35, thermal barrier coating4,36, radiative cooling37, etc. This preservation of true physics brings the second advantage. While other end-to-end models can only predict the κl in the right orders of magnitude, which are only useful for very rough estimation, our models give less than 5% error and can be used for quantitative materials design with high confidence. A comparison is shown in Fig. 6, where the relative errors of end-to-end models are around ± 30% and can be over ± 100% and our surrogate models achieve over 6 × improvement in the accuracy of predicting κl.

Fig. 6: The relative error of predicting κl comparing end-to-end ML models and our 3ph+4ph surrogate models.
figure 6

Results of end-to-end models are from refs. 23,31,42. The line and dot inside each box represent the median and the mean of the data, respectively. The box represents the interquartile range from the 25th to the 75th percentile. The upper and lower limits of whiskers are the minimum and maximum value in the dataset. Compared with other end-to-end machine learning models where the relative error can be over 100%, the relative error of our surrogate model is always within 5%, which shows much higher accuracy.

The acceleration of our surrogate models originates from the fast forward pass of DNNs. Compared with ShengBTE, which evaluates scattering processes sequentially in each thread, our models evaluate multiple scattering processes together in batches, which can take advantage of fast matrix operations. Compared with other works which accelerate the prediction of phonon scattering by utilizing GPU parallel computing ability38,39, the acceleration of our surrogate models does not rely on GPU. To demonstrate this, we run our surrogate model on one CPU core, and we find that the computational time only increases a little compared with running our model on GPU, which originates from a longer training time (see Supplementary Fig. 9). We still have a huge acceleration for the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\).

Our surrogate models provide an ML framework for the prediction of phonon scattering, which could also be used for other purposes. For example, Supplementary Section 10 shows a classification model based on 3ph DNN architecture that can identify the 3ph scattering processes that are less important to \({\tau }_{\lambda }^{3ph}\) and \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\). Eliminating them prior to the calculation of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) can also lead to significant acceleration. This suggests a broad application of our approach in phonon scattering research. Also, for transfer learning, we take advantage of the similarity between the 3ph and 4ph processes. It is possible to do transfer learning between materials with similar structures to achieve better performance.

Our surrogate model is based on RTA, which is valid for materials where Umklapp scattering dominates. It is worth noting that our 3ph+4ph result for Si is slightly underpredicted compared to the experimental value, consistent with previous works8,9,16 and due to the neglect of phonon renormalization at finite temperature. Literature has shown that adding phonon renormalization can make the prediction in agreement with experimental results for Si40. Future work could be done on using machine learning to predict scattering rates with phonon renormalization.

There is still room for us to further accelerate our models. In the 4ph surrogate models, we evaluate \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) mode by mode and do file I/O to avoid exceeding the memory limit, which decreases the speed of calculating scattering processes. More acceleration could be obtained if we can evaluate more scattering processes at once. Furthermore, our surrogate models are built with Python, which is an interpretive programming language with less efficiency compared with Fortran (the programming language for ShengBTE). There could be more time saving if we can turn the code into a more efficient language like C or Fortran. Another potential limitation of our work is the mismatch of descriptors between materials with different structures. For materials with different numbers of atoms in one primitive cell, the dimensions of their eigenvectors are different (see “Methods” for detail). This may hamper the idea of using transfer learning between materials with different structures. To deal with this problem, more work could be done on finding a better way to describe phonon eigenvector terms to keep the same dimension for every material.

In summary, we develop machine learning models at the phonon scattering level, which has not been achieved so far. The predicted phonon scattering rates and relaxation times agree well with the analytical result. Lattice thermal conductivities are then derived with relative error less than 3% for \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph}}}}}\) and 5% for \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). Our models achieve a speedup of up to four times for 3ph and up to seventy times for 3ph+4ph BTE workflow compared with ShengBTE. Transfer learning from 3ph to 4ph scattering models can further improve the model performance. The proposed surrogate models are able to capture the mechanism of phonon scattering and would assist the discovery of desired κl materials. The work can potentially remove the paramount computational cost barrier of high-order phonon scattering and enable large-scale thermal transport informatics.

Methods

Dataset

Our datasets contain the information of phonon pairs in the phonon phase space as input and the corresponding \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) or \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) as output. The datasets on Si, MgO and LiCoO2 are generated by a custom ShengBTE package with the FourPhonon module. The temperature is set to be 300 K. The first Brillouin zone is discretized into N × N × Nq-mesh. After careful convergence tests, N is set to be 28, 20 and 10 when calculating 3ph scattering for Si, MgO and LiCoO2, respectively. A broadening factor of unity is used for all three materials in the 3ph scattering. For the calculation of 3ph+4ph scattering, N is set to be 16, 15 and 10 for Si, MgO and LiCoO2, respectively. Considering the memory and computational cost, we set the broadening factor to be 0.1 for Si and MgO and 0.01 for LiCoO2, which is enough for reaching convergence. Isotopic scattering is included in both 3ph and 4ph scattering for all three materials. For data cleaning, we remove negative \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) from the generated dataset because they are unphysical. Target value is then transformed by −log10(Γ), where Γ stands for \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) or \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) in the corresponding model.

For each scattering process, we use the information of the three or four participating phonons as descriptors. A phonon is described as λ(ω, k, e, v), where the matrix properties are flattened into 1D vectors. Note that for materials like Si and MgO, with 2 atoms per primitive cell, the dimension of v is 12; while for materials like LiCoO2 with 4 atoms per primitive cell, the dimension of v is 24. In total, the dimension of descriptors for 3ph scattering is 57 (Si and MgO) or 93 (LiCoO2); while the dimension of descriptors is 76 (Si and MgO) or 124 (LiCoO2) for 4ph scattering. The dimensions of each physical term in descriptors are shown in Supplementary Section 3.

To generate the training set, we perform sampling over the phonon phase space. Considering that the numbers of allowed scattering processes are quite different for different modes (see Supplementary Fig. 2), some phonon modes would contribute more data than others and more weights would be given to these modes if we randomly select a certain percentage of scattering processes from the phonon phase space. To treat all the phonon modes more equally, we sample a certain number of scattering processes from each mode. The number of samples is set to be 2,000 for 3ph surrogate models and 20,000 for 4ph surrogate models. The generated training set is only a small fraction of the phonon phase space, as exhibited in Supplementary Table 4.

DNN structure

DNN models are built with Tensorflow41, which consists of 4 hidden layers with 1000, 1000, 1000 and 10 neurons. Rectified Linear Unit (ReLU) activation functions are used in the hidden layers. The output layer has one neuron with the linear activation function. The loss function is chosen to be the Mean Square Error (MSE), which is given by:

$$MSE=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{({y}_{i}-\hat{{y}_{i}})}^{2}\,{{\mbox{,}}}\,$$
(1)

where yi is the true value and \(\hat{{y}_{i}}\) is the predicted value. The structure of DNN is determined by hyperparameter tuning.

Recognizing that high \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) processes are more important, we develop target value-based loss function weights w. Multiplying w to the loss function of the DNN model makes it focus more on high \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) processes. After careful searching, the form of weight we use is:

$$w={\Gamma }^{0.4}\,{{\mbox{.}}}\,$$
(2)

The parameter 0.4 in w is determined by hyperparameter tuning on the Si 3ph dataset and is able to generalize to both other materials and other scattering types. Some other forms of weight were attempted but they are not consistently suitable for either different materials or different orders of phonon scattering. More details of weight selection are shown in Supplementary Section 5.

To minimize the loss, we perform error back-propagation to renew model weights and biases, which is performed using the Adam optimizer. The mini-batch size, which is the number of samples per gradient update, is set to 2048. The early stopping technique is employed to mitigate overfitting and save the optimal model state during training. The predicted result is evaluated by the coefficient of determination (R2), which is defined as:

$${R}^{2}=1-\frac{\mathop{\sum }\limits_{i=1}^{N}{({y}_{i}-\hat{{y}_{i}})}^{2}}{\mathop{\sum }\limits_{i=1}^{N}{({y}_{i}-\bar{{y}_{i}})}^{2}}$$
(3)

where N is the size of the dataset and \(\bar{{y}_{i}}\) is the mean of the true value. It ranges between 0 and 1 where a higher value means better fitting performance.

When predicting scattering rates for the remaining scattering processes in phonon phase space using surrogate models, we set the maximum size of the mini-batch to be 220 considering the memory limit. For 4ph models, the total descriptors and scattering rates of all processes would largely exceed the maximum memory of computers due to the huge number of scattering processes. To deal with this problem, we generate the descriptors from file mode by mode and only evaluate \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) within this mode at one time. After calculating \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\), the memory is freed for the next phonon mode.

The prediction of relaxation time and lattice thermal conductivity

After getting \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }}^{{{{\rm{3ph}}}}}\) and \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\) based on RTA, we calculate \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) and \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) by considering all corresponding processes in the mode λ. We then send them back to ShengBTE to complete the calculation of κl with the same setting (q-mesh, broadening factor, etc.) as we generate the dataset. A detailed description of computing τλ and κl is described in Supplementary Section 1. The reported κl is based on the average results of six surrogates trained by different random splits of the training set. We calculated the MAPE to evaluate the prediction accuracy, which is given by:

$$M\,{APE}=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}\left\vert \frac{{y}_{i}-\hat{{y}_{i}}}{{y}_{i}}\right\vert$$
(4)

Transfer learning

We perform the transfer learning from 3ph to 4ph DNN models. To keep the same dimensions of input, some dummy inputs (called ‘virtual phonon’) are added to the descriptors of 3ph surrogate models and their corresponding value is set to be zero. When training the 4ph models, we employ the ‘warm start’ strategy by using the weights and biases of the modified 3ph models as the initialization to train the 4ph models. The prediction of \({\Gamma }_{\lambda {\lambda }^{{\prime} }{\lambda }^{{\prime\prime} }{\lambda }^{{\prime\prime} {\prime} }}^{{{{\rm{4ph}}}}}\), \({\tau }_{\lambda }^{{{{\rm{4ph}}}}}\) and \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\) is the same with the previous workflow. We keep using MAPE as the metric to compare the performance of transfer learning models with the surrogate models.

For LiCoO2, we use transfer learning to further accelerate the prediction of \({\kappa }_{{{{\rm{l}}}}}^{{{{\rm{3ph+4ph}}}}}\). The number of data sampled from different phonon modes is set to be 2000. Compared with the previous sampling number in 4ph surrogate models (20,000 from each mode), we achieve roughly 90% of time reduction for generating training set and training DNN.

Computational cost

Our analytical BTE calculations are done using the ShengBTE package integrated with the FourPhonon module, on Purdue University Rosen Center for Advanced Computing (RCAC) bell cluster, which provides AMD EPYC 7502 32-Core Processor. The reported time is CPU time, which is the cumulative time of all CPU cores working on the job. For 3ph analytical results, we perform serial computing using 1 CPU core. For 4ph analytical results, the wall time for serial computing is unaffordable, so we do parallel computing using 128 CPU cores, which may slightly increase the total CPU time compared with serial computing. The prediction with the surrogate model is performed on the RCAC Gilbreth cluster, which provides Nvidia A30 GPU and Intel(R) Xeon(R) Silver 4114 CPU. We perform serial computing using 1 CPU core and 1 GPU. The reported time is CPU time plus GPU time. We also report the computational cost of running the surrogate model on the CPU core (Supplementary Section 9), which is performed on the RCAC bell cluster with 1 CPU core.

As for the computational cost of every procedure, we divide the process of predicting κl into generating data, training the model, calculating scattering rates and others. Others include reading datasets, preparing training data, calculating τλ, calculating κl, etc. For the 3ph+4ph BTE workflow, the computational cost for analytically calculating \({\tau }_{\lambda }^{{{{\rm{3ph}}}}}\) is also included. We multiply the total CPU time of the analytical calculation with the proportion of training set in the phonon phase space to get the time of data generation.