Physical and chemical descriptors for predicting interfacial thermal resistance

Heat transfer at interfaces plays a critical role in material design and device performance. Higher interfacial thermal resistances (ITRs) affect the device efficiency and increase the energy consumption. Conversely, higher ITRs can enhance the figure of merit of thermoelectric materials by achieving ultra-low thermal conductivity via nanostructuring. This study proposes a dataset of descriptors for predicting the ITRs. The dataset includes two parts: one part consists of ITRs data collected from 87 experimental papers and the other part consists of the descriptors of 289 materials, which can construct over 80,000 pair-material systems for ITRs prediction. The former part is composed of over 1300 data points of metal/nonmetal, nonmetal/nonmetal, and metal/metal interfaces. The latter part consists of physical and chemical properties that are highly correlated to the ITRs. The synthesis method of the materials and the thermal measurement technique are also recorded in the dataset for further analyses. These datasets can be applied not only to ITRs predictions but also to thermal-property predictions or heat transfer on various material systems.


Methods
The ITR data were collected from the experimental data in 87 published papers 1, ; some of them were extracted from plots via WebPlotDigitizer (https://automeris.io/WebPlotDigitizer) 93 . The interfacial thermal resistance (10 −9 m 2 K/W), thermal boundary conductance (MW/m 2 K), material system of the interfaces in chemical formula (e.g., Bi/Si), temperature (K), and film thickness were compiled. Moreover, the associated preparation methods for the materials, such as sputtering and evaporation, measurement methods of ITR, pretreatment of substrates, and other details concerning the interfaces, were collected if they were mentioned in the references.
The descriptor dataset includes the specific heat capacity, melting point, density, unit cell volume, electronegativity (EN), ionic potential (IP), atomic ratio (R), mass, atomic coordinate (AC), and binding energy (E b ) of 298 materials. The atomic ratios of the compounds for the first and second elements were defined as R 1 and R 2 , respectively. For example, for SiO 2 , R 1 and R 2 are 1 and 2, respectively. AC represents the atomic coordinates defined in the periodic table, with the group as the x-coordinate and the period as the y-coordinate, e.g., (AC i x, AC i y), where i represents the order of the elements of the compound. For example, for GaN, the coordinates of (AC 1 x, AC 1 y) and (AC 2 x, AC 2 y) are (13,4) and (15,2), respectively.

Fig. 1
A schematic overview of the ITR and descriptor datasets. The ITR dataset includes experimental data collected from 87 papers, the experimental conditions, and their reference details. The descriptor datasets are composed of the physical and chemical descriptors of different materials that can be used for data training and/ or prediction via machine learning.
The specific heat capacity was collected from the TPRC data series 94 ; the melting point, density, and unit cell volume were collected from AtomWork-Adv by the National Institute for materials Science (NIMS) (https:// atomwork-adv.nims.go.jp/) 95 ; EA, IP, and the mass were collected from the periodic table via the Pauling scale and National Institute of Standards and Technology (NIST) 96,97 ; and E b was calculated from the total energy of relaxed crystal structure of compound, which was collected in the Computational Electronic Structure Database (CompES-X) 98 . CompES-X is a database of electronic structures predicted by the first-principle calculations for mono-element and multi-element crystalline inorganic compounds based on experimental data of crystal structures. The total energies of constituent atoms can be found in the atom_energy_vasp sheet at https://doi. org/10.5281/zenodo.3564173 99 , in which the isolated atom was simulated by putting one atom in a cubic supercell with a length of 15 Å and was calculated using the same computational method as the one for compounds in CompES-X. For example, the binding energy of TiO 2 , E b [TiO 2 ], is calculated according to Eq. (1).  Table 1 for example: all the Au/SiO 2 /Si data from ids 1 to 5 used the same sample measured at different temperatures from 100 K to 296 K; therefore, the interface ids are all defined as being the same. Each interface is depicted by its chemical formula or name separated by a slash, for example, Al/Si, as shown in Table 1. To input the data for machine learning, there are six materials that use abbreviations in the "Film 1" and "Film 2" columns; C for diamond, gp-C for graphene, g-C for graphite, a-SiO 2 for glass, SiO 2 for quartz, and Al 2 O 3 for sapphire. Note that most of the Film 2 entries are substrates based directly on the commonly used measurement methods, such as time domain thermoreflectance (TDTR) or frequency domain thermoreflectance (FDTR) 55,83 . For some of the others, the Film 2 entry is not the substrate itself and the ITR values at the Film 2/substrate have been extracted or eliminated from the total resistance. Accordingly, the materials in the "Film 2" and "substrate details" columns of some interfaces are not consistent, such as those of Au/TiO 2 and Au/a-SiO 2 in Table 1. The interlayer column reflects whether an interlayer is id interface id Interface interlayer ITR (10 −9 m 2 K/W) www.nature.com/scientificdata www.nature.com/scientificdata/ present between the materials (Film 1/Film 2) at the interface; this value is either 1 or 0 (the former if an interlayer is present, and the latter if interlayers are absent) For example, the interlayers of Cr/Si and Cr/a-Si/Si in Table 1 are defined as 0 and 1, respectively. The interlayer includes the adhesion layer, a naturally or thermally formed oxidation layer (e.g., Au/SiO 2 /Si in Table 1) 55 , and the surface plasma treatment (e.g., the Bi/H-diamond in Table 1), which forms interlayers or a mixed region between the materials instead of a clear interface. The information concerning the experimental and interfacial conditions can be found in the substrate pretreatment columns, and other interfacial properties can be found in the file of "ITR dataset" at https://doi.org/10.5281/zenodo.3564173 99 . Further details can be found in the "ITR Reference" sheet using the reference-tracking id (id-R).
Descriptor dataset. The descriptor dataset is composed of the physical and chemical descriptors of 298 materials. The former includes the specific heat capacity, melting point, density, unit cell volume, and mass, and the latter includes the electronegativity (EN), IP, atomic ratio (R), atomic coordinate (AC), and binding energy (E b ). The materials are single element or binary compounds and are assigned a material id (id-M), as shown in Table 2. The units for the specific heat capacity, melting point, density, unit cell volume, mass, IP, and E b are J/gK, K, g/cm 3 , 10 −29 m 3 /formula unit (f.u.), u, eV, and eV/f.u., respectively; while the other quantities are dimensionless.

technical Validation
In this section, we present the statistical analyses and experimental variations of the ITR dataset and use the data selection of the ITR prediction as an example. First, the experimental data distribution is demonstrated in Fig. 2. Most of the material systems show small standard deviations, and Al/Si has the largest amount of data at 106 points. Al and Au have high percentages as film materials in the dataset because these materials are commonly used as heat transducer layers to absorb laser heat via TDTR and FDTR measurements 55,83 . Of the material systems, Au/Si has the largest standard deviation, which can be attributed to its unique experimental conditions including heavy ion bombardment or plasma treatment 62   www.nature.com/scientificdata www.nature.com/scientificdata/ lead to a big challenge on the data training. Except for data with special treatments, the heat transport modes and main carriers of the metal/metal interface or two-dimensional (2D) materials are different compared to the metal/ nonmetal interface materials. Therefore, the material systems composed of 2D materials, such as graphene and metal/metal, or materials that have no exact composition ratio, were removed from the dataset for the ITR prediction model. However, the data selection criteria change depending on the purpose. If one focuses on thermal transport at metal/diamond, Si, or sapphire interfaces, then surface treatments such as H-plasma or bombardment would be helpful for broader considerations and comparisons.
To further verify the ITR data for other specific thermal analysis, the ITR data distribution with and without an interlayer are shown in Figs. 3 and 4, respectively. The ITR data without an interlayer are categorized into three groups of metal/metal, metal/nonmetal, and nonmetal/nonmetal in Fig. 3. ITR decreases for the most part with increasing temperature in Fig. 3(a), and the ITR values of metal/metal are two to four orders lower than those of metal/nonmetal and nonmetal/nonmetal. In Fig. 3(b), a thickness dependence is not obvious for the different groups and a thickness near 100 nm is most commonly used due to laser absorption depth considerations. The ITR data organized into seven different interlayer groups versus the temperature are shown in Fig. 4. Even though the ITR values depend on the different material systems, the interlayer materials affect the ITR values as well: the 2D material group (including graphene) has relatively higher ITR values while the metal group tends to have lower ITR values. Fig. 3 The ITR data distribution without an interlayer. The ITR data distribution versus the temperature and the film 1 thickness are shown in (a,b), respectively. The data include three types of material systems: metal/ metal in red, metal/nonmetal in blue, and nonmetal/nonmetal in yellow.

Usage Notes
A description of the two datasets, the ITR and descriptor datasets, as well as the calculated total energy of isolated atoms via first-principle calculations (atom_energy_vasp), are provided. Further, the training data for the ITR machine-learning model are furnished under the file name "training dataset for ITR prediction" and can be directly used as training data for ITR predictions. Accordingly, the archive contains of four files with their depicted content, units, and sheets is shown in Online-only Table 1. This table can assist in searching for the data locations for broad thermal management; in addition, each ITR data point can be tracked via its reference id (id-R) in the "ITR References" sheet for further information. All the datasets can be found in https://doi. org/10.5281/zenodo.3564173 99 .
The datasets can be applied for flexible research purposes as mentioned above in the section of Background & Summary, here we take predicting ITR as an example. The construction steps are simply described in the following: (1) The target of ITR and the descriptors which are related to ITR should be input for training the machine learning model. Taking the interface of Al/Si as one example, the experimental ITR at different temperature (if available in papers) and the chemical, physical descriptors of both Al and Si should be collected. (2) The file "training dataset for ITR prediction" in https://doi.org/10.5281/zenodo.3564173 99 , which includes the experimental ITR data and materials' descriptors, can be used as training dataset directly. (3) And then training the model by tuning the hyper parameters via cross validation. The machine learning model is usually evaluated by the mean square error and R 2 . (4) Once you achieve good predictive performance, you can input various material systems such as Si/Ge with specific temperature, film thickness and their properties for prediction. (5) The potential candidates from the prediction could be further analyzed via experiments or simulation.
The details of descriptor selection, algorithm selection, and prediction analysis for the ITR machine-learning model and its applications can be found in our previous studies 1, 6 . Before applying the training dataset, "training dataset for ITR prediction, " we provided, there are some prerequisite restrictions you should consider corresponding to your research: (1) The training data excluded the metal/metal interface, two-dimensional (2D) materials, materials that have no exact composition ratio, and the interfaces with special treatments such as heavy ion bombardment from the original file "ITR dataset". (2) The chemical and physical descriptors were collected from data platform (AtomWork-Adv) 95 or handbooks (TPRC data series) 94 due to the limited information from the original papers. Therefore, there may be some mismatch between the materials and their descriptors, such as density and unit cell volume. (3) The data distribution is different corresponding to various material system or samples. For example, the data number of Al/Si is much more than other material systems. Besides, the ITR dataset contains 1318 data composed of only 457 interface samples because some samples have many ITR data points corresponding to different temperatures. For the prediction purpose, the temperature could be calibrated to prevent the data distortion.