Dimension reduction of thermoelectric properties using barycentric polynomial interpolation at Chebyshev nodes

The thermoelectric properties (TEPs), consisting of Seebeck coefficient, electrical resistivity and thermal conductivity, are infinite-dimensional vectors because they depend on temperature. Accordingly, a projection of them into a finite-dimensional space is inevitable for use in computers. In this paper, as a dimension reduction method, we validate the use of high-order polynomial interpolation of TEPs at Chebyshev nodes of the second kind. To avoid the numerical instability of high order Lagrange polynomial interpolation, we use the barycentric formula. The numerical tests on 276 sets of published TEPs show at least 8 nodes are recommended to preserve the positivity of electrical resistivity and thermal conductivity. With 11 nodes, the interpolation causes about 2% error in TEPs and only 0.4% error in thermoelectric generator module performance. The robustness of our method against noise in TEPs is also tested; as the relative error caused by the interpolation of TEPs is almost the same as the relative size of noise, the interpolation does not cause unnecessarily high oscillation at unsampled points. The accuracy and robustness of the interpolation indicate digitizing infinite-dimensional univariate material data is practicable with tens or less data points. Furthermore, since a large interpolation error comes from a drastic change of data, the interpolation can be used to detect an anomaly such as a phase transition.

Scientific RepoRtS | (2020) 10:13456 | https://doi.org/10.1038/s41598-020-70320-7 www.nature.com/scientificreports/ can accept only a finite number of values as input. Therefore it is unavoidable to describe the TEPs with a finite number of values, i.e., to project the infinite-dimensional material properties into a finite-dimensional space and reconstruct them. One way to reduce the dimension is to extract TEP values at a finite number of temperature values. Then the full TEP curves are reconstructed by interpolation which preserves the raw data. Suppose n + 1 sample values f j , j = 0, 1, . . . , n, of a TEP are extracted at n + 1 distinct temperature values T j , j = 0, 1, . . . , n where T j 's are strictly increasing: T 0 < T 1 < · · · < T n . Among the many interpolation methods, here we focus on polynomial interpolation because it is computationally cheap, and the derivatives and integrals of polynomials are directly obtainable. The ease of differentiation can help to calculate significant transport properties such as the effective masses of electrons and holes. A well-known formula for polynomial interpolation is the Lagrange formula: where ℓ j is the Lagrange polynomial which satisfies ℓ j (T j ) = 1 and ℓ j (T k ) = 0 for j = k . The subscript n of the p n (T) denotes the degree of the polynomial.
A popular choice of T j is equidistant nodes: However, the polynomial interpolation at equidistant nodes generates superfluous oscillations near the boundary of the interval [T 0 , T n ] for large n's and even diverges as n → ∞ , as Runge 3 first proved with the function 5] . The Runge's phenomenon arises naturally for many continuous curves. As an example, consider a Ag-doped Mg 2 Si 0.6 Ge 0.4 thermoelectric material in 4 . The top of Fig. 1 shows the polynomial interpolation of the TEPs highly deviates from the exact curve near the boundaries of the temperature intervals.
To alleviate the Runge's phenomenon, the choice of sample nodes T j is critical; the density of sample points should follow an asymptotic density proportional to (1 − x 2 ) −1/2 when the interval of x is [−1, 1] . Hence the density should be higher near the boundaries of the interval than the inside. One of such a choice is Chebyshev nodes of the second kind: A mathematical theory 5 shows the Runge's phenomenon would not be severe under the use of the Chebyshev nodes. The bottom of Fig. 1 shows that the polynomial interpolation at the Chebyshev nodes gives a substantially better result than the top of Fig. 1, overcoming the Runge's phenomenon.
But still there is a computational issue in the Lagrange formula. When n is large, the direct computation of the Lagrange formula (1) is numerically unstable due to the high degree of the Lagrange polynomials (2); the numerator of ℓ j (T) essentially contains the T n term so even with a moderate T, the numerator becomes too large to evaluate for large n. We use the barycentric formula of polynomial interpolation 6,7 as a numerically stable algorithm.
In this paper, using experimental thermoelectric data, we validate the use of the barycentric polynomial interpolation at the Chebyshev nodes of the second kind as an accurate dimension reduction method for thermoelectric material property curves. The interpolation is tested for 276 sets of TEPs acquired from published papers. Information on the TEP dataset can be found in Supplementary Information (SI). In the following section, the barycentric formula and its derivation are given. In subsequent sections, the accuracy of the interpolation on TEPs and module performance (figure of merit zT, power density, and efficiency) is studied. Then the effect of noise on the accuracy is tested. We conclude that the interpolation is accurate and robust for continuous TEPs, allowing its further application into various curves of scientific data.

Methods
Barycentric formula of polynomial interpolation. Since the barycentric formula has not been emphasized in elementary numerical analysis, here we include its derivation by following 8 . Let us define ℓ(T) := n k=0 (T − T k ) and the barycentric weights by Then obviously ℓ j (T) = ℓ(T) w j T−T j from (2). Hence from the Lagrange formula (1) we have (3) T j := T 0 − T n 2 cos jπ n + T 0 + T n 2 , j = 0, 1, . . . , n.
(4) Since the numerator and denominator in (6) have the same barycentric weights w j , any scaling of (4) can be used instead. For equidistant nodes, the barycentric weights can be explicitly computed by w j = (−1) j n j with a proper scaling 8 , where n j is the binomial coefficient. For the Chebyshev nodes (3), with a proper scaling 9 . This simplicity of w j 's makes the choice of the Chebyshev nodes (3) particularly intriguing among other choices of nodes. The barycentric polynomial interpolation (6) with (7) is explicit, hence its computational cost is cheap. Due to the singular term 1/(T − T j ) in (6), the barycentric formula (6) is not defined at T = T j and need to be specially treated as the sample value f j . However, when T ≃ T j , because the numerator and the denominator have the same singular terms 1/(T − T j ) , the inaccuracies due to the singular terms may cancel out 8 . If we use www.nature.com/scientificreports/ the Chebyshev nodes (3), the barycentric formula is indeed numerically forward stable 10 and no severe inaccuracy arises at T ≃ T j . The derivative of the barycentric formula (6) can be written in the same form. Using the barycentric representation of ℓ j (T) , we can show 8 that Considering the p ′ n (T i ) above as a new sample value at T i , we have the barycentric formula of p ′ n (T) just replacing f i in (6) by p ′ n (T i ) in (8):

Accuracy of interpolation.
We test the barycentric polynomial interpolation (6) at the Chebyshev nodes (3) by reconstructing 276 sets of TEPs from published papers. The list of the papers are given in the SI. The TEP dataset was previously used to validate a theory of thermoelectric conversion efficiency in 11,12 .
To assess the accuracy of interpolating curves, an exact curve should be known but this is not possible since the determination of the exact curve requires an infinite (uncountable) number of measurements. Hence we assume that Seebeck coefficient curve α(T) is given by a second-order spline (a spline is a piecewise polynomial; see, e.g., 13 ), and electrical resistivity ρ(T) and thermal conductivity κ(T) curves are given by first-order splines (i.e., piecewise linear curves). With this assumption, only one exact curve is obtained for each TEP from the raw data points. We use a second-order spline for α because our evaluation of thermoelectric module performance requires the temperature derivative of Seebeck coefficient α ′ (T) = dα dT (T) ; this point will be clear in the next section. We use first-order splines for ρ and κ to secure the strict positivity of ρ and κ ; higher-order splines can give unphysical properties of zero or negative ρ and κ due to superfluous oscillations. Also note that the choice of the piecewise linear exact curve makes polynomial interpolation even harder, compared to the choice of higher-order splines; it would not be an easy task for smooth polynomials to imitate non-differentiable piecewise linear curves.
The top of Fig. 2 shows the superiority of Chebyshev nodes over equidistant nodes. In the figure, the relative error is measured by the L 1 -norm: where f is an exact function and f is an interpolating function. As the number of nodes n increases, the error of the interpolation at Chebyshev nodes consistently decreases. With 11 Chebyshev nodes one may expect relative L 1 -norm errors of 0.5% for α and κ , and 1% for ρ ; see the bottom of Fig. 2. Meanwhile, the error of the interpolation at equidistant nodes significantly increases for large n's: the error is at the minimum with 7 nodes and exceeds 6% with 16 nodes.
As shown in the top of Fig. 3, the superiority of Chebyshev nodes is more apparent if the relative L ∞ -norm is considered. The error for equidistant nodes grows serious and exceeds 50% with 16 nodes, while the error for Chebyshev nodes consistently decreases. With 11 Chebyshev nodes, one may expect relative L ∞ -norm errors of 2% for α and κ , and 2.5% for ρ , as shown in the bottom of Fig. 3.

Accuracy of module performance.
Here we examine how much error in performance of thermoelectric modules is caused by the polynomial interpolation of α(T) , ρ(T) and κ(T) . We consider a single-material singleleg thermoelectric power generation module with the length L of 1 mm and cross-sectional area A of 1 mm 2 . Then the temperature distribution T(x) inside the module with a spatial coordinate x ∈ [0, L] is given by the following second-order ordinary differential equation called the thermoelectric equation (for derivation, refer to 2 ): www.nature.com/scientificreports/ where J is a given electric current density: J = I/A for a given electric current I. We assume the thermoelectric module is under fixed temperatures at the boundaries: T(0) = T h and T(L) = T c . The hot-side temperature T h and the cold-side temperature T c are chosen as the maximum and minimum temperature values in the TEP data where all the α(T), ρ(T) and κ(T) are available. As before, the exact curves are assumed to be a second-order spline for α(T) , and first-order splines for ρ(T) and κ(T) . We avoided using a first-order spline for α(T) because the thermoelectric equation (9) contains the derivative of α(T) ; if α(T) is a first-order spline, then its derivative is discontinuous so the computation of a numerical solution of (9) becomes difficult. The power P generated by the thermoelectric module is given by where the V OC is the open-circuit voltage and R is the electrical resistance inside the module: The energy conversion efficiency of the module is given by Since the power and the efficiency depend on the given electric current I, we can maximize the power or efficiency by choosing a suitable I. Such maximum values are referred as the maximum power and maximum efficiency. Another popular performance parameter is the thermoelectric figure of merit zT. Here the z is defined by The zT m := z T h +T c 2 is proportional to the efficiency for temperature-independent TEPs 14 . Although the proportional relation is no longer valid for temperature-dependent TEPs 11,12,[15][16][17] , the zT has been widely used to assess thermoelectric materials due to the conciseness of the formula. Because the zT depends on T, the maximum value of zT over T ∈ [T c , T h ] is often used for material evalulation.  www.nature.com/scientificreports/ Here we consider three performance parameters of thermoelectric power generation modules: the maximum power density P/A, maximum efficiency η , and maximum zT. The errors in those parameters, caused by the interpolation, are given in Fig. 4. In the figure, 8 or more nodes are considered because fewer nodes did not guarantee the positivity of ρ(T) for some of 276 TEPs. At least 8 nodes are recommended to secure the strict positivity of ρ(T) and κ(T) . With 11 Chebyshev nodes, one may expect the relative error of 0.4% for the maximum power density, maximum efficiency, and the corresponding electric currents where the maximum values are attained. The low error is in accordance with the L 1 -norm error in the interpolation of TEPs. This is not surprising because  www.nature.com/scientificreports/ the solution T(x) of (9) is mainly affected by integrated quantities of TEPs rather than the TEP themselves; refer to an integral formulation of the thermoelectric equation in 11,12,15 . On the contrary, the errors in the maximum zT and the corresponding T show a higher error of 1%. Since the zT depends on the TEPs directly, the error is in accordance with the L ∞ -norm error in the interpolation of TEPs.

Robustness on noise.
We have assumed so far the values extracted at Chebyshev nodes are exact. If there is noise in the sampling, how much the accuracy of the interpolation is affected? To assess the robustness on the noise, we add random noise on the extracted value in (6) by replacing f j with where U (a, b) is the uniform random variable of which probability density function is given by ∈ (a, b) and f U (x) ≡ 0 otherwise. Let us call this process p% noise for convenience.
The Fig. 5 shows the error caused by the interpolation of TEPs linearly increases with the degree of noise. The percentage of the error is almost the same as the percentage of noise. This implies the interpolation does not cause unnecessarily high oscillation at unsampled points even there is sampling noise. A large number of nodes is slightly detrimental when there is a high degree of noise. This is because a high order polynomial needlessly struggles for interpolating the noise-added curves. But such degradation due to the choice of the number of nodes is less than 1%. The Fig. 6 shows similar, linearly increasing error trends for module performance. The result is better for maximum power density and maximum efficiency; the percentage of error is below half of the percentage of noise. For example, with 11 Chebyshev nodes, the average relative error in maximum power density and maximum efficiency is below 2% under 5% noise. On the other hand, the result for the maximum zT is worse; the percentage of error is about one and a half of the percentage of noise. Also note that the error trends for the maximum zT are only valid when the interpolation preserves the positivity of ρ and κ . If it does not due to the noise, the error soars because ρ × κ is the denominator of z as in (10). But negative ρ and κ are so unphysical and conspicuous that one would fix them by resampling. For that reason, we deliberately avoided such negative ρ and κ cases in the numerical simulation.

Physical implication of small and large interpolation error cases. We have verified that tens or less
Chebyshev nodes provide an accurate polynomial interpolation on average. But even a smaller number of nodes is enough for many TEP curves. For example, with 7 nodes, 45% of the interpolation results has less than 1.5% error in the relative L ∞ -norm, and 80% of the results has less than 4.8% error (see Table S17 in SI). Let us call such well-interpolated curves normal and the other curves abnormal.
As an example of a normal curve, Fig. 7 shows that the nanostructured Bi-Sb-Te bulk alloys 18 have slowly varying behavior of measured TEP values in the observed temperature range smaller than 550K. The electron and phonon Boltzmann transport equations (BTE) with the relaxation time approximation (RTA) within firstprinciples calculations 19-21 may predict the TEP curves of this material; refer to SI for the details of the method. The predicted curves in Fig. 7 not only follow the variation on temperature but also have a reliable size of TEP values. Within this semi-classical theory, the electron and phonon quasiparticles are responsible for the charge and heat transport in crystalline solids 22 . Within the RTA, the interaction between fundamental particles and various imperfections lead to the particle scattering and enhanced resistivity [23][24][25] . Hence, when there is no phase transition by temperature change, the thermoelectric properties predicted by the BTE with RTA are smooth functions of temperature T because the charge and heat carrier densities as well as the relaxation time are smooth functions of T. The strong agreement between the measured and computed TEPs suggests that the nanostructured Bi-Sb-Te bulk alloys in 18 have no material phase transition and their TEP curves are smooth. www.nature.com/scientificreports/ On the other hand, abnormal TEP curves have a large interpolation error with a small number of nodes because they undergo a drastic change with temperature. As a phase transition can be manifested through a drastic change or a discontinuity in TEP curves, temperature points where a large interpolation error occurs can be a phase transition temperature. Using this idea, we search for large interpolation error points to detect phase transition behaviors. Figure 8(a)-(c) show three examples of phase transition successfully detected by this method without using any domain knowledge. The detailed algorithm is given in SI.
In contrast to Bi 2 Te 3 -based thermoelectric materials, other tellurides can have phase transitions. One case is the Ag 2 Te-based materials: at room temperature Ag 2 Te has a low symmetric monoclinic phase and it transforms into a high temperature fcc phase above 417 K 26 . Consistent with this fact, our algorithm detects an abnormal temperature for Ag 2 Se 0.5 Te 0.5 27 near 380 to 400 K; see Fig. 8(a). The drastic change of the TEP curves occurs in a  www.nature.com/scientificreports/ narrow temperature range. It is argued that due to the alloying effect between Ag 2 Te and Ag 2 Se , the phase transition seems to occur gradually in the narrow temperature range from 397 to 424 K 28,29 . Noteworthily, such phasetransition-induced abnormal transport behaviors are detected for all Ag 2 Te-containing materials in our dataset. Figure 8(b) shows the abnormal transport behavior of Cu 2 Se 30 known to exhibit a continuous phase transition. The continuous change of lattice angle in the temperature range from 340 to 410 K was confirmed by HTXRD data 30 . The key temperature points are well detected by our algorithm. www.nature.com/scientificreports/ SnSe is one of the well-known thermoelectric materials having high zT of 2.6 along the b-direction in the single crystalline phase 31 . Figure 8(c) shows our data analysis detects several abnormal temperature points for this material. There are two important temperature points, ∼ 600 K and ∼ 800 K, where the slopes of Seebeck coefficient and electrical resistivity rapidly change. The detected temperature point in 600-800 K well coincides with the range under the thermally activated generation of hole charge carriers in SnSe reported in 32 . From density functional theory calculations, it is shown that the formation of acceptor Sn-vacancy is responsible for the increasing electrical conductivity with temperature 32 : n h = N exp(−E form /(kT)) where n h is the hole carrier density, N is the site density of Sn, and E form is the defect formation energy of Sn-vacancy. The next critical point is about 800 K. The origin of the derivative discontinuity at near T = 800 K is considered to be the phase transition from Pnma to Cmcm 31,32 . Mathematical interpretation of small and large interpolation error cases. We have observed that many TEP curves are accurately interpolated with even a small number of Chebyshev nodes, and a large interpolation may indicate a drastic change in the curves due to a phase transition. There is a mathematical basis for why the polynomial interpolation can perform well for many TEPs. For a given continuous function on a closed interval, there is a polynomial arbitrarily close to the given function in the L ∞ -norm, by the Stone-Weierstrass theorem (see, e.g., Theorem 7.26 in 33 ). The issue is to find such a polynomial under a limited number of sampling nodes, and our approach here is the sampling at Chebyshev nodes.
On the other hand, for a discontinuous function, there is no reason for a polynomial interpolation to work. The right-hand side shows the error depends on the normalized highestorder derivative f (n+1) . This explains why a drastically changing or discontinuous curve ruins the interpolation. Also note that the error does not depend on the size of the interval T n − T 0 as long as the normalized highestorder derivative is the same. Hence the error is affected by the oscillation shape of the curves rather than the size of the interval. If the function f is continuous, there is a high-order polynomial arbitrary close to the f. As a polynomial has a high-order derivative, our Chebyshev-node interpolation becomes close to this polynomial as the number of nodes becomes large. Therefore by using more nodes, one can accurately recover the continuous function f. But for a discontinuous function, the discontinuity is not treatable with a high-order polynomial. A simple method to handle a drastically changing or discontinuous curve is to first spot the problematic points and to apply our interpolation method in each interval where the curve is continuous and midly changing. Some examples of this approach are given in Fig. 8(a)-(c); the interpolation consisting of multiple polynomials (red lines) are noticeably improved over the single-polynomial interpolation (dark blue dashed lines). The problematic points used there is the previously found abnormal temperature points.

conclusion
To reduce the infinite dimension of TEPs into a finite dimension, we propose the use of high-order polynomial interpolation at Chebyshev nodes of the second kind. To evaluate polynomials in a numerically stable way, the barycentric formula is used. The tests on 276 sets of published TEPs show our interpolation method is accurate, and robust on noise. For example, with 11 Chebyshev nodes, the error caused by the interpolation is about 2% for TEPs and 0.4% for module performance parameters of maximum power density and maximum efficiency. Even if there is noise in sampling, the interpolation does not cause any further error beyond the degree of noise.
While a small number of nodes is enough for many TEP curves, drastically changing or discontinuous curves can have a high interpolation error. A simple remedy to enhance the accuracy is to confine the temperature range to where the set of TEPs is continuous and midly changing. Meanwhile, as the drastic change may indicate a phase transition, the interpolation method can be used to detect an unidentified phase transition.
Since our polynomial interpolation method is robust on noise and presumes no physical model, it would perform well for various curves other than TEPs. The method projects infinite-dimensional vectors acquired from scientific experiments into a finite-dimensional space. Then a smooth, raw-data-preserving curve is efficiently constructed using the barycentric formula. Our empirical validation shows that the dimension reduction method allows digitizing infinite-dimensional univariate material data, with tens or less data points. Hence the method enables using the infinite-dimensional data for material informatics and machine learning. www.nature.com/scientificreports/ computational method All the numerical computations in this paper are performed using the python programming language (version 3.6). The thermoelectric equation (9) with the Dirichlet boundary condition is solved by the fourth-order collocation method implemented in the solve_bvp function 34 of SciPy library (version 1.2.1). The maximum power density and maximum efficiency are found by the Brent-Dekker optimization method implemented in the minimize_scalar function 35 of the SciPy library.

Data availability
The data generated during the current study is available in the GitHub repository https ://githu b.com/jaywa n-chung /tep-cheby shev. More information on the TEP dataset is given in SI. The numeric values of the confidence intervals and standard deviations in Fig. 2, 3, 4, 5 and 6 are given in SI.

code availability
The code used to generate data and figures in this paper is available in the GitHub repository https ://githu b.com/ jaywa n-chung /tep-cheby shev.