Combining thermodynamics with tensor completion techniques to enable multicomponent microstructure prediction

Multicomponent alloys show intricate microstructure evolution, providing materials engineers with a nearly inexhaustible variety of solutions to enhance material properties. Multicomponent microstructure evolution simulations are indispensable to exploit these opportunities. These simulations, however, require the handling of high-dimensional and prohibitively large data sets of thermodynamic quantities, of which the size grows exponentially with the number of elements in the alloy, making it virtually impossible to handle the effects of four or more elements. In this paper, we introduce the use of tensor completion for high-dimensional data sets in materials science as a general and elegant solution to this problem. We show that we can obtain an accurate representation of the composition dependence of high-dimensional thermodynamic quantities, and that the decomposed tensor representation can be evaluated very efficiently in microstructure simulations. This realization enables true multicomponent thermodynamic and microstructure modeling for alloy design.


INTRODUCTION
A number of recent discoveries has largely increased the interest in multicomponent alloy design. High entropy or multi-principle element alloys, for example, can give access to a wide range of properties and combinations of properties that cannot be obtained in alloys based on a single major element. 1 Moreover, the large range over which the diffusion properties of the different elements can vary gives rise to complex precipitation, interdiffusion, and coarsening paths, [2][3][4] resulting in new ways to stabilize metastable phases or structures for extended times. 5 Adding too many alloying elements may, however, also unnecessarily complicate material recovery and recycling. 6 Hence, in order to exploit in an optimal and responsible way the opportunities multicomponent alloy design can bring, a profound understanding of the effects of compositional variations and heat treatment on microstructure evolution in these alloys is required. Microstructure simulations are indispensable to obtain these insights given the immense parameter space, including not only the current temperature and the amount of each element, but also many parameters describing the history of the material, such as cooling rate and the duration of a heat treatment or mechanical loading.
Phase-field models (PFM) are generally considered as most suitable to study multicomponent microstructure evolution as diffusion and phase morphology often play a crucial role. [7][8][9] They describe complex morphological changes such as precipitate shape evolution during growth and coarsening and dendrite growth. The effects of various competing thermodynamic driving forces and transport processes can be considered. For multicomponent alloys, the chemical bulk driving force is an important contribution. CALPHAD thermodynamic models are usually adopted to describe the composition and temperaturedependent Gibbs free energy expressions of the different phases. For many technologically relevant alloy systems, CALPHAD models have been assessed to reproduce experimental, theoretical and ab initio data on thermodynamic quantities and phase diagrams. [10][11][12] Different approaches have been proposed to include the CALPHAD thermodynamic descriptions in PFM. [13][14][15][16][17][18][19][20][21][22][23][24][25][26] A classification and discussion of the different approaches is given further in the paper. However, to date, only a few results for quaternary or higher-order systems are reported in the literature. The reason is that the proposed methods were either for a specific case and cannot be generalised towards systems containing other types of phases, or become prohibitively complex or computation and data-intensive when the number of elements in the alloy is increased. There is thus no robust simulation approach available facilitating in silico microstructure design for multicomponent alloys.
Here, we introduce the use of tensor completion [27][28][29][30] in materials science to enable the representation and use of highdimensional data sets efficiently. More specifically, we used a canonical polyadic decomposition (CPD) with the factor vectors constrained to polynomial expressions 31 to include highdimensional thermodynamic data sets obtained from a CALPHAD model in phase-field simulations. Using CPD, the multicomponent composition dependence can be described using a limited number of coefficients. We show that the accuracy obtained with the proposed approach is comparable or better than that obtained with previous approaches, whereas the applicability is general and the computational cost to evaluate data is low. Moreover, the addition of more elements does not lead to a substantial increase of the complexity, and the number of coefficients and the computational cost to evaluate a CPD depend only linearly on the number of components in the system, in contrast to an exponential increase of the amount of thermodynamic data represented. The observed advantages will thus become increasingly prominent when adding more elements. The capabilities of the approach are illustrated for phase-field simulations of spinodal decomposition and subsequent coarsening, an important phenomenon in multicomponent alloys, 32 for Ag-Cu-Ni-Sn liquid alloys.

RESULTS AND DISCUSSION
Tensor completion of thermodynamic data In material science, we are familiar with the tensorial character of direction-dependent material properties, like stresses and strains. However, in mathematical engineering, the term tensor has been used much more widely. In general, tensors are higher-order generalisations of vectors (first-order tensors) and matrices (second-order tensors). 33,34 They can be seen as multiway arrays of any order holding numerical values. The term thermodynamic data tensor (TDT) is introduced here to denote a tensor that is obtained by calculating and storing the data set of any thermodynamic quantity of a phase, evaluated from a CALPHAD model. The TDTs typically needed in PFM implementations are the molar Gibbs free energy, G, the diffusion potential of each component n, P ðnÞ , and the derivative of the diffusion potential of the component n with respect to the molar fraction of components m, D ðn;mÞ .
Typically, the molar fractions x n of the components are used to indicate an alloy composition. For an alloy with C components, the composition dependence of any thermodynamic quantity can be expressed as a function of C À 1 molar fractions x n , with n ¼ 1; ; C À 1 referring to C À 1 of the C components. The order N of the TDT, this is the number of dimensions of the tensor, then equals C À 1. A graphical illustration of a third-order TDT, representing composition dependence of a thermodynamic quantity is shown in Fig. 1b. As by definition x C þ P N n¼1 x n ¼ 1 and 0 < x n < ð1 À x C Þ, tensor entries (these are the values in the tensor) for a combination of molar fraction values disobeying this constraint are not physical and cannot be calculated. A TDT is therefore necessarily incomplete. Besides composition dependence, other dependencies of the thermodynamic function, such as temperature, can be considered, increasing the order of the tensor with one for each extra dependency. If a thermodynamic model is available, TDTs can be precalculated and used in phase-field simulations to get the required thermodynamic quantities. However, the great challenge of dealing with TDTs is that their number of entries increases exponentially with the order of the tensor, as illustrated in Fig. 1a. Therefore, when more components are considered, it quickly becomes impossible to generate, store, or handle the huge data sets. The computational difficulties caused by this exponential dependence are known as the curse of dimensionality. 28 Moreover, as illustrated in Fig. 1a, b, the step size, δ xn ; with which the molar fraction x n is sampled, obviously also affects the number of entries in the TDT, as the size of the tensor along each dimension I n is inversely proportional to the step size. The step size can be chosen differently along each dimension, but in this study we take a same step size for all molar fractions, i.e., If TDTs are used to provide thermodynamic quantities in phase-field simulations, a small step size δ x is desirable to avoid sudden molar fraction jumps and the need for multi-directional interpolation between entries.
In practice, the use of TDTs in microstructure evolution simulations is thus limited to lower-order systems. The hypothesis motivating our work is that, if a higher-order TDT is too large to be constructed or used, the information present in the TDT may still be accessible if a CPD of the TDT is obtained. See Methods section Tensor Decomposition for a general introduction to this methodology. When applied to a quaternary thermodynamic system (N ¼ 3), such as the liquid Ag-Cu-Ni-Sn alloys under study, our technique decomposes the TDT into a sum of R rank-1 terms, of which each term is the outer product of N factor vectors, and with R called the rank of the decomposed tensor model. This is graphically illustrated in Fig. 2a, b. An alternative representation of the same CPD is obtained by collecting all factor vectors of the CPD related to each dimension n into a factor matrix A ðnÞ . In this case, the dimensions n refer to the molar fractions x n , with n ¼ 1; 2; 3, and factor matrices A ð1Þ , A ð2Þ , and A ð3Þ are obtained, as shown in Fig. 2a-c. A major benefit of this approach is demonstrated in Fig. 2d, where the number of entries in a TDT is shown to grow exponentially with the number of components in the system, whereas the number of parameters in a CPD, representing the same TDT only grows linearly. For alloys with four or more components, the reduction in coefficients when using a CPD representation of a full tensor is gigantic, enabling multicomponent microstructure evolution simulations.
While determining the rank value, R is needed for an exact decomposition of a TDT is difficult, previous applications in other fields have shown that tensors representing physical data can often be well approximated using a low-rank CPD. [35][36][37][38] In this work, we applied tensor decomposition to high-dimensional  Fig. 1 Characteristics of a thermodynamic data tensor. a The number of entries in a TDT is plotted as a function of the step size δ x (used for data collection) and the number of components in the system. b Visualisation of two third-order incomplete TDTs sampled with different step size. b The dashed box represents what would be the shape of a full third-order tensor; however, owing to the constraint P C i¼1 x i ¼ 1 on the molar fractions, the TDT is incomplete and only the entries within the indicated tetrahedron, i.e., the entries satisfying P N i¼1 x i 1 can be evaluated. The step size δ x determines the density and hence the number of entries in the TDT. On the other hand, the number of TDT entries depends exponentially on the number of components, or, more generally, on the order N of the tensor, making storage, and computation quickly intractable, as can be seen in a. thermodynamic data sets and we show that a low-rank approximation can be used to obtain an accurate decomposed tensor model describing the dependence of Gibbs energies and derived thermodynamic functions on the molar fractions, representing the alloy composition.
Moreover, when a priori knowledge on solution thermodynamics is taken into account, a CPD can be made even more compact and a continuous description of the composition dependence can be obtained. In this study, for example, we exploited the polynomial dependence of thermodynamic quantities on the molar fractions when the ideal mixing term is subtracted. This approach is reasonable as the contribution from ideal mixing depends only on known information (namely, the molar fractions of all components and the temperature) and can be calculated separately and added to the values evaluated from the CPD in the phase-field simulation (see Methods sections CALPHAD thermodynamic model and Thermodynamic tensor model). A polynomial constraint could therefore be imposed on the factor vectors, 31,39,40 and the CPDs representing the molar Gibbs free energy, Ã G (the superscript Ã denotes that the ideal mixing contribution is subtracted from the thermodynamic quantity), the diffusion potential of component n, Ã P ðnÞ , (i.e., derivative of the molar Gibbs free energy) and the derivative of the diffusion potential of component n with respect to the molar fractions of the component m, Ã D ðn;mÞ , (i.e., the second derivatives of the molar Gibbs free energies) were coupled. A single set of factor matrices modeling the composition dependence of all the required thermodynamic quantities at once was then obtained. Such a set containing all factor matrices necessary to completely describe the thermodynamic quantities of the system in a phasefield simulation is further referred as a thermodynamic tensor model of rank R. For a quaternary system, it is represented as The factor matrices A ðnÞ give the contribution related to x n to the Gibbs free energy TDT. The _ A ðnÞ model, the contribution related to taking the first partial derivative of the Gibbs energy with respect to x n , is needed to represent the TDT of the diffusion potential of component n. The € A ðnÞ model contributions related to taking the second partial derivative of the Gibbs energy with respect to x n . A cross partial derivative with respect to x n and x m is obtained using _ A ðnÞ and _ A ðmÞ .
From these factor matrices, the thermodynamic tensor models representing the required TDTs can be obtained, for instance, for a quaternary system Ã G % ½ ½A ð1Þ ; A ð2Þ ; A ð3Þ ; giving an extremely compact representation of an immense amount of data. Taking, for example, N ¼ 3, δ x ¼ 0:0001, and R ¼ 7, a TDT of about ð1=δ x Þ N =N! % 1:6 10 11 entries (the factor 1=N! takes into account that only the entries within the tetrahedron in the TDT in Fig. 1a, b are calculated) is represented using factor matrices that contain only R N 1=δ x % 2:1 10 5 coefficients. The CPD model is thus about a million times more compact than the TDT. Furthermore, the coefficients of the degree d polynomial constraint alone already suffice for the representation of the TTM. The Gibbs free energy and derived TDTs can thus be represented with R N ðd þ 1Þ ¼ 105 coefficients only. Such impressive compression ratios have recently revolutionised tensor-based scientific computing; [35][36][37][38] in materials science applications, the approach is new. Canonical polyadic decomposition of a thermodynamic data tensor (TDT). a-c Visualisation of a canonical polyadic decomposition (CPD) when applied to a third-order TDT with dimension I 1 I 2 I 3 . The third-order TDT a is written as a sum of R rank-1 terms b, each of which is represented by the outer product of three nonzero factor vectors a ð1Þ r , a ð2Þ r , and a ð3Þ r . The collection of R factor vectors into factor matrices c with dimensions I 1 R; I 2 R; and I 3 R, provides a convenient representation of the CPD. Each factor matrix contains the coefficients necessary to describe the contribution of the molar fraction of one of the components to the TDT. The entire tensor in a corresponds to the dashed box in Fig. 1b; the CPD in b will be determined from only the tetrahedron part in Fig. 1b. d The number of entries in the TDT and the number of coefficients in the CPDs with rank values R = 3, 6, 10 are plotted as a function of the number of components, for step size δ x ¼ 0:0001. The exponential dependence of the number of entries in the TDT on the number of components is broken when a CPD representation is used instead. Indeed, the CPD needs only ðI 1 þ I 2 þ I 3 ÞR coefficients, which depends linearly on the number of components. e The same information as in d is plotted, but excluding the TDT size for a better visualisation of the number of coefficients in the TTM as a function of the number of system components.
Moreover, the coefficients in the factor vectors can be obtained using only a limited set of entries from the original TDT, called the training set (see Methods section Thermodynamic tensor model training for more details). There is thus no need to compute the full TDT at any point in the derivation.
Another major advantage of the approach is that values approximating individual entries of the TDT can be calculated very efficiently and easily from the TTM. For example, the diffusion potential of the first componentμ 1 at a composition given by the molar fraction values x 1 , x 2 , and x 3 , for which the entry in the tensor is indexed by i 1 , i 2 , and i 3 , is calculated as Therefore, again we avoid the need to construct explicitly the full TDT Ã P ð1Þ before data can be extracted. This provides us with an efficient framework to supply PFM with CALPHAD thermodynamic quantities.
First TTMs with rank values R ¼ 3; ; 10 are optimised using Tensorlab, 40 modeling the composition dependence of the molar Gibbs energy, the diffusion potentials of the elements Ag, Cu, and Ni, and the partial derivatives of these diffusion potentials with respect to the molar fractions Ag, Cu, and Ni. The data used for training were obtained using Thermo-Calc 2017b with the COST 531 database. 41 Further details are given in Methods section Thermodynamic tensor model training. This particular system was chosen for validation purposes, as we have access to all coefficients of the CALPHAD model for this quaternary system. The rank R of the TTM is an important parameter in this study. A rank as low as possible is preferred, as the size of the factor matrices scales linearly with R (see Fig. 2b, c), and consequently the number of coefficients that has to be optimised and the number of terms that has to be evaluated when extracting data from the tensor model (equation (3)) in the phase-field simulations, increase linearly with R as well.
The dependence of the accuracy of the TTMs on the rank value R is evaluated by comparing thermodynamic quantities approximated with TTMs R ¼ 3; ; 10 with the corresponding entries in the validation TDTs (see Methods section Validation). The empirical cumulative density function (ECDF) of the relative errors (RE) on the thermodynamic quantities calculated from the TTMs is plotted for the different rank values in Fig. 3a. The interquartile range (IQR) and Q ¼ 0:95 and Q ¼ 0:98 quantiles of the relative error are given for each rank value R in the table in Fig. 3b. The plot and table show that up to R ¼ 7, the accuracy improves for increasing R, whereas for higher values of the rank further improvement is limited. Using the TTM with rank value R ¼ 7, 98% of the data points are represented with more than four digits of accuracy, i.e., RE < 10 À4 . For a small fraction (<0:1%) of points, the relative error remains large. However, further inspection of these larger RE showed that they are only found when the considered thermodynamic quantity itself has a value close to zero. It is verified, for instance, that TTMs with R ¼ 7 or higher all have a maximum absolute error smaller than 32 J/mol, which is very small compared with the range over which the Gibbs free energy data vary, namely between −1.02 × 10 5 J/mol and 3.82 × 10 6 J/mol.
Phase-field simulations Next, the TTMs for different rank values are used to evaluate the diffusion potentials as a function of the local composition in a multicomponent phase-field model simulating spinodal decomposition and further coarsening in Ag-Cu-Ni-Sn liquids, using the model described in Methods section Phase-field microstructure evolution model. The accuracy and advantages of the approach using TTMs are compared with those of the existing approaches for using CALPHAD thermodynamic models in phase-field simulations. Fig. 4 gives an overview of the most common coupling approaches. [13][14][15][16][18][19][20][22][23][24][25][26] The methods using direct implementation of the CALPHAD Gibbs free energy expressions (b) or an interface with a thermodynamic software (d), give an exact and continuous representation of the composition dependence of the CALPHAD Gibbs free energies. They are considered as a reference against which the accuracy of the simulations using TTMs is evaluated. In comparison with the method using lookup tables (e-f), the curse of dimensionality is broken and an arbitrarily fine resolution, and even continuous representation, of the composition dependence can be obtained.
First, 1D simulations were conducted using TTMs and validated against the approach using an interface with thermodynamic software. The conclusions are discussed in the Supplementary Information. Next, 2D simulations were conducted for the six alloy compositions given in Fig. 4i, using TTMs with rank values R ¼ 3; ; 10, and validated against the approach where the CALPHAD substitutional model is directly implemented in the PFM. For each of the six alloy compositions an initial condition is created by adding small random noise at each position in the system to the values of the molar fractions as given in Fig. 4i. The initial condition and all phase-field simulation parameters are the same for all simulations. The only difference is the approach used to include the CALPHAD diffusion potentials in the PFM.
A selection of the 2D-simulated microstructures is shown in Fig.  5a, c, e. Results obtained using the direct implementation of the CALPHAD model expressions (CE) are shown in the first column and results obtained using TTMs R ¼ 4 and R ¼ 6 in the second and third column. Even for R ¼ 4 and R ¼ 6, the microstructures obtained using the TTMs look very similar to those obtained using   Table with the interquartile range (IQR) and quantiles Q ¼ 0:95 and Q ¼ 0:98 of the relative error of TTMs for rank values R ¼ 3; ; 10. In both cases a, b the relative error is calculated for the Gibbs free energy, diffusion potentials, and derivatives taking data collected using TC-toolbox with COST 531 database as the reference. There is little improvement of the accuracy of the TTM for rank values R ! 7. For a more quantitative comparison, the corresponding probability density functions (PDF) of the molar fraction fields are visualised in Fig. 5b, d, f. The distributions are bimodal with two means, m 1 and m 2 , corresponding to the different compositions of the co-existing liquid phases. According to the Langer-Bar-On-Miller (LBM) method, 42-44 the microstructure characteristics can be quantified based on the amplitude A, defined as A ¼ jm 2 À m 1 j (see Methods subsection Validation). For all six alloys, the bimodal PDFs obtained from the simulations using the TTM with rank value R ¼ 6, coincide with those obtained evaluating the full CALPHAD model expressions. When using the TTM R ¼ 4, a deviation of the PDF from that obtained for CE is observed in some cases, see Fig. 5d. These deviations are also reflected in the values of the amplitude A. Therefore, the relative error on the values obtained for the amplitude A from those measured from the profiles obtained using the direct implementation of the CALPHAD model (CE) are considered as an appropriate measure to quantify the accuracy of the phase-field simulations using TTMs with a different rank value R.
In Fig. 6a, the CDFs of the RE on the amplitudes of the bimodal distributions of the molar fraction values are plotted for varying rank. Up to rank value R ¼ 6, the accuracy improves when the rank value of the TTM is increased. For R ! 6, no further improvement of the accuracy is obtained. The analysis of the IQR given in Fig. 6b indicates a low dispersion in the relative error, even for R < 6. The Q ¼ 0:95 and Q ¼ 0:98 quantile show that even when approaching the upper limit of the distribution, high accuracy is maintained.
As an additional comparison, the volume fraction of one of the phases is measured as a function of time from the simulations. In Fig. 6c, d, results from simulations using the full CALPHAD expression (CE) are plotted along results from simulations using TTMs with rank values R ¼ 3; ; 6. The volume fractions obtained using a TTM with rank value R ¼ 6 are almost identical to those obtained with the direct implementation of the CALPHAD expression in the PFM. However, if the main interest is in the volume fraction measurements, a TTM with rank value R ¼ 5 or even R ¼ 4 may already give sufficient accuracy (see Fig. 5c, d). Depending on the application and the desired accuracy, a lower rank value R can thus be chosen to limit the number of coefficients in the TTM and the computation time needed to extract data from the tensor by evaluation of equation (3).
In conclusion, exploiting the low-rank tensor character of thermodynamic properties for higher-order multicomponent systems, a highly accurate and compact representation of their composition dependence can be obtained, which can be evaluated easily and efficiently in microstructure evolution simulations. Although tensor decomposition and tensor completion can be applied as a purely data-driven method, we illustrated that there is the flexibility to also account for a priori knowledge on the models or physical phenomena behind the data by applying functional constraints to the factor vectors. It is possible to extend the proposed method to other types of thermodynamic solution models, to include more elements or the dependence on other variables than composition, or to represent different types of composition-dependent phase properties, such as diffusion mobilities, other kinetic coefficients, interface properties, and mechanical properties. The rank value may be slightly higher for more complex dependencies. It is also possible to adjust the degree of the polynomial constraint or even use different types of constraints to model thermodynamic functions with certain features. The statement that the number of coefficients in the CPD depends only linearly on the number of components in the system is an intrinsic property of the CPD and will thus remain valid. The important advantage is that the efficiency and lower complexity is not compromised when adding more elements or variables, which opens up the possibility for detailed higher-order However, most solid phases are described using a sublattice model for which it is complex to relate the phase-field and the CALPHAD variables. c A paraboloid expression fitted to data calculated with the CALPHAD method is used to approximate the composition dependence of the Gibbs free energy. However, this approximation can only describe the Gibbs free energy accurately over limited composition ranges. Moreover, for higher-order systems, it becomes hard to prevent molar fractions from taking nonphysical values below 0 or above 100%. d External software is used to evaluate thermodynamic quantities as required by the phase-field simulation. The time spent in the communication between software is the main disadvantage of this approach, making it less efficient than a. e, f The sampling of data as a function of composition using a thermodynamic software into lookup tables that are consulted in the phase-field simulation, which is heavily affected by the curse of dimensionality. g Proposed methodology in this work: a canonical polyadic decomposition is applied to describe efficiently the thermodynamic quantities, resulting in factor matrices h from which the thermodynamic quantities can be evaluated in PFM. The alloy compositions considered for the phase-field simulations are given in i.

CALPHAD thermodynamic model
Ag-Cu-Ni-Sn liquid alloys are considered in this study. According to the CALPHAD method, [10][11][12]45 the required thermodynamic quantities are described assuming a random substitutional solution model. They are a function of C ¼ 4 molar fractions x i , with i ¼ 1; ; C, and the temperature T. The molar Gibbs free energy consists of three parts in which G o refers to the Gibbs free energy of a reference surface, G id mix to the contribution from ideal random mixing (i.e., owing to the configurational entropy), and G xs mix to the excess energy owing to deviations from ideal behavior. The term G o is given by where G o i is the temperature-dependent molar Gibbs free energy of pure component i with respect to the SGTE 46 reference state. G id mix is given by in which R is the ideal gas constant. G xs mix is expressed using a Redlich-Kister-Muggianu polynomial 47 ijk þ x k L ð2Þ ijk Þ; in which L ðvÞ ij are binary interaction parameters and L ð0;1;2Þ ijk ternary interaction parameters.
As the molar fractions should always sum to one, i.e., P C i¼1 x i ¼ 1, one molar fraction, e.g., the Cth, is dependent on the other fractions, i.e., x C ¼ 1 À P N n¼1 x n . Therefore, the molar Gibbs free energy can be expressed as a function of N ¼ C À 1 molar fractions x n , with n ¼ 1; ; N, and the temperature T.
The diffusion potential of the nth component, needed in multicomponent phase-field simulations, is given by the first partial derivative Alloy 3 - which is related to the chemical potentials μ n of component n and μ C of component C, as defined in standard solution thermodynamics, as 14 μ n ¼ μ n À μ C : The partial derivatives of the diffusion potential of the nth component with respect to x m , are often necessary to numerically solve phase-field equations. Inspection of the different terms in equation (4) shows that only the contributions G o and G xs mix contain model parameters from the thermodynamic database, namely G o i , L ðvÞ ij , and L ð0;1;2Þ ijk . As these parameters may not be accessible by the user or the PFM, the terms G o and G xs mix must be modelled. The contribution G id mix given by equation (6), however, is a function of known variables only, namely the independent molar fractions x n and the temperature T. Therefore, the contribution of G id mix to equation (4) can be calculated at any moment and for any entry of the TDT without consulting the thermodynamic database. As the logarithmic dependence on the molar fractions of this term complicates the formulation of an accurate TTM, we define a starred version of the Gibbs free energy, diffusion potentials, and their partial derivatives with respect to the molar fractions in which the contribution of G id mix is not included: Ãμ n ¼μ n À RTðlogx n À logx C Þ; Equations (5) and (7) show that Ã G m , Ãμ n , and ∂ Ãμ n ∂xm depend polynomially on the molar fractions of the different components. As discussed in Methods section Thermodynamic tensor model, only these polynomial contributions to the TTMs are modelled, knowing that equation (7) for G id mix and its derivatives can be evaluated and added during the phase-field simulation.
For the simulations, temperature-dependent expressions for all binary and ternary interaction parameters L ðvÞ ij , L ð0;1;2Þ ijk , and the Gibbs free energies G o i of the pure elements for the liquid phase are obtained from the COST 531 thermodynamic database. 41 Tensor decomposition A CPD writes a Nth-order TDT T as a sum of R rank-1 terms, 28,33,34 each of which is the outer product, denoted by , of N nonzero factor vectors a ðnÞ r , with r ¼ 1; ; R and n ¼ 1; ; N: Elementwise, this means that each entry indexed with i n , for n ¼ 1; ; N, of T is given by There exist various techniques to determine the coefficients in the factor matrices based on a limited set of data entries of the original tensor. 28 Moreover, when available, a priori knowledge on the model or physical process behind the data can be taken into account, resulting in an even more compact and continuous tensor model of the data. In this work, for example, we exploited the polynomial dependence of Ã G m on the molar fractions, and the fact that the Ãμ n and ∂ Ãμ n ∂xm are the first and second partial derivatives of Ã G m . A polynomial constraint could therefore be applied on the factor vectors and the CPDs for Ã G m and its first and second partial  Fig. 6 Rank dependence of the accuracy of the 2D simulations using thermodynamic tensor models. a The ECDF of the relative errors on the amplitude of the molar fraction variation plotted in different line colours for TTM rank values R ¼ 3; ; 10. For R ¼ 6 and above, markers are also used for better identification of overlapping lines. b Table showing the interquartile region (IQR) and quantiles Q ¼ 0:95 and Q ¼ 0:98 of the relative errors for rank values R ¼ 3; ; 10. c, d Evolution of the volume fraction of the liquid phases as a function of time steps obtained from simulations using directly the CALPHAD model expressions (CE) and using TTMs with rank values R ¼ 3; ; 6. No further improvement in the accuracy of the simulations using TTMs is obtained when using a rank value higher than R ¼ 6. It should be noticed, however, that R ¼ 5 already gives high accuracy. Moreover, for certain applications, lower rank values might already give sufficient accuracy.
derivatives Ãμ n and ∂ Ãμ n ∂xm could be coupled within a single tensor model, as discussed in Methods Thermodynamic tensor model.

Thermodynamic tensor model
We first show how a TDT can be written as a CPD in which each factor vector a ðnÞ r is a discretized smooth function in a single variable x n , with n ¼ 1; ; N. Moreover, these functions are linear combinations of a set of basis functions common to each factor vector. This way, the problem of modeling all TDTs is recast into finding N coefficient matrices. The method is built upon two mild assumptions: separability and a linear combination of basis functions.
The first assumption is that the underlying function f ðx 1 ; ; x N Þ is a sum of separable functions, i.e., the terms can be written as the product of N functions in a single variable: in which R is preferably low. This is exactly the continuous formulation of equation (15). For example, G m ðx 1 ; ; x N Þ is not separable with low R because of the term G id mix (equation (6)), which contains x C logx C with x N Þ is, as it is the sum of products of polynomials in x n . As G id mix (equation (6)) contains only known constants, we can subtract this term and work with Ã G m ðx 1 ; ; x N Þ (equation (11)) without loss of generality. If the function f , and therefore also each f ðnÞ r ðx n Þ, is evaluated on (a subset of points of) an N-dimensional grid, an (incomplete) tensor is obtained. For example, in this paper, an equidistant grid is defined by > for some δ x and the TDT Ã G is Ã G m ðx 1 ; ; x N Þ evaluated on this grid. If we collect the functions f ðnÞ r ðx n Þ evaluated in x n as columns of A ðnÞ , n ¼ 1; ; N, we obtain the CPD: where Ã G is the tensor containing the values Ã G m ðx 1 ; ; x N Þ. Similarly, the TDTs for Ãμ n and ∂ Ãμ n ∂xm can be written as a CPD. Note that the grid points do not need to be chosen in the same way for different components n, nor do they have to be equidistant.
Second, we assume that each f ðnÞ By evaluating f ðnÞ r ðx n Þ on the grid, we obtain in which B ðnÞ collects the basis functions evaluated in x n and U ðnÞ the coefficients. As the basis functions are fixed, only the coefficients U ðnÞ need to be determined. In this paper, we choose a basis of monomial functions evaluated in the same points in all dimensions, i.e., b ðnÞ k ðx n Þ ¼ b k ðxÞ ¼ x ðkÀ1Þ for k ¼ 1; ; d þ 1, and n ¼ 1; ; N, which is reasonable given the polynomial nature of the theoretical models; see equations (11), (12), and (13). This way, each TDT can be written as a CPD in which each factor matrix A ðnÞ is subject to a linear constraint: in which the powers are elementwise. Note that the coefficient matrices U ðnÞ do not necessarily contain the coefficients G o i , L ðvÞ ij , and L ð0;1;2Þ ijk as their entries.
A key property following from the definition of the TDTs and the CPD formulation with linear constraints is that computing derivatives only requires the computation of derivatives of the known basis functions b k ðx n Þ: For example, as Ãμ 1 ¼ ∂ Ã Gm ∂x1 , the corresponding TDT Ã P ð1Þ can be represented as as only A ð1Þ depends on x 1 . _ B contains the evaluated derived functions _ b k ðxÞ ¼ ðk À 1Þx kÀ1 : Similarly, the derivatives of the potentials depend on _ B and € B, in which the latter matrix contains the second-order derivatives: respectively. Hence, once the coefficient matrices U ðnÞ , n ¼ 1; ; N, are determined, all TDTs are modeled. Moreover, as the coefficients are independent of the grid, different grids can be used in the training phase, i.e., to find U ðnÞ and when actually using the model, i.e., during the phasefield simulations.

Thermodynamic tensor model training
In the optimisation routine for modeling the TDTs of a quaternary system (N ¼ 3), we want to find the coefficients in the matrices U ð1Þ , U ð2Þ , U ð3Þ that minimise the least squares cost function min U ð1Þ ;U ð2Þ ;U ð3Þ T , and Ã P ð3Þ T are training TDTs, which are computed with a step size of δ x ¼ 0:025 using the TC-Toolbox (Thermo-Calc 2017b) with the COST 531 thermodynamic database 41 at 1400 K, each with 11,482 entries. The residual between the tensor and its low-rank, polynomial-constrained CPD model, e.g., Ã P ð1Þ T À ½ ½ _ BU ð1Þ ; BU ð2Þ ; BU ð3Þ , is multiplied elementwise (denoted by Ã) with a binary tensor S which is one for each computed training entry and zero otherwise, such that only the limited set of training entries contribute to the objective function. S can be different for each term of the objective problem during the optimisation, but here we choose to keep it constant. Only the diffusion potentials are used in the optimisation routine since other approaches, e.g., using Ã G or a combination of all TDTs, have shown to be less accurate. Therefore, as only derivatives are used, the Gibbs free energy is determined up to a constant α, which can be estimated as the mean of the residual and must be added to the computed CPD for the Gibbs free energy.
In this work, a conservative, i.e., small enough, value for δ x is chosen when selecting the training data to guarantee a highly accurate model. No further investigations are conducted on the influence of the step size used for the training data on the accuracy of the tensor model or phase-field simulations. Various other aspects can affect the accuracy of the tensor model, such as the degree of the polynomial constraint, the rank value of the CPD and the distribution of the training data over the system. However, the grid with which data are sampled is not very relevant, as we are interested in the optimisation of the coefficient in the U ðnÞ matrices and not the factors in the A ðnÞ matrices. The factors in the latter are only computed after the optimisation routine with equation (20) and the grid spacing can be chosen according to the necessity. This procedure also assures that the thermodynamic quantities are accurately modelled over the entire composition domain (including the dilute solution regions, where at least one of the molar fractions takes a very low value).
The least squares cost function is optimised in Tensorlab 40 using the data independent algorithm for the CPD of incomplete tensors subject to linear constraints (CPDLI DI). 31 A few decompositions are computed starting from random initial guesses for U ð1Þ , U ð2Þ , U ð3Þ , given R and d, and the best result is retained. CPDs for rank values R ¼ 3; ; 10 are computed. (Models with maximal degrees per variable d ¼ 3; 4; 5 have been computed as well to validate that the correct degree d ¼ 4 can indeed be recovered. This degree d ¼ 4 can be obtained from equation (4) and the parameters obtained from the thermodynamic database COST 531. 41 ) Once the matrices with the unknown coefficients U ð1Þ , U ð2Þ , U ð3Þ are found, the factor matrices A ðnÞ , _ A ðnÞ and € A ðnÞ , for n ¼ 1; 2; 3, are constructed using equation (20) and basis matrices B, _ B, € B, which are evaluated on a grid with δ x ¼ 0:0001.

Validation
The TC-toolbox (Thermo-Calc 2017b) and MATLAB are used to generate validation TDTs (indicated with subscript V) of the Gibbs free energy G V , diffusion potentials P of the liquid phase at 1400 K, with a step size of δ x ¼ 0:0001 and using the COST 531 thermodynamic database. 41 The accuracy of the TTMs with rank value R ¼ 3; ; 10 is verified based on the RE on the entries of the validation TDTs. For example, for the Gibbs free energy tensor, the relative error E ðGÞ R given by i2;r a ð3Þ i3;r þ RT The contribution from the ideal mixing term G id mix is calculated separately and added to the values calculated from the TTMs, which only include the Ã G m in equation (11) (12) and (13). As α is a constant, it does not appear in the expressions for the diffusion potentials and their partial derivatives with respect to the molar fractions.
An ECDF is then determined for each rank value R considering all the values obtained for E ðGÞ R , E ðPÞ R , and E ðDÞ R together. The MATLAB fuction ecdf is used for this. The ECDFs are plotted in Fig. 3a; the IQR, and quantiles Q ¼ 0:95 and Q ¼ 0:98 are presented in Fig. 3b. Additional quantile values can be found in the Supplementary Information.
The validation of the 2D simulations using TTMs is conducted based on a comparison with simulations performed using the direct implementation of the CALPHAD model (approach b in Fig. 4). The reason for this choice is the fact that most of the alternatives, presented in Fig. 4, cannot be applied to a quaternary system owing to certain limitations. For instance, in the approach that uses polynomial fitting (Fig. 4c), only a narrow region of the thermodynamic function can be accurately described and since diffusion paths can become complex for multicomponent systems, it is difficult to identify which composition region will be most relevant for a simulation. Moreover, polynomial fits cannot prevent molar fractions from taking an unphysical value below zero or above one. Although this can be avoided in binary system, by designing the simulation in such a way that the molar fraction does not take a negative value, when considering four or more elements, there is always a time or location in the simulation where one of the molar fraction fields goes below zero, making the remainder of the simulation unstable. The approach that uses lookup tables (Fig. 4f) is also unfeasible in this case owing to the size of the sampled data sets given the required high resolution, as seen in (Fig. 1a). These approaches, however, are suitable and straightforward to be applied to binary or ternary systems, whereas the proposed use of tensor completion requires at least three independent variables in the thermodynamic descriptions.
The approach that uses the direct implementation (Fig. 4b) of the CALPHAD model expression and the one that uses an interface with a thermodynamic software (Fig. 4d) can be applied for quaternary and higher-order systems. With our implementation of the method that uses an interface with a thermodynamic software we could only run 1D simulations in a reasonable amount of time (these are presented in the Supplementary  Information). It should be mentioned that optimisations (mainly based on the idea of interpolation between previously calculated data) of phase-field codes using an interface with a thermodynamic software have been reported in the literature; 18,19 however, to the best of our knowledge, only for binary and ternary systems.
For the method using the direct implementation of the CALPHAD model expression (CE), we found that (for our implementation) the efficiency is similar to that using the TTM for the quaternary system. However, such a comparison is often not relevant. The direct implementation of the CALPHAD model expression requires access to an open thermodynamic database (while today, the most advanced thermodynamic databases are commercial). Moreover, if the phase under study is modelled using sublattices, it is mostly impossible (except for some special cases) to relate the phase-field molar fractions with the site fractions in the CALPHAD expression. Hence, in these cases, the direct implementation of the CALPHAD model expression is not even possible, whereas the tensor model can still be formulated. We also observed that the current implementation of the TTM in the phase-field model is memory bound, which can be addressed with further optimisation studies. Theoretically, the TTM method should be cheaper to evaluate than CE.
For the validation of the 2D simulation, the LBM method is used to estimate the amplitudes of the molar fraction distributions of the simulated microstructures during spinodal decomposition and consequent coarsening. [42][43][44] For each simulation, the microstructures obtained at every thousandth time step are considered. For each microstructure, a Gaussian mixture model is used to fit the distributions of the molar fraction values for each of the three independent components with the MATLAB function gmdistribution. A bimodal distribution is obtained with two means m 1 and m 2 , corresponding to the compositions of the two liquid phases, as shown in Fig. 5b-d. The amplitude A is then calculated as the difference between these two means, i.e., A ¼ jm 1 À m 2 j. The amplitude A R from the simulation using the TTM with rank value R is compared with the amplitude A obtained from the simulation using the direct implementation of the CALPHAD model (approach b in Fig. 4). The relative error E 2D R is defined by and is determined for each molar fraction field and each alloy composition and for rank values R ¼ 3; ; 10. Then, the ECDF is calculated for each rank value R, considering all the RE determined for the different alloy compositions and the three independent molar fraction fields, obtained for the given rank value R, together. The ECDFs are plotted in Fig. 6a and the information of the IQR, Q ¼ 0:95 and Q ¼ 0:98 are presented in Fig. 6b. In addition, the volume fractions of the two liquid phases are measured at every thousandth time step as follows. First, the mean of one of the molar fraction distributions is computed. Then, the grid cells of the simulation domain are divided into two groups, one containing the grid cells with a molar fraction value greater than the mean and the other with the grid cells with a molar fraction value smaller than the mean. The volume fractions of the two phases are then obtained by dividing the number of grid cells in each group by the total number of cells in the simulation domain. This information is presented in Fig. 6c, d.
Phase-field microstructure evolution model Assuming a volume-fixed frame of reference 48 , the temporal and spatial evolution of the non-equilibrium microstructure of the quaternary Ag-Cu-Ni-Sn system (C ¼ 4), is described with N ¼ C À 1 ¼ 3 conserved field variables, namely the molar fractions x 1 ðr; tÞ, x 2 ðr; tÞ, and x 3 ðr; tÞ of silver, copper, and nickel, respectively. The molar fraction of tin is then obtained as x C ¼ 1 À x 1 À x 2 À x 3 .
Following the Cahn-Hilliard 49 formalism, the non-uniform free energy functional is defined as where f 0 ðx 1 ; x 2 ; x 3 Þ is the homogeneous free energy density and k n are positive gradient energy coefficients for n ¼ 1; 2; 3. The variational derivative of F with respect to x n is given by δF δx n ¼ ∂f 0 ðx 1 ; x 2 ; x 3 Þ ∂x n À k n ∇ 2 x n : The partial derivative of the homogeneous free energy is obtained given the thermodynamic relation 14 in equation (9) as withμ n the diffusion potential and μ n the chemical potential of component n (n ¼ 1; 2; 3) and μ C the chemical potential of Sn. The diffusion equation describes the evolution of the conserved field variables for the multicomponent system 7 and is given by 1 V m ∂x n ðr; tÞ ∂t ¼ À∇ J n ; in which V m is a constant coefficient approximating the molar volume of Y.A. Coutinho et al.
the alloy. J n is the diffusion flux of the nth component, which is defined with the Onsager reciprocal relations 50 as in which M nm is the mobility of component n taken with component m as reference. As information on the diffusion coefficients for quaternary systems are scarce, equal atomic mobilities β for all components are adopted, giving 48 Vm βx n x m ; n ≠ m: Equation (34)

DATA AVAILABILITY
All data are kept available at KU Leuven and can be accessed upon request. Alternatively, they can be generated in limited time using the provided codes in combination with Tensorlab 3.0, MATLAB and TC-toolbox (Thermo-Calc 2017b).

CODE AVAILABILITY
All codes used to generate the data for this project are publicly available. 51 Received: 26 August 2019; Accepted: 5 December 2019;