A data-informatics method to quantitatively represent ternary eutectic microstructures

Many of the useful properties of modern engineering materials are determined by the material’s microstructure. Controlling the microstructure requires an understanding of the complex dynamics underlying its evolution during processing. Investigating the thermal and mass transport phenomena responsible for a structure requires establishing a common language to quantitatively represent the microstructures being examined. Although such a common language exists for some of the simple structures, which has allowed these materials to be engineered, there has yet to be a method to represent complex systems, such as the ternary microstructures, which are important for many technologies. Here we show how stereological and data science methods can be combined to quantitatively represent ternary eutectic microstructures relative to a set of exemplars that span the stereological attribute space. Our method uniquely describes ternary eutectic microstructures, allowing images from different studies, with different compositions and processing histories, to be quantitatively compared. By overcoming this long-standing challenge, it becomes possible to begin to make progress toward a quantitatively predictive theory of ternary eutectic growth. We anticipate that the method of quantifying instances of an object relative to a set of exemplars spanning attribute-space will be broadly applied to classify materials structures, and may also find uses in other fields.

Materials design involves observing and cataloging materials structures, understanding the underlying relationship between the multilevel structures and resulting material properties, and developing processing routes to prepare materials with the properties that yield optimal engineering performance 1 . It relies upon the existence of a universally agreed upon language to quantitatively represent and subsequently catalog the observed structures 2 . One of the most fundamental components of a material's hierarchical structure is the microstructure; however, there is yet to be a consensus regarding the language for quantitatively representing many complex microstructures.
Recent attempts at microstructural quantification have involved data-driven and machine learning based approaches [3][4][5][6][7] . Principal component analysis (PCA), a statistical method that has been successfully applied to identify trends in complex multivariate materials data [8][9][10] , has also found application in the quantitative representation of microstructures. PCA was used by Zabaras et al. to construct a dynamic library of single phase polyhedra microstructures 11,12 . Instead of focusing on the many sub-features of the microstructures, each was considered as a single entity that was quantified by a set of coefficients. Single grain microstructures were used as input to PCA and were combined with a support vector machine algorithm for classification 11,12 . PCA also has been used in stochastic modeling of microstructures. The spatial relations were described using two-point correlations and the overall state of the structure was treated as a statistical distribution 2,[13][14][15][16] . The variance of the two-point correlations was examined in PCA and this allowed the creation of a structure-property map in PCA space 2,13 . This approach allows the simultaneous classification and structure-property analysis of multiple complex microstructures 14 including, for example, the structure-diffusivity relationship in porous transport layers of polymer electrolyte fuel-cells 15 , and the structure-plasticity relationship in non-metallic inclusion/steel composite material systems 16 . A similar method has been used to couple phase field simulations with spatial correlations to quantify and classify the evolution of microstructural changes in ternary eutectic structures 17,18 .
In this work we also focus on the ternary eutectic microstructures, as an exemplar. Even simple ternary microstructures exhibit a high degree of morphological variation due to the complex dynamics present during their evolution. The significant advances that have been made in understanding the solidification of regular binary  19,20 are based on the common language used to quantitatively describe the resulting microstructure, i.e., the lamellar and rod morphologies. The absence of a universal language to quantitatively represent the ternary eutectic microstructures has prohibited the development of an accurate theory of ternary eutectic solidification and this motivates our study. Initial classification efforts described ternary eutectic microstructures as a combinations of the lamellar and rod morphologies observed in the regular binary eutectic microstructures [21][22][23] , but this approach was unable to represent the multitude of complex morphologies observed in experiments [24][25][26][27][28] . Currently, the most widely used classification scheme for ternary eutectic morphologies is given by Ruggiero and Rutter 29 and its analytical solution is an extension of Jackson and Hunt's analytical solution of binary eutectics 30 ; three distinct growth modes are identified: semi-regular brick (SRB), lamellar (LAM), and rod-hexagon (RHN) 31,32 . A small set of geometric features, such as fixed eutectic spacings and the fixed spacing ratio of phases are used to describe the relative scale of these microstructures. These approaches, although yielding important insights, have yet to produce a universal representation of ternary eutectic microstructures that can be used to develop a predictive model of solidification. As a result, new parameters have been suggested to describe the microstructures 33,34 . The greatest challenge for developing a quantitatively correct theory of ternary eutectic solidification is the creation of a universal language to allow a representation of the observed structures.
We demonstrate a data informatics approach to quantitatively represent the ternary eutectic microstructure. The microstructures examined are from the ternary eutectic Al-Cu-Ag system and are taken from refs 35,36 , along with the stereological descriptors used to describe them. Although there is a continuum of possible structures, the data-informatics method presented here allows any microstructure to be uniquely referenced to the three idealized morphologies identified in refs 29,31,32 . This approach directly allows for the integration of data spanning sources. In addition, the resulting numerical regression is applicable to ternary eutectic microstructures of other compositions.

Results
Dataset. It was reported in refs 35,36 that three variations of the semi-regular brick structure were observed in samples produced by directional solidification processed over the velocity range 0.0005 mm/s to 0.018 mm/s. In the low-velocity range, 0.0005 mm/s to 0.001 mm/s, an ordered semi-regular brick structure was observed; at mid-velocity range, 0.001 mm/s and 0.01 mm/s, the intermetallic phases, Ag 2 Al(hcp) and Al 2 Cu(tet), began connecting to their nearest same phase neighbors resulting in a more elongated form; at higher velocities the microstructure somewhat resembled the lamellar morphology. Due to the relatively slow diffusion rate of Ag in liquid, at high velocities the Ag 2 Al(hcp) phase had a more fragmented morphology as compared to the Al 2 Cu(tet) phase, which continued to maintain a lamellar form. At the highest velocities Al(fcc) lost its elongated form, adopting a more circular shape, similar to Ag 2 Al(hcp), and the periodicity between intermetallic phases also became less regular. The morphological changes in the images were smooth with increasing velocity. Examples of these microstructures are shown in Fig. 1.
These observations, while instructive for understanding the Al(fcc)-Ag 2 Al(hcp)-Al 2 Cu(tet) ternary eutectic system, do not provide a quantitative representation that allows the development of a general theory of solidification, nor do they allow for comparison of these microstructures to those reported from other studies. Well-known stereological descriptors that define the scale of the phases provide a quantification of the microstructure, but do not accurately capture all the qualitative changes observed in the images 35 . For example, Fig. 1(e) shows that the relative eutectic spacing remains constant for all growth velocities. This quantitative result does not explain the apparent microstructural differences observed in Fig. 1(a-d). Sargin and Napolitano concluded that scale defining attributes alone are insufficient to quantitatively represent the microstructures and developed descriptors that describe both the shape and scale of the phases 35,36 .
The stereological attributes used in the data analysis, consist of three different elements. The first element involves the quantification of area, perimeter, length, and width of each phase. The phase fractions are obtained from area measurement. The values are then averaged across each image. The second element is analysis of Fourier transform patterns from single-phase masked images. For each Fourier transform pattern, a radial and angular distribution plot is generated. The radial and angular order parameters are defined as the height over width of the peaks in the distribution plots. The final element is the analysis of the phase boundary distribution according to the angle and type. The 19 stereological attributes used in this study are given in Table 1.
Rather than compare microstructures to each other directly, a quantitative representation is constructed that describes the microstructures relative to the ideal SRB, LAM, and RHN ternary eutectic microstructures proposed by Ruggiero and Rutter 29,31,32 . The ideal microstructural images with equilibrium phase fractions, shown in Fig. 1(g-i), are generated such that the scale of the microstructures are consistent with that of the sample pulled at 0.001 mm/s. The phases are assigned the same pixel values as the masked experimental ones, 2, 255, and 127 for Al(fcc), Ag 2 Al(hcp), and Al 2 Cu(tet), respectively. The stereological attributes extracted for these three are combined with data from refs 35-37 , discussed above, to yield the dataset. Data analysis. The dataset is standardized and PCA is performed. The scree plot in Fig. 2 shows that 90% of the variance in the dataset is accounted for by the first four principal components (PCs). The loadings of these PCs are given in Table 2 and the score plots are given in Fig. 3.
The distinction between experimental and ideal microstructures is the source of greatest variance in the dataset, as shown in Fig. 3(a-c), and is captured by PC1. In contrast, the score plot of PC2, PC3, and PC4, in Fig. 3(d), shows that the experimental microstructures are bracketed by the ideal SRB, LAM, and RHN structures. The loadings of PC2, PC3, and PC4 have a larger variability that PC1 demonstrating that each PC represents unique stereological information. PC1 is removed for the remainder of the analysis due to its primarily characterizing the trivial distinction between the experimental and ideal structures. The principal component transformation is distance preserving, i.e., the relative distances between samples in attribute space remain the same in PC space, therefore it is possible to use the relative distance between experimental and ideal microstructures in PC space to quantify the similarity of experimental microstructures to the ideal ones. The experimental scores are projected into the plane defined by the SRB, LAM, and RHN scores, as shown in Fig. 4. This allows each experimental microstructure to be uniquely triangulated in terms of its fraction similarity to the ideals. The fraction similarity to the SRB, LAM, and RHN structures is determined for each microstructure and the results are plotted in Fig. 5. The growth velocity is used to label each microstructure in the figure.
A partial least squares (PLS) regression is used to quantify the relationship between a microstructure's stereological attributes and its fraction similarity to the ideal structures. To cross validate the regression, the leave-one-method is used. This approach helps compensate for the relatively small sample population. The resulting regression has a mean squared error (MSE) of 4.8 × 10 −5 , 7.5 × 10 −5 , and 1.6 × 10 −5 for SRB, LAM, and RHN, respectively.
It is known from the PCA that not all 19 of the stereological attributes are significant and it is found that using only 9 attributes yields a regression with MSE 1.4 × 10 −3 , 6.0 × 10 −4 , and 4.0 × 10 −4 for SRB, LAM, and RHN, respectively. Reducing the number of attributes allows the most significant ones to be identified and assists future studies by reducing the stereological analysis needed. The predicted regression results using only 9 attributes are compared to regression results using all 19 attributes in Fig. 5. The regressions are written where A n are the measured attributes and C n are the coefficients given in Table 3.

Discussion
It has been shown that ternary eutectic microstructures cannot be quantitatively represented using simple geometric measures, nor can they be represented using stereological analysis of individual features 35 . Here we demonstrate that it is possible to decompose ternary eutectic microstructures using stereological attributes and then apply informatics methods to quantify the similarity of the microstructure relative to the ideal SRB, LAM, and RHN structures. The directionally solidified microstructures' fractional similarity to the ideal LAM, SRB, and RHN structures is shown in Fig. 5. Analysis of the results tells us that for velocities less than 0.001 mm/s the structures are strongly   Fig. 1: Fig. 1(a) demonstrates SRB features for low velocities, Fig. 1(b,c) demonstrate a primarily LAM structure, and Fig. 1(d) demonstrates a mixing of the LAM and RHN features.    Looking closely at the results in Fig. 5 we see a large change in predicted character going from 0.009 mm/s to 0.010 mm/s. Such a large change in microstructure for an incremental change in velocity is unexpected, but careful examination of the 0.009 mm/s structure, shown in Fig. 1(f), and the 0.010 mm/s, shown in Fig. 1(c), validates this prediction. Although at this time it is unclear the physical origin of the changes, it is apparent that our analysis method accurately and quantitatively captures them.
Our confidence in this approach to represent the microstructures is due to the overall robustness of the analysis methods and data. First, PCA is a distance preserving transformation, meaning that the relative stereological similarities of the samples in attribute space is preserved when transformed to PC space. The first four PCs capture 90% of the attribute variance, meaning that the resolution loss due to truncation is on the order of the uncertainty introduced from other sources, such as the inherent variability from the experimental methods. Second, the first PC clearly captures the large difference between the ideal and experimental images. This is immediately observable from the score plots in Fig. 3(a-c). Discarding the trivial information from PC1 refines the data, leaving the important differences in the remaining PCs. This is analogous to masking the through beam in an electron diffraction experiment allowing access to the fine grained information in the diffracted data. Third, even though only a limited number of images are available, each image is information rich and the stereological analysis makes redundant measurements across each image. The attributes used in the analysis are not single measurements, but are averages from 25 different images taken from each sample. Fourth, this method is able to quantify the dramatic changes in the microstructures as the solidification velocities change, for example the crossover between SRB and LAM at 0.001 mm/s. The emergence of RHN at velocities greater than 0.012 mm/s also is clearly demonstrated. Fifth, the method presented here is purely based on experimental microstructures, comparing them to a small set of idealized archetypal structures. It does not require any theoretical inputs that may bias the results.
The PLS regression can be used for determination of fraction similarity to SRB, LAM, and RHN of any invariant ternary eutectic structure, independent of alloy system, i.e., the attributes have no explicit compositional dependence. The regression also is independent of the velocity and thermal gradient because the ideal standards used in the mapping do not depend on processing.

Conclusions
An approach is demonstrated that accurately and quantitatively represents ternary eutectic microstructures by classifying them according to their relative similarity to the ideal SRB, LAM, and RHN microstructures. The method is sufficiently general that it can be applied to any directionally solidified invariant ternary eutectic microstructure regardless the alloy system. Therefore it provides the common language necessary for the development of a general theory of ternary eutectic microstructure evolution during directional solidification.
Although this study is focused on microstructure, the data analysis process can be applied to quantitatively characterize any collection of structures or objects that have well-defined bracketing ideals. The essential elements of the approach are the identification of ideal standards, the identification of quantifiable attributes, the use of the attributes and PCA to classify the structures, the determination of the relevant PCs needed to explain the sample variance, and the exclusion of the trivial PCs that merely highlight the difference between the ideal and actual structures.

Methods
In the dataset, the length over width, area over perimeter, shape factor, angular order, and radial order attributes were subjected to a logarithmic transformations. Attribute standardization was used to avoid bias due to scaling differences. Standardization was applied according to the equation, where X S i was the standardized value of the data set, μ x was the mean and σ X was the standard deviation. Following this, PCA was applied and the distances and similarities of experimental structures to the ideal ones were calculated. The microstructures were characterized relative to the ideal ones by projecting them into the plane defined by the SRB, LAM, and RHN structures in PC2-PC3-PC4-space. The PC1 component was discarded because it primarily contained the information distinguishing the experimental and ideal images. The projected microstructures were bound by a triangle defined by the SRB, LAM, and RHN, which allowed the similarity of each microstructure to be determined in terms of its unique position relative to the ideal structures.
A PLS regression was used to determine the microstructures' fraction similarity to SRB, LAM, and RHN as a function of the microstructural attributes. The scikit-learn's 38 PLS package with the nonlinear iterative partial least squares (NIPALS) algortihm was used. The optimal number of components to be used in the PLS regression was obtained by the leave-one-out cross-validation method. For SRB and RHN 8 PCs were used and for LAM 9. The minimum number of attributes needed to accurately represent the triangulated data was determined to be 9. The 9-attribute linear equation for SRB, LAM, and RHN, were determined independent of each other.