Introduction

The mapping of material properties, pioneered by Ashby and coworkers, leads to the creation of charts that condense a large quantity of information into a useful representation that has the virtue of revealing important property correlations and that facilitates materials selection and design.1,2 These (usually two-dimensional) charts form the basis of an optimized, systematic methodology for materials selection based on materials informatics. In particular, this program involves the identification of technical design requirements and the associated materials indices used for materials selection to meet these requirements.3 The utility of material property charts is evidenced by their centrality in the materials selection process and, more specifically, their ubiquity in engineering design courses.

The correlations between materials properties have been expressed in terms of generic rules-of-thumb and also couched more formally in terms of constraints on dimensionless property groups.4 In many cases, however, materials property correlations or design specifications are intrinsically highly multidimensional, and one therefore naturally seeks a representation that conveys the underlying complexity of these relationships. For example, mechanical properties such as stiffness and strength, as described by the Young’s modulus and the ultimate tensile strength, respectively, may depend on the magnitude of interatomic interactions and the density, as described by the melting temperature and the mass density, respectively. This situation may be further complicated by the presence of a confounding variable, such as a particular materials property, that can lead one to ascribe spurious associations between other properties.5 In addition, for a given implementation, materials selection is often subject to practical, multidimensional constraints (e.g., on specimen dimensions, mechanical properties, cost, etc.) that are dictated, for example, by a fabricated component’s function.

The visualization and analysis of complex, high-dimensional materials data presents several challenges. First, information on many axes must be conveyed in a digestible, graphical format permitting one to identify relationships among variables. (In some cases, it is possible to apply dimensional reduction techniques to find an approximate representation of the data.6) As the data may reflect different physical properties spanning a wide range of values, a proper normalization is required to make meaningful comparisons. Moreover, it is necessary to define metrics that quantify trends and provide a means to interpret the data. For these reasons, as noted by Bryden et al., despite the implementation of visualization tools for multidimensional data analysis in fields such as computer science7 and genetics,8 the exploitation of such tools to interpret materials data is still in its infancy.9

One especially useful tool for multidimensional visualization is parallel coordinates. In a parallel-coordinate representation, as developed by Inselberg,10,11,12 a set of d parallel axes replaces the conventional d-dimensional orthogonal Cartesian axes such that each point has a unique representation. In particular, a point given by the coordinates (c1, c2, c3, …, c d ) in a d-dimensional Cartesian system is represented in parallel coordinates by locating each coordinate on its respective (parallel) axis and then joining these points to form a polyline. As an example, consider the point (0, 3, 1) in Cartesian coordinates. Its representation in parallel coordinates is given in Fig. 1a. More generally, it has been shown that mathematical objects may be readily mapped from Cartesian to parallel coordinates, and that the latter coordinate system is well-suited to multivariate statistical analysis.12 Another example of an important duality is presented in Fig. 1b, which show that lines in a Cartesian coordinate system map into (polyline intersection) points in parallel coordinates.

Fig. 1
figure 1

a The point (0, 3, 1) (in Cartesian coordinates) represented in parallel coordinates. Note that the intercepts on the axes are joined by line segments to make a polyline. b An illustration of the point-line duality. A line (shown in blue) with a negative slope m, namely y = 5 − x, in Cartesian coordinates maps into a point in parallel coordinates, the point lying at the intersection of polyline segments. A line (shown in red) with a positive slope m, namely y = (x/2) + 1, also maps into an (extrapolated) intersection point, now to the right of the rightmost axis

In this work, we illustrate the utility of combining the methods of data analytics with parallel coordinates to represent and interrogate high-dimensional materials property data. While parallel-coordinate plots have been employed in other science and engineering contexts, including Internet attack detection, process monitoring and analysis of climate data, their application to material science problems has been quite limited.13,14,15 There are, however, several recent noteworthy papers in this area, including the recent work by Bryden et al.9 on visualization, Gorai et al.16 on the development of tools for thermoelectric materials design and Kamath et al.17 on the analysis of additively-manufactured parts. Our focus here is on the construction and systematic analysis of multidimensional materials properties charts for metallic and ceramic systems using the parallel-coordinate framework. For clarity and to illustrate the main concepts, we have focused here on these two materials classes; the same approach outlined above can, of course, be applied to other materials classes (e.g., polymers, electronic materials) with the same benefits. As will be seen below, when combined with data analytics, this approach constitutes a powerful tool for identifying important property relationships that can guide materials selection Table 1.

Table 1 A property table summarizing the thermomechanical properties considered here, along with their reference values for nickel

Results

Parallel coordinates and correlations

To illustrate the construction and analysis of multidimensional materials property charts, we consider two distinct materials classes, namely elemental metals and (mostly) technical ceramics, each comprising 25 systems. The thermomechanical properties that will be examined are listed in the table below. Also shown in the table are the corresponding values for a reference system, chosen for convenience to be nickel, that provides a useful basis for comparison. These reference values (denoted with the subscript ‘0’) will be used to normalize property values to create dimensionless variables that will be denoted with a prime (e.g., H′ = H/H0).

As a first illustration of a multidimensional property chart, Fig. 2 displays a parallel-coordinate chart showing the normalized values of the p = 7 properties listed in the table for the n = 25 elemental metals. It should be noted that the order of the parallel axes was chosen here to highlight important pairwise correlations. While there is inherent clutter in this presentation (an issue to be addressed below), some information can be gleaned upon inspection. For example, given the behavior illustrated in Fig. 2, it is evident that there is some degree of positive correlation between the pairs E′ and \(T_m^\prime\), H′ and \(T_m^\prime\) and H′ and UTS′.

Fig. 2
figure 2

A parallel-coordinate chart that displays the values of the normalized thermomechanical properties for 25 the elemental metals listed in the property table

It is of interest to explore these correlations in more detail. The dependence of the stiffness and hardness on the melting temperature follows intuitively from the fact that the melting temperature is a measure of bond strength. One can ask whether, given their association with \(T_m^\prime\), E′ and H′ are correlated and whether this correlation is direct. Several authors have explored this relationship using indentation measurements finding that in many, but not all cases, E′ and H′ are positively correlated.18,19 From the metals data shown in Fig. 2 one finds that the correlation coefficient between E and H ρ EH  = 0.78. Performing a hypothesis test, one can conclude that since ρEH > ρcrit (where ρcrit = 0.396 is the critical value) for n − 2 degrees of freedom, one can reject the null hypothesis that E and H are uncorrelated at the 5% level of signficance.20 Thus, there is some signficance to this property association. To determine the degree of correlation between stiffness and hardness after the influence of the melting temperature is eliminated, one calculates the partial correlation coefficient21

$$\rho _{EH \cdot T_m} = \frac{{\rho _{EH} - \rho _{ET_m}\rho _{T_mH}}}{{\sqrt {1 - \rho _{ET_m}^2} \sqrt {1 - \rho _{T_mH}^2} }},$$
(1)

where the subscripts denote the properties. One finds that \(\rho _{EH \cdot T_m} = 0.29\). Assessing the significance of this coefficient with a hypothesis test, one finds a critical value of 0.488 at the 5% level of significance,20 implying that there may, in fact, be no direct correlation between E and H when controlling for other factors, such as the melting temperature.

Property trends and cluster validation

We now make a comparison of properties across materials classes. Figure 3a displays a parallel-coordinate chart showing the normalized values of the p = 7 properties listed in the table for the n = 25 elemental metals and the n = 25 (mostly) technical ceramics. Given the clutter in this figure, it is useful to construct a somewhat simpler representation to highlight important property trends. For each class of materials, one convenient and robust measure of centrality, especially in higher dimensions, is the geometric median, \(\tilde x\). In this context, the geometric median of the property points is defined such that

$$\tilde x = {\mathrm{argmin}}\mathop {\sum}\limits_{i = 1}^p d\left( {x,x_i} \right),$$
(2)

where d(x,x i ) denotes the Euclidean distance between points x and x i .22 (We note that the use of other metrics here is also possible.) These median values are shown in Fig. 3b. Several trends are evident upon inspection of these charts. As a class, the selected ceramics have lower densities, lower coefficients of thermal expansion and higher melting temperatures than the selected metals, as expected. Given that melting temperature is a measure of bond strength, it is not surprising that these ceramics are, in general, harder than the metals.

Fig. 3
figure 3

a A parallel-coordinate chart showing the normalized values of the p properties listed in the table for the n = 25 elemental metals (blue) and the n = 25 (mostly) technical ceramics (red). b The geometric median of properties for both the metals and ceramics. This analysis removes much of the clutter seen in a

With this division into materials classes, one can determine the degree to which these classes are distinct. From a physical point of view, the nature of the interatomic bonding in these two classes is, of course, different and this difference leads to distinct properties. From a data analytics perspective, one can ask whether this expected difference in properties leads to a sensible division of the data into clusters. To make this determination, one can employ clustering metrics known as validation measures that are used in data analytics to measure the quality of a cluster analysis. Two such metrics are particularly useful in this context, namely the Thornton separability,23 τ, and the Dunn,24,25 Δ, indices, respectively. To define these metrics, suppose that one wishes to compare two point clusters (e.g., corresponding to metals and ceramics), C m and C c , that comprise a distinct partitioning, C, of a set of points. Then, if there are N total points, x i is the i-th point and \(x_{i^{\prime}}\) is its nearest neighbor

$${\mathrm{\Delta }} = \frac{{{{\rm min}}_{x_i \in C_m,x_j \in C_c}d\left( {x_i,x_j} \right)}}{{{{\rm max}}_{x_s,x_t \in C_k;k:C_k \in C}d\left( {x_s,x_t} \right)}},$$
(3)

and

$$\tau = \frac{{\mathop {\sum}\limits_{i = 1}^N \left( {f\left( {x_i} \right) + f\left( {x_{i^{\prime}}} \right) + 1} \right){\mathrm{mod2}}}}{N},$$
(4)

where the indicator function f(x i ) = 0 if x i C m and 1 otherwise. The Dunn index is then the ratio of the minimum intercluster distance to the maximum intracluster distance, with larger values signifying greater separability, while the Thornton index is the fraction of points whose nearest neighbor is in the same (materials) class.26 Evidently, 0 ≤ τ ≤ 1, with values closer to 1 indicating well-separated clusters, those closer to 0.5 indicating little to no clustering and those close to 0 indicating a strong inverse correlation between classes (e.g., an alternating grid of points).

For this data one finds that Δ = 0.10 and τ = 0.82, respectively. The larger value of τ indicates that points corresponding to the same type of material have a relatively high degree of association. However, the relatively small value of Δ implies that the material clusters may be anisotropically dispersed such that clusters are in proximity in some direction in the multidimensional space. Thus, there is a moderate degree of separation between these two materials classes, suggesting some distinction between these two types of materials. It is of interest to compare these results with those obtained by a data-mining clustering algorithm, namely k-means clustering.27,28 This unsupervised learning methodology assigns points to clusters such that (property) space is divided into Voronoi cells. When applied to partition the data here into 2 clusters, the resulting clusters differ somewhat from those based on materials class. To highlight these differences, one can calculate the purity index, π (0 ≤ π ≤ 1), for the k-means clusters, a measure of the fraction of data points that have been correctly classified by a clustering scheme.29 In this case, π = 0.66, indicating that the k-means classification reflects some of the underlying physics. Evidently, the proximity of the property class clusters inhibits a clear spatial partitioning based on material properties.

Dimensional reduction

Parallel coordinates are also useful in displaying the impact of dimensional reduction strategies. With these strategies one seeks an approximate representation of the data in fewer dimensions.6 In this case, we apply a principal- component analysis (PCA)30,31 to effect this reduction. In brief, if we form n × 1 column vectors \(\vec x_j\left( {j = 1,2, \cdots ,p} \right)\) of observations of the p properties and assemble these column vectors to form an n × p data matrix, then we can readily construct the corresponding p × p covariance matrix Σ having an eigenvalue spectrum {λ i }(i = 1, 2, …, p) and whose eigenvectors, \(\vec v_i\) (i = 1, 2, …, p), form an orthonormal basis.32 In a PCA one projects the \(\vec x_j\) onto a subspace of dimension k < p by constructing a projection operator from the k eigenvectors of Σ having the largest corresponding eigenvalues.

The number of principal components required for a satisfactory representation of the data can be determined by using the rule of thumb that a component j is retained if \(\lambda _j >\mathop {\sum}\nolimits_{i = 1}^p \lambda _i{\mathrm{/}}p\).32 While in this case a two-dimensional representation is satisfactory, one extra dimension beyond what is necessary was added to more completely explain the data. Retaining one extra principal component, a PCA was applied to the ceramic property information shown in Fig. 3 to reconstruct the data in the relevant subspace with k = 3. These three components account for 87% of the total variance. In Fig. 4a a parallel-coordinate plot shows the fractional difference between the original and the reconstructed data from the PCA. In this format it is straightforward to determine the range of validity of the reconstruction. For this purpose horizontal lines show fractional differences of ±0.25. Figure 4b shows the data plotted in the k = 3 subspace determined by the PCA. Also shown is the ellipsoid that captures much of the behavior and is given by \(\left( {v_1{\mathrm{/}}\lambda _1} \right)^2 + \left( {v_2{\mathrm{/}}\lambda _2} \right)^2 + \left( {v_3{\mathrm{/}}\lambda _3} \right)^2\).

Fig. 4
figure 4

a A parallel-coordinate plot of the fractional difference between the original (from Fig. 3) and the reconstructed ceramic property data from the PCA (red). For comparison, horizontal lines (green) show fractional differences of ±0.25. b The property data for ceramics plotted in the k = 3 subspace obtained from the PCA. The ellipsoid with semi-major axes obtained from the PCA eigenvalues is also shown

Visualization of class envelopes

One may also wish to visualize the outer envelopes of materials classes to determine their relative hypervolumes in the materials property space. Parallel coordinates provides a powerful framework for this purpose. One approach to the construction of these envelopes is to determine the convex hulls associated with the respective collection of material points. Alternatively, one can use the results of a PCA to obtain an approximate, hyperellipsoidal envelope surrounding the points. More specifically, employing this latter option, one takes the envelope at position \(\vec u\) to be the quadratic form \(g\left( {\vec u} \right) = \left( {\vec u - \vec \mu } \right)^T{\mathrm{\Sigma }}^{ - 1}\left( {\vec u - \vec \mu } \right)\), where \(\vec \mu\) is the vector of mean values for properties and Σ−1 is the (inverse of the) aforementioned covariance matrix. The associated d-dimensional hypervolume

$$V = \frac{{2\pi ^{p/2}}}{{p{\mathrm{\Gamma }}\left[ {p{\mathrm{/}}2} \right]}}\left| \Sigma \right|^{1/2},$$
(5)

where Γ denotes the gamma function33 and || denotes a determinant.

Figure 5 shows the property envelope for the ceramic class in the principal axis frame. The overall shape of the envelope can be rationalized by noting that there is a mapping duality between an ellipse in d-dimensional Cartesian coordinates and a hyperbola in parallel coordinates.34 The doubly serrated shape of the envelope is a succession of hyperbolas that highlights the unequal lengths of the principal axes. A similar envelope can be constructed for the metals class. Upon comparing their corresponding volumes using Eq. (5), one finds that Vmetal/Vceram = 1.95, indicating that the metals class occupies a larger fraction of the multidimensional property space.

Fig. 5
figure 5

The property envelope for the ceramic class in the principal axis frame in parallel coordinates

Materials selection

Finally, we examine the application of parallel-coordinate charts to facilitate materials selection. The selection process typically involves the imposition of constraints among properties. Given the aforementioned duality relations between multidimensional Cartesian coordinates and parallel coordinates, one can describe these constraints in the latter representation. For example, for the points on the line y i (x) = m i x + b i , the corresponding polylines in parallel coordinates intersect at the point (1/(1 − m i ),b i /(1 − m i )), assuming a unit spacing between parallel axes.11 Thus, for points constrained to be between the lines y1 and y2 (with say m2 > m1 > 0), the corresponding polylines pass through a constraint’ trapezoid with bases \(\left| {b_2 - b_1} \right|{\mathrm{/}}\left( {1 - m_i} \right)\left( {i = 1,2} \right)\) and height (m2 − m1)/[(1 − m1)(1 − m2)] in parallel coordinates.

To illustrate the application of this approach, consider a materials selection problem in which one wishes to identify a metal for a relatively light structure that is both stiff and strong and that meets cost specifications. For example, one may seek a metal having a (normalized) specific stiffness E′/ρ′ > 1.4, a (normalized) specific strength UTS′/ρ′ > 1.0 and a normalized cost Cost′ < 2.0. For these requirements, the corresponding constraint’ trapezoids collapse to line segments along the abscissa. Thus, for the stiffness relation, one looks for a polyline segment with a positive slope that, upon extrapolation, intercepts the abscissa between 1/(1 − 1.4) = −2.5 and 0. Similarly, the strength relation leads to a polyline segment with a positive slope that intercepts the abscissa between 0 and 1.0. Figure 6 shows a truncated parallel coordinates plot focusing on three properties with a few candidate materials that satisfy at least some of these constraints. The white segments (i.e., the collapsed trapezoids) on the plot highlight regions that satisfy the constraints. In this plot beryllium is shown in red, molybdenum and chromium in green and titanium and aluminum in blue. Also, shown are extrapolated polyline segments for molybdenum and chromium (dashed green) and beryllium (dashed red) that are seen to intercept the abscissa between −2.5 and 0. By contrast, the polyline segments for titanium and aluminum (dashed blue) do not satisfy this constraint. It should also be noted that only beryllium satisfies the strength constraint, as indicated by the extrapolated polyline (dashed red). Finally, the selected metals, with the exception of beryllium, satisfy the cost constraint, and therefore no materials satisfies all three constraints.

Fig. 6
figure 6

A truncated parallel coordinates plot focusing on three properties, namely the (normalized) stiffness, strength and cost, with a few candidate materials that satisfy at least some of the stated constraints. Beryllium is shown in red, molybdenum, and chromium in green and titanium and aluminum in blue. The gray regions identify constraints for the polyline segments. The extrapolated polyline segments (dashed lines) show which materials satisfy the design constraints

Discussion

In this work, we illustrated the utility of combining the methods of data analytics with a parallel coordinates representation to construct and interpret multidimensional materials property charts. This construction, along with associated materials analytics, permits the identification of important property correlations, quantifies the role of property clustering, highlights the efficacy of dimensional reduction strategies, provides a framework for the visualization of materials class envelopes and facilitates materials selection by displaying multidimensional property constraints. Given these capabilities, this approach constitutes a powerful tool for exploring complex property interrelationships that can guide materials selection.

This paper illustrates the utility of combining higher-dimensional data visualization with data analytics to quantify useful relationships among materials properties. For reference, a summary of the important aspects of this analysis is given below.

  • The use of cluster validation to identify cluster centrality (via the geometric median) and remove clutter, the bane of parallel-coordinate plots (Fig. 3a, b). This analysis facilitates a discussion of property trends in metallic versus ceramic systems.

  • The use of the Thornton and Dunn metrics to characterize the quality of clusters. As discussed in the subsection entitled Property Trends and Cluster Validation, these metrics allow one to quantify the degree to which there are separable clusters based on materials properties. In this way, one can quantify, for example, the extent to which metals and ceramics differ in terms of their attributes.

  • The use of the purity index to quantify the correctness of property-point clustering. In this context, as noted above, the value of the index indicates that the proximity of the property class clusters inhibits a clear spatial partitioning based on material properties. Thus, the purity index complements the Thornton and Dunn metrics and gives one a quantitative measure of the degree to which materials classes are distinct.

  • The visualization of property envelopes in a high-dimensional space (Fig. 5) exploits the geometric duality between ellipsoids in Cartesian space and hyperboloids in parallel-coordinate space.11 In this case, parallel coordinates permit one to infer important directions (i.e., principal axes) of the hyperellipsoid that defines the property envelope in Cartesian space and thereby extend the Ashby property envelope to higher dimensions.

  • The use of parallel coordinates in combination with principal component analysis (PCA) enables one to quantify readily the degree of dimensional reduction achieved with PCA (Fig. 4a,b). In this way, one can determine whether a description of a materials class by a small number of linear property combinations is accurate.

While for the purposes of illustration and clarity we have focused here on two materials classes, the same approach outlined above can, of course, be applied to other materials classes (e.g., polymers, electronic materials) with the same benefits. It is especially well-suited to situations in which there are many coupled property variables and/or constraints, and one wishes to quantify systematically these relationships while navigating through a high-dimensional property space. Generalizations of this approach are also possible. For example, one may extend the representation and associated analysis to spatially heterogeneous systems with attendant property variations via position-dependent parallel coordinates. These extensions and other applications of this methodology are currently under investigation.

Methods

We employ here data analytics in conjunction with parallel coordinates, a technique for visualizing and interpreting multidimensional data. The data analytics methods used in our analysis include: principal component analysis (PCA),32 the Thornton separability23 and the Dunn clustering indices24,25 and centrality measures.22 The parallel-coordinate representation11 of the data enables one to identify trends and material property envelopes in high-dimensional systems.

Data availability

The author will make available, upon request, the data used in the applications described in this work. It is understood that the data provided will not be for commercial use.