Predicting temperature-dependent ultimate strengths of body-centered-cubic (BCC) high-entropy alloys

This paper presents a bilinear log model, for predicting temperature-dependent ultimate strength of high-entropy alloys (HEAs) based on 21 HEA compositions. We consider the break temperature, Tbreak, introduced in the model, an important parameter for design of materials with attractive high-temperature properties, one warranting inclusion in alloy specifications. For reliable operation, the operating temperature of alloys may need to stay below Tbreak. We introduce a technique of global optimization, one enabling concurrent optimization of model parameters over low-temperature and high-temperature regimes. Furthermore, we suggest a general framework for joint optimization of alloy properties, capable of accounting for physics-based dependencies, and show how a special case can be formulated to address the identification of HEAs offering attractive ultimate strength. We advocate for the selection of an optimization technique suitable for the problem at hand and the data available, and for properly accounting for the underlying sources of variations.


INTRODUCTION
Metallic structural materials with excellent mechanical properties have been widely used in a variety of operating conditions and often applied under constant or static loads. Engineering components under either loading conditions are usually required to exhibit high strength. Thus, it is important to be able to design advanced materials with favorable strength properties. Highentropy alloys (HEAs) have drawn great attention in the recent decade due to their excellent mechanical properties and vast compositional space, which makes them suitable for this purpose 1 . A key objective is to suggest a framework for joint optimization of mechanical properties, to introduce-in context with such a framework-compositions of HEAs yielding high ultimate strengths (USs), and to conduct experimental verification of our findings. Figure 1 outlines the multiple sources that impact the mechanical properties of HEAs, and highlights dependence between them. It is worth noting that improvements in the US may come at the expense of other properties (hence, framework for joint optimization). For example, there usually is a trade-off between the ductility and the strength of alloys. Sources of variations in US may involve difference in compositions, microstructures, parameters of postfabrication processes, or defect levels.
Data analytics and machine learning (ML) can help with rapid screening, i.e., expedite identification of HEAs exhibiting given properties of interest 17 . But as opposed to specifically applying ML, (narrowly) defined in terms of single-layer or multi-layer neural networks 17 , Bayesian graphical models, support vector machines, or decision trees, to identification of HEA compositions of interest, we reformulate the task in the broader context of engineering optimization. We recommend picking an optimization technique suitable for the application at hand and data available. But we certainly include ML in the consideration. For background material on ML, refer to 17 .
Effective application of ML may require a large number of data points. If you have such data, then ML can help you organize the data in a meaningful fashion and extract complex, hidden relationships 17 . But in the case of experimental data on HEA compositions with attractive strength properties (the present state of affairs), we are working in a domain of relatively limited data, a domain where traditional ML may exhibit limitations. Producing high-quality experimental data is usually both time consuming and expensive. In case of such limited data, it is essential to make the most of the underlying physics, i.e., to account for underlying physical dependencies, in the prediction model. Occam's razor and Bayesian learning provide tools for quantifying the notion of limited data in this context 17 . Our approach is in part based on observations of Agrawal et al. 18 . Table 2 and Figure 5 of 18 illustrate that there is at most a difference of a few percentages between the techniques applied to predict the fatigue strength of stainless steels. Table 2 of 18 shows that both simple linear regression and pace regression yield the coefficient of determination, R 2 , of 0.963, while an artificial neural network, a traditional ML technique, results in R 2 of 0.972.
In terms of an important contribution, this study presents a method capable of yielding consistency among the predictions of HEA compositions with attractive US, empirical rules of thermodynamics 19,20 , and experimental results. We accomplish this goal, despite relatively limited data available, and the corresponding selection of a simple prediction algorithm (multi-variate regression).
Then, we present in Fig. 2 elements of a physics-based model for predicting the US, a model that accounts for physical dependencies as a priori information.
But more importantly, we introduce a bilinear log model for predicting USs across temperatures. The model consists of separate exponentials, for a low-temperature and a hightemperature regime, with a break temperature, T break , in between. The model accounts for the underlying physics, in particular diffusion processes required to initiate phase transformations in the high-temperature regime 21 . Furthermore, we show how piecewise linear regression can be employed to extend the model beyond two exponentials and yield accurate fit, in case of a non-convex objective function caused by hump (s) in the data. Previous models for the temperature dependence of yield strengths (YSs) only accounted for a single exponential 22,23 . Hence, there was no break temperature, T break . We consider the break point critical for the optimization of the high-temperature properties of alloys. For reliable operation, the temperature of turbine blades made out of refractory alloys may need to stay below T break . Once above T break , materials can lose strength rapidly due to rapid diffusion, leading to easy dislocation motion and dissolution of strengthening phases 21 . We consider T break an important parameter for the design of materials with attractive high-temperature properties, one warranting inclusion in alloy specifications. Hence, it is important to accurately estimate T break , e.g., using the global optimization approach presented. Figure 26 of 17 summarizes the rational for initially restricting the analysis to room-temperature data. As illustrated in the figure, the US exhibits significant dependence on the temperature. While all compositions in Figure 26 of 17 contained a bodycentered-cubic (BCC) phase, and were subjected to some type of annealing, the US at 1000°C can be~1/8th (~12%) to~1/3rd (~33%) of the US at room temperature. With this fact in mind, and to maintain consistency across compositions, we elected to first apply the optimization framework to US at room temperature only.

Room temperature
Our original data set, listed in Table 13 of 17 , contains some 24 compositions that yield relatively high US at room temperature.
To accommodate the elements involved, we derived two feature sets, hereafter referred to as A and B, from the original data set in We have available 19 instances of Feature Vector A, and 22 of Feature Vector B. While the set of input data may seem small, we will show that the data suffice for meaningful prediction, provided that a suitable optimization technique is selected 17,24 .
In terms of data curation, we concluded that the US values, except for MoNbTiV x Zr (from 25 ), were recorded with high enough fidelity to warrant inclusion. For revised US measurements for MoNbTiV 0.75 Zr, refer to Table 1. For catching suspicious recordings of the US, one can employ proportionality relationships with the YS as a guideline. At least (or about) half of the references associated an uncertainty interval with the US values reported, with ΔUS usually within the range of 1% of the US reported.
In order to develop insight into the causes of variations in the US for the pure elements comprising Feature Vectors, A and B, and for the identification of a model for predicting compositions yielding high US, we point to Figures 24 and 25 of 17 . Figure 24 of 17 shows that processing conditions and purity can contribute to variations in US of Al of~3x and of~4x in the US of Co. Figure 25 of 17 similarly illustrates that processing methods can have significant influence on the US of V and Cr. For V, the variations in the US are~2x, and for Cr, we are looking at variations of up to~5x. This trend suggests that the inputs listed in Eq. (22) are indeed able to account for the variations in the US observed. But for a relative comparison of US across compositions, for the same heat-treatment process and defect levels, and at a fixed (room) temperature, the model may suffice. There may be additional sources of variations, such as the test mode applied. But according to our tentative analysis, presented in Supplementary Table 2, the variations in the YS observed, based on the test mode applied, tend to be relatively small. For the prediction presented in Supplementary Fig. 3, and Given the relatively small size of the data set in Table 13 of 17 , it appears that we may not be ready for traditional ML models. Models, such as artificial neural networks, decision trees, support vector machines, Bayesian networks, or genetic algorithms, tend to be effective in organizing and extracting complex patterns from large sets of data 17 . But for the application and limited data set at hand, it makes sense to select a simple linear-prediction model, multi-variate linear regression, to begin with, and build from there. As suggested by Agrawal et al. 18 , changing the method may not vary the results that much. According to Figure 5 of 18 and Table 2 of 18 , the linear regression yields R 2 of 0.963, when predicting the fatigue strength of a stainless steel, compared to R 2 of 0.972 for the artificial neural networks.
Our approach, outlined in 17 , assumes starting with a simple model, multi-variate linear regression, and accounting for the input sources that contribute to variations in the US observed. The approach then involves expanding the model, and adding non-linearities, based on the underlying physics, and as necessitated by the application at hand and data available.
When applying the multi-variate linear regression, we solve an unconstrained optimization problem of the form Here, y represents a vector of US values, b denotes a vector of regression coefficients, and X symbolizes a training set of compositions (a stacked version of x vectors 17 , derived from Table 13 of 17 ). This unconstrained optimization problem has a well-known, closed-form solution 17 When the training set is very small, the inverse (X T X) -1 may not exist. In that case, we recommend replacing (X T X) -1 with (X T X) + , the pseudoinverse 26 .
The observations reported in Supplementary Fig. 2-and more extensively in 17 -strengthen our belief in that the prediction accuracy, measured in terms of R 2 and the standard deviation normalized per data point, is primarily limited by the quality of (variance in) the input data. These limitations in the prediction accuracy are consistent with the variations observed in Table 13 and Figures 24 and 25 of 17 . These observations further suggest Table 1. Compression yield strength, σ y , maximum strength, σ max , and fracture strain, ε f , of the reference and predicted compositions at room temperature.

Composition
Sample diameter (mm)  Our intent is to construct models capturing the underlying physics. This abstracted model shows that the microstructure formed depends on the heat-treatment process applied, manufacturing, processing as well as the composition 34 . In case that an artificial neural network (ANN) is deemed suitable for the application at hand, we suggest employing custom kernel functions consistent with the underlying physics, for the purpose of attaining tighter coupling, better prediction, and extracting the most out of the-usually limited-input data available. Note that the composition can include trace-level elements (impurities), in addition to the principal components.
that multi-variate regression is indeed a suitable technique for this application.

Elevated temperatures
In an effort to identify compositions exhibiting the ability to retain strength at high temperatures, we present Fig. 3. In case of high-temperature applications, we are looking to derive a model of the form US ¼ hðcomposition; TÞ (6) for the prediction of the US across temperatures. In addition to the temperature dependence of pure tungsten and the HEAs, the temperature dependence of the commercial alloys [the Mo-rich Titanium-Zirconium-Molybdenum alloy and the Nb-rich C-103 alloy] is also of interest.
Looking at Fig. 3, one first notices that the strength vs. temperature data definitely do not look linear. Hence, the multivariate linear regression may no longer be the preferred approach. Second, the temperature dependence does come across as approximately exponential, but not exactly. It seems to entail a high-temperature and a low-temperature regime. Third, one may shy away from employing an automated ML suite, such as the Tree-Based Pipeline Optimization Tool 27 , because of limited ability of such black-box models to provide much needed insights into the underlying physics. One is motivated to make the most of the limited data available, by incorporating important a priori information about the underlying physics into the model structure, for purpose of deriving such insights. Fourth, Fig. 3a, e.g., the high-temperature data points for MoNbTaW, highlights the need for data curation.  Fig. 3c, d, together with physics-based insights from 21 , we model the temperature dependence of the US(T), in terms of a bilinear log model, parametrized by the melting temperature, T m , as follows:

Motivated by
There is an additional physics (diffusion) induced constraint on T break 21 : and a continuity constraint between the low-temperature and the high-temperature regimes: A conceptually simple approach for fitting the model in Eqs. (7)- (12) to the US data available consists of first deriving the constant coefficients, C 1 and C 2 , by applying linear regression to data points available to the lowest temperature region (0 < T < 0.35 T m ) as well as to the intermediate region (0.35 T m ≤ T ≤ 0.55 T m ). One can then derive the constants, C 3 and C 4 by applying linear regression to data points available to the intermediate (0.35 T m ≤ T ≤ 0.55 T m ) and high-temperature (T > 0.55 T m ) regions. Note that T break does not need to be known in advance. According to Eq. (12), this inherent property of a given alloy comes out of the model as the break point between the two linear regions. The model consists of only four independent parameters, C 1 , C 2 , C 3 , and C 4 , which simply can be estimated by applying linear regression separately to lowtemperature and high-temperature regimes, even to a fairly small data set. Note, furthermore, that for a new alloy system, T m , does not need to be known experimentally in advance either; a rough estimate for T m can be obtained, using "the rule of mixing", and a more refined estimate obtained, employing Calculation of Phase Diagram (CALPHAD) simulations 17 .
A superior approach for deriving the coefficients, C 1 , C 2 , C 3 , and C 4 , involves concurrent optimization over the low-temperature and high-temperature regimes using global optimization. Here, we seek to minimize min C1;C2;C3;C4 norm 2; i US T i ð Þ À y i ð Þ 2 ; where y i represents the measured US values, and US 1 (T i ) and US 2 (T i ) are modeled through Eqs. (8) and (9), respectively. Matlab provides a function, fminunc(), for solving this type of unconstrained minimization over a generic function. The results in Fig. 3a, b were derived, using the global optimization approach, applied separately to individual alloys, for the purpose of obtaining a tighter fit and more accurate estimation of T break , than for separate optimization over the low and high-temperature regimes.
It is worth noting that previous models for the temperature dependence of the YS only accounted for a single exponential 22,23 . Hence, there was no break temperature, T break . We consider the break point critical for the optimization of the high-temperature properties of alloys. For reliable operation, the temperature of turbine blades made out of refractory alloys may need to stay below T break (not accounting for coatings 17 ). Once the temperature of the turbine blades exceeds T break , undesirable phase transformations (e.g., dissolution of strengthening precipitates) may start to take place 21 , and the alloy may begin to lose structural integrity. Here, the second exponential, modeled through US 2 (T), may prove detrimental. Turbine blades made out of certain alloys, such as Ni-based superalloys, should only be operated above T break , if supported by extensive experimental test results. We consider the break point, T break , an important parameter for the design of materials with attractive hightemperature properties, one warranting inclusion in alloy specifications. Hence, it is important to be able to accurately estimate T break , e.g., using the global optimization approach presented [Eq. (13) The atoms cannot move out of the lattice, and no phase transformations can take place. This trend applies to lowentropy alloys, medium-entropy alloys, and HEAs, and serves to explain the two exponentials. A solid-solution strengthening model by Rao et al. 28,29 does not take into account diffusion effects and agrees well with the experimental data only at relatively low temperatures, where diffusion-controlling deformation mechanisms can be ignored. Then, as the temperature increases, the chemical bonds between the elements become softer. The diffusion-controlled regime generally occurs above~0.5-0.6 T m . It can be distinguished from the lowtemperature regime by a more rapid drop in strength with increasing temperature, because dislocations are able to move more easily around obstacles 30 .
A related model for the prediction of YS over temperature was presented by Wu et al. 22 . The authors separately analyzed the temperature dependencies of the YS and strain hardening of a family of equi-atomic binary, ternary, and quaternary alloys based on the elements, Fe, Ni, Co, Cr, and Mn, which had been shown to form single-phase FCC solid solutions. The authors presented a model with a single exponential for the overall YS, σ y (T), of the form 22 where σ a , C, and σ b were fitting coefficients. The authors showed that lattice friction appeared to be the predominant component of the temperature-dependent YS, possibly because of the Peierls barrier height decreasing with increasing temperature, due to a thermally induced increase in dislocation width. Note, while similar to the YS, we are here modeling the US. According to Maresca et al., the YS of the solid-solution BCC matrix alloy constitutes the major part of the alloy and can be estimated by 23 : where τ y0 is the zero-temperature flow stress, ΔE b is the energy barrier for dislocation movement, T is the absolute temperature, _ ε is the strain rate, and k B is the Boltzmann constant. For accurate modeling of the YS, it is important to consider dislocations, atomic and volume misfits.
Depending on the grain sizes and compositions involved, a trilinear log model may yield a better fit for certain alloys 31 : B. Steingrimsson et al.
Given an anomalous yield stress phenomenon in a CMSX-4, single-crystal, Nickel-based superalloy, three exponentials are needed for accurate modeling in case of Heat Treatment A, but four exponentials in case of Heat Treatment B, according to Supplementary Fig. 4. This phenomenon manifests itself as a hump between the low-temperature and high-temperature regimes, found in superalloys strengthened by L1 2 -ordered intermetallics. Here, the increased strength of γ′ phase with temperature is explained by thermally activated cross-slip of dislocations from {111} planes to {100} planes. Supplementary Tables 4 and 5 present a practical approach to model selection suitable for this case. We stop increasing the model order, once the mean squared error (MSE) starts to taper off. Supplementary Figs. 5 and 6 capture an application of piecewise linear regression needed to address challenges imposed by non-convexity of the objective function possible in this case. Here we expand the parameter set such as to explicitly include the break temperatures. Supplementary Figs. 7 and 8 contain Matlab pseudo-code for the bilinear log model (a convex case) and a trilinear log model (a possibly non-convex case).
In Supplementary Note 7, we provide physics-based reasons explaining why a bilinear model will likely suffice for refractory HEAs 32 . We also address the number of data points needed for modeling. Since the hump between the low-temperature and high-temperature regimes originates from the γ′ phase (which is a L1 2 phase, i.e., ordered FCC structure), and since most refractory HEAs contain BCC or hexagonal-closed-packed phases, with totally different dislocation systems, it is unlikely that cross-slip from {111} to {100} planes will happen in refractory HEAs 32 . Hence, we expect the bilinear log model (two exponentials) to suffice for most refractory HEAs. Table 2 further characterizes the ability of the 21 compositions under consideration to retain strength at high temperatures, both in terms of a high break temperature, T break , and a small slope, C 3 . Table 2 also compares the modeling accuracy of the bilinear log model to that of a single exponential. It is not surprising that the composition, MoNbTaW, which consists of strongly refractory elements (Mo, Nb, Ta, and W), i.e., elements with the melting point above 2200°C 33 , yields the highest T break of 1124°C. This evidence serves to validate the model. MoNbTaVW, which includes one weakly refractory element (V), i.e., an element with a melting point above 1850°C, results in the smallest slope, C 3 , of 4.85, compared to 7.82 for MoNbTaW. But this observation assumes omitting the data point at T = 1600°C, as a result of data curation. MoNbTaW will result in the lowest slope (C 3 = 2.75), if this data point is included. This trend underscores the importance of considering dislocations, interactions between elements, volume or lattice misfit, and atomic mismatch 23 , when designing materials with attractive high-temperature properties, in addition to the melting points of the constituent elements. Similarly, it is not surprising that the composition, AlMo 0.5 NbTa 0.5 TiZr, which additionally contains the weakly refractory elements, Ti and Zr, also results in a small slope (C 3 = 4.95). While AlMo 0.5 NbTa 0.5 TiZr does seem to offer relatively favorable high-temperature properties, it is worth noting that the estimation of its slope is only based on three data points.
In terms of the modeling accuracy, the bilinear log model yields the average MSE of 0.003 in the log domain, for composition No. 1-10 and 17-21 from  Fig. 4 provides T break refers to the breaking point between bilinear log models, defined in Fig. 3.

B. Steingrimsson et al.
graphical insight as to why the bilinear log model yields a better match to the data available than a model consisting of a single exponential. Supplementary Figs. 9-23 provide similar diagrams for the other 20 alloy compositions from Table 2.

DISCUSSION AND CONCLUSION
For interpretation of the prediction results, we refer to Section 4.7 of 17 . To analyze consistency with experimental verification, we point to Supplementary Compression tests were conducted on both the reference and predicted compositions in the as-cast condition. Figure 5 summarizes the engineering stress-strain curves of the predicted compositions, including Al 0.5 Mo 0.5 Nb 1.5 Ta 0.5 Zr 1.5 and Mo 1.25 Nb 1.25 Ti 0.5 V 0.5 Zr 1.25 , in comparison to the respective references, Al 0.5 Mo 0.5 NbTa 0.5 TiZr and MoNbTiV 0.75 Zr. The compression properties of these alloys, such as the YS, σ y , maximum strength, σ max , and fracture strain, ε f , are listed in Table 1. We conclude from the experimental results in Fig. 5 that the candidate compositions, Al 0.5 Mo 0.5 Nb 1.5 Ta 0.5 Zr 1.5 , Mo 1.25 Nb 1.25 Ti 0.5 V 0.75 Zr, and Mo 1.25 Nb 1.25 Ti 0.5 V 0.5 Zr 1.25 , indeed exhibit higher strengths than the respective reference, hence confirming the outcome of our two sets of prediction. Figure 6 shows the energy dispersive X-Ray spectroscopy (EDX) mapping for the predicted alloys. It can be noted that both of the predicted compositions feature typical dendrite-inter-dendrite microstructure, which indicates elemental segregation during solidification with high cooling rates. Al 0.5 Mo 0.5 Nb 1.5 Ta 0.5 Zr 1.5 exhibits segregation in all the five elements. The X-ray diffraction (XRD) results in Fig. 7 indicate that both of the predicted alloys contain two BCC phases. These results are consistent with known properties of BCC and FCC phases, in terms of the BCC phases usually helping improve material strength, but the FCC phases helping improve ductility.
In this study, we proposed a bilinear log model for predicting the US of HEAs across temperatures and evaluated its effectiveness for 21 compositions. We considered the break temperature, T break , an important parameter for design of materials with attractive high-temperature properties, one warranting inclusion in alloy specifications. For reliable operation, the operating temperature for the corresponding alloys may need to stay below T break . Previous models for temperature dependence of the YS only accounted for a single exponential. Hence, there was no break temperature.
We, further, suggested a general methodology for joint optimization, a methodology capable of accounting for physicsbased dependencies, and presented the maximization of the US as an initial step toward the joint optimization of mechanical properties. We applied an optimization technique suitable for the problem under study, linear regression analysis, to a data set of modest size from the literature, to predict HEA compositions yielding the exceptional US at room temperature. For accurate prediction, we recommended picking an optimization technique appropriate for the application at hand and the data available and carefully accounting for the underlying sources of variations. Despite relatively limited data and a simple prediction algorithm 17 , we were able to attain the goal of successfully predicting HEA compositions, exhibiting superior strength, compared to previous work, and to demonstrate consistency of our prediction, both with empirical rules (in Table 16 of 17 ) and with an experimental finding (per Fig. 5). In this way, we successfully addressed the research objective of predicting compositions of HEAs yielding the best strength, and conducting experimental verification of our findings. Next, one needs to account for the ductility. In case of the   Table 1.
maximization of US and presence of a relatively small data set, we recommended multi-variate linear regression as the method of choice. In this case, the prediction rule is fairly general: One can extrapolate in a direction of the gradient from the data point in the training set exhibiting the highest US. As long as the step size is selected as sufficiently small (only aiming for 5-10% increase in US for a single step), the resulting prediction is considered much superior to a trial-and-error approach. Sequential learning 17 is expected to greatly expedite the identification of alloys exhibiting given mechanical properties of interest.

METHODS
While the primary emphasis here is on the US, the optimization of the mechanical properties is assumed to take place within a framework for joint optimization. We essentially take the US to represent the ultimate compression strength, since Supplementary The spider chart in Fig. 8 provides a high-level depiction of the content of the database used for the present research. The database currently captures mechanical properties of some 218 US recordings for HEA configurations, many of which contain refractory elements, with 147 configurations measured at room temperature and 71 at elevated temperatures. The US was measured using tensile testing for 29 of the recordings. But the remaining 189 recordings were obtained through compression testing. Figure 2 captures an abstracted model of physical dependencies for the prediction of the US. This model is an extension of the input sources modeled in Eq. (21). Capturing of the physical dependencies helps greatly in terms of the incorporation of a priori knowledge, derived from the   7 XRD patterns for the predicted compositions. The two BCC phases could be related to the segregated microstructures observed in the EDX maps, which may contribute to the high strengths of the alloys due to the solid-solution strengthening effect and the second phase strengthening effects. For comparison and consistency, note that (according to Table 13 of 17 ) the training set consisted of a combination of single-phase and multi-phasemostly BCC compositions. A Panalytical Empyrean X-ray diffractometer, at Cu Kα radiation, was used to identify the crystal structure of the alloys. underlying physics, and in terms of making the most of the-usually limited-input data available.
Our intent is to accurately capture the input sources that contribute to variations in the US observed (to variations in the output). In the present research, we model the input combination as Input ¼ ðcomposition; T; process; defects; grain size; microstructureÞ: (21) Here, "defects" are defined broadly such as to include inhomogeneities, impurities, dislocations, or unwanted features. "T" represents temperature. Similarly, the term "microstructure" broadly represents microstructures, at nano-or micro-scale, as well as phase properties. The term "process" broadly refers both to manufacturing processes and post-processing. Correspondingly, the term "grain size" refers to the distribution in grain sizes. Section 4.4 of 17 allows for dependence between input sources, and Section 4.5 outlines the expected dependence of the US on the individual input sources listed. Dependence amongst the inputs is further addressed in Supplementary Note 2.

Methodology for maximization of the US
The overall methodology for predicting the US is presented in Supplementary Fig. 2. We summarize the prediction model as follows: US ¼ h½composition; T; process; defects process; T ð Þ ; grains process; T ð Þ ; microstructure process; T ð Þ : If the US corresponding to a given input combination is known, one can simply look up the known value. If the US corresponding to a given input combination is not known, then a prediction step can be applied (e.g., interpolation or extrapolation).
The purpose of the data curation step in Supplementary Fig. 2 is to ensure that input data to the prediction step are of the highest quality possible 17 . Here, the intent is to look for outliers, suspected cases of discrepancy, or incorrect data (data that one may not fully trust). Generally, it is recommended to filter out data that have no relevance to the application domain or the task at hand 17 .
The step in our methodology for maximization of the predicted US,ỹ, assumes a generic model of the typẽ y ¼ hðxÞ: Here, the input vector,x, can be considered as the definition of a feature set comprising of parameters related to the compositions, temperature, heat-treatment process, defect property, grain size, microstructure (phase properties), manufacturing process, or post-processing, essentially all the source parameters that impact the output quantity of interest. The transformation, h(·), can be a non-linear function of the input,x. Artificial intelligence and supervised ML are presented as one of the alternatives for deriving the system model 17 . For a parametrized description of the terms in Eq. (22), including composition, manufacturing processes, microstructures, and defects, and for support for the model in context with theory on ML, refer to Supplementary Notes 3-5.

Experimental validation
The alloys predicted were prepared by arc-melting a mixture of pure elements [purity >99.9 weight percent (wt.%)] in a Ti-gettered argon atmosphere. The ingots were flipped and remelted for at least five times to achieve homogenized elemental distributions. The ingots were cast into a water-cooled copper hearth and then cut into desired shapes for further experiments. Compression tests were performed on a computer-controlled uniaxial mechanical testing system with a servo hydraulic load frame at default strain rate of 1 × 10 -3 s -1 .

DATA AVAILABILITY
The data in this paper, including those in the Supplementary Figures, can be requested by contacting the corresponding authors (baldur@imagars.com or drzhangy@ustb.edu.cn).