To improve their efficiency, gas turbine engines (GTEs) must be able to operate at higher temperatures. The development of materials capable of withstanding these demanding operating conditions has played a key role in the evolution of GTE technologies. Ni-based superalloys are currently the material of choice for GTE blades, and have been continually redesigned over the past 40 years to increase their ability to operate at higher temperatures. Starting with PWA1480 and culminating in TMS-238, six generations of single-crystal Ni-based superalloys have been developed1. TMS-238 is the most advanced Ni-based superalloy to date, and is able to withstand 1000 hours of creep testing under 137 MPa tensile stress at 1100 C1. However, these alloys are approaching their operational limits as they are being designed to operate near their solidus temperatures. As a result, the discovery and development of ultrahigh-temperature materials are necessary to enable further increases in operating temperatures for GTE blades2.

Refractory Multi-Principal-Element Alloys (MPEAs) have shown promise as structural materials for gas turbine engine blades3. These alloys consist of multiple alloying elements (typically 4 or more) at concentrations ranging from 5 to 35 at%. The diversity of MPEA compositions offers the potential to design alloys with desirable properties such as low density, high-temperature yield strength, creep resistance, and oxidation resistance. However, the MPEA design space has been largely unexplored to date4. The high dimensionality of this space and the combinatorial explosion of different constituent combinations makes it challenging to explore. For example, a 5-component alloy system sampled at 5 at% would result in over 10,000 candidate designs, not including the exploration of microstructure space. Due to the vast size of the MPEA space, it is impossible to explore it through traditional experimental (or even computational) approaches.

Moreover, candidate alloys for complex engineering applications such as GTE must meet multiple design objectives and constraints, all at once. For example, they must be ductile at room temperature for formability while retaining their yield strength at high temperatures. However, the ‘strength-ductility trade-off’5 makes it difficult to design such an alloy. In addition to these objectives, candidate alloys must also meet a number of performance constraints, including low density, high thermal conductivity, narrow solidification range, high solidus temperature, and a small linear thermal expansion coefficient. The design of structural materials for GTE blades is, therefore, a highly constrained problem, requiring the simultaneous satisfaction of multiple objectives and constraints. It is not possible to know beforehand whether a given alloy will meet all of these requirements, so each point in the design space must be individually evaluated. These multi-objective, multi-constraint problems are more complex and resource-intensive than conventional single-objective, loosely-constrained design problems.

The Integrated Computational Materials Engineering (ICME) paradigm6 offers a promising approach for designing alloys with tailored properties through computational means by inverting the process-structure-property-performance (PSPP) chain. However, constructing meaningful linkages along the PSPP chain is a resource-intensive process, both experimentally and computationally. Traditional ICME methods are not sufficient for efficiently exploring a vast, high-dimensional design space while simultaneously optimizing for multiple objectives and satisfying a range of constraints. This presents a major challenge in the field, as it is crucial to identify constraint-satisfying Pareto-optimal designs within limited resources. Without more efficient approaches for exploring and exploiting highly-constrained multi-objective design problems, it will be difficult to make significant progress in this area.

Multi-objective Bayesian optimization (MOBO) methods have been popular in materials design because they work with minimal data and employ a heuristic-based search to look for the possibly most informative observations to make and increase a system’s state of knowledge in terms of optimal design. MOBO schemes have been successfully deployed in various contexts within the domain of materials science. For example, Arpan et al.7 leveraged MOBO to design interfacially controlled ferroelectric materials for superior energy storage and minimal energy loss. The authors performed 4-objective optimization of the following parameters: temperature, partial O2 pressure, film thickness, and surface ion energy. Solomou et al.8 optimally explored the multi-objective Pareto front in precipitation-strengthened shape-memory alloys by maximizing the Expected Hyper Volume Improvement (EHVI) scalar metric9. In another work, Suzuki et al.10 proposed a MOBO scheme known as Pareto-frontier entropy search (PFES). The proposed acquisition function evaluates the information gain via the mutual information between the objective functions and the Pareto front and selects the design most likely to improve the system’s knowledge of the Pareto front. The authors benchmarked the proposed optimization scheme against two datasets concerning the design of battery materials. Within the first dataset, the simultaneous maximization of ion conductivity and stability (minimization of formation energy) was performed within the Bi1−xyzErxNbyWzO48+y3/2z chemical space where a pool of 335 candidate designs were available. Likewise, simultaneous maximization of ion conductivity and stability was performed within the La2/3−xLi3xTiO3 chemical space where a pool of 1119 candidate designs was available. The authors note that this entropy-based approach to MOBO converged faster than implementations such as ParEGO11 in both design spaces.

An improvement to the Bayesian optimization paradigm is to employ multiple models representing the same quantity of interest. This is known as multi-fidelity BO and has been shown to effectively increase the robustness and efficiency of engineering design schemes12,13,14,15,16. These models are built upon different assumptions and/or simplifications and vary in fidelity and cost of the evaluation. The models can then be considered to be information sources that provide useful knowledge about a given quantity of interest (QoI). In multi-fidelity BO, the assumption is that every source has some helpful information about the design space. By accurately fusing the information from all available sources, it is possible to construct a fused model that is a better approximation to the ground truth than any information source in isolation. In the earlier works of refs. 12,13,14,15,16, a multi-fidelity approach has been employed to optimize a single quantity of interest (single-objective optimization). Recently, this multi-fidelity setting has been expanded to multi-objective design problems as well17. However, none of these prior works have tackled problems for which constraints must be actively learned to identify the feasible design space.

Constrained design problems pose a significant challenge because it can be difficult to handle constraints and ensure the feasibility of proposed solutions. Without properly identifying the feasible region in the design space, there is a risk that optimal designs may be infeasible. Recently, Hickman and Aldeghi et al.18 proposed a method for using Bayesian optimization (BO) with constraints in their Python module, GRYFFIN. However, this method assumes that the constraints are already known and can be easily checked, which is often not the case. Additionally, checking if a design satisfies a constraint can often require expensive computational modeling or resource-intensive experiments. In such cases, machine learning approaches can be more effective at learning and modeling the constraints. The main focus when learning a constrained design space is the boundary of the feasible space, rather than the value of the constrained quantity of interest (QoI) at a particular location. Instead of a regression model, it may be more efficient to use a classifier to represent the feasibility boundary that separates feasible and infeasible regions in the design space. Once this boundary has been correctly identified, optimization can be performed within the feasible design space, which increases the efficiency of the design process by limiting expensive queries against design objectives to only feasible design choices.

Of particular interest to this work is the Closed-loop autonomous materials exploration and optimization (CAMEO) framework, developed by ref. 19. The authors deployed CAMEO within the Ge-Sb-Te chemistry space in search of optimal phase-change memory materials for application in photonic switching devices. The authors first use GRENDEL (graph-based endmember extraction and labeling)20 to determine where boundaries between phases lie in the chemistry space. Once the phase boundaries have been learned, the authors then use CAMEO to optimize within a phase of particular interest; priority is given to design near phase boundaries where significant changes in the optical contrast between amorphous and crystalline states (the target property) are expected. This framework first identifies the phase-boundaries in a particular design space. Once the phase-boundaries are identified, CAMEO will sample near the boundaries as this is likely where the local maxima are located. Depsite this, CAMEO was limited to mapping phase boundaries. Furthermore, during constrained optimization in the context of alloy design, it is often the case that multiple constraints (not just phase boundaries) must be mapped in order to identify regions in the design space worth performing optimization in. In this work we propose a framework that actively learns the boundaries of multiple constraints and then searches within these boundaries for optimal materials.

In order to effectively use classifiers to represent constraint boundaries, it is necessary to learn the feasibility boundaries to ensure the accuracy of classifier predictions. In this work, we build upon our previous efforts21 in constraint-satisfaction multi-objective Bayesian optimization by introducing a entropy-based approach to the decision-making process. Our previous approach calculated entropy based on the difference between class membership probabilities predicted by Gaussian process classifiers, resulting in higher entropy for designs close to the predicted boundary. However, this approach did not take into account uncertainty in the probability predictions, and the entropy was heavily influenced by the location of the predicted boundary, which can change as the system learns more, potentially making previously queried data points less valuable. In this work, we propose calculating entropy based on uncertainty in class membership probability predictions so that designs with higher uncertainty about their class membership will have higher entropy regardless of their distance from the predicted boundary. This approach improves upon our previous efforts by considering uncertainty in probability predictions and reducing the reliance on the location of the predicted boundary.

Our proposed method for solving constrained design problems is not only faster than previous approaches but also allows for more informed decision-making at every stage of the process. By introducing a entropy-based approach to the Bayesian optimization (BO) framework, we are able to accurately learn the feasibility boundaries while also improving the system’s knowledge of the optimal values of the quality of interests QoIs. This is exemplified in our application of the method to a tri-objective, multi-constrained design problem over the Mo-Nb-Ti-V-W system, a complex multi-physics problem space. The efficiency and effectiveness of our approach are further enhanced when it is implemented with a batch variant in the BO stage. With this method, we are able to make confident and strategic decisions that lead to successful design outcomes.

The deployment of our framework within the Mo-Nb-Ti-V-W high entropy alloy system resulted in the identification of 21 constraint-satisfying Pareto optimal alloys. Importantly, the framework converges on a Pareto front of alloys that is interpretable. With regard to constraint satisfaction, we find that alloys that meet constraints relevant to GTE blades are lean in W and Mo due to the dominance of the density constraint. On the other hand, Ti- and V-rich alloys failed the minimum solidus temperature constraint. When considering the multi-objective optimization problem, compositions along the tri-objective Pareto front were found to have more W when near the strength axis. At the same time, they were rich in Nb when the alloys were near the axes for both ductility indicators. We note that identifying these Pareto optimal alloys with a brute force approach would have required the querying of ~10,000 alloys for five constraints each, just to learn the feasible space. On the other hand, the proposed framework learns the feasible space and identifies the Pareto set in ~700 queries. Furthermore, we demonstrate that employing a batch querying policy after the feasible space has been identified can decrease the time required to identify the Pareto set by ~95%.


Definition of design problem

Alloys suitable for GTE blades must meet several objectives and satisfy numerous constraints. For the sake of simplification, in this work, we consider two opposing types of design objectives, summarized in Table 1. On the one hand, the alloy must have high strength at high temperatures in order to carry the necessary structural loads during operation. On the other hand, the alloy has to possess some degree of ductility at room temperature to minimize the risk of fracture.

Table 1 The five constraints and the three objectives associated with the design problem addressed in this work.

In this work, we evaluate the HT (1300 C) yield-strength objective using a physics-based model developed by Curtin and Maresca22. We consider this model to be the truth model for the HT yield strength objective, as detailed in the Methods section. This model relies on the assumption that a hypothetical homogeneous ‘average’ alloy has all the macroscopic properties of the true random alloy22. The model’s grounding assumption is that the intrinsic strength of compositionally complex BCC alloys originates from the increased ‘roughness’ of the landscape that dislocations must traverse to induce plastic deformation. The model is capable of incorporating temperature effects and has been found to be in moderately good agreement with available experimental data.

While models for the elongation at fracture (ϵf) of MPEAs are not available, the ductility of MPEAs can be roughly inferred from ground state properties of alloys, such as the Pugh ratio and the Cauchy pressure. These two indicators of ductility have been used extensively in the design of ductile MPEAs21,23,24,25. In the context of metals, Pugh’s ratio is defined as the ratio of the bulk modulus to the shear modulus (B/G). Thus, B/G captures the extent of the plastic deformation (B) without fracture (G)26. Pettifor27 proposed Cauchy pressure as an indicator of intrinsic ductility/brittleness, which is the difference between two elastic constants C12 and C44. A positive Cauchy pressure indicates non-directional metallic bonds resulting in intrinsic ductility of the crystal, whereas a negative Cauchy pressure corresponds to directional bonds and results in an intrinsically brittle crystal structure. Both indicators can be estimated with high-fidelity DFT frameworks at a great computational cost. However, as the MPEA composition space is combinatorically vast, sufficient exploration of the space is intractable using conventional brute-force approaches. In the case of this work, the truth model for both ductility objectives is the DFT-based Korringa–Kohn–Rostoker Green’s function (DFT-KKR-CPA) method, as detailed in the Methods section.

In addition to the objectives associated with strength and ductility, candidate alloys for next-generation GTE blades must satisfy several constraints. Feasible alloys must have a sufficiently high solidus temperature to operate in the hot zone of the turbine. As such, we stipulate that the solidus temperature be greater than 2000 C. Moreover, candidate alloys must also be lightweight, both to minimize centripetal forces caused by the rotation of the blades28 and to reduce the total mass of the GTE system. For this reason, we stipulate feasible alloys must have a density of less than 9 g/cc. Alloys should also be designed with the thermal management system of the turbine blade in mind. As such, the material comprising the turbine blades must have high thermal conductivity to dissipate the large amounts of heat from the hot zone of the engine29.

Additionally, the blade must be compatible with thermal barrier coatings. To ensure this, the linear thermal expansion from room temperature to 1300 C must not exceed 2%/K. Furthermore, from the manufacturing standpoint, these alloys must be resistant to solidification tearing, a common concern during the synthesis/fabrication of metallic parts from melt precursors. While solidification tearing results from very complex physical processes, a narrow solidification range can protect against this failure mode. Here, we stipulate the solidification range not exceeding 400 C. Finally, we want to note that the design constraints and objectives described above and summarized in Table 1 are derived directly from the challenge specifications by the Department of Energy’s ARPA-E ULTIMATE program2. Thus, the present alloy design exercise has some practical relevance.

Deployment of framework

The proposed framework is structured by connecting Bayesian classification and Bayesian optimization loops. Starting with the Bayesian classification loop, the goal is to actively learn the boundaries separating the feasible and infeasible regions. Therefore, a binary classifier is a natural choice for such a condition. A Bayesian approach to learning the boundaries requires classifiers capable of providing uncertainty for class membership predictions. Thus, Gaussian process classifiers are employed to represent design constraint boundaries. A formal way to make uncertainty a comparable quantity is by representing it as entropy. Thus, active learning in the Bayesian classification framework is done by attempting to reduce the entropy associated with the classifiers via augmenting the prediction standard deviations provided by Gaussian process classifiers for a set of designs to the Shannon entropy formula. Once the reduction in entropy drops below a threshold, the predicted feasible regions are fed to the Bayesian optimization loop by generating feasible designs to be searched. The Bayesian optimization framework uses Gaussian process regressions (GPRs) to model objective functions and Expected HyperVolume Improvement (EHVI) as the acquisition function to suggest the most informative experiments to discover better approximations of the Pareto frontier. Note that the Bayesian classification loop runs in parallel to the Bayesian optimization loop in search of experiments that may significantly reduce entropy. Thus, the framework is capable of dynamically switching between both loops depending on the expected information gain calculations. the schematic the framework is illustrated in Fig. 1 (All codes will be publicly available upon the end of the project at the following Github repository:

Fig. 1: Schematic of the Bayesian optimization framework with active learning of the design constraints.
figure 1

In every iteration of the framework, both Bayesian classification and Bayesian optimization loops run in parallel. The algorithm starts with Bayesian classification and switches to Bayesian optimization once the average reduction in entropy of all constraint models falls below a threshold. The framework switches back to Bayesian classification if a valuable experiment is suggested accordingly.

Figure 2 illustrates the overall results of implementing the proposed framework to solve the 3-objective, 5-constraint design problem in this study. A total of 700 iterations were completed, and the Bayesian optimization stage was initiated after iteration 420, when all the average entropy reduction plots flattened and dropped below 3%. At the beginning of the Bayesian optimization stage, classifiers were used to filter the design space, discarding the infeasible regions first. As more queries were made to the objective functions, better estimations of the Pareto front were obtained, as indicated by improvements in the hypervolume value. Initially, larger improvements were observed. However, the improvements gradually decreased, indicating convergence to the optimal Pareto front. It is important to note that all queries were made around one corner of the objective space corresponding to the maximum values of each quantity of interest, which confirms that the framework effectively recognized the optimal design region and is searching that area to discover better non-dominated solutions.

Fig. 2: Overall results of the 5-constraint 3-objective material design problem.
figure 2

The figure shows the application of the proposed framework to solve the problem. The process begins with learning the constraint boundaries by querying the constraints, effectively reducing the entropy associated with each classifier that represents a specific constraint. Once the entropy curves for all classifiers are flattened, Bayesian optimization begins to learn the non-dominated design region. As the estimations of the Pareto front improve, the hypervolume increases respectively. The figure also includes an illustration of the objective space, showing all the queries to the ground truth model and the final estimation of the Pareto front.

While the aforementioned results are obtained using a sequential approach during the Bayesian optimization stage, we also consider a batch Bayesian optimization approach. Since there is no change in the Bayesian classification stage and the related results, the batch process begins after the optimization stage is triggered. Employing the batch Bayesian optimization scheme enables the execution of 48 experiments in parallel. This is equivalent to processing a batch of 48 samples at every single iteration at no or low additional costs. While economies of scale are likely to be more modest in the context of actual physical experiments, in this computational study, the batch of 48 simultaneous calculations was executed at no additional cost (per sample).

By employing the batch Bayesian optimization scheme, the same hypervolume improvement is obtained in only 13 iterations, as a comparison is shown in Fig. 3. In contrast, 280 iterations were needed while exploring the Pareto set using sequential MOBO. This corresponds to a reduction in the time necessary to discover the Pareto set of 95%. While the total cost (in terms of supercomputing time) associated with the calculations was roughly the same in both cases, there is a significant opportunity cost incurred during sequential BO by not learning the Pareto set early enough. Assuming each iteration lasts one day, it is much more valuable to learn the design capabilities of an alloy system in just 2 weeks rather than 9 months. In this context, batch-based strategies can significantly reduce opportunity costs related to long development times.

Fig. 3: Comparison of hyper-volume improvements in batch and sequential Bayesian optimization.
figure 3

Only 13 iteration is needed to reach the same Pareto front estimation quality in comparison to the sequential approach. 48 cores are accessible in our supercomputing system. Thus, it is possible to run 48 experiments in parallel without additional wall-time in batch Bayesian optimization case.

The fact that the batch and sequential BO schemes show the same hypervolume improvement at convergence means that they achieve a predicted Pareto set of similar quality. However, the non-dominated designs found (i.e., the alloys comprising the Pareto sets) may not necessarily be the same due to the high dimensionality of the input space and the stochastic nature of the BO process.

Regarding the discovery of constraint satisfying candidate alloys, UMAP (Uniform Manifold Approximation and Projection) in Fig. 4a–c shows that alloys rich in Ti, Mo, and particularly W fail one or more of the five constraints, depicted in gray. For a more quantitative view of this filtering process, in Fig. 4d, a Kernel Density Estimate (KDE) is fit over the frequency at which elements at various concentrations remain after filtering to visualize the chemical signature of the resultant feasible space. In these chemical signature plots, we see that the Ti and Mo signatures are slightly shifted back, indicating a slight depletion in these elements. On the other hand, the W signature is shifted back significantly, indicating W-rich alloys fail at least one of the design constraints.

Fig. 4: Visualizations of constraint-passing and pareto-optimal alloys.
figure 4

a ROM Cauchy Pressure (indicator of true objective) plotted over the design space. b ROM Pugh Ratio (indicator of true objective) objective plotted over the design space. c Estimated yield strength from the Curtin–Maresca model plotted over the design space. d Chemical signature of the feasible chemical space. e Chemical signature of Pareto-optimal set of alloys.

The optimization portion of the framework converged on 21 Pareto-optimal alloys. The best-performing alloys with regard to the ductility indicators are rich in Nb. This can be seen in the UMAP, where the Pareto-optimal alloys, represented by stars, are located near the Nb-rich corner of the diagram. On the other hand, Pareto-optimal alloys that perform the best with regards to the HT yield strength metric have higher W-content. Again, this can be seen in the UMAP, where Pareto-optimal alloys approach the W-rich corner of the diagram until reaching the border of the feasible region. Likewise, the alloys that strike a trade-off between these three objectives have a wide range of potential Nb and W contents. This range of Nb and W contents can be seen in the chemical signature of the Pareto set, where the chemical signature of these two elements has broad peaks. These alloys and their associated objective and constraint values are summarized in Table 2. We recommend further investigation of these 21 Pareto-optimal alloys to properly characterize their behavior in the context of GTE blade applications.

Table 2 The set of alloys that lie on the tri-objective strength-ductility Pareto-front identified by the proposed framework.


To benchmark, the performance of the constraint-satisfaction aspect of the proposed framework, a factorial exploration of the space was performed. The information sources for the 5 constraints were queried at increments of 5 at% considering binaries to quinaries resulting in 10,626 queries of each model (53,130 queries in total). Using the proposed batch active learning of constraints, only 420 queries were required to learn the extent of the feasible design space, demonstrating the improved efficiency of the proposed framework over a brute-force approach, with a total reduction in the effort of ~96%. Here we note that while in this work, the constraints were evaluated computationally at relatively modest cost, in a real physical setting such a reduction in effort would have a dramatic impact on the feasibility of experimental campaigns.

We note that the classification of the feasible space has arrived at interpretable results. Regarding the solidus temperature, 85.96% percent of alloys pass the Ts ≥ 2000 °C constraint. Alloys that fail this constraint are rich in Ti and V. This is to be expected, as Ti and V are the least refractory elements comprising this design space. Most alloys in the space (99.46%) pass the thermal conductivity constraint κ ≥ 20 W/m/K. The few alloys that fail this constraint are, again, rich in the two elements with the lowest thermal conductivities, Ti and V. Again, this is likely due to the fact that Ti and V are the least refractory elements in this design space. In addition to Ti-rich and V-rich alloys, compositionally complex alloys are also more likely to fail this constraint due enhanced phonon and electron scattering leading to a decreased thermal conductivity, putting a slight penalty on more high-entropy alloys.

All alloys in the Mo-Nb-Ti-V-W space pass the thermal expansion coefficient constraint CTE < 2% 1/K. Regarding solidification range, 97.12% of candidate alloys pass the ΔT ≤ 400 K constraint. Alloys that fail this constraint are rich in W and Ti. This is to be expected as these W and Ti have the biggest difference in their melting temperatures i.e., 3422 °C and 1668 °C, respectively. Furthermore, increased alloy complexity alloys increases the solidification range, again putting a penalty on high entropy alloys. Regarding density, 42.55% alloys pass the ρ ≤ 9 g/cc constraint. The three most refractory elements, W, Mo, and to a lesser extent Nb, fail this constraint. Figure 5 depicts a summary of this filtering.

Fig. 5: Pairwise plot demonstrating correlations and trade-offs between the 5 constraints applied to the design space.
figure 5

Alloys that are comprised of more than 50% of a particular element are colored accordingly. Alloys that do not have a majority element are colored in gray. Diagonal rows depict property distributions for each class of alloy. The lower-left triangle depicts Kernel Density Estimate (KDE) estimates over joint property distributions to better visualize the structure of the data.

Likewise, the optimization aspect of the framework has converged on results that can be understood using metallurgical intuition. The fact that Nb-rich alloys perform well concerning the ductility objectives agrees with other works where Nb is to enhance the ductility of otherwise brittle RHEAs30. The Pugh ratios and Cauchy pressures of these 21 Pareto-optimal alloys are on the order of 3.32 ± 0.266 and 93.1 ± 1.92 GPa, respectively. These values are comparable to the ductile refractory MPEAs TiHfVNbTa (B/G = 3.817, C12 − C44 = 75 GPa, ϵfrac = 12.6%)31 and NbMoTaWTi (B/G = 2.74, C12 − C44 = 73, ϵfrac = 13%).32. Regarding yield strength, increasing the W content within MPEAs has been shown to increase the yield strength of alloys33.

To further benchmark the performance of the optimization aspect of the proposed framework, we carried out a DFT analysis of the Pareto-front. For example, in Fig. 6, we analyzed the correlation of at.% Nb and V, (as both are from same group in the periodic table) on key DFT quantities such as formation energy (Eform), intrinsic-strength, and Pugh’s ratio34.

Fig. 6: Phase stability, intrinsic-strength, and ductility.
figure 6

a, b Formation energy plotted with respect to Mo+Nb and V concentration for 21 MPEAs in Table 2. c Intrinsic strength (bulk-moduli), and d Pugh’s ratio with respect to V + Nb concentration.

In Fig. 6a, b, we plot Eform with respect to (Mo + Nb) and V concentration, respectively, where an increase in at.% (Nb with Mo) increases the alloy stability while increasing at.% V destabilizes the BCC phase. We found that there is an optimal V (<50 at%) or Mo + Nb (>50 at%) concentration that stabilizes the alloy. On the temperature scale, the 25 meV is equivalent to 300 K (RT; 27 °C), i.e., all predicted HEAs (except one) show RT stability.

The intrinsic strength (bulk moduli, B) and Pugh’s ratio (i.e., ductility indicator) in Fig. 6c, d shows a strong correlation with V+Nb composition for predicted HEAs in Table 2. As seen in Fig. 6c, the intrinsic strength decreases sharply with increasing V + Nb concentration, while Pugh’s ratio (shown in Fig. 6d) increases. Alloys with Pugh’s ratio (G/B) < 0.57 are considered ductile based on Pugh’s criteria26. Furthermore, a good correlation is observed between framework-predicted properties in Fig. 4 and DFT calculations in Fig. 6 for increasing Nb composition. This correlation suggests the utility of such frameworks for reliable exploration and understanding of the strength-ductility trade-offs in HEAs.

In light of recent initiatives for ICME-enabled closed-loop design platforms and autonomous materials discovery, it is important to note that the methodology used in this work, while conducted in silico, can also be used to guide experimental exploration of design spaces. One possible approach would be to use computational models to initially reduce the design space by applying relaxed constraints to eliminate candidates that are likely to fail one or more constraints, such as predicted thermal conductivity greater than 10W/m − K. This initial filtering could then be followed by experimental campaigns to more accurately determine the true boundaries of the constraint-satisfying regions in the design space using stricter constraints, such as thermal conductivity greater than 20W/m − K. A possible design of experiments could include using a dilatometer to measure the thermal expansion coefficient35, a densimeter to measure the density36, a laser flash apparatus to measure thermal conductivity37, and a high-temperature tensile testing rig to measure the yield strength and elongation at yield38. For constraints related to the solidus and solidification range, the design space could be reduced by relying on CALPHAD-based predictions, as it is currently not feasible to experimentally determine the melting temperature for such refractory alloys in an HTP manner. After reducing the design space, an experimental campaign could be undertaken to optimize simultaneously for strength and ductility. The proposed framework can be useful for autonomous and closed-loop material design campaigns, as depicted in Fig. 7.

Fig. 7: Schematic representation of an experimental campaign utilizing the proposed framework.
figure 7

The process begins with a closed-loop exploration of the design space to identify the range of compositions that meet all requirements. This forms the feasible region of the design space. Optimization is then carried out within this region to identify a set of Pareto-optimal alloys as the final outcome of the campaign.


In this study, we proposed and implemented an approach to solving constrained multi-objective design problems by deploying a Bayesian classification and optimization-based active learning strategy. The framework is capable of handling an arbitrary number of objectives and constraints. Moreover, the Bayesian classification scheme uses an entropy-based measure to select an optimal sequence of informative experiments. As a result, this approach can identify the feasible boundaries on the design space in a more efficient manner compared to previous approaches21 by incorporating the uncertainty provided by Gaussian process classifiers regarding the class membership predictions. The superiority of our MOBO framework is that it employs a Bayesian classification approach that can handle any number of constraints and recognizes the feasible regions regarding each constraint without spending a substantial computational budget to obtain training data required for accurately distinguishing the feasible and infeasible regions. Since the models representing the constraints are not computationally cheap to evaluate, it is vital to manage the available resources to make observations on alloys with the greatest values in them.

To determine the overall uncertainty of a classifier, the class memberships of a set of randomly generated samples are checked. However, the labels are not informative here, but the uncertainty of the predictions in the form of standard deviation is used to calculate the entropy. As the classifiers get more information in terms of boundaries, the standard deviations get smaller, and so does the entropy. Here, a criterion is defined by the user to make the transition to the Bayesian optimization stage once all classifiers are confident enough in terms of label predictions. Since the entropy data is noisy because, at every iteration, a different set of samples are generated in the composition space to make sure it does not overlook any part of the space, a window of 50 iterations is considered to calculate the average reduction in entropy (in distances of 25 iterations). In this case, we stop considering a constraint among the possible experiments for the next step if this average drops below 3 percent.

Once all constraints meet the defined criteria, the Bayesian optimization stage begins; however, the framework still keeps track of entropy values for all constraints at every iteration of the process so that if it finds an experiment of great value (when the average entropy reduction jumps greater than 3 percent), it may switch to the classification stage and perform that experiment. This dynamic decision-making approach makes the framework capable of switching between classification and optimization stages when necessary. Below, all the ingredients of this framework are introduced.

Gaussian process regression

Surrogate models are essential for a Bayesian optimization framework to model black-box functions, given prior observations made from these functions. Moreover, surrogate models make it possible to search the space at low computational costs, looking for the best next experiment that adds the most information about the optimum design to the system. This work uses GPRs as surrogates to model the objective functions39. Gaussian process models are powerful tools for probabilistic modeling due to the ease with which models can be updated with newly acquired information. Moreover, they provide probabilistic predictions that model the uncertainty associated with unobserved regions in a given design space. Finally, GPs are constructed with an intrinsic notion of distance (or correlation) between points in a design space. This correlation is exploited when predicting the model uncertainty.

Since more than one model may represent the same quantity of interest, each model needs its own GPR. These models are considered as different sources that the system has access to gain required information about a quantity of interest—such frameworks are known as multi-information source approaches. Following refs. 15,16, we formulate the surrogates (GPRs) by assuming we have available multiple information sources, fi(x), where i {1, 2, …, S}, to estimate a quantity of interest, f(x), at design point x. These surrogates are indicated by fGP,i(x). Assuming there are Ni evaluations of information source i denoted by \(\{{{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{y}}}}}_{{N}_{i}}\}\), where \({{{{\bf{X}}}}}_{{N}_{i}}=({{{{\bf{x}}}}}_{1,i},\ldots ,{{{{\bf{x}}}}}_{{N}_{i},i})\) represents the Ni input samples to information source i and \({{{{\bf{y}}}}}_{{N}_{i}}=\left({f}_{i}({{{{\bf{x}}}}}_{1,i}),\ldots ,{f}_{i}({{{{\bf{x}}}}}_{{N}_{i},i})\right)\) represents the corresponding outputs from information source i, then the posterior distribution of information source i at design point x is given as

$${f}_{{{{\rm{GP}}}},i}({{{\bf{x}}}})| {{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{y}}}}}_{{N}_{i}} \sim {{{\mathcal{N}}}}\left({\mu }_{i}({{{\bf{x}}}}),{\sigma }_{{{{\rm{GP}}}},i}^{2}({{{\bf{x}}}})\right)$$


$$\begin{array}{rlr}{\mu }_{i}({{{\bf{x}}}})&={K}_{i}{({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})}^{T}{[{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{X}}}}}_{{N}_{i}})+{\sigma }_{n,i}^{2}I]}^{-1}{{{{\bf{y}}}}}_{{N}_{i}}&\\ {\sigma }_{{{{\rm{GP}}}},i}^{2}({{{\bf{x}}}})&={k}_{i}({{{\bf{x}}}},{{{\bf{x}}}})-{K}_{i}{({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})}^{T}\\ &\quad{[{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{X}}}}}_{{N}_{i}})+{\sigma }_{n,i}^{2}I]}^{-1}{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})\end{array}$$

where ki is a real-valued kernel function, \({K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{X}}}}}_{{N}_{i}})\) is the Ni × Ni matrix whose m, n entry is ki(xm,i, xn,i), and \({K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})\) is the Ni × 1 vector whose mth entry is ki(xm,i, x) for information source i. We have also included the term \({\sigma }_{n,i}^{2}\), which is used to model observation error for information sources based on experiments or expert’s opinion. Note that the term signal variance is to cover two sources of uncertainty: the variance associated to the GPR estimation of the objective function and the variance associated to the information source with respect to the highest fidelity model, also known as the ground truth.

Gaussian process classification

In Bayesian classification frameworks, similar to Bayesian optimization technique, Bayes’ theorem can be employed but to calculate the joint probability p(y,x), where y is the class label:

$$p(y| {{{\bf{x}}}})=\frac{p(y)p({{{\bf{x}}}}| y)}{\mathop{\sum }\nolimits_{c = 1}^{C}p({C}_{c})p({{{\bf{x}}}}| {C}_{c})}$$

Gaussian process classifications (GPCs) are probabilistic models that predict the probability of belonging to a specific class by putting a Gaussian process prior over a latent function f(X) and computing the posterior distribution at a desired location x39,40. GPCs are formulated similar to GPRs but with labeled data, instead of a continuous objective value, as follows:

$$\begin{array}{ll}{\mu }_{i}({{{\bf{x}}}})={K}_{i}{({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})}^{T}{[{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{X}}}}}_{{N}_{i}})]}^{-1}f({{{\bf{X}}}})\\ {{{\Sigma }}}_{i}({{{\bf{x}}}})={k}_{i}({{{\bf{x}}}},{{{\bf{x}}}})-{K}_{i}{({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})}^{T}\\ \qquad\qquad {[{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{{\bf{X}}}}}_{{N}_{i}})]}^{-1}{K}_{i}({{{{\bf{X}}}}}_{{N}_{i}},{{{\bf{x}}}})\end{array}$$

The class label predictions are obtained by performing Monte Carlo sampling from the calculated posterior distribution and then passing samples through a sigmoid function σ to ensure the output is bounded to [0,1]. Then the mean and variance of the obtained distribution define the class membership probability and associated uncertainty to the predicted label.

By utilizing a Bayesian methodology, the inclusion of uncertainty in the predictions is a crucial aspect in determining the expected utility value. Importantly, this feature differentiates Gaussian Process Classification (GPC) as a probabilistic model from other classification methods. As a result, GPC is particularly well-suited for applications that involve probabilistic frameworks and machine learning tasks. A more detailed discussion is presented in ref. 39.

Active learning in Bayesian classification

As mentioned earlier, GPCs are probabilistic models well-suited for Bayesian classification frameworks because they provide uncertainty associated with the predicted class memberships. The class membership predicted by a GPC of information source i is a random variable defined via a normal distribution \({\mathbb{Y}} \sim {{{\mathcal{N}}}}\left({p}_{i}({{{\bf{x}}}}),{\sigma }_{i}^{2}({{{\bf{x}}}})\right)\). A Bayesian classification framework aims to reduce the overall classifier’s uncertainty associated with class membership predictions. To further quantify the uncertainty of a classifier, a measure is needed to compare how newly added information to the system may help to achieve more accurate classifiers. Entropy is a natural choice here to determine the uncertainty of different models.

Herein, we propose to use the uncertainty in form of standard deviation assigned to class membership predictions of a GPC. Then, we employ the discrete entropy formula to determine the entropy:

$$H=-\mathop{\sum }\limits_{j=1}^{k}{\sigma }_{j}\,\log ({\sigma }_{j})$$

where we have predicted the labels of k samples randomly generated, and σj is the standard deviation of the predicted class membership provided by the GPC. The more accurate a classifier is about the boundary, the less uncertain it will be about the assigned labels. Such a decrease in uncertainty is manifested as a lower model/classifier entropy. Employing the entropy measure as the utility function in a Bayesian classification framework, we can recognize the best next experiment to make and update the system that results in the most significant reduction in a classifier’s entropy.

Information fusion of multiple sources

Several approaches exist for fusing multiple sources of information, such as Bayesian modeling averaging41,42,43,44,45,46, the use of adjustment factors47,48,49,50, covariance intersection methods51, and fusion under known correlation52,53,54.

In engineering design, there often exist multiple models that represent the same system of interest. Each model provides valuable information about the quantity of interest. By combining all of this knowledge through a process known as model fusion, more accurate and less biased models can be produced. As more sources are incorporated into the fusion process, it is commonly expected to see a reduction in the variance of the quantity of interest estimates. However, this is not always the case with other fusion techniques such as Bayesian model averaging41,42,43,44,45,46,55, the use of adjustment factors47,48,49,50, covariance intersection methods51, with the exception of fusion under known correlation 52,53,54.

Unlike some multi-fidelity methods, our approach does not rely on any assumptions about the relative importance of the information sources. As a result, it is crucial to establish the correlation between information sources prior to the fusion process. We use a technique called the reification process to estimate the correlation coefficients between the different information sources56,57. In accordance with the methodology outlined in previous studies such as refs. 15,56,57,58, once the correlation coefficients are determined, the fused mean and variance at a specific design point x can be defined using the method proposed in ref. 54.

$${\mathbb{E}}[\hat{f}({{{\bf{x}}}})]=\frac{{{{{\bf{e}}}}}^{{{{\rm{T}}}}}\tilde{{{\Sigma }}}{({{{\bf{x}}}})}^{-1}{{{\boldsymbol{\mu }}}}({{{\bf{x}}}})}{{{{{\bf{e}}}}}^{{{{\rm{T}}}}}\tilde{{{\Sigma }}}{({{{\bf{x}}}})}^{-1}{{{\bf{e}}}}}$$
$${{{\rm{Var}}}}(\hat{f}({{{\bf{x}}}}))=\frac{1}{{{{{\bf{e}}}}}^{{{{\rm{T}}}}}\tilde{{{\Sigma }}}{({{{\bf{x}}}})}^{-1}{{{\bf{e}}}}}$$

where e = [1, …, 1]T, \({{{\boldsymbol{\mu }}}}({{{\bf{x}}}})={[{\mu }_{1}({{{\bf{x}}}}),\ldots ,{\mu }_{S}({{{\bf{x}}}})]}^{{{{\rm{T}}}}}\) given S models, and \(\tilde{{{\Sigma }}}{({{{\bf{x}}}})}^{-1}\) is the inverse of the covariance matrix between the information sources. A more detailed discussion and examples can be found in Refs. 12,14,16,56,59,60,61,62.

Multi-objective optimization

A multi-objective optimization problem is defined as

$$\,{{\mbox{minimize}}}\,\,\,\,\{{f}_{1}({{{\bf{x}}}}),...,{f}_{n}({{{\bf{x}}}})\},{{{\bf{x}}}}\in {{{\mathcal{X}}}}$$

where f1(x), …, fn(x) are the objectives and \({{{\mathcal{X}}}}\) is the feasible design space. In multi-objective optimization, it is typical that there is no single solution that simultaneously optimizes all objectives. Rather, the optimal solutions are represented by a set of non-dominated designs, which form the Pareto front in the objective space. In this context, the optimal solutions, y, to a multi-objective optimization problem with n objectives, are denoted as \({{{\bf{y}}}}\prec {{{{\bf{y}}}}}^{{\prime} }\) and can be expressed as

$$\begin{array}{l}\left\{\right.{{{\bf{y}}}}:{{{\bf{y}}}}=\left({y}_{1},{y}_{2},\ldots ,{y}_{n}\right),\ {y}_{i}\le {y}_{i}^{{\prime} }\ \ \forall \ i\in \{1,2,\ldots ,n\},\exists \ j\in \\ \qquad\quad\;\;\; \{1,2,\ldots ,n\}:{y}_{j} \,<\, {y}_{j}^{{\prime} }\left.\right\}\end{array}$$

where \({{{{\bf{y}}}}}^{{\prime} }=({y}_{1}^{{\prime} },{y}_{2}^{{\prime} },\ldots ,{y}_{n}^{{\prime} })\) denotes any possible objective output. The set of \({{{\bf{y}}}}\in {{{\mathcal{Y}}}}\), where \({{{\mathcal{Y}}}}\) is the objective space, is known as the Pareto front.

There are various techniques for estimating the Pareto front in multi-objective optimization problems, such as the weighted sum approach63, the adaptive weighted sum approach64, normal boundary intersection methods65 and hypervolume indicator methods66,67,68,69,70,71,72, among others. In Bayesian optimization frameworks, hypervolume indicator approaches are well-suited to handle the probabilistic nature of these frameworks and to approximate the Pareto front of solutions efficiently. We adopt the methodology presented in refs. 17,73 to conduct Bayesian optimization of multi-objective functions in multi-fidelity settings. For a detailed explanation of the calculation of the EHVI, we refer readers to ref. 9.

Models for constraints

In this work, the truth models for all five constraints were derived from high-fidelity CALculation of PHase Diagrams (CALPHAD)-based simulations. The high entropy alloy database TCHEA5 and Thermo-Calc’s equilibrium simulation were used to calculate the density, solidus, solidification range, CTE, and thermal conductivity. Specifically, the solidus and solidification range (difference between solidus and liquidus temperatures) were extracted from phase diagrams generated from CALPHAD models74. The coefficient of thermal expansion (CTE) was calculated by using Thermo-Calc to estimate the equation of state of the system at a given reference temperature T0 and a target temperature Tf, and then determining the amount of expansion at those two temperatures74. The density was calculated in the same manner. Finally, thermal conductivity was determined by querying fitted polynomials within the CALPHAD database. In situations where separate data for electronic and lattice thermal conductivities is not available, the thermal conductivity was estimated using the Slack model74,75,76 for lattice thermal conductivity and the Wiedemann–Franz Law74,77 for electronic thermal conductivity, then summing these two estimates to obtain the overall thermal conductivity74. The Thermo-Calc’s API, TC-Python, was used to integrate these models with the proposed framework.

Models for objectives

The yield strength objective in this study is modeled using the analytical framework proposed by Curtin and Maresca in ref. 22. This model considers the behavior of an edge dislocation within a random solute field present in a body-centered cubic high-entropy alloy. In order to minimize the energy associated with the dislocation in this random alloy, the dislocation adopts a wavy configuration. This allows the dislocation to avoid high-energy areas in the medium due to dislocation-solute interactions while being attracted to and pinned to areas with lower energy from such interactions. This wavy configuration results in an increased line tension, which represents the energy cost of this configuration. However, the characteristic waviness also minimizes the overall energy of the dislocation by simultaneously reducing the energy associated with the interaction between the edge dislocation and the solute field and the energy associated with the line tension of the edge dislocation. A statistical analysis of the energy barrier required for thermally activated edge glide was carried out, leading to the following equations:

$$\begin{array}{r}{\tau }_{y0}=0.040{\alpha }^{-1/3}\bar{\mu }{\left(\frac{1+\bar{\nu }}{1-\bar{\nu }}\right)}^{4/3}{\left[\frac{{\sum }_{n}{c}_{n}{{\Delta }}{{V}_{n}}^{2}}{{b}^{-6}}\right]}^{2/3}\end{array}$$
$$\begin{array}{r}{{\Delta }}{E}_{b}=2.00{\alpha }^{1/3}{b}^{-3}\bar{\mu }{\left(\frac{1+\bar{\nu }}{1-\bar{\nu }}\right)}^{2/3}{\left[\frac{{\sum }_{n}{c}_{n}{{\Delta }}{{V}_{n}}^{2}}{{b}^{-6}}\right]}^{1/3}\end{array}$$
$$\begin{array}{r}{\tau }_{y}(T,\dot{\epsilon })={\tau }_{{y}_{0}}\left[-\frac{1}{0.55}{\left(\frac{{k}_{b}T}{{{\Delta }}{E}_{b}}\ln \frac{\dot{{\epsilon }_{0}}}{\dot{\epsilon }}\right)}^{0.91}\right]\end{array}$$
$$\begin{array}{r}{\sigma }_{y}(T,\dot{\epsilon })=M{\tau }_{{y}_{0}}\end{array}$$

The variables in the above equation are as follows: α is the line tension parameter and is set to 1/12 for edge dislocations; \(\bar{\mu }\) is the average shear modulus of the alloy; \(\bar{\nu }\) is the average Poisson ratio of the alloy; b is the Burger vector associate with the BCC edge dislocation within the random alloy; ΔV is the misfit volume of the nth solute, which can be accurately estimated as ΔVn = Vn − ∑n=1cnVn according to Vegard’s law; \({\tau }_{{y}_{0}}\) is the zero-temperature yield stress; ΔEb is the energy barrier for the thermal-activated flow; \(\dot{{\epsilon }_{0}}\) is the reference strain rate which is typically set to 104s−1; \(\dot{\epsilon }\) is the applied strain rate which is typically set to 103s−1 and is indeed set to this value in the current work; M is the Taylor factor for edge glide in a random BCC polycrystal; kB is the Boltzmann constant; σy(T, ϵ) is the yield strength estimated at a finite temperature and strain rate, T and \(\dot{\epsilon }\).

In this study, we employed the DFT-based KKR (Korringa-Kohn-Rostoker Green’s function) method as the reference model for calculating key properties such as intrinsic strength and phase stability for arbitrary compositions. The method uses a coherent-potential approximation (CPA) to account for direct configurational averages over chemical disorder78,79. The gradient-corrected exchange-correlation functional provided by Perdew, Burke, and Ernzerhof (PBE)80 was used in the DFT-KKR-CPA calculations, which were used to obtain bulk moduli and derived quantities such as shear moduli and Pugh’s ratio. A 24 × 24 × 24 Monkhorst-Pack mesh was used for Brillouin zone integrations and core-electrons were treated relativistically, while valence-electrons were handled scalar-relativistically, with no spin-orbit coupling78. The Fermi energy was determined by integrating the complex Green’s function on a semicircular energy contour of 25 points on a Gauss-Chebyshev mesh78.

The mechanical properties (bulk and shear moduli) of down-selected compositions were calculated and assessed by designing Super-Cell Random Approximates (SCRAPs)34 and employing a computationally intensive stress-strain method as implemented within DFT based Vienna Ab-initio Simulation Package (VASP)81,82,83,84. The PBE generalized-gradient approximation (GGA) functional80 was employed for geometrical relaxations with total-energy and force convergence criteria of 10−6 eV and 0.01 eV/Å, respectively. The Brillouin-zone integration during ionic relaxation was performed on 1 × 1 × 1 while mechanical properties calculations were calculated on 3 × 3 × 3 k-mesh grid using Monkhorst-Pack method85 with a plane-wave cutoff energy of 520 eV. The effect of the core electrons and interaction between the nuclei and the valence was treated by the projector-augmented wave (PAW)86.

Limitations of models

Regarding the accuracy of the yield strength truth model, in their original publication using a similar refractory MPEA system and its subsystems (Mo-Nb-Ta-V-W) Maresca et al.22 compared their model to experimental yield strength measurements captured over a range of temperatures. They determined their model has acceptable agreement with experiment (MAE = 126 MPa, RMSE = 138 MPa from 800 K to 1800 K) to be used for HTP alloy design. Furthermore, the Curtin–Maresca model has been successfully used in alloy design87,88. Specifically, Rao et al.87 showed that the Curtin–Maresca models accurately (MAE = 167 MPa, RSME = 209 MPa) predicts temperature-dependent yield strength of 4 refractory MPEAs at 5 temperatures ranging from 25 °C to 1200 °C.

DFT is the truth model for the two ductility indicators in this work, the Pugh ratio and Cauchy pressure. A large potential source of error in these calculations is the exchange-correlation functional. However, the exchange-correlation functional used in this work has been extensively tested in MPEA composition spaces21,89. Specifically, DFT-calculated elastic constants were consistently within 10% to experimental values89, which represents a smaller scatter compared to one calculated from elemental average.

Regarding the accuracy of these models, Thermo-Calc’s equilibrium simulations have been able to successfully predicted phase stability90, solidus temperatures91, thermal-conductivity92,93, and thermal expansion coefficient93,94. Specifically, Abu-Odeh et al.90 benchmarked the TCHEA1 database against experiments and found that phase predictions from Thermo-Calc’s equilibrium simulation were in 70.8% agreement with experimental data; the authors note that discrepancies between Thermo-Calc may lie in experimental procedures such as not providing enough time for the alloys to reach thermodynamic equilibrium. In this work, the 5th iteration of this database is used, likely increasing the accuracy of the model. Regarding the solidus constraint, Kirk et al. demonstrated that Thermo-Calc equipped with the TCHEA4 database accurately predicts the melting temperature of MPEAs (MAE = 10.5 K), outperforming ROM predictions of solidus temperatures. Regarding thermal conductivity, preliminary works conducted throughout the BIRDSHOT collaboration2 (a project conducted for ARPA-e’s ULTIMATE program) indicate that the Thermo-Calc’s property module equipped with the TCHEA5 database is able to accurately predict thermal conductivity of BCC alloys (MAE = 14.9 W/m/K.)95. As previously stated, when data for a particular system is sparse, Thermo-Calc’s thermal conductivity is informed by the Slack model96. As such, the model used for thermal conductivity is at least as accurate as the commonly used96 Slack model.