Introduction

A phase diagram serves as an essential tool in materials science, providing detailed mappings of various phases and their transformations on changes in thermodynamic variables such as temperature, pressure, and composition. Extensive research in materials science has led to the development of numerous phase diagrams for alloys and compounds1,2,3,4; the study of phase diagrams for magnetic structures is also prevalent in condensed-matter physics5,6,7. However, generating a phase diagram typically involves a multidimensional search space requiring extensive experiments or simulations, which can be resource-intensive in terms of time, cost, and human effort.

The advent of data-driven approaches in materials research8,9,10,11,12 has seen the emerging application of these methodologies to phase diagrams. Machine learning techniques enable the prediction of phase diagrams for previously unexplored materials based on existing phase diagrams, thereby circumventing the need for additional experiments or simulations. Applications of machine learning in this domain include predicting phase formation in high-entropy alloys13, stability of quasicrystals14, coexisting phases in ternary sections15, and phase boundaries in binary systems16. In condensed-matter physics, data-driven techniques have been employed to analyze simulation-based phase diagrams for studies on critical phenomena, including research on strongly correlated fermions17 and topological quantum systems18,19,20.

Developments in active learning for phase diagram analysis have led to methods where the algorithm proposes pivotal experiments for delineating a phase diagram. This process involves three iterative steps: (i) identifying the most informative experiments through a machine learning model, (ii) conducting these experiments, and (iii) retraining the machine learning model with the newly acquired data. In uncertainty sampling, the informative experiments are gauged by the uncertainty in predictions, with Gaussian process regression commonly used for uncertainty evaluation. While active learning approaches based on Gaussian process regression have been explored for phase diagrams21,22,23, adapting them for multiple discrete categories, typical in phase diagram investigations with various phase types, necessitates the implementation of an appropriate acquisition function.

To address the challenge of multiple discrete categories, an active learning method known as the PDC (Phase Diagram Construction) algorithm was developed24,25,26,27. This algorithm is based on the label propagation (LP) approach, a form of semi-supervised learning. Through LP, it becomes possible to determine the probabilities of unlabeled points belonging to each phase region. These probabilities are then utilized to assess uncertainty within the phase diagram, enabling the selection of the most uncertain points, informative experiments. The efficacy of the PDC algorithm was evidenced by its ability to reduce the number of required experiments by 20% compared to random experimentation24.

The practical utility of the PDC algorithm was further validated in an experimental study focusing on the phase diagram for Zn–Sn–P film deposition using molecular beam epitaxy (MBE)28. Additionally, the algorithm has been applied to ascertain phase boundaries and property transitions. For instance, it facilitated the determination of temperature- and composition-dependent boundaries between the creep zone and the lower creep zone in cross-linked polymers29. Consequently, the PDC algorithm holds significant potential for broad application in both material development and fundamental scientific research, particularly where efficient investigation of boundaries across various categories is crucial. Through the development of the PDC algorithm and its applications to experiments, we understood that the visualization technique of the phase diagrams predicted by the PDC algorithm is important to deepen their understanding. In particular, since two experimental phase diagrams constructed by the PDC algorithm28,29 were on a two-dimensional space, an appropriate visualization technique is essential to construct phase diagrams in a three-dimensional space, which is not easy to consider even if the researchers are familiar with phase diagrams.

This paper reports the development of AIPHAD (Artificial Intelligence technique for PHAse Diagram), a web application based on the PDC algorithm, designed for the investigation and visual understanding of phase diagrams. It is accessible at https://aiphad.org/. AIPHAD streamlines the visualization of key experimental proposals, maps of uncertainty, and the estimated phase diagrams. The application encompasses five types of phase diagrams: (i) two-variable diagrams; (ii) three-variable diagrams; (iii) ternary sections; (iv) ternary phase diagrams; and (v) quaternary sections. Additionally, a Python version of AIPHAD is available on GitHub30. The utility of AIPHAD is illustrated through its application in the study of Fe-Ti-Sn ternary phase diagrams, which are known to contain Heusler compounds31,32. The emergence of the Heusler phase is significant, as it is associated with vital electronic and magnetic properties for practical applications33,34. AIPHAD’s capability to create and verify estimated phase diagrams is successfully demonstrated. In developing AIPHAD, we focused on providing a framework as a web application that can be easily used by researchers and engineers who are not familiar with programming. We hope that our user-friendly application will contribute to the efficient construction of phase diagrams.

The structure of this paper is as follows: Methods section reviews the PDC algorithm and detail the usage of the AIPHAD web application and its Python version are explained. Results section presents the experimental findings obtained from the Fe-Ti-Sn system using the AIPHAD web application.

Methods

Review of the PDC algorithm

The PDC algorithm commences with the discretization of the phase diagram and an initial setting. The estimation of phase regions and the computation of their uncertainties are conducted using machine learning techniques. Based on these estimations, informative experiments are suggested, which are then conducted to identify phase information. By iteratively executing these steps, phase diagrams are derived from a limited number of experiments, as illustrated in Fig. 1. Incorporating thermodynamics into this closed-loop investigation further accelerates the process. The details of each step are shown below.

Fig. 1: Flowchart illustrating the phase diagram construction using AIPHAD.
figure 1

The diagram shows the closed-loop process, which includes phase estimation, uncertainty score calculation, and conducting experiments.

Initial setting

The process begins by setting up the space for the phase diagram, with each dimension discretized into candidate points for experiments (Fig. 1). For a phase diagram of dimension \(d\), the discretized position vector is represented as \({{{\bf{x}}}} \in {R}^{d}\); \({X=\left\{{{{{\bf{x}}}}}_{i}\right\}}_{i=1,\ldots ,N}\) denotes the dataset comprising \(N\) candidate points. An initial training dataset of \(M\) points, known as labeled data, is prepared from these candidate points, based on completed experiments. This initial dataset is either derived from pre-existing data or generated from preliminary experiments using random sampling. The indices of the labeled data points are denoted as \({\{{l}_{j}\}}_{j=1,\ldots ,M}\). The remaining indices, \(i=1,\ldots ,N\) excluding \({\{{l}_{j}\}}_{j=1,\ldots ,M}\) correspond to unlabeled data points. For the labeled data, the experimentally determined categories within the phase diagram, such as phase names, coexisting phase names, and regions with large or small properties, are known. In the AIPHAD implementation, single phases and coexisting phases have to be categorized as distinct “phases.” For simplicity, all categories in the phase diagram are referred to as “phase.” Each category in the initial dataset is assigned an integer index from \(L=\{1,\ldots ,C\}\) when there are \(C\) categories. This index serves as a label for the labeled data points, denoted as \({y}_{{l}_{j}}\in L\) for \(j=1,\ldots ,M\).

Phase estimation

Phase estimation for unlabeled data within the PDC algorithm employs machine learning techniques, specifically LP and label spreading (LS). These methods function as follows:

1. Label propagation (LP): In LP, the labels of the labeled data \(\{y_{{l}_{j}}\}_{j=1,..,M}\) are propagated across the dataset \(X\), estimating the probabilities of each unlabeled data point belonging to various labels. This process begins by constructing a fully connected graph for \(X\). The weight \({w}_{{ij}}\) for the edge connecting the \(i\) th and \(j\) th data points in this graph is defined using the RBF kernel as

$${w}_{{ij}}=\exp \left(-\gamma | {{{{\bf{x}}}}}_{i}{{{\rm{\hbox{-}}}}}{{{{\bf{x}}}}}_{j}{{{{\rm{|}}}}}^{2}\right),$$
(1)

where \(\gamma\) is a hyperparameter, set as \(\gamma =20\) in the AIPHAD web application, following the default value in the Scikit-learn package35. This value can be adjusted in the Python software. Using weight \({w}_{{ij}}\), the transition matrix \(T\) on the graph is defined, with each element \({t}_{{ij}}\) representing the transition probability from the \(j\) th to the \(i\) th data expressed as

$${t}_{{ij}}=\frac{{w}_{{ij}}}{{\sum }_{k=1}^{N}{w}_{{kj}}}.$$
(2)

In the subsequent step, a vector \({{{{\bf{p}}}}}_{i}\in {R}^{C}\) is prepared for each data point, which represents the probability that it belongs to a phase \(\in L\) for the \(i\) th data. The probability matrix \(P\) given by \(\left\{{{\bf{p}}}_i\right\}_{i=1,\ldots ,N}\) is then defined. The initial state of \(P\) is prepared as follows: For the labeled data in \({\{{l}_{j}\}}_{j=1,\ldots ,M}\), the elements corresponding to the label of \({y}_{{l}_{j}}\) are 1, and 0 for the other element in \({{{{\bf{p}}}}}_{i}\). For the unlabeled data points, all elements in \({{{{\bf{p}}}}}_{i}\) were set to 0. The probability matrix \(P\) is updated through a series of steps using the transition matrix \(T\). These steps are as follows:

  1. (i)

    Update operation: Apply the operation \(P\leftarrow {TP}\).

  2. (ii-1)

    Normalization: Normalize each \({{{{\bf{p}}}}}_{i}\) such that the sum of the elements becomes 1 for unlabeled data.

  3. (ii-2)

    Normalization: Return \({{{{\bf{p}}}}}_{i}\) for the labeled data in \({\{{l}_{j}\}}_{j=1,\ldots ,M}\) to the initial state; that is, the elements corresponding to the label of \({y}_{{l}_{j}}\) are 1 and 0 for the other element.

  4. (iii)

    Convergence check: Repeat steps (i) and (ii) iteratively until the \({{{{\bf{p}}}}}_{i}\) reach convergence.

Upon convergence, each vector \({{{{\bf{p}}}}}_{i}\) represents the final probability distribution across different phase regions for the corresponding data point. Notably, in the LP method, due to the resetting mechanism in step (ii-2), the probabilities for labeled data points remain consistent with their initial values, ensuring that their original labels \(\{{y}_{{l}_{j}}\}_{j=1,..,M}\) are preserved throughout the process.

2. Label spreading (LS): The LS method is similar to the LP method, but the label of the labeled data can be changed from \(\{{y}_{{l}_{j}}\}_{j=1,...,M}\) to be more robust to noise in the labeled data. Similar to the LP method, we prepare a fully connected graph for \(X\) and calculate the weight \({w}_{{ij}}\) for the edge between the \(i\) th and \(j\) th data points using Eq. (1) when \(i\ne j\). Conversely, for \(i=j\), \({w}_{{ij}}=0\) in contrast to the LP method. Using the prepared \({w}_{{ij}}\), transition matrix \(T\) is defined by Eq. (2). The probability matrix \(P\) was prepared for the LP method. The initial value of the probability matrix \(P\) is defined as \({P}_{0}\). Probability matrix \(P\) is propagated using transition matrix \(T\) as follows:

  1. (i)

    Update operation: Apply the operation \(P\leftarrow \alpha {TP}+(1-\alpha ){P}_{0}\), where \(0 < \alpha < 1\) signifies the likelihood of changing the label of labeled data, functioning as a hyperparameter in the LS method.

  2. (ii)

    Normalization: Normalize each \({{{{\bf{p}}}}}_{i}\) such that the sum of its elements equals 1, applicable across all data points.

  3. (iii)

    Convergence check: Repeat steps (i) and (ii) iteratively until the \({{{{\bf{p}}}}}_{i}\) reach convergence.

In the AIPHAD web application, \(\alpha\) is preset to 0.2, aligning with the default in the Scikit-learn package. However, users have the option to adjust this value in the Python software to fine-tune the LS process according to specific dataset characteristics or objectives.

Phase diagrams are predicted using the derived \({{{{\bf{p}}}}}_{i}\) vectors. For the \(i\) th data point, the label \({y}_{i}={{\arg }}{\max }_{L}{{{{\bf{p}}}}}_{i}\) denotes the predicted phase region.

Uncertainty sampling

The uncertainty map is generated from the obtained \({{{{\bf{p}}}}}_{i}\) vectors, which contain probabilities of belonging to each phase region \(\in L\) for the \(i\) th data point. In uncertainty sampling, the most uncertain point in dataset \(X\) is selected for informative experiments. To quantify uncertainty, three types of uncertainty scores are commonly used, varying according to the data point \({{{\bf{x}}}}\). These scores are defined as follows:

$${u}^{{{{\rm{LC}}}}}\left({{{\bf{x}}}}\right)=1-P({k}_{1}{{{\rm{|}}}}{{{\bf{x}}}}),$$
(3)
$${u}^{{{{\rm{MS}}}}}\left({{{\bf{x}}}}\right)=1-\left[P\left({k}_{1} | {{{\bf{x}}}}\right)-P\left({k}_{2} | {{{\bf{x}}}}\right)\right],$$
(4)
$${u}^{{{{\rm{EA}}}}}\left({{{\bf{x}}}}\right)=-{\sum }_{k=1}^{C}P(k{{{\rm{|}}}}{{{\bf{x}}}})\log P(k{{{\rm{|}}}}{{{\bf{x}}}}),$$
(5)

where elements of \({{{{\bf{p}}}}}_{i}\) are defined as \(P({k}|{{\bf{x}}}_{i})\) with \(k\in L\). The indices \({k}_{1}\) and \({k}_{2}\) represent the elements with the highest- and second-highest values of \(P({k}|{{\bf{x}}})\) respectively. The uncertainty in the phase diagram is quantified using three different methods, as outlined in Eqs. (3)–(5): the least confident (LC), margin sampling (MS), and entropy-based approach (EA). These methods determine the most uncertain point, denoted as \({{{{\bf{x}}}}}^{* }\), which is proposed for conducting the most informative experiment as

$${{{{\bf{x}}}}}^{* }={{\arg }}{\max }_{X}u({{{\bf{x}}}}).$$
(6)

An experiment conducted at \({{{{\bf{x}}}}}^{* }\), results in the identification of the phase and an increase in the number of labeled data points. This process is pivotal for refining the phase diagram and enhancing the accuracy of the machine learning model.

In scenarios where experiments are conducted in parallel, the methodology requires multiple suggestions. As discussed in ref. 27, two straightforward but effective methods for selecting multiple candidates have been identified:

1. Only the uncertainty score (US) ranking: This approach involves selecting multiple candidates based on their descending order of uncertainty scores.

2. Neighbor exclusion method: In this method, multiple candidates are also selected based on their descending order of uncertainty scores. However, the neighboring points of the selected candidates are excluded to ensure diversity in the selection. This method incorporates a hyperparameter \(K\), which determines the extent of exclusion. Data points that are closer than the \(K\) th nearest neighbor points are not included in the selection.

These strategies are crucial for efficiently exploring the phase diagram space, particularly when aiming to maximize the information gained from parallel experiments.

Using thermodynamic considerations

Reference 26. presents a study focusing on optimizing the investigation of phase diagrams through thermodynamic considerations. While utilizing the same algorithm as described earlier, this approach demonstrates that incorporating information about coexisting phases and the phase rule can lead to more efficient construction of phase diagrams. This methodology involves two key strategies:

1. Utilization of coexisting phases: When coexisting phases are identified through proposed experiments, it simultaneously generates a substantial volume of labeled data. For instance, if two coexisting phases are discovered, the tie line’s endpoints indicate single phases, while points on the tie line represent the two-phase region. This information, when used as labeled data, enriches the machine learning model with extensive phase-related details from a single experiment.

2. Application of the Gibbs phase rule: The Gibbs phase rule serves as a tool to streamline the search process by excluding specific regions from the search space. In a ternary phase diagram, for instance, if three coexisting phases are identified, they form a triangle devoid of any other phases. Consequently, this area can be excluded from further investigation. This exclusion significantly reduces the number of candidate points, thereby enhancing the efficiency of phase diagram determination.

The PDC algorithm remains applicable, with the modification that data points deemed unnecessary for search are omitted from the dataset \(X\). This strategic approach not only optimizes the phase diagram exploration process but also maximizes the information extracted from each experimental result, contributing significantly to the advancement of materials science research.

Usage of the AIPHAD web application version

The AIPHAD web application was utilized to investigate five types of phase diagrams: (i) two-variable diagrams, (ii) three-variable diagrams, (iii) ternary sections, (iv) ternary phase diagrams, and (v) quaternary sections. These diagrams are represented in either two- or three-dimensional spaces. The following steps outline the procedure for deriving phase diagrams, as depicted in Fig. 2:

Fig. 2: Control panel of the AIPHAD web application.
figure 2

A detailed procedure for investigating phase diagrams using the AIPHAD web application is presented, outlining each step in the process.

1. Defining the search space: In the “Search Space” menu, users specify axis names, parameter ranges, and step sizes.

2. Inputting phase information: Experiments are labeled at specific points on the phase diagram. Users can select points directly on the diagram or input them in the “Data Table” menu. The application supports both numerical and textual input for phase names. Unnecessary unlabeled points can be excluded using the “Delete Mode” in the “Data Table” menu.

3. Selecting active learning conditions: The “Proposal Method” menu allows users to choose between LP (Label Propagation) and LS (Label Spreading) as the estimation method, and LC (Least Confident), MS (Margin Sampling), EA (Entropy-based Approach), and RS (Random Sampling) as the sampling method. If “RS” is selected, candidates are proposed randomly from unlabeled data. The number of candidates proposed for informative experiments is determined based on the “only US ranking” strategy. The “Neighbor exclusion” method can be used to remove points close to proposed candidates. Default values for hyperparameters \(\gamma\) and \(\alpha\) are set according to Scikit-learn.

4. Running calculations: Clicking the “Run” button initiates calculations, with candidate points for informative experiments displayed in both the phase diagram and the “Data Table.” Additionally, a map of the uncertainty score and the estimated phase diagram are shown. Labeled data do not appear on the uncertainty map. While LS may alter the labels of labeled data, the phase information displayed in the phase diagram remains consistent with the input information.

5. Viewing phase probabilities: The “Probability” menu ranks unlabeled points in descending order of probability for the selected phase. Probabilities are evaluated using the chosen phase estimation method.

This detailed procedure enables users to efficiently explore and analyze phase diagrams, significantly aiding in the understanding and advancement of materials research.

Usage of the AIPHAD Python version

The AIPHAD Python manual, accessible at https://nims-da.github.io/aiphad/docs/en/index.html, provides comprehensive guidance on its usage. Below is a basic overview of utilizing AIPHAD in Python.

Install

AIPHAD is developed in Python3 (requires version 3.6 or higher) and can be installed via PyPI as follows:

$ pip3 install aiphad

Single suggestion

The program outlined in Scheme 1 describes the fundamental steps for using the AIPHAD Python package for phase diagram estimation and uncertainty sampling. The program flow can be summarized as follows:

Scheme 1
scheme 1

A basic Python program for AIPHAD. Uncertainty sampling with a single proposal using AIPHAD.

1. Import libraries: Initially, the ‘pdc_sampler’ from AIPHAD and ‘numpy’ are imported into the Python environment. This step prepares the necessary functions and data structures for phase diagram estimation.

2. Specify parameters for ‘pdc_sampler’:

  • ‘estimation’: Choose between “LP” or “LS” as the method for phase diagram estimation.

  • ‘sampling’: Select an uncertainty score from “LC”, “MS”, “EA”, or “RS”.

  • ‘proposal’: Define the number of proposals as an integer, which determines how many points will be suggested for experimental investigation.

3. Prepare dataset:

  • ‘X’: A list representing all candidate points in the discretized phase diagram. The method can handle datasets with arbitrary dimensions.

  • ‘y’: A one-dimensional list corresponding to ‘X’ that contains the label data. Each data point where the phase is already known is assigned a phase index from the set \(L=\{1,\ldots ,C\}\). For unlabeled data points, an index of −1 is used.

4. Phase diagram estimation: Utilize ‘pdc.fit(X, y)’ to estimate the phase diagram using the chosen LP or LS method on the input data arrays.

5. Uncertainty sampling: Uncertainty scores are calculated for the candidate points using ‘pdc.us()’. The indices of candidate points with the highest uncertainty scores are stored in ‘pdc.proposals’, and their corresponding position vectors (\({{{\bf{x}}}}{{\in }}{R}^{d}\)) are in ‘pdc.proposals_X’. The uncertainty scores of the selected points can be accessed using ‘pdc.proposals_us’.

Multiple suggestion

To generate multiple proposals using the ‘pdc_sampler()’ function, the ‘proposal’ argument can be set to the desired number. Additionally, the function enables the specification of the following arguments to tailor the proposal strategy:

  • ‘multi_method’: This argument defines the method for generating multiple proposals. Two options are available: (i) “OU” (Only US ranking), this option selects unlabeled data points in descending order of their uncertainty scores. (ii) “NE” (Neighbor Exclusion), this option also ranks unlabeled points by their uncertainty scores but excludes points that are adjacent to already selected ones. If no method is specified, “OU” is chosen as the default.

  • ‘NE_k’: Relevant only when the “NE” method is selected, this argument is set as an integer. It defines the exclusion radius around each selected point, ensuring that no data point within the nearest ‘NE_k’ neighbors of any selected data point is included in the proposal. The default is 1.

Hyperparameters

For phase estimation methods, the hyperparameters are \(\gamma\) for the “LP” method, and both \(\gamma\) and \(\alpha\) for the “LS” method. In the Python version of AIPHAD, the ‘gamma’ and ‘alpha’ arguments allow users to modify these hyperparameters from their default values in scikit-learn.

Probabilities of belonging to each phase

In the analysis, uncertainty scores and probabilities for each label were computed for all unlabeled points. The original uncertainty score is derived from this data. AIPHAD stores the indices of unlabeled data in ‘pdc.unlabeled_index_list’ and their associated uncertainty scores in ‘pdc.u_score_list’. Moreover, probabilities for each label are contained within ‘pdc.label_distributions’, arranged according to the phase index. The ‘pdc.label_distributions’ facilitate the identification of points with the highest probability of belonging to a specified phase.

Results

The Fe-Ti-Sn system was subject to experimental exploration guided by AIPHAD. In this context, the Fe2TiSn Heusler phase, known for its potential in thermoelectric materials, was examined36,37,38. The electronic properties of the Fe2TiSn Heusler phase, subject to changes by varying the composition ratio39,40, underscores the importance of a ternary phase diagram for accurately determining the compositional region where the Heusler phase is stably generated. Additionally, the ordering of each element significantly influences the properties of the Heusler phase40,41, with the atomic ordering within the phase varying considerably based on synthesis and annealing temperatures42. Therefore, the temperature dependence of the phase diagram is a vital aspect of this study.

This research focused on the stability of the Fe2TiSn Heusler phase within a ternary phase diagram, using guidance from AIPHAD. In the Fe-Ti-Sn system, extensive heat treatments, approximately 1000 h at temperatures ranging from 800 to 1000 °C, are required to achieve equilibrium43. Notably, the study did not investigate the equilibrium phase diagram such as a previous report44. Instead, the objective was to delineate the metastable phase diagram, labeling phase regions based on the predominant phase(s) following short-duration heat treatments, particularly identifying the region where the Heusler phase is stable. Metastable phases and unreacted raw materials often remain post-thermal treatment in the metastable phase diagram. AIPHAD’s flexible labeling system facilitates the accelerated determination of both equilibrium and metastable phase diagrams, as demonstrated in this study.

Experimental detail

The samples for this study were prepared via a solid–liquid reaction using high-purity elemental powders. Specifically, Fe, Ti, and Sn powders, each with a purity of 99.99% obtained from Kojundo Chemical Laboratory Co., were measured in a predetermined ratio. These powders were then placed into a boron nitride crucible, sourced from Zikusu Industry Co., Ltd., with a purity of 99.7%, an outer diameter of 8.5 mm, an inner diameter of 6.5 mm, and a depth of 18 mm. The crucible was sealed in a stainless-steel reaction container within an argon-filled glove box (with O2 and H2O levels below 1 ppm). The design of this reaction container aligns with that reported in a previous study45. To synthesize the sample, this container was heated for 24 h in an electric furnace in an air atmosphere. The crystalline phases present in the synthesized samples were identified through powder X-ray diffraction (XRD), utilizing a Bruker D2-Phaser system with Cu- radiation at 30 kV and 10 mA.

Isothermal section at 900 °C

The construction of the phase diagram for the ternary section of the Fe-Ti-Sn system at 900 °C was undertaken using AIPHAD. The phase diagram was discretized into 231 points with composition increments of 5%. From these candidate points, initial experiments were conducted on seven compositions, including a Heusler composition. Four distinct phases were identified: a Ti-rich phase, a Sn-rich phase, an Fe-rich phase, and a Heusler phase. The specific compositions at which these phases were found and the results of the XRD for each composition are summarized in Supplementary Note 1 and Supplementary Fig. 1.

Using these initial seven experiments as labeled data, AIPHAD was employed to propose the next promising points for further investigation of the phase diagram. Figure 3 illustrates the proposed points and the distributions of uncertainty scores based on the chosen phase-estimation method (LP or LS) and uncertainty scores (LC, MS, or EA). Notably, the selection of proposed points varied significantly with different uncertainty scores under the LP method. However, with the LS method, despite minor variations in the distribution of uncertainty scores, the most uncertain point remained consistent across different uncertainty score methods. Furthermore, the phase boundary at which the uncertainty score increased was more distinctly observed when using the LS method. This approach highlights the potential of AIPHAD in efficiently navigating the complex landscape of phase diagrams, particularly in systems with multiple phases, such as the Fe-Ti-Sn system.

Fig. 3: Uncertainty scores for the Fe-Ti-Sn ternary system at 900 °C.
figure 3

Distributions of uncertainty scores in the Fe-Ti-Sn ternary system at 900 °C are shown, according to different phase estimation methods (LP or LS) and uncertainty scores (LC, MS, or EA). High uncertainty points are marked in dark green, with the red circle highlighting the most uncertain point proposed by AIPHAD for experimental validation.

In this study, LS combined with the LC method was selected to identify the next experimental points. This is because that LS can clearly predict the phase boundaries, and LC would be suitable for constructing ternary phase diagrams. From the definition of LC, the uncertainty score of the boundary points of three or more phases will be higher than that of the boundary of two phases, and to complete ternary phase diagrams, finding invariant points and monovariant lines is essential. Figure 4 presents the results of three closed-loop cycles, illustrating the identified experimental points, the distribution of uncertainty scores, and the evolving predicted phase diagram through each cycle. The experimental points proposed by AIPHAD, particularly around the predicted phase boundaries, indicate that the phase boundaries were accurately delineated as the closed-loop cycles progressed. However, no new phases were discovered by experiments for selected experimental conditions during these cycles, resulting in minimal changes in the outline of the predicted phase diagram compared to the initial data and the three additional experiments.

Fig. 4: AIPHAD-guided determination of the Fe-Ti-Sn ternary system at 900 °C.
figure 4

AIPHAD-guided determination of the Fe-Ti-Sn ternary section at 900 °C following short heat treatments is presented. The LS method and LC uncertainty score are used for estimation. The figure shows the labeled points, AIPHAD’s experimental proposal (green point in red circle), uncertainty score distribution, and estimated phase diagrams.

Ternary phase diagram

The ternary phase diagram of the Fe-Ti-Sn system is represented as a triangular prism, with ternary sections stacked along the temperature axis. This diagram was discretized for temperatures ranging from 700 °C to 1000 °C in 100 °C increments and for compositions in 5% steps. The construction of the phase diagram commenced with the same initial data points of the case for the isothermal section at 900 °C, specifically at 900 °C. In AIPHAD, the LS method was employed with the LC uncertainty score. The evolution of the uncertainty scores and the estimated phase diagrams across iterations are summarized in Fig. 5.

Fig. 5: AIPHAD-guided determination of the metastable ternary Fe-Ti-Sn system.
figure 5

Evolution of the uncertainty score distribution and the predicted phase diagram during the AIPHAD-guided determination of the metastable ternary Fe-Ti-Sn system is presented. The figure also highlights changes in the temperature range and dataset volume, with phase regions labeled according to the predominant phase(s) found after short heat treatments.

Initially, temperatures were set at 700, 800, 900, and 1000 °C, with AIPHAD proposing 14 experimental points using the “Only US” option. The primary phases identified at each proposed point are detailed in Supplementary Table 1. Subsequent experimental findings at 700 and 800 °C revealed three new phase regions: FeSn, FeSn2, and a mixed sample containing unreacted Fe and Ti, labeled as Fe + Ti. These discoveries expand the understanding of the Fe-Ti-Sn system, particularly in lower temperature ranges, highlighting the efficacy of AIPHAD in guiding experimental investigations to uncover complex phase relationships in multicomponent systems.

The investigation into the formation of the Heusler phase in the Fe-Ti-Sn system revealed that at 700 °C, the phase did not form, probably due to insufficiently long heat treatments at this temperature. Consequently, it was deemed that further exploration at 700 °C would not yield relevant results. Therefore, the subsequent phase diagram analysis focused on the temperature range of 800 °C to 1000 °C, with a finer temperature increment of 50 °C, instead of the initial 100 °C steps. In this temperature range, AIPHAD proposed fourteen new experiments. Although these experiments did not lead to the discovery of additional phase regions, they were instrumental in further clarifying the existing phase boundaries. Utilizing all collected data, including that from the 700 °C experiments, a comprehensive and detailed metastable phase diagram was constructed, incorporating custom labeling. For each stage of this process, Supplementary Data 1-5 compatible with the AIPHAD web application are available for enhanced visualization. It is important to note that the obtained phase diagrams shown in Figs. 4, 5 are the metastable ones, because we used a short time for heat treatment. Thus, for each point, it is not guaranteed that equilibrium states are observed, and the shape of the phase diagram is different from the equilibrium one reported in refs. 43,44.

Search for a specific phase region

To specifically target a particular phase region, AIPHAD offers a unique approach. It can display probabilities of unlabeled data belonging to already identified phases, enabling a focused search within a desired phase region. This capability was demonstrated in the search for stable regions of the Heusler phase along the 900 °C isothermal section of the Fe-Ti-Sn ternary system. Using the same initial data as in the case for the isothermal section at 900 °C and employing the LS method for phase estimation, six candidate points were identified as having a high probability of yielding the Heusler phase. Subsequent experiments at these locations confirmed the presence of the Heusler phase in four points, while the other two points aligned with different phase regions. Figure 6 depicts the evolution of the isothermal section, both before and after the experiments, as guided by AIPHAD. This exploration successfully delineated the region conducive to synthesizing the Heusler phase. In summary, AIPHAD proves to be an effective tool for efficiently identifying specific phase regions of interest within complex multicomponent systems.

Fig. 6: AIPHAD-guided determination of the Heusler phase.
figure 6

Progression of the 900 °C isothermal section in the Fe-Ti-Sn system before and after six targeted experiments to identify the Heusler phase is presented. The figure shows the labeled points and AIPHAD’s proposals (red circles) and the estimated phase diagrams after experiments.

Conclusions

In this study, the AIPHAD toolbox was introduced as an efficient means for phase diagram determination using machine learning, incorporating both a web application and a Python program. The underlying PDC algorithm in AIPHAD utilizes uncertainty sampling through label propagation and label spreading methods, which have been elaborately discussed. This study successfully demonstrated that phase diagrams, particularly in the Fe-Ti-Sn system exhibiting the Heusler phase, can be efficiently constructed with fewer experiments using the PDC algorithm starting from no data. This demonstration shows that AIPHAD is a powerful tool when the phase diagram is constructed from scratch without prior knowledge of the target system. Sample preparation involved solid–liquid reactions of elemental components, with subsequent phase identification conducted through XRD measurements. The AIPHAD web application facilitated the visualization of the evolving phase diagram and the delineation of phase boundaries, offering an insightful representation of the material system under study.

Additionally, this study incorporated NIMS-OS, designed to enable a seamless integration of robotic experiments and artificial intelligence for autonomous material exploration46. The PDC algorithm’s implementation within NIMS-OS enables the visualization of phase diagrams created through autonomous experiments on the AIPHAD platform. AIPHAD’s accessibility and user-friendly interface are anticipated to simplify the construction of phase diagrams using machine learning for a broad user base. The versatility of the developed method extends beyond the exploration of phase regions or equilibrium diagrams, suggesting wide-ranging applications in material science. Moreover, the integration of this tool with the CALPHAD method holds significant importance, potentially enhancing the efficiency and accuracy of phase diagram predictions in diverse material systems. This study’s findings underscore the potential of combining machine learning tools with traditional materials science approaches to advance the field of materials exploration and discovery.