Coupling physics in machine learning to predict properties of high-temperatures alloys

Peng, Jian; Yamamoto, Yukinori; Hawk, Jeffrey A.; Lara-Curzio, Edgar; Shin, Dongwon

doi:10.1038/s41524-020-00407-2

Download PDF

Article
Open access
Published: 18 September 2020

Coupling physics in machine learning to predict properties of high-temperatures alloys

npj Computational Materials volume 6, Article number: 141 (2020) Cite this article

7198 Accesses
36 Citations
5 Altmetric
Metrics details

Subjects

Abstract

High-temperature alloy design requires a concurrent consideration of multiple mechanisms at different length scales. We propose a workflow that couples highly relevant physics into machine learning (ML) to predict properties of complex high-temperature alloys with an example of the 9–12 wt% Cr steels yield strength. We have incorporated synthetic alloy features that capture microstructure and phase transformations into the dataset. Identified high impact features that affect yield strength of 9Cr from correlation analysis agree well with the generally accepted strengthening mechanism. As a part of the verification process, the consistency of sub-datasets has been extensively evaluated with respect to temperature and then refined for the boundary conditions of trained ML models. The predicted yield strength of 9Cr steels using the ML models is in excellent agreement with experiments. The current approach introduces physically meaningful constraints in interrogating the trained ML models to predict properties of hypothetical alloys when applied to data-driven materials.

Machine learning for alloys

Article 20 July 2021

Gus L. W. Hart, Tim Mueller, … Stefano Curtarolo

A machine-learning-based alloy design platform that enables both forward and inverse predictions for thermo-mechanically controlled processed (TMCP) steel alloys

Article Open access 26 May 2021

Jin-Woong Lee, Chaewon Park, … Kee-Sun Sohn

Machine learning assisted prediction of the Young’s modulus of compositionally complex alloys

Article Open access 25 August 2021

Hrishabh Khakurel, M. F. N. Taufique, … Ram Devanathan

Introduction

Material design assisted by data analytics is an emerging area of materials science and engineering that offers a reduction in cost, risk, and time over traditional material development approaches based solely on experimental investigations and/or physics-based simulations^1,2,3,4,5. Due to their complexity (i.e., chemistry, melt process, thermo-mechanical process, heat treatment, and resulting developed microstructure), the rational design of high-temperature alloys by machine learning (ML) requires a comprehensive dataset that can cover various aspects: multi-component, multi-phase, multi-physics, multi-scale, and multiple strengthening mechanisms as well as significant influence of processing conditions on the properties of final products.

The majority of previous efforts applying ML to predict the properties of high-temperature alloys have used alloy compositions and simple processing conditions as features^{6,7,8,9,10,11,12,13}. While these approaches can leverage experimental data accumulated over decades, extrapolating (and even interpolating) these models outside the range of the input data is risky due to the absence of physical constraints. There have been attempts to incorporate atomistic-level features, e.g., atomic radius/volume, electronegativities, cohesive energy, and local electronegativity mismatch, for predicting high-temperature alloy properties^14,15,16,17, but features related to phenomena/mechanisms occurring in larger length scales (i.e., micro- and meso-scale) may have more impact on alloys.

For high-temperature alloy design, physical information, such as microstructure, is essential for representing process–structure–property correlation^{18,19,20,21,22}. Pioneering work by Zhao and Henry showed that the performance of a regression model for predicting the rupture time of Ni-based alloys could be significantly improved by incorporating the equilibrium volume fraction of the γ′ phase²¹. Recently, further advancement in this area was made by establishing a data analytics workflow by integrating microstructure-related synthetic features via the CALPHAD approach to predict the creep strength of alumina-forming austenitic stainless steels²³ and high-strength stainless steel²⁴.

However, for many material systems (high-temperature alloys in particular), microstructure-related synthetic features from CALPHAD are often not enough since the microstructure changes over time and the strengthening mechanisms evolve with applied stress and temperature. Consider the case of 9–12 wt% Cr martensitic–ferritic steels (hereafter referred to as 9Cr steel) as an example. This class of alloy consists of a tempered martensitic microstructure, where temperature plays a critical role with respect to strengthening mechanisms²⁵. Fine prior austenite grains/packet/lath structure and dislocation density significantly control the room-temperature strength. With increasing temperatures, up to around 600–650 °C, second-phase precipitates, i.e., M₂₃C₆ (M = Fe, Cr, and Mn), MX (M: mainly V, X: C and N), and even Laves phase, within the sub-grain interior or along these sub-boundary, play an important role in strengthening. Above 700 °C, microstructure instability, such as rapid precipitate coarsening, recovery and/or recrystallization, lead to a significant loss in mechanical strength.

Thus, relevant features such as phase transformation temperatures, e.g., A3 temperature (the temperature at which transformation of ferrite to austenite is completed during heating) and martensite start temperature (Ms) etc., should be considered, in addition to microstructure information. These phase transformation temperatures are directly correlated with martensitic microstructure evolution, other microstructural features (e.g., prior austenite grain size (PAGS), packet/lath sizes, dislocation density in the as-normalized condition, etc.^26,27), and consequently influence their initial mechanical properties as well as long-term microstructural stability.

Herein, we demonstrate a workflow of coupling highly relevant physics into ML models for predicting properties of multi-phase and multi-component high-temperature alloys. A yield strength dataset of 9Cr martensitic–ferritic steels is selected to elucidate this strategy. Figure 1 illustrates the structure of the yield strength dataset of 9Cr steel used in this study. The computed synthetic alloy features, along with raw experimental data, are listed in Table 1. The correlation between these features in the dataset and the 9Cr yield strength was quantitatively determined and compared with generally accepted mechanisms by the community. We evaluated the performance of representative ML models, i.e., linear regression (LR)²⁸, Bayesian ridge (BR)^29,30, k-nearest neighbor (NN)³¹, random forest (RF)³², and support vector machines (SVM)³³. Additional work was also carried out to assess the performance of ML models on predicting the PAGS of the 9Cr steel since PAGS is an essential input for calculating Ms of 9Cr steels.

Table 1 List of alloy features considered in this work to predict 0.2% yield strength (MPa) of 9–12 wt% Cr steels.

Full size table

Results and discussion

We started with the 9Cr dataset with only raw experimental data (i.e., elemental alloy compositions, processing, and testing conditions, and PAGS—groups 1 and 2 in Fig. 1) to train five different ML models. Figure 2 shows the average accuracy of these models, and their standard deviation from ten training runs as a function of the numbers of top-ranking features from Pearson’s correlation coefficient (PCC)³⁴ and maximal information coefficient (MIC)³⁵ analyses. Overall, ML models RF, NN, and SVM exhibit high accuracy (R² > 0.9) regardless of the number of top-ranking features. More specifically, RF was the most accurate (always higher than 0.95), followed by SVM. Nevertheless, the applicability of these models for alloy design is questionable since PAGS is the only physically measured microstructure-related feature involved in ML training. Other relevant physically meaningful features, such as volume fraction of key phases and phase transformation temperatures, are required to properly represent the process–structure–property relationship and serve as physical constraints in ML.

**Fig. 2: Machine learning accuracy based on correlation analysis.**

Analyses of temperature-based sub-datasets

Given the lack of physically measurable microstructure features in the 9Cr dataset, the raw experimental data were augmented with synthetically derived features, i.e., groups 3 and 4 in Fig. 1 (see Table 1), from high-throughput CALPHAD calculations. Since the primary strengthening mechanisms of 9Cr steel are temperature dependent, it was essential to carefully examine whether the present dataset is capable of representing the temperature-dependent strengthening mechanism. Thus, we divided the 9Cr dataset into several sub-datasets based on the testing temperature for further analysis. As such, we performed correlation analysis for each sub-dataset.

The top 10 and bottom 10 features from the PCC analysis were evaluated at three representative temperatures, i.e., 200 °C (low temperature), 550 and 650 °C (medium to high temperatures), and 750 °C (above service temperature). These results are presented in Fig. 3. From this analysis, it was observed that the closer the absolute value of the correlation coefficient is to 1, the stronger the correlation is between the feature and yield strength. Those features identified with either a positive or negative correlation with yield strength at 200, 550, and 650 °C were consistent and mostly in good agreement with generally accepted strengthening factors/mechanisms in 9Cr steel. For example, Ni content exhibited a strong positive correlation with yield strength, i.e., the higher the Ni content, the higher the yield strength. This is in accordance with the practice of adding Ni to 9Cr steel to stabilize austenite at high temperatures, lower the martensitic transformation temperatures, and consequently, increases the hardenability in the normalization process. These effects generally increase the yield strength of martensitic–ferritic steels, including the 9Cr family of steel²⁷. This result is also logistically supported by the present correlation analysis that shows a strong negative correlation between the Ms temperatures and the yield strength.

**Fig. 3: Correlation analysis of subset data at different temperatures.**

The M₂₃C₆ phase also plays an important role in strengthening the 9Cr steel from the precipitate strengthening perspective and stabilizes the tempered martensite microstructure, especially at elevated temperatures³⁶. A higher volume fraction of M₂₃C₆ leads to higher yield strength. Thus, it is reasonable that the volume fraction of M₂₃C₆ has one of the strongest positive correlations with yield strength. The elements V and N facilitate the formation of strengthening MX precipitates during tempering, which also assists in increasing yield strength by impeding dislocation motion during deformation and stabilizing the sub-grain structure. Co is also an austenite stabilizer that suppresses δ-ferrite formation during the normalizing heat treatment step. Ms and microstructure-related features (e.g., volume fractions of M₂₃C₆, hcp, and fcc phases) from our high-throughput calculation are highly impactful features, critical to obtaining high-fidelity surrogate ML models. This finding is also applicable to the other sub-datasets up to 650 °C (see Supplementary Table 1).

For the sub-datasets above 650 °C (e.g., 750 °C in Fig. 3), the correlation coefficients are smaller than those at low temperatures, indicating weaker response between alloy features and yield strength. In addition, the feature ranking order at 750 °C is counterintuitive and very different from the trends below 650 °C. For instance, Ms has a negative correlation below 650 °C, and now it shows to have a positive response at 750 °C. Features wC, wCr, wW, and PAGS should positively contribute to yield strength are now identified as having a negative impact at 750 °C. The MIC analysis also shows a similar trend (see Supplementary Fig. 1).

The correlation between alloy features and yield strength at 750 °C is much weaker than those at lower temperatures. Typical high impact features, such as Temper 1, wV, wNb, wNi, wC, T2_VPV_M23C6, have been correctly identified at 200, 550, and 650 °C, while at 750 °C they are counterintuitive in nature. The present findings may be put into context by realizing that (1) the number of data points at >650 °C is insufficient for representing the effects of certain features on yield strength correctly, and (2) the microstructure changes during exposure at high temperatures are significant and may result in a variation of yield strength attributed to other factors that are not considered in the present dataset (e.g., the heating rate and/or the holding time before tensile testing at temperature).

We then trained five ML models (BR, LR, RF, NN, and SVM) with these sub-datasets at each temperature. Since these sub-datasets have a maximum of 44 data points, we limited the number of top-ranking features used in ML to 10 to avoid overfitting. The top 10 features of each sub-dataset from correlation analysis are summarized in the Supplementary information (Supplementary Table 1). As an example, Fig. 4 shows the accuracy of the RF model trained with various top-ranking features as a function of temperature-based sub-datasets. The results of the 9Cr entire dataset (“All”) are also included for comparison. As shown in Fig. 4a, the accuracy of ML models trained with sub-datasets is always lower than that of the one using the entirety of the 9Cr dataset (i.e., “All”), which can be attributed to their smaller volume of data for the former. The performance of RF trained with top-ranking features from MIC does not improve with more features, and the top 4 features already lead to the maximal accuracy. This exercise shows that these features are sufficient to fit the RF model well. However, the top 8 features from PCC analysis are required to reach maximal accuracy (Fig. 4b). In both cases, the maximum accuracy is always >0.8 from room temperature (RT) to 600 °C regardless of the ranked features from the MIC or PCC analyses. From this point, it decreases monotonically above 600 °C, which is in accordance with the decreasing data volume above 600 °C (see Fig. 1). Since the ranking of features at 650 °C is reasonable (see Fig. 3), the lower accuracy at 650 °C may be attributed to its slightly smaller data volume than the lower temperature datasets.

**Fig. 4: ML performance of respective temperature sub-datasets.**

Consequently, no matter how many top-ranking features are used in ML models, the accuracy (R²) is always below 0.6. This observation again confirms that data at >650 °C are insufficient, and the features in the present 9Cr dataset cannot represent the microstructure instability at high temperatures. Therefore, including the data at >650 °C could mislead the training of ML models, and consequently, result in an incorrect prediction. For this reason, data above 650 °C were removed, resulting in the truncated (≤650 °C) 9Cr dataset for the following ML model.

Truncated (≤650 °C) dataset

Figure 5 and Table 2 summarize the results of correlation analysis for the truncated dataset. Many physically meaningful features (i.e., volume fractions of phases and Ms) that we added into the raw 9Cr yield strength dataset commonly have high correlation coefficients. These highly impactful features from both PCC and MIC analyses are in good agreement with the generally accepted strengthening mechanisms, indicating that the features collected in the truncated dataset can capture the strengthening mechanisms of 9Cr steel well in the given temperature range. In this dataset, tensile testing temperature (TTTemp) is included, which allows its inclusion into the temperature dependence of yield strength in the ML models. TTTemp possesses a strong negative correlation with yield strength, which is also consistent with the experimental observations that the higher the test temperature, the lower the yield strength.

Table 2 Top 20 features from the correlation analysis between alloy features (simple features plus synthetic features populated from the high-throughput calculation) and yield strength using the MIC and PCC methods for the truncated (≤650 °C) dataset.

Full size table

There is a discrepancy between the results from MIC and PCC analyses, for example, MIC ranked wCo 1st (9th in PCC), while PCC ranked T2_VPV_M23C6 2nd (14th in MIC). This is attributed to the different algorithms in assigning in the strength of correlation. PCC only evaluates the strength of the linear relationship and MIC has an advantage over PCC when there is a non-linear correlation between input feature and target property. Detailed comparison of MIC and PCC analyses with different data structures are available in ref. ³⁷. It should be emphasized that the purpose of performing both MIC and PCC analyses in this study is not to rank one method over the other. Correlation analysis is a topic of its own, aiming to study the statistical relationship strength between two variables. It is also a category of feature selection approach that facilitates the choice of the most relevant input features for ML²³. The intent here is also to demonstrate that correlation analysis is necessary to validate whether or not underlying mechanisms have been efficiently captured by quantitatively evaluating the score of features considered. It can also be used to evaluate the quality of the consistency of a material dataset. The results of different correlation analyses can be further analyzed to inspire alloy design experts to generate alloy hypotheses.

Five ML models (i.e., BR, LR, RF, NN, and SVM) were trained using the truncated dataset. The results are shown in Fig. 6. Similarly, the number of top-ranking features based on the MIC and PCC analyses was varied to train these models. The accuracy of the models using the top-ranking features from the MIC and PCC analyses show similar trends. For example, increasing the top-ranking features from 5 to 10 for PCC, and from 5 to 15 for MIC increased the accuracy of these models significantly. After taking into account the top-ranking features, the accuracy of the BR, LR, RF, and SVM models was almost constant, with the NN model showing a monotonic decrease in accuracy. For the models utilized, it was necessary to include at least the top 10 features for PCC and the top 15 features for MIC to obtain good accuracy.

**Fig. 6: ML performance as a function of the number of top-ranking features.**

Regardless of the type and number of features used for the PCC and MIC analyses, the accuracy of the trained models in predicting yield strength were, in order: RF > SVM > NN > BR ≈ LR. More specifically, RF, NN, and SVM exhibited very high accuracy (R² > 0.9), while the maximum accuracy of the LR and BR models were ~0.85. For example, Fig. 7 shows the predicted yield strength using the RF model. It exhibits an excellent agreement with the experimentally determined yield strength. Although the accuracy of trained ML models with the dataset augmented by synthetic features is similar to those trained only with raw experimental data (see Fig. 2), the fidelity of these models is notably enhanced for LR, BR, and SVM. This is because the synthetic features we incorporated into the dataset are proved to be highly correlated with the yield strength of 9Cr steel. Moreover, the ML models still achieve very high accuracy even though the truncated dataset contains ~10% less data than the initial 9Cr dataset, mainly because the inconsistent data above 650 °C was eliminated. As such, we believe that the trained ML models (as described in this section) are more accurate and can provide more realistic predictions.

**Fig. 7: Experimental vs. predicted yield strength of the 9Cr steel with random forest (RF) with the top 10 features from MIC and PCC analyses.**

The high-fidelity surrogate models obtained in this work will allow prediction of the yield strength of hypothetical 9Cr alloys. However, in this case, additional work on predicting PAGS is required, as it was used as an input feature to predict the yield strength. For all features in groups 1 and 2 (see Fig. 1), PAGS is unique. The PAGS is an essential input for predicting Ms³⁸, which was previously identified as a highly relevant feature for yield strength and served as an important constraint in training high-fidelity surrogate models. Also, PAGS depends on various details of the composition and processing conditions. However, PAGS of an alloy can only be obtained by physical inspection, i.e., metallography. Thus, following the similar workflow in the present study, surrogate models for PAGS were trained using the truncated dataset. The predicted PAGS using the NN, RF, and SVM models is in excellent agreement with experimental data (see Supplementary Fig. 2 in Supplementary materials). As an example, a comparison between experimental and predicted PAGS of the 9Cr steel using the RF ML model is shown in Fig. 8. We believe the outstanding performance of trained ML models is attributed to the extremely high correlation between input features and PAGS (see the correlation scores of high-ranking features in Supplementary Table 2). The average MIC score of top 15 features is 0.933 ± 0.061, which is extremely high. The average scores of PCC are not as high as those of MIC, but the average score of top 10 is 0.660 ± 0.100, which can be regarded to be high. With the success of this approach, PAGS for any 9Cr steel alloys can be derived and used as input to predict yield strength via a data analytics approach as demonstrated in the present study.

**Fig. 8: Experimental vs. predicted PAGS of the 9Cr steel with random forest (RF) with the top 10 features from MIC and PCC analyses.**

In summary, we have demonstrated a workflow that can incorporate highly relevant physics into ML for predicting properties of complex heat-resistant alloys. Using a yield strength dataset of the 9–12 wt% Cr steel as an example, the approach has been described in detail. We augmented raw experimental data with key features that can capture both the microstructure and phase transformation of this class of alloy, i.e., the volume fraction of key phases, A3, and martensite phase transformation temperatures. It is worth mentioning that the present features could not capture the complex location- and size-specific microstructural detail of the secondary phases that form in the 9Cr alloys. It would be ideal to incorporate such detailed microstructure-related information into the data analytics workflow. However, obtaining such a large volume of high-fidelity microstructural details for all the alloy chemistries and processing conditions will be extremely time and cost-prohibitive.

We computed these synthetic features using high-fidelity thermodynamic models in a high-throughput manner. Critical evaluation of each temperature-based sub-datasets, including correlation analysis and ML training, showed that data above 650 °C are insufficient for correctly capturing the significant factors related to the yield strength of 9Cr steel due to the relative lack of experimental data and relevant microstructure features. Thus, this information was removed from the 9Cr dataset, and correlation analysis of this truncated dataset showed that the high-ranking features were in good agreement with the generally accepted strengthening mechanisms.

We tested the performance of representative ML models, i.e., RF, SVM, NN, BR, and LR, as a function of the number of top-ranking features. From this exercise, the top 10 features from PCC and the top 15 features from MIC are necessary to obtain good accuracy for all models. Among the ML models tested, the RF and SVM ones exhibited very high accuracy (R² > 0.95) for predicting 9Cr steel yield strength. In conclusion, this study demonstrated that high-fidelity surrogate models could be trained with highly relevant and physically meaningful features. Such physical constraints effectively prevent erroneously predicting properties of hypothetical candidate alloys when interrogating trained ML models in a data-driven materials design. We anticipate that the approach demonstrated in the present work can be further extended by integrating additional alloy physical/chemical features beyond what is achievable in this study.

Methods

Experimental dataset and synthetic alloy features via thermodynamic calculations

The raw experimental dataset was compiled by National Energy Technology Laboratory^8,9, USA, using the creep datasheet for high Cr steel³⁹ in the MatNavi materials database by the National Institute for Materials Science, Japan. The dataset is consists of compositions of 18 elements, processing and testing temperatures, and PAGS (converted from austenite grain size number). The state-of-the-art steel and Fe-alloys dataset TCFE9⁴⁰ was used to compute the volume fractions of the phases and the A3 temperature for each steel composition by the CALPHAD approach⁴¹. A recently developed thermodynamic model³⁸ (also implemented in Thermo-Calc software package^42,43) was adopted to calculate Ms temperatures. This analytical model, which is an extension of the models developed by Borgenstam and Hillert⁴⁴, and Stormvinter et al.⁴⁵, takes into account of the thermodynamic driving force for of FCC–BCC phase transformation as the major contribution as well as PAGS as a non-chemical contribution to predict Ms of a given 9Cr alloy. Raw experimental data were augmented with these synthetic features by the high-throughput calculation using Thermo-Calc, resulting in a dataset with 451 instances/rows, 45 input features/columns, and one target (0.2% yield strength), and the temperature range of RT to 800 °C.

Correlation analysis

The necessity of correlation analysis in materials data analytics is threefold: (1) validate if high-ranking features are consistent with generally accepted mechanisms; (2) provide a numerical/statistical basis for the selection of input features in the training of ML models; and (3) facilitate the generation of alloy hypotheses by identifying overlooked/hidden features in previous work. The correlation between the input features and the target was represented by PCC³⁴ and MIC³⁵. While PCC only evaluates the strength of the linear relationship, MIC identifies the strength of both linear and non-linear relationships. The correlation coefficient of PCC lies between −1 and 1, where 1 indicates a total positive linear correlation, −1 indicates a complete negative/reciprocal linear correlation, and 0 indicates no linear correlation. The closer the coefficient is to 1 or −1, the stronger the correlation between the two variables is. The correlation coefficient of MIC ranges between 0 and 1. The closer the coefficient is to 1, the stronger is that the correlation.

Machine learning

The performance of five representative ML models was studied: (1) LR²⁸, (2) BR^29,30, (3) k-NN³¹, (4) RF³², and (5) SVM³³. A different number of top-ranking features based on the ranking from MIC and |PCC| (i.e., the absolute value of the correlation coefficient of PCC) was used to train ML models and evaluate their performance. The hyperparameters of each model were tuned by using up to 150 iterations to identify the optimum parameters. Each model was trained ten times for a given set of features to determine the averaged accuracy and its standard deviation. The ranking from correlation analysis does not assign any hierarchical factor to the features, i.e., all features have the same weight in ML training regardless of their ranking. The coefficient of determination (R²) was adopted to represent the accuracy of ML models. The correlation analysis and ML were performed using the open-source data analytics frontend, Advanced data SCiEnce toolkit for Non-Data Scientists (ASCENDS)^46,47, which is available via GitHub (https://github.com/ornlpmcp/ASCENDS).

Depending on the flexibility of the ML models, overfitting or underfitting the data is possible. The k-fold approach⁴⁸ with k = 5 was used for the ML training. Four groups were used to train the ML model, and the one remaining group (i.e., unseen data) was withheld during training and later used as the validation data to evaluate the accuracy of models. Then we have trained the same ML model (i.e., the same feature set for a given ML algorithm) ten times to get the statistics for uncertainty quantification. As such, it ensured that the fitting of the ML models to the data was balanced.

Data availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

References

Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: recent applications and prospects. npj Comput. Mater. 3, 54 (2017).
Article Google Scholar
Ramakrishna, S. et al. Materials informatics. J. Intell. Manuf. 30, 2307–2326 (2019).
Article Google Scholar
Bock, F. E. et al. A review of the application of machine learning and data mining approaches in continuum materials mechanics. Front. Mater. 6, 110 (2019).
Article Google Scholar
Alberi, K. et al. The 2019 materials by design roadmap. J. Phys. D: Appl. Phys. 52, 013001 (2018).
Article Google Scholar
Rajan, K. Materials informatics. Mater. Today 8, 38–45 (2005).
Article CAS Google Scholar
Sourmail, T., Bhadeshia, H. K. D. H. & MacKay, D. J. C. Neural network model of creep strength of austenitic stainless steels. Mater. Sci. Technol. 18, 655–663 (2002).
Article CAS Google Scholar
Agrawal, A. et al. Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters. Integr. Mater. Manuf. Innov. 3, 90–108 (2014).
Article Google Scholar
Verma, A. K. et al. Mapping multivariate influence of alloying elements on creep behavior for design of new martensitic steels. Metall. Mater. Trans. A 50, 3106–3120 (2019).
Article CAS Google Scholar
Verma, A. K. et al. Screening of heritage data for improving toughness of creep-resistant martensitic steels. Mater. Sci. Eng. A, 763, 138142 (2019).
Zhang, M. et al. High cycle fatigue life prediction of laser additive manufactured stainless steel: a machine learning approach. Int. J. Fatigue 128, 105194 (2019).
Article CAS Google Scholar
Bhadeshia, H. K. D. H. & Sourmail, T. Design of creep-resistant steels: success & failure of models. Jpn. Soc. Promot. Sci. Comm. Heat.-Resist. Mater. Alloy. 44, 299–314 (2003).
Google Scholar
Dimitriu, R. C. & Bhadeshia, H. K. D. H. Hot strength of creep resistant ferritic steels and relationship to creep rupture data. Mater. Sci. Technol. 23, 1127–1131 (2007).
Article CAS Google Scholar
Bhadeshia, H. K. D. H. Neural networks in materials science. ISIJ Int. 39, 966–979 (1999).
Article CAS Google Scholar
Shin, D., Lee, S., Shyam, A. & Haynes, J. A. Petascale supercomputing to accelerate the design of high-temperature alloys. Sci. Technol. Adv. Mater. 18, 828–838 (2017).
Article Google Scholar
Wen, C. et al. Machine learning assisted design of high entropy alloys with desired property. Acta Mater. 170, 109–117 (2019).
Article CAS Google Scholar
Huang, W., Martin, P. & Zhuang, H. L. Machine-learning phase prediction of high-entropy alloys. Acta Mater. 169, 225–236 (2019).
Article CAS Google Scholar
Zhang, Y. et al. Phase prediction in high entropy alloys with a rational selection of materials descriptors and machine learning models. Acta Mater. 185, 528–539 (2020).
Meredig, B. Five high-impact research areas in machine learning for materials science. Chem. Mater. 31, 9579–9581 (2019).
Article CAS Google Scholar
Kalidindi, S. R. Data science and cyberinfrastructure: critical enablers for accelerated development of hierarchical materials. Int. Mater. Rev. 60, 150–168 (2015).
Article CAS Google Scholar
Panchal, J. H., Kalidindi, S. R. & McDowell, D. L. Key computational modeling issues in integrated computational materials engineering. Comput. Aided Des. 45, 4–25 (2013).
Article Google Scholar
Zhao, J. C. & Henry, M. F. CALPHAD—is it ready for superalloy design? Adv. Eng. Mater. 4, 501–508 (2002).
Article CAS Google Scholar
Kalidindi, S. R. & De Graef, M. Materials data science: current status and future outlook. Annu. Rev. Mater. Res. 45, 171–193 (2015).
Article CAS Google Scholar
Shin, D., Yamamoto, Y., Brady, M. P., Lee, S. & Haynes, J. A. Modern data analytics approach to predict creep of high-temperature alloys. Acta Mater. 168, 321–330 (2019).
Article CAS Google Scholar
Shen, C. et al. Physical metallurgy-guided machine learning and artificial intelligent design of ultrahigh-strength stainless steel. Acta Mater. 179, 201–214 (2019).
Article CAS Google Scholar
Abe, F. in Proceedings of the Materials for Advanced Power Engineering, COST Conference, Liege, Belguim, September 18–20, 2020.
Washko, S. & Aggen, G. ASM Handbook Volume 1, Properties and Selection: Irons, Steels, and High-Performance Alloys (ASM International, 1990).
Dossett, J. L. & Totten, G. E. ASM Handbook, Volume 4D: Heat Treating of Irons and Steels, 382–396 (ASM International, 2014).
Freedman, D. A. Statistical Models: Theory and Practice. 26 (Cambridge University Press, 2009).
MacKay, D. J. Bayesian interpolation. Neural Comput. 4, 415–447 (1992).
Article Google Scholar
Tipping, M. E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1, 211–244 (2001).
Google Scholar
Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992).
Google Scholar
Barandiaran, I. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998).
Article Google Scholar
Awad, M. & Khanna, R. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers (Apress, 2015).
Sedgwick, P. Pearson’s correlation coefficient. BMJ 345, e4483 (2012).
Article Google Scholar
Reshef, D. N. et al. Detecting novel associations in large data sets. science 334, 1518–1524 (2011).
Article CAS Google Scholar
Abe, F. Strengthening mechanisms in steel for creep and creep rupture, in Creep-Resistant Steels (eds Kern, T. U., Abe, F. & Viswanathan, R.) 279–304 (Woodhead Publishing Series in Metals and Surface Engineering, 2008).
Kortum, F., Karras, O., Klünder, J. & Schneider, K. in Proceedings of International Conference on Product-Focused Software Process Improvement. 725–740 (Springer, 2019).
Hanumantharaju Gulapura, A. K. Thermodynamic Modelling of Martensite Start Temperature in Commercial Steels, Master thesis (KTH Royal Institute of Technology, 2018).
National Research Institute for Metals, NIMS Materials Database (MatNavi), Creep Data Sheet, Category: High Cr Steels, Technical Reports 10B, 13B, 19B, 43A, 46A, 48B, 51A, 52A (NIMS, Japan, 1994–2018). https://smds.nims.go.jp/creep/en/.
Thermo-Calc Software AB, TCFE9: TCS Steel and Fe-alloys Database, 2019, https://www.thermocalc.com/media/10306/tcfe9_extended_info.pdf.
Lukas, H. L., Fries, S. G. & Sundman, B. Computational Thermodynamics: The Calphad Method 131 (Cambridge University Press, Cambridge, 2007).
Book Google Scholar
Andersson, J.-O., Helander, T., Höglund, L., Shi, P. & Sundman, B. Thermo-Calc & DICTRA, computational tools for materials science. Calphad 26, 273–312 (2002).
Article CAS Google Scholar
Sundman, B., Jansson, B. & Andersson, J.-O. The thermo-calc databank system. Calphad 9, 153–190 (1985).
Article CAS Google Scholar
Borgenstam, A. & Hillert, M. Driving force for fcc → bcc martensites in Fe-X alloys. Acta Mater. 45, 2079–2091 (1997).
Article CAS Google Scholar
Stormvinter, A., Borgenstam, A. & Ågren, J. Thermodynamically based prediction of the martensite start temperature for commercial steels. Metall. Mater. Trans. A 43, 3870–3879 (2012).
Article CAS Google Scholar
Lee, S., Peng, J., Williams, A. & Shin, D. ASCENDS: advanced data SCiENce toolkit for non-data scientists. J. Open Source Softw. 5, 1656 (2020).
Article Google Scholar
Peng, J., Lee, S., Williams, A., Haynes, J. A. & Shin, D. Advanced data science toolkit for non-data scientists—a user guide. Calphad 68, 101733 (2020).
Article CAS Google Scholar
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning. Vol. 112, 181 (Springer, 2013).

Download references

Acknowledgements

This research was sponsored by the US Department of Energy, Office of Fossil Energy, eXtreme environment MATerials (XMAT) consortium. This research used resources of the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. The authors thank YiYu Wang for valuable discussion and Chris Layton for his support on using CADES. This paper has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this paper, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Author information

Authors and Affiliations

Materials Science and Technology Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Jian Peng, Yukinori Yamamoto, Edgar Lara-Curzio & Dongwon Shin
Materials Performance Division, National Energy Technology Laboratory, Albany, OR, 97321-2198, USA
Jeffrey A. Hawk

Authors

Jian Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yukinori Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Hawk
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Lara-Curzio
View author publications
You can also search for this author in PubMed Google Scholar
Dongwon Shin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.S. conceived the study. J.A.H provided the dataset. J.P. performed correlation analysis and machine learning training. J.P., Y.Y, and D.S. analyzed the data. J.P. drafted the paper. Y.Y., J.A.H., E.L-C., and D.S. reviewed the paper.

Corresponding author

Correspondence to Dongwon Shin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Peng, J., Yamamoto, Y., Hawk, J.A. et al. Coupling physics in machine learning to predict properties of high-temperatures alloys. npj Comput Mater 6, 141 (2020). https://doi.org/10.1038/s41524-020-00407-2

Download citation

Received: 09 April 2020
Accepted: 19 August 2020
Published: 18 September 2020
DOI: https://doi.org/10.1038/s41524-020-00407-2

This article is cited by

A neural network model for high entropy alloy design
- Jaemin Wang
- Hyeonseok Kwon
- Byeong-Joo Lee
npj Computational Materials (2023)
Prediction of sintered density of binary W(Mo) alloys using machine learning
- He-Xiong Liu
- Yun-Fei Yang
- Jin-Shu Wang
Rare Metals (2023)
Tempered Hardness Optimization of Martensitic Alloy Steels
- Heather A. Murdoch
- Daniel M. Field
- Krista R. Limmer
Integrating Materials and Manufacturing Innovation (2023)
A Neural Network Approach to Predict Gibbs Free Energy of Ternary Solid Solutions
- Paul Laiu
- Ying Yang
- Dongwon Shin
Journal of Phase Equilibria and Diffusion (2022)
Design of Ni-based turbine disc superalloys with improved yield strength using machine learning
- Bin Xu
- Haiqing Yin
- Xuanhui Qu
Journal of Materials Science (2022)