Expanded dataset of mechanical properties and observed phases of multi-principal element alloys

This data article presents a compilation of mechanical properties of 630 multi-principal element alloys (MPEAs). Built upon recently published MPEA databases, this article includes updated records from previous reviews (with minor error corrections) along with new data from articles that were published since 2019. The extracted properties include reported composition, processing method, microstructure, density, hardness, yield strength, ultimate tensile strength (or maximum compression strength), elongation (or maximum compression strain), and Young’s modulus. Additionally, descriptors (e.g. grain size) not included in previous reviews were also extracted for articles that reported them. The database is hosted and continually updated on an open data platform, Citrination. To promote interpretation, some data are graphically presented. Measurement(s) alloy • oxygen content • carbon content • nitrogen content • yield strength • elongation • ultimate strength • hardness • density • grain size • observed phases Technology Type(s) digital curation Factor Type(s) alloy processing method • test temperature • test type • alloy composition Measurement(s) alloy • oxygen content • carbon content • nitrogen content • yield strength • elongation • ultimate strength • hardness • density • grain size • observed phases Technology Type(s) digital curation Factor Type(s) alloy processing method • test temperature • test type • alloy composition Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12804137


Background & Summary
Traditional engineering alloys consist of a single principal element (e.g., Fe in steels and Ni in superalloys) and one or more solute elements present in much lower concentrations than the principal element. In contrast, multi-principal element alloys (MPEAs), also called complex concentrated alloys (CCAs), are a class of alloys where no single element dominates the composition and 3 or more principal elements are present in significant amounts. The term high entropy alloy (HEA) is often used to describe MPEAs with 5 or more principal elements and medium entropy alloy typically describes MPEAs with 3 or 4 principal elements. These alloys exhibit unique and extensively tunable properties compared to traditional single principal element alloys [1][2][3][4][5][6][7][8][9][10][11] .
A primary driver of interest in MPEAs is the significant expansion in compositional design space for new alloy development made available compared to traditional alloys 12 . Assuming a palette of 30 elements to choose from, there are approximately 143,000 potential 5-component systems and 594,000 potential 6-component systems to explore, with countless compositions within each system to synthesize and characterize, often with unknown processing routes. This large design space presents a challenge, since examining each system experimentally is prohibitively expensive. As such, there has been recent interest in employing computational and data-driven methods to accelerate exploration of MPEA systems and identify promising candidates for experimental study 13,14 .
Since the approach for MPEA design was defined in 2004 1,2 , there has been a growing body of work in the literature exploring these systems experimentally, with a focus on mechanical properties. An accurate accounting of high quality data from these studies is necessary to aid in further MPEA development, such as identifying gaps in design space, training machine learning models, flagging of outliers, etc. Given the large interest in this class of alloys, data on new systems are rapidly being published, necessitating frequent database updates to maintain relevancy. The updated MPEA mechanical properties database presented here combines data from previous reviews, makes corrections to data, and adds new data from articles published in 2019. The complete database will be hosted online in conjunction with a template to ensure routine updating and public availability of the database.  The database generation workflow. Records are first extracted from various publications and input into a defined template format. Post-processing tools are used to identify outliers or erroneous data points. A detailed review of the number of records and properties contained in the resultant database is presented in Table 2.
www.nature.com/scientificdata www.nature.com/scientificdata/ During extraction and digitization of the initial database, various typos and extraction errors were identified and corrected. Once digitized, the initial database was combined with the newly extracted data and put into single spreadsheet, as demonstrated in Table 1.
To identify new sources of MPEA data, a keyword search for "high entropy alloy" was conducted on Web of Science (query performed October 2019) and responses were filtered for articles published in 2019. From this query, 136 articles were identified as potentially viable sources of experimental MPEA mechanical property data (i.e. articles reporting single and multiphase materials with a minimum of three elements). Defined in detail in the Data Records section, relevant mechanical property data were extracted from plots, tables, and text and input into a tabular format. To extract data from plots, webplotdigitizer 17 was employed. The newly extracted data were combined with the previously digitized data to complete the database. A high level overview of the extraction workflow is provided in Fig. 1.
Data from future publications. For any data that are relevant, but not present in the current review, researchers are encouraged to make their own contributions. Using the template provided on GitHub 18 data extraction and digitization can be performed by many groups asynchronously. This template is formatted such that data can be easily uploaded to Citrination, an online platform for materials data 19 . Upon notification, any data added to the database on Citrination will be verified for integrity by the authors. Researchers are also encouraged to upload their data to other open data resources and contact the authors directly for integration with the MPEA database.

Data records
The database contains 1545 records from 265 articles. An individual record is defined as having a unique composition, property, temperature, reference combination. For example, if two articles measured the yield strength of HfNbTaTiZr at five temperatures, the number of records extracted is ten. On a per record basis, this database presents a > 100% improvement in the amount of available data when compared to the data presented in the 2018 reviews.
The data in the database are extracted to best represent the data made available by the authors. Often, not all properties in the database are reported for every record. For example, despite the importance of grain size and interstitial contents on properties, particularly for refractory MPEAs, these features are missing from many articles. The data are made available on Figshare 20 and in various tabular formats on the project GitHub 18 . The data have also been digitized into Physical Information File (PIF) records, an open-source json-based schema for materials data 21 . PIF records are hosted on Citrination (https://citrination.com/datasets/190954) to provide easy access for data visualizations and machine learning. Each data source will be updated continuously as more data are extracted.
The database records consist of the following fields, as available: • Alloy composition: Normalized and alphabetized nominal alloy composition, in atomic percent. Validation and alphabetization were performed using the Pymatgen Composition module 22 . • Microstructure: The experimentally observed phases (e.g. FCC, BCC, B2). Any phases that were not BCC, FCC, HCP, L12, B2, or Laves were labeled as "Sec. = secondary" or "Other". • Processing method: The conditions under which the alloy was synthesized. CAST = as-cast or directional casting. POWDER = gas atomization, mechanical alloying, sintering, spark plasma sintering, or vacuum hot pressing. WROUGHT = cold-rolled, hot-rolled, or hot-forged. ANNEAL = annealed, homogenized, or aged. OTHER = additive manufacturing, hot isostatic pressing, or severe plastic deformation. • Grain size (μm): The average grain size of the alloy. www.nature.com/scientificdata www.nature.com/scientificdata/ A portion of the database (the 25 alloys with the highest yield strength at room temperature) is highlighted in Table 1 to provide an example for how data are stored in each field. This is only a subset of the properties collected in the database. Statistics on all properties extracted for the database are presented in Table 2. , other (magenta)). "Other" is defined as any report of an MPEA that is either multiphase, or single-phase but not FCC or BCC. . "Other" is defined as any report of an MPEA that is either multiphase, or single-phase but not FCC or BCC. The trend may suggest that BCC MPEAs have better hightemperature strength, but also highlights the lack of data available for FCC MPEAs. (2020) 7:430 | https://doi.org/10.1038/s41597-020-00768-9 www.nature.com/scientificdata www.nature.com/scientificdata/ Figure 2 illustrates the relationship between yield strength and elongation for compressive and tensile tests. Figure 3 illustrates the temperature dependence of yield strength across three microstructure classifications (single-phase BCC, single-phase FCC, and multiphase/other).
technical Validation review by domain experts. The data were collected, processed, and verified for accuracy by a team familiar with MPEAs and their properties. This domain-knowledge was useful during data compilation and formatting of the dataset.

Extreme value identification.
During the processing of the database, various statistical plots were generated to assist in the identification of outliers and subsequent removal or correction of inaccurate data. Figure 4 is provided as an example of the outlier identification process. Box plots are generated for properties of interest and extreme values in the tails of the distribution are investigated. Extreme values that could not be verified were either removed or corrected.

Usage Notes
This expanded dataset on MPEAs is intended for use to guide experiment for future alloy development. As shown in Figs. 2 and 3 the dataset can produce informative visualizations to guide researcher efforts. Each record can be accessed programmatically via the Citrination API 23 . In conjunction with traditional Python data processing packages (e.g. pandas) the dataset will be useful as training data for machine learning applications. To ensure data quality, each record is associated with a digital object identifier (DOI) link to the original source. To improve the predictive capabilities of subsequent machine learning models, researchers are encouraged to contribute to this database through the addition of new data as it is generated.

Code availability
Data processing, validation and statistical plotting were performed using visualization tools on Citrination and Jupyter notebooks 24 in a Python 3 25 environment. The code is available on GitHub (https://github.com/ CitrineInformatics/MPEA_dataset). Step 1: Box plots were generated for properties of interest (e.g. alloy hardness) and the source of any extreme values were investigated.
Step 2: In this case, the inaccuracy was units-related; the value was recorded as in units of GPa, however the database expected units of HV.
Step 3: The value with correct units was updated in place of the originally recorded value. Original source reproduced with data from Jumaev et al. 26 .