JARVIS: An Integrated Infrastructure for Data-driven Materials Design

Kamal Choudhary, Kevin F. Garrity, Andrew C. E. Reid, Brian DeCost, Adam J. Biacchi, Angela R. Hight Walker, Zachary Trautt, Jason Hattrick-Simpers, A. Gilad Kusne, Andrea Centrone, Albert Davydov, Jie Jiang, Ruth Pachter, Gowoon Cheon, Evan Reed, Ankit Agrawal, Xiaofeng Qian, Vinit Sharma, Houlong Zhuang, Sergei V. Kalinin, Bobby G. Sumpter, Ghanshyam Pilania, Pinar Acar, Subhasish Mandal, Kristjan Haule, David Vanderbilt, Karin Rabe, Francesca Tavazza

Especially in the field of computational materials design, quantum mechanics-based density functional theory (DFT) 23 has proven to be an immensely successful technique, and several databases of automated DFT calculations are widely used in materials design applications. Despite their successes, existing DFT databases face limitations due to issues intrinsic to conventional DFT approaches, e.g. the generalized gradient approximation of Perdew-Burke-Ernzerhof (GGA-PBE) 23,24 . Drawbacks of the existing DFT databases include non-inclusion of van der Waals (vdW) interactions 8 , bandgap underestimations 25 , non-inclusion of spin-orbit coupling 7 , overly simplifying magnetic ordering 26 , neglecting defects 27 (point, line, surface and volume), unconverged computational parameters such as k-points 28 , ignoring temperature effects 29 (generally DFT calculations are performed at 0 K), lack of layer/thickness-dependent properties of low dimensional materials 30 , and lacking interfaces/heterostructures of materials 31 , all of which can be critical for realistic material-applications. Additionally, there are several other computational approaches, such as classical force-field (FF) 32 , computational microscopy, phasefield (PF), CALculation of PHAse Diagrams (CALPHAD) 33 , and Orientation Distribution Functions (ODF) 34 which lack the integrated tools and databases that have been developed for DFT-based computational approaches. Finally, the integration of computational approaches with experiments, the application of statistical uncertainty analysis, and the implementation of data analytics and artificial intelligence (AI) techniques require significant developments to meet the goals set forth by the MGI.

, Novel
Materials Discovery (NOMAD) 9 , Computational Materials Repository (CMR) 35 42 , and Phase-Field hub (PFhub) 43 . Some of the commonly used computational-tools are Python Materials Genomics (PYMATGEN) 44 , Atomic Simulation Environment (ASE) 45 , Automated Interactive Infrastructure and Database (AIIDA) 6 and MPinterfaces 46 . The data most commonly included in these databases consists of crystal structures, formation energies, bandgaps, elastic constants, Poisson ratios, piezoelectric constants, and dielectric constants. These material properties can be used directly to screen for potentially interesting materials for a given application as candidates for experimental synthesis and characterization, as well as part of a PSPP design approach to better understand the factors driving material performance. Beyond the directly calculated material properties mentioned above, several new selection metrics are also being developed to aid materials design, such as scintillation attenuation length 47 , thermoelectric complexity factor 48 , spectroscopy limited maximum efficiency 49,50 , exfoliation energy 8 , and spin-orbit spillage 7,26,51 .
Akin to DFT-like standard computational approaches that are used as screening tools for experiments, machine learning (ML) [14][15][16]52 models for materials design are being developed as pre-screening tools for other conventional computational methods such as DFT. In addition, ML tools are proposed to accelerate experimental methods directly based on computational data 53  Started in 2017, JARVIS-DFT 7,8,[25][26][27]30,31,49,53,55 is a repository based on DFT calculations that mainly uses the vdW-DF-OptB88 van der Waals functional 56 . The database also uses beyond-GGA approaches for a subset of materials, including the Tran-Blaha modified Becke-Johnson (TBmBJ) meta-GGA 57 , the hybrid functional PBE0, the hybrid range-separated functional Heyd-Scuseria-Ernzerhof (HSE06), Dynamical Mean Field Theory (DMFT), and G0W0. In addition to hosting conventional properties such as formation energies, bandgaps, elastic constants, piezoelectric constants, dielectric constants, and magnetic moments, it also contains unique datasets, such as exfoliation energies for van der Waals bonded materials, the spin-orbit coupling (SOC) spillage, improved meta-GGA bandgaps, frequency-dependent dielectric functions, the spectroscopy limited maximum efficiency (SLME), infrared (IR) intensities, electric field gradients (EFG), heterojunction classifications, and Wannier tight-binding Hamiltonians. These datasets are compared to experimental results wherever possible to evaluate their accuracy as predictive tools. JARVIS-DFT also introduced protocols such as automatic k-point convergence, which can be critical for obtaining precise and accurate results. JARVIS-DFT is distributed through the website: https://www.ctcms.nist.gov/~knc6/JVASP.html.
The JARVIS-FF 27,58 database, also started in 2017, is a repository of classical forcefield/potential computational data intended to help a user select the most appropriate force-field for a specific application. Many classical force-fields are developed for a particular set of properties (such as energies), and may not have been tested for properties not included in training (such as elastic constants, or defect formation energies). JARVIS-FF provides an automatic framework to consistently calculate and compare basic properties, such as the bulk modulus, defect formation energies, phonons, etc., that may be critical for specific molecular-dynamics simulations. JARVIS-FF relies on DFT and experimental data to evaluate accuracy. JARVIS-FF is distributed through the website: https://www.ctcms.nist.gov/~knc6/periodic.html.
The JARVIS-ML 49,53,55,59,60 is a repository of machine learning (ML) model parameters, descriptors, and ML-related input and target data. JARVIS-ML introduced Classical Force-field Inspired Descriptors (CFID) in 2018 as a universal framework to represent a material's chemistrystructure-charge related data. With the help of CFID and JARVIS-DFT data, several high-accuracy classification and regression ML models were developed, with applications to fast materialsscreening and energy-landscape mapping. Some of the trained property models include formation energies, exfoliation energies, bandgaps, magnetic moments, refractive indexes, dielectric constants, thermoelectric performance, and maximum piezoelectric and infrared modes. Also, several ML interpretability analyses have provided physical-insights beyond intuitive materialsscience knowledge 59 . These models, the workflow, the datasets, etc. are disseminated to enhance the transparency of the work. Recently, JARVIS-ML was expanded to include ML models to analyze STM-images in order to directly accelerate the interpretation of experimental images.
Graph convolution neural network models are currently being developed for automated handling of images and crystal-structure analysis in materials science. JARVIS-ML is distributed through the website: https://www.ctcms.nist.gov/jarvisml. JARVIS-Tools is the underlying computational framework used for automation, data-generation, data-handling, analysis and dissemination of all the above repositories. JARVIS-Tools uses cloudbased continuous integration, low-software dependency, auto-documentation, Jupyter and Google-Colab notebook integration, pip installation and related strategies to make the software robust and easy to use. JARVIS-Tools also hosts several examples to enable a user to reproduce the data in the above repositories or to apply the tools for their own applications. JARVIS-Tools are provided through the GitHub page: https://github.com/usnistgov/jarvis. This paper is organized as follows: 1) we introduce the main computational techniques, organized by the time and length scales, 2) we illustrate JARVIS-Tools and its functionalities, 3) we discuss the contents of the major JARVIS databases, 4) we demonstrate some of the derived applications, and 5) we discuss outstanding challenges and future work.

Overview of computational techniques
There are many computational tools for simulating realistic materials depending on the time and length scales of interest 61 . Before we discuss the details of JARVIS, we will provide a brief list of these techniques and highlight their range of applicability, as summarized in Fig. 1. Relevant techniques include quantum mechanical computations, classical/molecular mechanics, mesoscale modeling, finite element analysis, and engineering design. Each of these methodologies has its own ontology and semantics for describing themselves and the PSPP relationship. For example, 'structure' may imply electronic configurations in the quantum regime, atomic arrangement in molecular mechanics, microstructure, segments in phase field-based mesoscale modeling, and mesh-structure in finite element analysis. Material properties are calculated using corresponding physical laws such as the Schrödinger equation in the quantum regime, or Newton's laws of motion for classical regimes. For realistic material design, it is important to integrate these methods. A major challenge for multiscale modeling is propagating the results of one simulation into another while capturing the relevant physics. Artificial Intelligence (AI) techniques have been applied in each of these domains and can be used to integrate the methods to a certain extent 14 . In JARVIS, we primarily focus on atomistic-based classical and quantum simulations and machine-learning, but we also attempt to integrate other simulation methods with our atomistic data for a few specific applications.

Software and databases
The JARVIS infrastructure (Fig. 2) is a combination of databases and tools for running and integrating some of the computational methods mentioned above. The general procedure for adding a dataset to JARVIS is as follows. We start with the goal of finding or designing a material to display or optimize a given property. Then, we decide on an appropriate computational method, as well as a computationally efficient way to screen for the best candidate materials. The screening process can proceed in several steps, with computationally inexpensive methods applied first, followed by more computationally intensive methods on the remaining materials. Whenever possible, the data is compared with available experiments to evaluate the accuracy and quality of the database. Once a large enough dataset is generated, machine learning techniques can be utilized to accelerate the traditional computational approaches.
As an example, we consider the goal of finding materials to maximize solar-cell efficiency, where the appropriate computational tool is DFT. We develop a screening criterion (Spectroscopic Limited Maximum Efficiency, SLME) and calculate the necessary properties (dielectric function and band gap). We test the method by comparing known materials to experiment, and we perform more accurate meta-GGA and GW calculations as additional screening and validation steps.
Finally, we develop a machine learning model to accelerate future materials design.
The database component of JARVIS consists of JARVIS-DFT for DFT calculations and JARVIS-FF for molecular dynamics simulations. JARVIS-ML hosts several machine learning models based on our datasets. JARVIS-Tools contains tools for automating, post-processing and disseminating generated data, as well as several derived applications such as JARVIS-Heterostructure. We also include precision and accuracy analyses of the generated data, which consists of comparing DFT data with experiments, comparing FF data with DFT, comparing ML models with DFT, etc. As a lower-level technique (see Fig. 1), JARVIS-DFT data can be fed into JARVIS-FF and JARVIS-ML models, but not vice versa. We use JARVIS-ML to accelerate both JARVIS-DFT and JARVIS-FF. In this way, the JARVIS-infrastructure establishes a joint integration for automation and generation of repositories. We provide several social-media platforms to build a community of interest. Some of the key resources for the JARVIS-infrastructure are shown in Table 1.    An example python class in JARVIS-Tools is 'Atoms'. It uses atomic coordinates, element types and lattice vectors to build an 'Atoms' object from which several properties, such as density and chemical formula, can be calculated. This 'Atoms' class, along with several other modules (discussed later), can be used for setting up calculations with external software packages. An example of the 'Atoms' class is shown in Fig. 4.

Fig. 4 Examples of using python classes in JARVIS-Tools for constructing 'Atoms' class and downloading data.
The 'Atoms' class along with many other modules in JARVIS-Tools are used to generate input files for automating software codes. Currently, JARVIS-Tools can be used to automate DFT calculations with packages such as Vienna Ab-initio simulation package (VASP) 62 Later, custom jobs can also be run on the optimized structure using 'VaspJob', such as A similar workflow is shown for an example of FF based on LAMMPS calculations in Fig. 5b.
Here, for a particular force-field such as Ni-Al 58 , for example, all the structures related to Ni, Al and Ni-Al are obtained from the DFT database and converted into a LAMMPS input format using 'Atoms', 'LammpsData' and 'LammpsJob' objects. Then a series of geometry optimization, vacancy formation energy, surface energy, and phonon-related calculations are run, based on the symmetry of the structure. All of these steps use a set of ".mod" module files with input parameters that control respective LAMMPS calculations. The obtained results are compared with corresponding DFT data, to evaluate the quality of an FF for a particular system or simulation.
In machine learning calculations, the input materials-data is transformed into several machinereadable descriptors 71

Fig. 5 Flowcharts showing some of the main steps used in most-commons calculations a) JARVIS-DFT, b) JARVIS-FF and c) JARVIS-ML workflows.
After running the automated calculations, the data is post-processed to predict various material properties (such as bandgap, formation energy, spin-orbit spillage, SLME, density of states,  2D, 1D and 0D materials. This functional has been shown to provide accurate predictions for lattice-parameters and energetics for both vdW and non-vdW bonded materials 30 . In addition to hosting 3D bulk materials, the database consists of 2D monolayer, 1D-nanowire, and 0Dmolecular materials (as shown in Table 2). However, to date, 3D and 2D materials have primarily been distributed publicly. Moreover, other exchange-correlation functionals are considered (as shown in Table 3), which can help estimate the prediction uncertainty. While vdW-DF-OptB88 can predict accurate lattice parameters and formation energies, bandgaps are still underestimated.
Calculations with hybrid functionals (such as range-separated HSE06 and PBE0) and many-body approaches (such as G0W0) remain too computationally expensive 23 to use in a high-throughput methodology for thousands of materials. Hence, a meta-GGA Tran-Blaha-modified Becke-Johnson (TBmBJ) potential is used to provide a good balance between computational expense and accuracy. The TBmBJ accuracy is shown to be close enough to the high-level methods such as HSE06 at up to ten times lower computational expense 57 . Accurate prediction of optical gaps by calculation of the frequency-dependent dielectric function is important for several applications, for example, solar-cell efficiency calculations. Accurate prediction of bandgaps also helps in obtaining accurate frequency-dependent dielectric functions, which can be critical for solar-cell efficiency calculations; however, TBmBJ cannot describe the excitonic nature of electron-hole pairs in lowdimensional materials. In addition to TBmBJ, we are generating HSE06, PBE0, G0W0 and DMFT datasets, which can be considered as beyond-DFT methods discussed in the next section. Next, SOC is varied to analyze the differences introduced by this coupling. These differences are used to discover 3D and 2D topological materials. In addition, several new DFT databases are developed including properties such as frequency-dependent dielectric function and electric field gradient. A few important protocols such as k-point automatic convergence are also introduced. A snapshot of the JARVIS-DFT website along with a list of properties that are available is shown in Fig. 7. JARVIS-DFT has several filtering options on the website to screen candidate materials.
We provide the input files as downloadable .zip files, especially for the users who do not have much expertise in using python-based codes. Raw input and output files (on the order of 1 terabyte)

JARVIS-Beyond-DFT
While quantum mechanical methods in single-particle theories such as DFT or DFT+U methods (mainly GGA) are fast and can predict accurate results for most structural parameters, even when relatively strong electron correlations are present, qualitative predictions of excited state properties may require beyond-DFT methods 75 . Beyond-DFT calculations have been applied to many materials systems, including cuprates and Fe-based high-temperature superconductors, Mott insulators, heavy Fermion systems, semiconductors, photovoltaics, and topological Mott insulators 75 . In the last few decades, both perturbative and stochastic approaches have been developed to understand these strongly correlated materials. These approaches, such as the GW approximation, Dynamical Mean Field Theory (DMFT) 76 , or hybrid functionals are often called beyond-DFT methods since they go beyond the limit of semilocal DFT. The materials design community needs to have a way of answering the question of whether, in a particular case, it is necessary to use a beyond-DFT method, and most importantly which method to use. In the JARVIS-Beyond-DFT database we are building a database of spectral functions and related quantities as computed using meta-GGA, GW, hybrid functionals, and LDA+DMFT for head-tohead comparison on 100+ materials.
In the JARVIS-Beyond-DFT 75 database we try to answer a few key questions regarding discoveries through a materials database for quantum materials. First, where is it necessary to use a beyond-DFT method, and which method to be use? Second, how do different "beyond-DFT" methods compare with experiments? Target materials include but are not limited to various transition metal oxides, perovskites and mixed perovskites, nickelates, transition metal dichalcogenides, and a wide range of metals starting from alkali metals to transition metals, and various Iron-based superconductors. JARVIS-Beyond-DFT will be distributed through the website: https://www.ctcms.nist.gov/~knc6/BDFT.html.

JARVIS-FF
Classical force-field-/interatomic-potential-based simulations are the workhorse technique for large scale atomistic simulations. They are especially suited for temperature-dependent and defectrelated phenomena. Several varieties of FFs differ based on the materials system and the underlying phenomena under investigation, e.g., whether they include bond-angle information and fixed or dynamic charges. Also, they are generally designed for particular applications and phases, making it difficult to ascertain whether they will perform well in simulations for which they were  Table. 5 . Furthermore, we plan to include several recently developed machine learning force-fields into JARVIS-FF. A snapshot of the JARVIS-FF website is also shown in Fig. 8.

JARVIS-ML
Machine learning has several applications in materials science and engineering 14,80,81 , such as automating experimental data analysis, discovering new functional materials, optimizing known ones by accelerating conventional methods such as DFT, automating literature searches, discovering new physical equations, and efficient clustering of materials and their properties.
There are several data types that can be used in ML such as scalar data (e.g., formation energies, bandgaps), vector/spectra data (e.g., density of states, dielectric function, charge density, X-ray diffraction patterns, etc.), image-based data (such as scanning tunneling microscopy and transmission electron microscopy images), and natural language processing-based data (such as scientific papers). In addition, ML can be applied on a variety of materials classes such as bulk crystals, molecules, proteins and free-surfaces.
Currently, there are two types of data that are machine-learned in JARVIS-ML 49,53,55,59,60 : discrete and image-based. The discrete target is obtained from the JARVIS-DFT database for 3D and 2D materials. There have been several descriptor developments as attempts to capture the complex chemical-structural information of a material 71 . We compute CFID descriptors for most crystal structures in various databases (as shown in Table. 6). Many of these structures are non-unique but can still be used for pre-screening applications 49  descriptors. More details can be found in Ref. 59 . Currently, we provide CFID descriptors only, but other descriptors such as Coulomb-matrix, and sine-matrix will be provided soon. With CFID descriptors, we trained several classification and regression tasks. Once these models are trained, parameters are stored that can predict the properties of an arbitrary compound quickly. We developed a web-based application to host the trained models, as shown in Fig. 9, and a list of the trained properties are displayed there as well. We note that classical quantities such as bulk modulus, maximum infrared (IR) active mode, and formation energies can be accurately trained, especially with regression models. For other properties such as bandgaps, magnetic moments, piezoelectric coefficients, thermoelectric coefficients, high accuracy models are obtained for classification tasks only. In addition to the descriptor-based data, we develop Scanning Tunneling Microscopy (STM) 53 image classification models that can be used to accelerate the analysis of STM data. The images are converted into a black/white image to identify spots with/without atoms.
The model's accuracy is compared with respect to DFT data or experiments wherever applicable.

Derived apps
The knowledge developed through the above-mentioned databases and tools can serve as static content, as well as accessed through dynamic user-defined inputs. Derived applications (apps) are designed to help a user analyze the combinatorics in the data. Based on the databases and tools discussed above, several apps are derived from JARVIS such as JARVIS-Heterostructure 31 , JARVIS-Wannier TB, and JARVIS-ODF. JARVIS-Heterostructure (as shown in Fig. 10a) can be used to characterize heterojunction type and modeling interfaces for exfoliable 2D materials. We classify these heterostructures into type-I, II and III systems according to Anderson's rule, which is based on the band-alignment with respect to the vacuum potential of non-interacting monolayers, obtained from JARVIS-DFT. The app also generates crystallographic positions for the heterostructure that could be used as input for subsequent calculations. JARVIS-WannierTB (as shown in Fig. 10b) can be used to solve Wannier Tight Binding Hamiltonians on arbitrary kpoints for 3D and 2D materials. Properties such as the band structure and the density of states can be predicted on the fly from this app. Additionally, many other apps are being developed, which are primarily based on the Flask python package 74 . The JARVIS-ODF (Orientation Distribution Function) library is under development, which aims to calculate volume-averaged (meso-level) material properties, including the elasto-plastic deformation behavior, using the property data available for single crystals in the JARVIS database.
Once generated, the JARVIS-ODF library will be capable of obtaining such material properties for all crystalline structures.

Accuracy and precision analysis
In simulations, accuracy refers to the degree of closeness between a calculated value and a reference value, which can be from an experiment or a high-fidelity theory. Precision refers to the degree of closeness between numerical approaches to solving a certain model, including the effect of convergence and other simulation parameters.
In JARVIS-DFT, the accuracy of the DFT data is obtained by comparing it to available experimental results. The accuracy of JARVIS-FF and JARVIS-ML, instead, is given with respect to DFT results. Note that the numbers of high-quality experimental measurements or high-fidelity calculations for a given property are often low. Therefore, the accuracy metrics we derive in our works are obtained only for the few cases we can directly compare, not for the entire dataset.
Below, we provide accuracy metrics for some material properties in the JARVIS-DFT, with respect to experiments. In addition to the scalar data, vector/continuous data, such as frequency dependent dielectric function and Scanning Tunneling Microscopy (STM) images, are compared to a handful of experimental data points as well. Details of individual properties can be found in Ref. 8,30,49,52,53,55,59,60 53 . We find high precision (more than 0.87) for all of the 2D-Bravais lattices. Precision analysis for regression tasks are still ongoing and will be available soon.

Future work
Given that the number of all possible materials 77 could be of the order of 10 100 , and furthermore existing materials properties can be computed at increasing levels of accuracy/cost, the JARVIS databases will always be incomplete. This represents an opportunity for JARVIS to be drastically expanded in the future. Future work will be aimed at addressing some of the limitations of the existing databases, and may include additions like defect/disorder properties, magnetic ordering, non-linear optoelectronics, more beyond-DFT calculations, temperature-dependent properties, integration with experiments, and more detailed uncertainty analysis. Moreover, new ML models and methods for data-prediction and uncertainty quantification will be developed for 'explainable AI' (XAI) and transfer-learning (TL)-based research. Other derived apps such as JARVIS-ODF, JARVIS-Beyond-DFT, JARVIS-GraphConv, and JARVIS-STM are also being developed. In addition to the technical aspects, the broader impact of the infrastructure will be to provide a research platform that will allow maximum participation of worldwide researchers. NIST-JARVIS currently hosts pre-computed data and would host on-the-fly calculation resources also. We believe the publicly available data and resources provided here will significantly accelerate futuristic materials-design in various areas of science and technology.

Methods
The entire study was managed, monitored, and analyzed using the modular workflow, which we have made available 54 on our JARVIS-Tools GitHub page (https://github.com/usnistgov/jarvis).
The DFT calculations are mainly carried out using the Vienna Ab-initio simulation package (VASP) 62,63 . We use the projected augmented wave method and OptB88vdW functional 56 , which gives accurate lattice parameters for both van der Waals (vdW) and non-vdW solids 30 . Both the internal atomic positions and the lattice constants are allowed to relax in spin-unrestricted calculations until the maximal residual Hellmann-Feynman forces on atoms are smaller than 0.001 eV Å -1 and energy-tolerance of 10 -7 eV. We do not consider magnetic orderings besides ferromagnetic yet, because of a high computational cost. We note that nuclear spins are not explicitly considered during the DFT calculations. The list of pseudopotentials used in this work is given on the GitHub page. The k-point mesh and plane-wave cut-off were converged for each material using the automated procedure described in Ref 28 . The elastic constants are calculated using the finite difference method with six finite symmetrically distinct distortions. The thermoelectric coefficients such as power factor and Seebeck coefficients are obtained with the BoltzTrap code with Constant Relaxation Time approximation (CRTA) 78 . Optoelectronic properties such as dielectric function and solar-cell efficiency are calculated using linear-optics methods mainly using OptB88vdW and TBmBJ. We also compared such data with HSE06 and G0W0. The piezoelectric, dielectric and phonon modes at Г-point are calculated using Density Functional Perturbation Theory (DFPT). Topological spillage for identifying topologically nontrivial materials is calculated by comparing DFT wave functions with/without SOC 7,26 . 2D exfoliation energies are calculated by comparing bulk and 2D monolayer energy per atom. The 2D heterostructure 31 behavior is predicted using Zur and Anderson methods. Wannier tight binding Hamiltonians are generated using the Wannier90 code 69 . 2D STM images are predicted using the Tersoff-Hamman method 53 .
Classical force-field calculations are carried out with the LAMMPS software package 65 . In our structure minimization calculations, we used 10 -10 eVÅ -1 for force convergence and 10000 maximum iterations. The geometric structure is minimized by expanding and contracting the simulation box with 'fix box/relax' command and adjusting atoms until they reach the force convergence criterion. These are commonly used computational set-up parameters. After structure optimization point vacancy defects are created using Wycoff-position data. Free surfaces for maximum miller indices up to 3 are generated. The defect structures were required to be at least 1.5 nm long in the x, y and z directions to avoid spurious self-interactions with the periodic images of the simulation cell. We enforce the surfaces to be at least 2.5 nm thick and with 2.5 nm vacuum in the simulation box. The 2.5 nm vacuum is used to ensure no self-interaction between slabs, and the slab-thickness is used to mimic an experimental surface of a bulk crystal. Using the energies of perfect bulk and surface structures, surface energies for a specific plane are calculated. We should point out that only unreconstructed surfaces without any surface-segregation effects are computed, as our high-throughput approach does not allow for taking into account specific, element dependent reconstructions yet. Phonon structures are generated mainly using the Phonopy package interface 79 .
Machine learning models are mainly trained using Scikit-learn 66

Code Availability
Python-language based codes with examples are available at JARVIS-Tools page: https://github.com/usnistgov/jarvis .