Evaluation and comparison of classical interatomic potentials through a user-friendly interactive web-interface

Classical empirical potentials/force-fields (FF) provide atomistic insights into material phenomena through molecular dynamics and Monte Carlo simulations. Despite their wide applicability, a systematic evaluation of materials properties using such potentials and, especially, an easy-to-use user-interface for their comparison is still lacking. To address this deficiency, we computed energetics and elastic properties of variety of materials such as metals and ceramics using a wide range of empirical potentials and compared them to density functional theory (DFT) as well as to experimental data, where available. The database currently consists of 3248 entries including energetics and elastic property calculations, and it is still increasing. We also include computational tools for convex-hull plots for DFT and FF calculations. The data covers 1471 materials and 116 force-fields. In addition, both the complete database and the software coding used in the process have been released for public use online (presently at http://www.ctcms.nist.gov/∼knc6/periodic.html) in a user-friendly way designed to enable further material design and discovery.

and output files for all the calculations as well, for the LAMMPS software, to further facilitate the user in running similar benchmarking tests.
The paper is organized as follows: first the methods for developing this project are discussed, followed by data records and the important key elements of the data. After that, technical validation of the data is discussed using examples of single element, binary and ternary systems. Finally, usage notes have been discussed.

Methods
The methodology behind the current work consisted of several steps: 1. Obtaining structural data: We started by downloading all the available potentials from the NIST interatomic potential repository (IPR) and from LAMMPS itself (15May2015 version). At present, there are 92 EAM, 9 Tersoff, 6 COMB, 5 ReaxFF, 2 AIREBO, 1 MEAM and 1 EIM potentials in the database. For each element having at least a potential, we downloaded all the corresponding crystal structures from the Materials Project (MP) database 30 . We also downloaded all the energetics and mechanical properties data from MP and stored them in a separate database for a later comparison with the classical results (LAMMPS calculations at T = 0 K). More specifically, the download of the structures from MP was done using the Restful Application Program Interface (MP REST-API) 25,31 , while the python library URLLIB was used for downloading the potentials from IPR. A 'from_ctcms. py' script was used in this process and it is distributed in the github page of this project.
2. Setting High-throughput framework: The high-throughput setting of LAMMPS jobs was done using the MPinterfaces code 32 . Originally the part of the MPinterfaces code designed to work with LAMMPS only prepared scripts to calculate the system energy, but later we added the functionality of accepting generic scripts, such as those to compute elastic constants. Using such a high-throughput framework we calculated the 21 distinct elastic constants 33 using the script provided in the LAMMPS distribution version 15May2015. It is to be noted that the above script is one of the many other methods to calculate elastic constants and comparison and integration of other scripts with the one used here is subject to future work. Such a script allows to computation of elastic constant and energetics for any available empirical potential type and any crystal structure 34,35 . In our runs we used 10 − 06 as strain, 10 − 10 eV/Å for force convergence during the minimization to optimize the structure and 1000 maximum iteration for structure optimization. These are generalized computational set-up parameters, and the energetics and elastic constant data may or may not depend on them. We tested strain parameters for a range of values (10 − 04 , 10 − 06 and 10 − 08 ) but obviously evaluating such set of parameters for all the calculations was too extensive a work and was not carried out here. The data for different strains (10 − 04 , 10 − 06 and 10 − 08 ) for EAM potentials is given in the Supplementary Information as an example of testing of the script. For each structure and force-field, the code automatically generated the input structure file and input controlling parameters for energetics and elastic constant calculations. The relaxed structure was also stored along with the above files for later use such as for performing defect, phonon or other similar calculations. In some cases, (about 20 out of 3248) the jobs stopped running because of computer-cluster operation problems, so we manually reran them. Our current set up/scripts do not allow for checking of errors and resubmitting calculations automatically. Such an extension of the software is currently under work and will be part of a future project extension. Using the energetics information for each material, convex hull plot were drawn using the package provided in pymatgen 31 (script named 'PDPlotter.py'). All the input and output data for the calculations were saved in zipped and JSON format files for data accessibility and reproducibility. The JSON files are stored in a MongoDB database, which we found to be more advantageous over Structured Query Language (SQL) databases 36 . In MongoDB, the data is the schema and there is symmetry between the way data goes into the database and the way it comes out. A 'ctcms_alloy.py' script was used for all the 'alloy' class of force-fields. Similar ones were used for other force-fields. The scripts are provided on the github page.
system, while clicking on 'forcefield' tab downloads the potential used in the calculation. Each zipped directory obtained by clicking on the 'Calcs' tab consists of following files: 'data' (the input structure coordinates and dimensions), 'log.lammps' (the file containing LAMMPS output), 'restart.equil' (the input file for restarting LAMMPS calculation if needed), 'in.elastic' (the input file for LAMMPS). The file 'in.elastic' is obtained by combining the three sub-files 'init.mod' (initializing calculation setup), 'potential.mod' (setting up of potential format and file), and 'displace.mod' (for taking into account all the displacements). To combine above-mentioned scripts 'inelast.mod' is used. These files were obtained from the LAMMPS source-code distribution. The 'init.mod' and the 'potential.mod' files were modified for specific structures and force-fields by the MPInterfaces code automatically while  All the data we computed is also available in JavaScript Object Notation (JSON) 37 format, to facilitate a command interface for downloading the data instead of just a web-based interface.
A flow-chart for the process we followed to compute and display all our data is given in Fig. 2.

Code availability
The code is publicly available at the github link (https://github.com/JARVIS-Unifies/JARVIS-FF) and it is written in the python language. The code can be schematically divided in three major parts: 1) the assimilation of structures from the MP website 25 , and force-fields from the IPR website 3 , 2) the configuring of the high throughput calculation framework, and 3) the post-processing of the data.
Step 2) heavily uses the MPinterfaces code 25 to convert the DFT-format structure files into LAMMPS format, and to automatically prepare the input files for LAMMPS based on the specific structures at hand 38 . The Post-processing (step 3) is performed using normal file viewing or querying a JSON file. The work is under General Public License (GPL). The convex hull plot was obtained with the pymatgen distribution which uses composition and energy of the system. The script for calculating the convex hull using materials project is given at the github link. The reference DFT bulk modulus data was obtained from Materials Project elastic calculation data 33 . The data is available in JSON format at the github link. A back-up of the data is also kept at NIST's materialsdata webpage https://materialsdata.nist.gov/dspace/ xmlui/handle/11256/702. We compared the bulk modulus from LAMMPS to DFT wherever applicable. The script 'get_entries_from_json.py' can be used to plot the convex hull data. The input to the script is mainly the composition, e.g., Cu-H-O and the force-field, e.g., 'ffield.CuOCH.comb3'.

Data Records
All data computed in this work can be found at the github link https://github.com/JARVIS-Unifies/ JARVIS-FF. The complete data set can also be downloaded in a JSON from Dryad Digital Repository (Data Citation 1). Key variables for the JSON file are shown in Table 1. They include energy per atom, elastic constants, associated structures and force-fields. We included energy/atom, instead of cohesive energy, to match the quantities provided in MP. For many potentials, however, the two energies coincide because of original potential fitting choices. Energy per atom values can be post-processed to calculate the cohesive energy and heat of formation of the system with respect to their corresponding reference state. The data is available in three forms: the JSON file, the html page (http://www.ctcms.nist.gov/~knc6/ periodic.html) and the REST API services (available soon). The script for plotting the convex hull and fetching energetic/elastic constant information is provided on the github webpage mentioned above. The Pymatgen code is used for plotting the data. The convex hull plot uses the 'composition' and 'totenergy' parameter of the above JSON file for a particular system such as Ni-Al. The convex-hull 39 plot hence obtained can be compared with data obtained from MP. More on this will be discussed in the Technical Evaluation section.

Technical Validation
In this section, the reproducibility of the data in our database is discussed. We also demonstrate how to use these data to determine the range of applicability of the investigated force-field/potentials. All the data in the database is intended to be reproduced very easily, as they were obtained by running the publicly available code LAMMPS and all the used input files are available for download from the zipped folders stored on the web for each calculation. Each folder contains the input controlling parameters, the structure file and the corresponding force-field. Combining the 3 files along with 'displace.mod' and 'inelast.mod' (provided on the github page and taken form LAMMPS source distribution) produces the input file LAMMPS needs to run the exact same calculation. These were obtained from LAMMPS software and are available at the github link https://github.com/JARVIS-Unifies/JARVIS-FF. A database of force-field evaluation with different version of LAMMPS as well as force-field can enable version control of the entire data.
To validate the FF applicability, the elastic constants and cohesive energy data of each structure are compared with those obtained using DFT, if available. These DFT data were taken from the Materials Project (MP) and were reported to be within ±15% of the experiments 33 . We found that given the computational set up parameters we used, many FFs produce elastic constants that differ significantly from their experimental values. This was not unexpected, as many FFs are not designed to reproduce mechanical deformations. Before giving detailed analysis of force-field data, a brief overview of FF types can be useful.
Embedded atom method (EAM) is one of most accurate predicting FFs for metals and alloys with relatively inexpensive calculations. In EAM potentials basic mechanical properties are always included in the fitting, making these potentials well suited for simulating mechanical deformations. As an example, Al EAM potential (Mishin-NiAl-2009) 40 produces bulk modulus, C 11 , and cohesive energy of 78.9 GPa, 113.8 GPa and −3.36 eV/atom respectively. The bulk modulus, C 11 and cohesive energy (E c ) from DFT are 83.28 GPa, 103.93 GPa and −3.78 eV/atom, while the experimental values are 76 GPa, 113 GPa, and −3.360 eV/atom respectively. Hence, the classical mechanical properties data here remains within 5% of the experimental or DFT values. It is to be noted that DFT-single atom energy is typically strongly negative, but it is close to zero for most of the force-fields. Modified embedded atom method (MEAM) potential 41 was developed to include directional bond characteristics to EAM potential, therefore allowing description of certain non-metallic systems. MEAM potentials such for SiC (P6 3 mc) has C 11 , Voigt-bulk modulus (B v ) and cohesive energy (E c ) as 211.  While all the above-mentioned force-fields do not contain charge information, ReaxFF force-field contains dynamic charge based on immediate environment. It is known to perform well for molecules as well as periodic structures. Fe Im3m in ReaxFF 44,45 has B v , C 11 , as 173.1 GPa, 282.8 GPa and. while the DFT values are 182.46 GPa, 247.06 GPa respectively. The ReaxFF cohesive energy (E c ) for Fe is −4.44 eV/ atom, and the corresponding experimental value is − 4.26 eV. In this case, MP only provides the energy per atom, not the cohesive energy, so a direct comparison is not possible. However other DFT evaluation of the iron cohesive energy validate the ReaxFF result, as in the case of the DFT work by Philipsen et al. 46 where a value of −4.36 eV is reported for E c . Similar to ReaxFF, COMB potential is dynamic charge and bond-order based potential. It is known to work well for metallic, ionic and polymeric systems. Cuprous oxide in COMB 47,48 has B v , C 11 and cohesive energy (E c ) as 102 GPa, 114.1 GPa and −3.65 eV/atom while the DFT values are 111.5 GPa,124.16 GPa and −4.79 eV/atom respectively. Again, it is to be noted these values can be sensitive to the initial calculation set-up and the target property/data for which the forcefields were fit. Now, we give some specific examples of FF data and the analysis of the force-field data as follows:

1) Single element: Al
Clicking on Al (or entering Al in search box) produces a list of results, one row for each of the 25 different force-fields available for Al. As the Material Project only has data for Al in face-centre cubic (FCC) structure (mp-134, space group Fm3m), we also only list results for this one structure for different FFs. The energy per atom for all the potentials is around −3.36 eV/atom, with the exception of those of Al_zhou.eam 49 (−3.58 eV/atom), Farkas_Nb-Ti-Al 50 (−3.21 eV/atom), Al1.eam.fs (−3.41 eV/atom), Al90Sm10_Mendelev (−3.91 eV/atom), Al_wkg_MSME (−2.65 eV/atom). The names of the force fields are given here as they appear on the JARVIS-FF web-page, where the complete reference for each potential can be found. While it is beyond the scope of this work to analyse the reasons behind these outliers, we want to point out that energy differences are more important than the absolute energy/atom value, and, therefore, the variability of the above results may not have significant repercussions in the modelling of physical phenomena. However, the differences found in the elastic constant values among the various potential will have consequences when modelling mechanical deformations. Therefore, any user interested in modelling such phenomena should pay careful attention to the choice of force field. For instance, the C 12 of Al is 61.6 GPa, while it is 123.6 GPa for Farkas_Nb-Ti-Al potentials using the abovementioned calculation set-up parameters. With respect to bulk modulus values, all potentials for Al except for the Farkas-Al 50 are within 2% of the experimental value (which is 76 GPa). This is to be noted that this potential was not mainly developed primarily for mechanical properties, so our web-interface can be used as an easy-look up table. Lastly, as there is only one structure for Al, the energy above hull is zero i.e., the structure in on the convex hull.

2) Binary system: Ni-Al
As 'Ni-Al' is entered in search box or, alternatively, 'Click Al +click Ni' is selected, the webpage returns 49 rows of results because the Materials Projects has seven structures for the Ni-Al binary alloy and we found seven potentials for such compounds. Out of these seven, six compounds were stable or on convex hull ('Energy_hull (DFT)' (energy above hull from materials project's DFT data) was 0.0 on webpage) The output is sorted based on formulae of structures to facilitate comparison of properties. As an example, results for Ni 3 Al, one of the Ni-Al primary phases (energy above hull 0.0), are found under the calculation id 'Calc-178'. Clicking on this box downloads all the input and output files for calculation. Clicking on 'mp-2593' (the corresponding MP id) redirects to the MP data where the DFT energy/atom (−5.7024 eV) and elastic constant matrix can be found. This way the comparison between DFT and forcefield data is effortless and immediate. In addition, the structure and Inorganic Crystal Structure Database (ICSD) data can be visualized in the materials-project webpage as well. The ICSD 51 tag is an important link that can later be used to compare to other data resources such as 'The Open Quantum Materials Database' (OQMD) 52 and 'Automatic Flow for Materials Discovery' (AFLOW) 53 , which is planned for future work. The ICSD tags are available at materials project webpages of the structures. It is to be noted that the DFT energy/atom was much lower than the potential data (−4.63 eV/atom), however the experimental value for this cohesive energy in − 4.620 eV/atom 40 , that nicely validates the result. Also, these potentials are generally fit to experimental data. Similar searches can be made for ternary system as well. Next, convex-hull plot is used to know the phase stability of materials using their composition and energetics information. All the calculations done here have the information to generate a convex hull plot for DFT as well as force-fields. The data can be accessed using the script given in the github page. An example for Ni-Al system is given in Fig. 3. Here, the main low energy phases are similar for force-fields compared to DFT but high-energy structures (which are difficult to fit in force-fields) are not on convexhull plot for force-fields. Again, this is due to the fact that the potential was not fitted for the energetics of the high energy phases. To reproduce the DFT convex hull plot, the potential would need to be parametrized. However, this was not the goal when the potential was made.

3) Ternary system (Fe-O-H, Cu-O-H):
Complex force-fields such as ReaxFF 44,45 and COMB potentials 47,48 are used for investigating heterogeneous systems. In these cases, it is generally common that the DFT convex hull plot and    Here, only stable structures are shown for clarity. We notice that ReaxFF shows FeHO2, Fe 21 HO 32 , FeO 2 are stable in force-field but not in DFT. Similarly, Cu (HO)2 and CuH are stable in COMB but not in DFT. This predicts that an MD simulation may predict different system behaviour than DFT-MD.
Next, the FF and DFT Voigt-bulk modulus data were compared as shown in Fig. 6. The x-axis shows the bulk modulus for materials from DFT calculation obtained through materials project, while the y-axis shows the bulk modulus for corresponding materials using the force-field calculations. It is to be noted that potentials are generally fit to lattice constants, elastic constants of particular target materials and may not produce reasonable results for structures other than the ones they are fit to. For example, if a potential . The x-axis shows the bulk modulus for materials (for which FFs were available to us) from DFT calculation obtained through materials project, while the y-axis shows the bulk modulus for corresponding materials using the force-field calculations. The figure shows bulk moduli that correspond to structures that are common to JARVIS-FF and the Materials-project database. Because of this reason, out of 3248 entries in JARVIS-FF, only 408 entries are plotted. is fit to simulate Al-Cu binary system only, then it might not give very accurate description for elemental Al and Cu. In the Fig. 6 the slope of the curve is close to one for about 70% of the calculations (based on available DFT data) indicating many of the potentials were probably fitted to elastic constants.

4) Data analytics:
Data analytics tools provide the opportunity to determine correlations in data sets that can illuminate the accuracy of predictions and the flexibility of underlying models. To investigate if there are any trends in the elastic properties predicted by the various potentials, we perform a principal-component analysis on the relative errors of the predicted elastic constants compared to experimental data 7 for the four FCC elements with the largest number of available potentials, i.e., Al, Cu, Ni, and Ag. In our analysis, we removed outliers with errors larger than 50% in c 11 and c 12 and larger than 100% in c 44 , as these indicate that the potentials were likely not optimized for the elastic properties. Figure 7 shows the relative errors of the elastic coefficients of the four elements and illustrates the larger relative errors for c 44 , compared to c 11 and c 12 . Figure 8a,b show the results of the principal component analysis, Fig. 8a shows the variance that is predicted by each of the three principal component eigenvectors and Fig. 8b the data projected onto the first two principal component eigenvectors. The results illustrate that most of the variance is explained by the first eigenvector (0.33, 0.31, 0.89) where the three components correspond to c 11 , c 12 , and c 44 . Thus, the principal component analysis for this subset of empirical potentials predicts a relative error in c 44 of about 2.5 times the sum of the errors in c 11 and c 12 . This example illustrates the usefulness of comprehensive datasets for predictions of fundamental materials parameters by empirical potentials and other models to aid in the selection and the development of empirical potentials.
The above results emphasize the need for judgement in the use of empirical potentials and motivates the design of better empirical potentials. Also, it is to be noted that the DFT databases have been recently made available, hence researchers have now access to data in order to compare their testing and development of potentials with the easy-accessible DFT results 54 .

Usage Notes
The database presented here represents the to-date largest collection of consistently calculated properties of materials using interatomic potentials. We anticipate that this dataset, and the methods provided for accessing it, will provide a useful tool in fundamental and application-related studies of materials especially metallic and ionic systems for material design. Based on the list of potentials, user will be able to choose potential for their particular applications. Future applications will include the implementation of similar evaluation technique to polymer/bio-materials and on the fly calculation of properties on web by uploading or specifying particular structure. Data mining, data analytics, and machine learning tools then can be added to guide screening of materials.