Efficient prediction of temperature-dependent elastic and mechanical properties of 2D materials

An efficient automated toolkit for predicting the mechanical properties of materials can accelerate new materials design and discovery; this process often involves screening large configurational space in high-throughput calculations. Herein, we present the ElasTool toolkit for these applications. In particular, we use the ElasTool to study diversity of 2D materials and heterostructures including their temperature-dependent mechanical properties, and developed a machine learning algorithm for exploring predicted properties.

The main input file for the ElasTool toolkit is the file named elastool.in; the key parameters are given in Table S1 and described below. We have also provided a sample elastool.in file for a typical 2D material in Input 1. The main input parameters in a typical elastool.in are described as follow.
(a) run mode controls the type or level of calculations we want to perform. run mode = 1 is for automatic run, i.e., the calculations starts from reading the crystal information and finishes by outputting the elastic tensor and related mechanical properties; run mode = 2 should be used if the structure has been optimized at fixed volume or pressure with the subdirectory OPT containing the CONTCAR and the OUTCAR file; and run mode = 3 is a postprocessing option that enables recalculation of the elastic parameters. The subfolder STRESS that contains the various calculations at various strains must exist. (b) dimensional determines the dimension of the crystal system. Currently, ElasTool can perform elastic constant calculations for both 2D and 3D materials. (c) structure f ile supplies the crystal structure information of the material in either the standard VASP POSCAR file or CIF with the extensions ".vasp" and ".cif", respectively. (d) method stress statistics sets either zerotemperature (denoted as static) or finitetemperature (denoted with dynamic) calculations. (e) strains matrix enables choosing the method to be used for computing the stresses. The highefficiency strain-matrix sets (OHESS) is our most efficient method. (f) strains list is used to specify the strain values for the calculations, e.g., for a static calculations, one can use -0.06 -0.03 0.03 0.06. For finite-temperature calculations, because of the computational cost, the strain list can be -0.03 0.03. (g) repeat num is used to specify the size of the supercell, e.g., 3 × 3 × 1. This flag is only used for finitetemperature calculations. It is a dummy variable for zero temperature calculations. (h) num last samples is used to specify the the number of last molecular dynamics (MD) steps to average thermal stresses. A good number is 500 for a 1000 MD steps. This flag is dormant for zero temperature calculations. (i) parallel submit command is used to specify how the VASP executable is called. This can take serial as well as parallel calls to the VASP code.  An auxiliary input file named INCARs are provided as well. This file supplies all the necessary VASP-specific input parameters and flags need to optimize the structure and compute the stresses with either density functional theory for zero-temperature (Input 2) and ab initio molecular dynamics for finite-temperature calculations (Input 3). The given INCARs are for the calculation of a typical 2D material with two atoms per unit cell, e.g., MoS 2 . The flowchart of ElasTool package is presented in Fig.S1. As explained above, the main input file for the ElasTool toolkit is supplied through the elastool.in file. Crystal structure information is read by ElasTool either in the standard VASP format or in the Crystal Information Format (CIF). The crystal structure is then optimized at fixed pressure or volume using VASP as the calculator 1 . After the structural optimization, based on the strain values supplied by the user in the input file, ElasTool applies different deformation matrices based on the symmetry and further optimizes all the atomic positions based at each specified strain value. If it is a zero-temperature calculations, ElasTool computes all the stress tensors corresponding to each deformation matrix. If it is finite-temperature elastic tensor calculations, ElasTool averages the thermal stresses based on the number of last MD steps specified by the user. To obtain the elastic tensor, ElasTool fits first-order function to the collected stress-strain data. Further data visualization of the mechanical properties can be performed by supplying the computed elastic tensor to codes such as EIAM, 2 ELATE, 3 , etc.
In Tables S3 and S4, we present the elastic and mechanical properties for several 2D materials and their heterostructures computed at zero-temperature and a temperature of 300 K, respectively. The database will be FIG. S1. The flowchart of ElasTool for computing the elastic and mechanical properties of materials. By using external opensource codes such as the EIAM, 2 ELATE, 3 etc., one can further visualize the mechanical properties of materials.

MACHINE LEARNING ALGORITHM
We present the calculated elastic and mechanical properties of several 2D materials and heterostructures at zero-temperature (Table S3) and a temperature of 300 K (Table S4). Data for 600 K are also computed and available in the manuscript GitHub website. Our database currently contains four hundred and eight (408) 2D materials and six thousand six hundred and fifty four (6654) 2D-based heterostructures.
To enable us to gain a deeper understanding of the correlations between the computed mechanical properties and develop a model framework for exploring and exploiting the predicted properties, we employ machine learning (ML) models to predict the lattice constant as the target. Several multilinear regression models were explored, including the boosting models, XGBoost and LightGBM. To make the developed ML model tractable, we focus on those materials with equal lattice constants for a and b. To develop the ML model, we chose the lattice constant (a) as the target. The initial ML-model is developed with T emp (i.e., temperature), SG (i.e., space group), C 11 , C 12 , and c (i.e., vacuum size) as the features. Unlike several other calculations where the vacuum size is normally fixed, we self-consistently compute the vacuum size for each material. Then, we employ their pair-plot correlations (see Figures S2) to determine their level of correlation.
Before training and testing our ML model, we set aside 20% each for 2D materials and the heterostructures as unseen data for validation of the developed ML model. In general, an MLR can be written as where Y is the target, X j are the features of the model, and the intercept β o and the coefficients β j are the model parameters. To begin with, we fit, excluding the data set aside as an unseen sample to the MLR models with default parameters. We initially trained the data using standard multilinear regression (MLR) model to enable us to establish a baseline model; this led to an R 2 score of 0.60 and 0.74 for the 2D materials and 2D-based heterostructures, respectively (see the manuscript code in GitHub for details). To avoid any potential overfitting (or underfitting) we employed a cross-validation (CV) test (also known as out-of-sample testing or rotation estimation); we used the ShuffleSplit CV approach, which is particularly more appropriate for our case because it iteratively and randomly samples data instead of partitioning. In general, the CV will lead to the reduction of the accuracy scores, but will give a better description of the out-of-sample performance.
A more appropriate picture of ML-trained models can be obtained by hyperparameter tuning on top of the cross-validation process. The hyperparameter tuning was carried out using the grid search technique to obtain the optimal hyperparameter combinations for our various ML models based on the CV test. For example, for the XGBoost, we have tuned the number of weak learners, max depth, colsample-bytree, and the learning rate. Finally, the optimal hyperparameters were used to obtain our final ML models. Overall, the boosting ML models showed the best performance with training R 2 close to 1.0 and out-of-sample R 2 and CV accuracy approaching 0.90 for both 2D materials and their heterostructures (see Tables S2). Particularly, the XGBoost had the best combination of the accuracy score and error metrics; it has been used in the description of the results presented in the main text. A Python script providing a detailed step-by-step guide and other information on the implementation of the ML models are provided in a GitHub repository at https://github.com/gmp007/2D_ Elastic-Properties and the obtained accuracy scores and error metrics are presented in Table S2 for 2D materials and 2D-based heterostructures.

Feature Importance
Presented in Figure S4 is the distribution of the feature importance in our best ML model -XGBoost for both 2D materials and 2D-based heterostructures. In both cases, all the features play a significant role in the model prediction. However, the level of importance is switched for 2D materials and 2D-based heterostructures. In the 2D materials, our ML model identifies C 11 as the most The performance of a machine learning model in production is an essential step in developing a robust algorithm that is capable of predicting target to within acceptable accuracy. We have performed a rigorous outof-sample analysis to determine the accuracy of the developed ML model when applied to unseen data, i.e., the dataset that was not part of both training and testing samples. As explained in the code documentation, before training our model, we set aside ∼20% of the data for both 2D materials and 2D-based heterostructures for additional model validation as unseen data (see Figure 3 in the main text and the manuscript code in GitHub). For both the 2D materials and the heterostructures, the performance of our ML model is at the same level as that from our cross-validation analysis. Additionally, for the 2D materials, we have obtained from the current literature about 12 materials that meet the same standard as our data, i.e., reliable vacuum size and isotropic crystal lattice (lattice constants a and b are the same) to further validate the model (see Figure S5 and additional details in the manuscript code in GitHub). Again, the level of accuracy remained very high with an R 2 score that is basically the same as our CV score. S3: Calculated lattice constants a(b), mechanical, and elastic properties of 2D materials at zero pressure and temperature. The lattice constants are inÅ, the in-plane stiffness K (i.e., the 2D equivalent of bulk modulus), the shear modulus G, the 2D Young's modulus Y 2D , and the elastic constant tensor Cij are in N/m; ν is the Poisson ratio; and V l and Vt are the longitudinal and shear sound velocity in km/s, respectively. The superscript ( † ) indicated auxetic materials, i.e., negative Poisson ratio, which implies anti-rubber behavior. Note for isotropic 2D materials, C66 = (Cii − Cij)/2 , where i = 1, 2 and has been omitted in the table. Superscript ♣ denotes an unstable structure.   In both 2D and heterostructures, the accuracy (R 2 ) score is basically the same. Note the R 2 value in the training data is practically unity.
Continued on next page    S5. The relation between some lattice constants from literature (see the code for details) and those predicted with our machine learning model for 2D materials. This further validates the high accuracy of our ML model when applied to unknown (independent) data.                  TABLE S4: Calculated mechanical and elastic properties of 2D materials at zero pressure and temperature of 300 K. The in-plane stiffness K (i.e., the 2D equivalent of bulk modulus), the shear modulus G, the 2D Young's modulus Y 2D , and the elastic constant tensor Cij are in N/m; ν is the Poisson ratio; and V l and Vt are the longitudinal and shear sound velocity in km/s, respectively. The superscript ( † ) indicated auxetic materials, i.e., negative Poisson ratio, which implies anti-rubber behavior.Note for isotropic 2D materials, C66 = (Cii − Cij)/2 , where i = 1, 2 and has been omitted in the table. Superscript ♣ denotes an unstable structure.