Vickers hardness prediction from machine learning methods

The search for new superhard materials is of great interest for extreme industrial applications. However, the theoretical prediction of hardness is still a challenge for the scientific community, given the difficulty of modeling plastic behavior of solids. Different hardness models have been proposed over the years. Still, they are either too complicated to use, inaccurate when extrapolating to a wide variety of solids or require coding knowledge. In this investigation, we built a successful machine learning model that implements Gradient Boosting Regressor (GBR) to predict hardness and uses the mechanical properties of a solid (bulk modulus, shear modulus, Young’s modulus, and Poisson’s ratio) as input variables. The model was trained with an experimental Vickers hardness database of 143 materials, assuring various kinds of compounds. The input properties were calculated from the theoretical elastic tensor. The Materials Project’s database was explored to search for new superhard materials, and our results are in good agreement with the experimental data available. Other alternative models to compute hardness from mechanical properties are also discussed in this work. Our results are available in a free-access easy to use online application to be further used in future studies of new materials at www.hardnesscalculator.com.

Hardness is a measure of the resistance of a material to localized plastic deformation. Over the years, several hardness-testing techniques (like Brinell, Vickers, Knoop and Rockwell) have been developed, and each one has its own scale. However, the basic principle to measure hardness is to force an indenter into the surface to be tested under controlled load conditions. The larger the indentation, the softer the material. The depth and size of the indentation are then converted into a hardness number. In this work we will focus on Vickers hardness, which is one of the most popular techniques given that it is experimentally easy to calculate and can be used for all materials regardless of hardness. Vickers hardness test uses a very small diamond indenter with a pyramidal geometry that has an angle of 136 • between the plane faces of the indenter tip. The Vickers hardness measurement is determined by the following ratio: where F is the applied force (kgf) and d is the average length of the diagonal left by the indenter (mm).
The search for new materials with superior hardness has generated considerable interest in the scientific community for many years [1][2][3] . These materials are needed in extreme industrial applications, such as hard cutting tools, abrasion, and wear-resistant coatings. Traditionally, diamond, titanium nitride, and cubic boron nitride (c-BN) are the preferred materials for these applications. However, they have limitations due to the difference in the chemical bonding character and chemical reactivity. For example, diamond reacts with iron, and the synthesis process of the first two materials requires high-pressure and high-temperature conditions making them costly 4 .
First principle methods have demonstrated to be viable for predicting many physical properties of materials. Among many existing techniques, density functional theory (DFT) stands out for its practical and helpful approach to solving condensed matter systems. DFT has become a primary tool for calculating crystal structures and elastic properties of a wide range of materials with remarkable success when comparing the results to experiment 5 . However, predicting hardness from ab initio calculations is not a trivial task. Hardness is a measure of the resistance of a solid to plastic deformation 6 . Despite its success in calculating elastic properties, DFT cannot predict a solid's plastic behavior directly.
In recent years, correlations between the elastic properties and the plastic behavior of materials have been established to evaluate the hardness from a theoretical approach 4,7,8 . A hard material will exhibit a slight indentation. The observed shape can be correlated to the elastic response a hard material should have: be incompressible www.nature.com/scientificreports/ (high bulk modulus), not deform in a direction different from the applied load (high shear modulus), and not distort plastically (strong directional bonds that prevent the creation and motion of dislocations) 4 . The Poisson's ratio relates the bulk modulus and the shear modulus. A high shear modulus requires a high bulk modulus and a small Poisson's ratio. A low value for the Poisson's ratio results from directional bonds in the crystal 4,8 . For example, the Poisson's ratio for diamond is 0.07, 0.1 for a typical covalent material, and 0.3 for an ionic one 8 . On the other hand, the resistance of a material to plastic deformation depends on the chemical environment of the crystal; a material with short covalent bonds will minimize the activation and mobility of dislocations enhancing the hardness. Thus, covalent materials are generally harder than ionic or metallic 4 . Given the complexity of the problem, there is no universal method that predicts hardness accurately from previously known properties of a material. With these ideas in mind, several semi-empirical relations between hardness and elastic properties of materials have been proposed over the years 7,[9][10][11][12] . Usually, these correlations reasonably agree with the experiment for a specific set of materials, but they would not hold when extrapolating to a wide variety of solids.
In this investigation, we proposed various models to compute hardness using the mechanical properties of a solid. The mechanical properties (bulk modulus, shear modulus, Young's modulus, and Poisson's ratio) were obtained from the theoretical elastic tensor. As shown in Fig. 1, we used two approaches: classic and machine learning (ML).
In the classic approach we studied the six different macroscopic relations for hardness nicely presented by Ivanovskii in Ref. 13 , listed in Eqs. (2)-(7), with a database of more than 140 materials. These relations depend solely on mechanical properties. We calculated the Vickers hardness ( H v ) using the six relations and compared the results with the experiment to evaluate which method is more suitable for each material kind. We observed the correlation between the six different hardness relations and some physical properties of solids (crystal system, bandgap, and density). From this approach, we developed The Classic Calculator, a selection model of the best relation to compute hardness based on simple properties of a solid.
Given the exponential growth in computing power and the development of highly efficient algorithms, machine learning is used today to solve numerous kinds of problems 14 . In the second part of this study, we built a successful machine learning regression model (GBR) to predict the value of hardness directly using the mechanical properties of a solid as input variables. This model demonstrated the highest predicting power among all proposed models in this work. However, given that many scientists use machine learning with hesitation, we also created a classification ML model (GBC) that predicts the best relation to compute hardness with the same data and input variables. This method allows users to select the best relation to compute hardness using the robustness of modern ML algorithms without losing track of the physics behind the calculation. Both ML models, GBR and GBC, are referred to as The Machine Learning Calculator in this work.
Both, classic and ML schemes, are discussed, compared to each other, and used successfully to predict new hard and superhard materials. In general, The Machine Learning Calculator has proven to be more accurate than The Classic Calculator. However, both schemes have demonstrated superior predicting power. The most accurate model was proven to be the machine learning GBR, followed by GBC, and the classic model that uses crystal system and density simultaneously.
This investigation aims to provide valuable tools for the theoretical prediction of hardness. The Hardness Calculator, which includes classic and ML predictors, is presented in a free access online application for users to discriminate between the different available results. We believe The Hardness Calculator stands out among other methods proposed in the past because: (1) it can be used for a wide variety of solids, (2) it's easy to use, (3) it is available for everyone as a free-access website that does not require any coding knowledge, (4) and it provides different hardness models simultaneously. Even though GBR is the recommended model in this work, users have the option to consider GBC or any of the classic calculators instead.

Methods
For most of the database, the elastic tensor was extracted from the Materials Project's database 15 , while for a few materials (18), it was calculated using first principles. The latter materials were added to the database to ensure a wide variety of materials for the study. The subsequent elastic properties: bulk modulus (B), shear modulus (G), Young's modulus (Y), and Poisson's ratio ( ν ) were calculated using the MechElastic package 16 . The detailed www.nature.com/scientificreports/ database used in this investigation, including the experimental hardness and the mechanical properties, is presented in the supplemental information.
The first-principles calculations were performed within the framework of DFT 17 . The exchange and correlation effects were treated using the Generalized Gradient Approximation (GGA) with the parameterization of Perdew-Burke-Ernzerhof (PBE) 18 . The valence electrons wave functions were described by the projector augmented-wave method (PAW) 19 . The cutoff energy and the gamma-centered k-point mesh 20 were converged in each case to assure a maximum error of 1 meV/atom. The self-consistent electronic loop was set to a maximum total energy difference of 10 −6 eV. The calculations were performed using the Vienna Ab initio Simulation Package (VASP) 21-24 . Semi-empirical relations for hardness. For each material, the Vickers hardness was estimated using the following six different semi-empirical relations: Each result was compared to the experimental value in order to determine the absolute error in each calculation. The absolute error was defined as the absolute value of the difference between the experimental ( H exp ) and the predicted ( H pred ) Vickers hardness as shown in the following equation. GPa and H 5 = 93.0 GPa. As observed, some relations work better than others. The absolute error (Eq. 8) reveals the accuracy of each relation when predicting hardness of a given material. For the case of diamond, the best relation to estimate hardness is H 5 because it exhibits the lowest absolute error (3.0 GPa).
To determine which hardness calculation method is more suitable for each type of material, they were classified by crystal system, electronic bandgap ( E ), and density ( ρ ). According to the bandgap, materials were defined as insulators ( �E > 2eV ), semiconductors ( �E < 2eV ) and metals ( E = 0 ). Additionally, the compounds were arranged by low ( ρ < 4 g/cm 3 ), medium (4 g/cm 3 ≤ ρ ≤ 9 g/cm 3 ) and high density ( ρ > 9 g/ cm 3 ). Each of these models was analyzed and compared to each other to establish which is more effective in minimizing the mean absolute error (MAE) in the hardness calculation. The MAE is defined in Equation 9, where N is the number of samples.
Further correlations, including two variables simultaneously (Crystal System + Bandgap, Crystal System + Density, and Bandgap + Density), were also studied.

Machine learning.
To find a methodology that predicts the hardness based on different elastic properties, we have used diverse supervised learners, where hardness is the expected output, and the user needs to provide the mechanical properties of a solid (B, G, Y, ν ) as input variables. There are two types of supervised learning techniques: classification and regression. In this study, the classification algorithms target the best hardness calculation relation ( H 1a , H 1b , H 2 , H 3 , H 4 , or H 5 ), while the regression algorithms aim to predict the value of hardness directly. Therefore, to generate and compare different algorithms, the created experimental database of 143 materials was split into train and test sets, where the train set has 80% of the data, and the test set the remaining 20%. This approach is essential to have an out-of-sample accuracy.
Classification. Supervised learning classification algorithms such as K-Nearest Neighbors (KNN), Decision Trees (DT), Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), AdaBoost (ADA), www.nature.com/scientificreports/ and Gradient Boosting Classifier (GBC) were used to generate algorithms capable of predicting the best hardness calculation relation given the mechanical properties of a material (B, G, Y, and ν ) as an input 25 . KNN finds the k closest training examples (k is the number of nearest neighbors) and assigns the new object with the most common class among its k nearest neighbors. DT is an algorithm that splits the data according to certain parameters, in this case the mechanical properties. LR works with the probability of an object belonging to a certain class. SVM is an algorithm that classifies cases by finding a separator or a boundary. RF is built by a multitude of decision trees, and the output is the class selected by most trees. ADA is built by a multitude of weak learners each one with a different weight, and the output is the class that gets the most points in the weighted sum. Gradient boosting (GBC for classification tasks) is an ensemble of decision trees that are built subsequently based on the errors of the previous tree. All trees have equal saying in the final output.
The KNN algorithm was optimized for a k-parameter of three neighbors. The DT classifier was defined for a maximum tree depth of three. The inverse of regularization strength for LR was set to 0.01, and the solver liblinear was used given it is the best for small datasets. The SVM was trained with the Radial Basis Function kernel. The RF was built with a maximum tree depth of two and a random seed of zero. The ADA classifier was set with a maximum number of estimators equal to 100 and a zero random seed. The GBC was parameterized with 100 estimators, a maximum depth of the individual regression estimators of 1, a learning rate of 0.6, and a random seed of zero. The rest of the parameters have default values in all cases.
The different classifiers were compared using out-of-sample accuracy and Jaccard index. These metrics are defined as follows: where N is again the number of samples, ŷ are the predicted labels, and y are the actual labels. The MAE was also computed in each case.
Regression. Gradient boosting can be used in regression and classification tasks. To predict the hardness directly, the Gradient Boosting Regressor (GBR) was implemented 25 . GBR is a supervised learning regression technique that creates a prediction model with the same input variables used before (B, G, Y, ν ). The algorithm was only parameterized with a random seed of zero. All the other parameters have default values. The MAE was also computed to measure the accuracy of the model.

Results and discussion
Comparing different relations of hardness. We started by defining the best hardness calculation relation based on the crystal system. As observed in Table 1, for the 143 structures considered in this study, relation H 1a is the most accurate, with an MAE of 3.3 GPa. This relation is also the preferred one for cubic structures. Nevertheless, some crystal systems work better with other approximations. The hexagonal, monoclinic, and tetragonal groups prefer the H 4 relation, while the orthorhombic and trigonal types minimize their MAE by using H 2 . The triclinic group works better with the H 5 relation. Calculating the hardness with the selected relation for each crystal type reduces the general MAE from 3.3 to 3.0 GPa.
As observed, systems with all lattice parameters equal to each other (cubic and trigonal) work successfully with relations of hardness that depend solely on the shear modulus ( H 1a and H 2 respectively). On the other hand, systems with all angles equal to 90 • (cubic, orthorhombic and tetragonal) do not display such a clear trend. While cubic and orthorhombic systems also work better with the shear modulus ( H 1a and H 2 ), tetragonal systems prefer a combination of the bulk modulus and Poisson's ratio ( H 4 ), and the shear modulus appears as the second-best option ( H 1a ). Nevertheless, the latter results suggest that, in general, for high-symmetry systems, www.nature.com/scientificreports/ the shear modulus is a good descriptor of hardness. Perhaps, it is simple to capture the overall rigidity of a solid in a single parameter if the system is highly-symmetric.
On the other hand, systems with two of their lattice parameters equal to each other and well-defined angles (hexagonal and tetragonal) exhibit an inclination toward a combination of the bulk modulus and Poisson's ratio ( H 4 ). Notably, having an expression that depends simultaneously on these two parameters provides significant flexibility in describing the rigidity of a solid in these cases.
Finally, low-symmetry systems, with all lattice parameters different from each other and at least one angle different from 90 • (monoclinic and triclinic), exhibit a preference for the combination of the bulk modulus with another property. Monoclinic structures work better with the combination of bulk modulus and Poisson's ratio ( H 4 ), while triclinic structures prefer the combination of bulk and shear modulus ( H 5 ).
Similar to the previous discussion, additional analyses were performed but now considering different electronic bandgaps (insulators, semiconductors, and metals) and density (low, medium, and high) as criteria to distinguish the elastic response. The general MAE was 3.0 GPa and 2.6 GPa, respectively. Table 2 displays the details for the bandgap classification. The best approach for insulators is H 2 , while for semiconductors is H 1a , and for metals H 4 . These results indicate that for insulators and semiconductors, the shear modulus is a better descriptor of hardness, while metallic systems work better with a combination of bulk modulus and Poisson's ratio. The latter result suggests that the shear modulus can capture a solid's overall rigidity when it is composed of strong directional atomic bonds. Table 3 presents the details for the density analysis. Materials with a low density behave better with the H 2 approximation, while materials with medium or high-density incline for H 4 . This observation aligns with the previous findings, given that low-density materials usually have strong directional bonds and small packing factors, while high-density materials have metallic bonds and close-packed crystal structures.
A similar exercise including two variables simultaneously was executed to minimize the absolute error. Table 4 presents the results for the different single and combined methods. The first row presents the best possible result; when the hardness of each material is calculated with the relation ( H 1a , H 1b , H 2 , H 3 , H 4 , or H 5 ) that minimizes the absolute error in each case. The MAE column suggests that the best mode to reduce the hardness calculation error is to simultaneously consider the Crystal System and Density classification (CLA 2 ). This model exhibits the lowest MAE of 2.2 GPa with a standard deviation of 2.2 GPa. The second best combination is Crystal System and Bandgap (CLA 1 ) followed by Bandgap and Density (CLA 3 ).
The classic calculator. Even though the combination of Crystal System and Density exhibits the best result, the data presented in Table 4 reveals no statistical significant difference among the three combined methods  www.nature.com/scientificreports/ (CLA 1 , CLA 2 and CLA 3 ). Based on the latter observation, The Classic Calculator was developed as a selection model considering simple properties of a solid like crystal system, bandgap, and density. Table 5 summarizes the results considering the crystal system and the bandgap simultaneously. This table presents the relation that minimizes the error in the hardness calculation based on these two criteria. Figure 2a compares the experimental with the theoretical data calculated using this method. Most data points lie close to the red line, indicating that the calculated values greatly resemble the experimental data. The coefficient of determination ( R 2 = 0.95 ) between the observed and estimated values also shows a strong correlation validating the model.
Similarly, Table 6 presents the results of simultaneously considering the crystal system and density, and Table 7 the bandgap and density. Any of the three different approaches of the The Classic Calculator can be used to select a proper relation for calculating hardness depending on the available information.
For example, diamond is a low-density ( ρ = 3.5 g/cm 3 ) insulator ( E = 4.3 eV) with a cubic crystal system ( ρ and E correspond to theoretical values extracted from the Materials Project's database). Table 5 displays the classic calculator considering crystal system and the bandgap simultaneously (CLA 1 ). In the case of diamond, the latter suggests using relation H 2 (89.3 GPa) to estimate the hardness of diamond. Table 6 is the classic calculator considering crystal system and density simultaneously (CLA 2 ). For diamond, CLA 2 suggests using relation H 5 (93.0 GPa) to compute hardness. Table 7 shows the classic calculator built upon bandgap and density (CLA 3 ). In the case of diamond CLA 3 recommends using relation H 2 (89.3 GPa) for hardness. As observed the three classic models display very similar results, but one can be more accurate than the other. Given the experimental Vickers hardness of diamond is 96 GPa, CLA 2 exhibits the best prediction, which agrees with the results presented in Table 4. However, any of the classic models may be used to estimate hardness depending on the available information.
The machine learning calculator. Table 8 displays the performance of different supervised machine learning techniques when trying to solve the hardness problem. The results for seven different classification methods and one regression algorithm are shown and compared to each other.
Classification. The classification algorithms target the best calculation relation in each case. As observed in Table 8, GBC (31%) and DT (31%) have the highest accuracy, followed by KNN (21%). The Jaccard index reflects, almost identically, the same behavior. At first glance, 31% accuracy may suggest a low performance. However, this not necessarily means the classifier did a poor job because some materials can work successfully with two, three, or four hardness relations. Therefore, to keep a more balanced measure of the performance of the different classifiers, we have selected the best by minimizing the MAE. GBC presented the lowest MAE (1.4 GPa), followed by KNN (2.3 GPa), DT (2.9 GPa) and SVM (2.9 GPa). Also, GBC (1.9 GPa) exhibited the lowest standard deviation, followed by KNN (2.9 GPa) and SVM (3.2 GPa). Based on the latter results, it is indisputable that GBC is the best classifier, given its higher accuracy and low MAE.
GBC is a very sophisticated technique, so it is not surprising that it outperforms KNN or DT. However, it is remarkable to observe that even though KNN has a lower accuracy, its MAE is smaller than DT. This confirms Table 4. Comparison of the hardness MAE (GPa) and standard deviation σ (GPa) for various classification methods. Accuracy was calculated with respect to the best possible result. www.nature.com/scientificreports/  Table 5 (CLA 1 ), (b) The Machine Learning Calculator using GBC and (c) GBR. Table 6. The Classic Calculator considering crystal system and density simultaneously (CLA 2 ). Materials are classified by density ( ρ ) as low ( ρ < 4 g/cm 3 ), medium (4 g/cm 3 ≤ ρ ≤ 9 g/cm 3 ) and high density ( ρ > 9 g/ cm 3 ).

Cubic Hexagonal Monoclinic Orthorhombic Tetragonal Triclinic Trigonal
Low  Figure 2b shows the experimental and predicted values of hardness using GBC. As observed, there is a clear linear trend corroborated by the coefficient of determination ( R 2 = 0.98 ). Also, the dispersion of the data points in Fig. 2b is less than the one observed in Fig. 2a, suggesting that the GBC provides a better model for future forecasts than The Classic Calculator.
Regression. The results in the previous section show that the Gradient Boosting Classifier (GBC) is the best algorithm to select the hardness calculation relation given the properties of a solid. Gradient boosting is a robust algorithm used for regression or classification tasks. Given that the classifier did such an outstanding job, the Gradient Boosting Regressor (GBR) was implemented to predict the value of hardness directly in this study. As observed in Table 8, the performance of the regressor is better than the classifier. While the regressor displays a MAE of 1.3 GPa, the classifier shows 1.4 GPa, a small difference of 0.1 GPa that favors the regressor over the classifier. Additionally, the standard deviation of the regressor and the classifier have the same value, suggesting an overall better prediction by the regressor.
Comparing the MAE of GBR (1.3 GPa) with the best possible result (1.0 GPa) shown in Table 4, it is clear that the GBR works effectively predicting the value of hardness, followed by the GBC (1.4 GPa) and KNN (2.3 GPa). Also, GBR (1.9 GPa) and GBC (1.9 GPa) display the lowest standard deviation among all the ML techniques explored in this work, followed by KNN (2.9 GPa) and SVM (3.2 GPa). The standard deviations of GBR and GBC are only 0.7 GPa above the best possible result (1.2 GPa), a small value compared to the results exhibited by other methods. The latter results demonstrate that GBR has the best performance among all the ML algorithms evaluated in this work. Consequently, GBC holds second place, followed by KNN.
In the case of diamond, the classification algorithms KNN, DT, LR, SVM, RF, and GBC predicted the best relation is H 5 (93.0 GPa), while ADA inclined towards H 2 (89.3 GPa). On the other hand, the regressor GBR directly predicts a value of 95.9 GPa. Figure 2c displays the experimental and predicted values of hardness using GBR. As observed most of the data points lie very close to the red line, minimizing the dispersion of the data. The coefficient of determination in this case ( R 2 = 0.99 ) is very close to 1.0, indicating that the statistical model predicts hardness successfully. In Fig. 2c we can observe that GBR manages to correct some data points that were not predicted correctly neither by CLA or GBC. Given these observations, we recommend GBR as the most reliable method for predicting hardness, among all the different techniques proposed in this study.
Prediction of hard and superhard materials. The Materials Project's database was explored for compounds with the computed elastic tensor. Approximately 12,000 materials meet the criteria. The mechanical properties (B, G, Y, ν ) were calculated for each one of them using the MechElastic package 16 . The materials were further classified (by crystal system, density, and bandgap) using the theoretical data provided by the Table 7. The Classic Calculator considering bandgap and density simultaneously (CLA 3 ). Materials are classified by bandgap ( E ) as insulators ( �E > 2 eV), semiconductors ( �E < 2 eV) and metals ( E = 0 ); and by density ( ρ ) as low ( ρ < 4 g/cm 3 ), medium (4 g/cm 3 ≤ ρ ≤ 9 g/cm 3 ) and high density ( ρ > 9 g/cm 3 ).

Low Medium High
Insulator Table 8. Machine learning for hardness prediction. Out-of-sample accuracy and Jaccard index for different machine learning algorithms. The MAE (GPa) and standard deviation σ (GPa) consider the entire dataset.
Classification algorithms (C) target the best calculation relation, and the regression algorithms (R) the hardness value directly.  Table 9 presents some of the materials predicted to be hard and superhard using The Hardness Calculator. From this list, we found that five materials have experimental hardness measurements, ten have been predicted to be hard by other authors, and the remaining sixteen are predicted to be hard within this work.
The compounds BN, Be 2 C , Si 3 N 4 , VB 2 and HfB 2 have been previously synthesized and were predicted to be superhard at least by one of the methods presented in Table 9. Even though, in general, the experimental values are slightly below the predictions, BN is experimentally superhard, and the rest of the materials are hard, corroborating the goodness of the methods implemented in The Hardness Calculator.
In agreement with our predictions, other theoretical studies have suggested that C 3 N 4 , BC 2 N and CN 2 are excellent candidates to be superhard materials. From first-principles calculations, Teter et al. predicted a cubic form of C 3 N 4 with a zero-pressure bulk modulus exceeding that of diamond. The authors suggested that this phase could potentially be synthesized for use as a superhard material 31 . Also, Hong Sun et al. studied different cubic BC 2 N structures from ab initio methods 32 . The authors stated that the two hardest c-BC 2 N structures have bulk and shear moduli comparable to or slightly higher than c-BN, suggesting these compounds are superhard. They also believe these structures are similar to c-BC 2 N synthesized by Knittle et al. 42 . However, the experimental hardness of this compound is still unknown. Finally, Quan Li et al. predicted the body-centered tetragonal structure of CN 2 from first principles 33 . The authors simulated a hardness of 77 GPa for this compound, indicating that it has excellent incompressible and superhard properties. Similarly, other authors have suggested that BeCN 2 , B 2 CN , ReN 2 , TcOs 3 , CrC, TcB 2 , and ReC are good candidates for hard materials. All these observations suggest that the methods implemented in The hardness calculator are coherent with the findings in previous studies.
To our knowledge, the remaining sixteen materials proposed to be hard in this work have not yet been studied for hardness. We hope this work motivates the experimental study of these compounds.

Website. The Hardness
Calculator is a standalone online application created for simple analysis of hardness (available at https:// www. hardn essca lcula tor. com). It is a user-friendly interface that requires mechanical properties as an input to compute the hardness of a material. The program displays the hardness values calculated by The Machine Learning Calculator ( H GBC and H GBR ) as well as all the other values of hardness estimated by the six different relations described in Sect. 2.1 ( H 1a , H 1b , H 2 , H 3 , H 4 , and H 5 ). If the user provides the crystal system, density and/or bandgap, the program will also indicate the preferred relation to estimate hardness according to The Classic Calculator.

Conclusions
In this study, we have discussed several methodologies to compute hardness using the mechanical properties of a solid (bulk modulus, shear modulus, Young's modulus, and Poisson's ratio) as input variables. We have approached the hardness estimation problem from two different perspectives.
In the first approach, we investigated the correlation between different hardness relations ( H 1a , H 1b , H 2 , H 3 , H 4 , and H 5 ) and some physical properties of solids, such as crystal system, bandgap, and density. From this first part, we developed The Classic Calculator, which is a selection model based on the simple properties of a solid. The best results were observed considering two properties simultaneously: Crystal System + Bandgap, Crystal System + Density, or Bandgap + Density. The MAE (standard deviation) in the hardness calculation for each one of these methods is 2.3 GPa (2.7 GPa), 2.2 GPa (2.2 GPa), and 2.5 GPa (2.9 GPa), respectively. Even thou the combination of Crystal System + Density exhibits the better performance among the three approaches, there is www.nature.com/scientificreports/ no significant statistical difference between these methods; any of them can be used to select the proper relation to calculate hardness depending on the available information.
The second approach is based on Machine Learning and is referred to as The Machine Learning Calculator. We proposed two models to compute hardness using ML: a classifier (GBC) and a regressor (GBR). The classifier targets the best relation to calculate the crystal hardness using the mechanical properties of a solid as input variables. On the other hand, the regressor directly predicts the hardness value using the same input variables as the classifier. GBC and GBR display a MAE (standard deviation) of 1.4 GPa (1.9 GPa) and 1.3 GPa (1.9 GPa), respectively. GBR displays the best performance among all the different techniques studied in this work.
The Hardness Calculator, composed of classic and ML schemes, was used to search for hard and superhard materials within the Materials Project's database. This exploration demonstrated that The Hardness Calculator shows great predictive power as our results match other experimental or theoretical studies. As a result, sixteen materials were proposed as new hard or super hard candidates by this work.
The Hardness Calculator is available as a free access online application for users to discriminate between the different results at https:// www. hardn essca lcula tor. com.

Data availability
The authors declare that all data that support the findings of this study are included in the paper and/or its supplementary information files. Table 9. Prediction of hard and superhard materials using The Hardness Calculator. Materials Project's database identification number, chemical formula, crystal system (CS), bandgap ( E in eV), density ( ρ in g/cm 3 ), bulk modulus (B in GPa), shear modulus (G in GPa), Young's modulus (Y in GPa) and Poisson's ratio ( ν ) are shown. Vickers hardness (in GPa) was calculated using The Classic Calculator (H CLA 1 ) according to Table 5, and the The Machine Learning Calculator using the GBC (H GBC ) and the GBR (H GBR ). The comments specify whether the material was predicted to be hard by this work, by this work and other authors or if the hardness has been previously measured experimentally. a Ref. 26