Machine-learning potentials for nanoscale simulations of deformation and fracture: example of TiB$_2$ ceramic

Machine-learning interatomic potentials (MLIPs) offer a powerful avenue for simulations beyond length and timescales of ab initio methods. Their development for investigation of mechanical properties and fracture, however, is far from trivial since extended defects -- governing plasticity and crack nucleation in most materials -- are too large to be included in the training set. Using TiB$_2$ as a model ceramic material, we propose a strategy for fitting MLIPs suitable to simulate mechanical response of monocrystals until fracture. Our MLIP accurately reproduces ab initio stresses and failure mechanisms during room-temperature uniaxial tensile deformation of TiB$_2$ at the atomic scale ($\approx{10}^3$ atoms). More realistic tensile tests (low strain rate, Poisson's contraction) at the nanoscale ($\approx{10}^4$--10$^6$ atoms) require MLIP up-fitting, i.e. learning from additional ab initio configurations. Consequently, we elucidate trends in theoretical strength, toughness, and crack initiation patterns under different loading directions. To identify useful environments for further up-fitting, i.e., making the MLIP applicable to a wider spectrum of simulations, we asses transferability to other deformation conditions and phases not explicitly trained on.


Introduction
Simulations of materials' mechanical response-including (i) intrinsic strength and toughness, (ii) nucleation of extended defects (e.g.dislocations, stacking faults, cracks) and their implications for (iii) plasticity and fracture mechanisms [1][2][3] require length and time scales beyond limits of ab initio methods (≈ 10 3 atoms, ≪ns) [4][5][6] .The go-to method in most cases would be Molecular Dynamics (MD), allowing to access atomistic pathways for deformation and fracture in nanoscale systems (≈ 10 6 atoms) and "realistic" operation conditions (e.g.ultra-high temperatures, times up to µs).However, a severe problem of classical MD is that the necessary interatomic potentials do not exist for most engineering materials, or, are limited in accuracy and transferability with respect to temperatures, phases, and defect structures (see e.g.8][9] ).
A powerful avenue for MD simulations on multiple time and length scales with near ab initio accuracy is the application of machine learning interatomic potentials [10][11][12][13] (MLIPs, in case of no ambiguity just "potentials").MLIPs learn the atomic energy (or other atomic properties) as a nontrivial function of a descriptor quantifying local atomic environments in an ab initio training set 14 .Compared to conventional ab initio calculations, MLIPs can achieve a speed up of as much as 5 orders of magnitude 15,16 .Previous studies showed examples of MLIPs' transferability with respect to defects (e.g. grain boundaries 17 , dislocation structures 18,19 ) and phases 20,21 (e.g.Ni-Mo phase diagram illustrating superior performance of a MLIP over a classical potential 22 ).Recently, Tasnádi et al. 23 have demonstrated high accuracy of MLIP-predicted elastic constants for TiAlN ceramics, hence, have set the stage for MLIP development beyond linear elastic regime.
Based on the parametrization of local structural properties, MLIPs can be fitted employing different formalisms: spectral neighbor analysis potentials (SNAP) 24 , neural networks potentials (NNP) 25,26 , Gaussian approximation potentials (GAP) 27 , moment tensor potentials (MTP) 28 , linearized interatomic potentials 29 , or atom cluster expansion (ACE) potentials 14 , where only the last three are mathematically complete 30 .Though some parametrizations have been formally expressed as a special case of ACE 30 , their particular implementations may be more suitable for different training datasets and applications.Benchmarks for competing parametrizations have been published in case of carbon 31 or graphene 32 , but are missing for chemically complex materials.
Besides various MLIP formalisms, additional fundamental challenges in the field are: (i) efficient training dataset generation (i.e.minimising computationally-intensive ab initio calculations), (ii) standardized validation procedure, and (iii) training strategies for simulations beyond ab initio reach.A solution to (iii) would facilitate nanoscale description of solids with atomic-scale resolution, hence, is the key to understanding differences between material's sub-micrometer and macroscopic behavior [33][34][35] .Point (iii) is also a primary motivation of this study which presents methodological (MLIP development) as well as materials science discussion of nanoscale MD simulations on deformation and fracture for a model TiB 2 ceramic.
The chosen material, TiB 2 , is a widely researched representative of ultra-high temperature ceramics (UHTCs) with mature synthesis technologies 36,37 and melting point of 3500 K 38 .
Here, we propose a training strategy for MLIPs suitable to model deformation and crack growth mechanisms in monocrystals from atomic to nanoscale.Viability of our approach is illustrated by MD simulations of room-temperature uniaxial tensile loading for supercells with ≈ 10 3 -10 6 atoms.Specifically, we use the MTP formalism and train on a small fraction (initially < 1%) of snapshots from room-temperature ab initio molecular dynamics (AIMD) tensile tests, iteratively expanding the training set until all configurations are reliably described.Our MLIP accurately reproduces stress evolution and failure mechanisms during tensile loading of TiB 2 at the atomic scale, but requires up-fitting for more realistic nanoscale tensile tests.Consequently, we discuss size effects in the predicted mechanical properties and fracture patterns, as well as asses MLIP's transferability with respect to other loading conditions and phases.

Results and discussion 1 Training procedure and fitting initial MLIPs
Our general training procedure is described below (Procedure 1) and schematically depicted in Fig. 1a.Throughout this work, ab initio training data is generated by finite-temperature AIMD calculations generally producing many highly correlated configurations i .To avoid over-representation, our MLIP training initiates with a small fraction of AIMD configurations and exploits the MLIP's uncertainty indication-quantified through the extrapolation/Maxvol grade 60 (γ)-to iteratively expand the training set until all AIMD configurations are reliably described.1][62] , γ of a given configuration expresses how much MLIP extrapolates when predicting the corresponding energy, forces, and stresses.Specifically, γ ≤ 1 means interpolation and γ > 1 extrapolation.
Note that some degree of extrapolation is acceptable.Following Shapeev and co-workers 63 who describe γ ≤ 2 as accurate extrapolation, we employ a γ threshold, γ thr = 2, as a condition for exiting the training loop in Procedure 1.
i By a configuration, we mean a structure labelled by ab initio total energy, forces acting on each ion, and six stress tensor components.
(2) Divide the pool into an initial training set (TS 0 ), a learning set (LS), and a validation set (VS) by randomly selecting 0.5%, 79.5%, and 20% of non-overlapping configurations.
(3) Fit an initial MLIP (MLIP 0 , trained on TS 0 ).If γ of all configuration in the LS and VS is below γ thr = 2, exit.Else, build TS 1 by adding (maximum 15) highly extrapolative configurations from the LS to TS 0 and fit a new MLIP (MLIP 1 , trained on TS 1 ).
(4) While γ of all configurations in the LS and VS is above γ thr = 2, build TS i by adding (maximum 15) highly extrapolative configurations from the LS to TS i−1 , and fit a new MLIP (MLIP i , trained on TS i ).
Technical comments on Procedure 1 (for further details of MLIP training, please see the Methodology): • MLIP 0 in the step (3) is trained from an untrained MTP.
To speed-up the fitting process, MLIP i in step (4) is fitted from MLIP i−1 if maximum γ in the (i − 1)th iteration is below 1000 (otherwise it is fitted from an untrained MTP).
• The VS is not used for training but only as a reference ii .
• Besides γ, quality of the fit at each iteration i in (4) is monitored through errors of energies, forces and stresses (quantified by common regression model evaluation metrics, MAE, RMSE, R 2 , see e.g.5][66] ) for the TS i (fitting errors) and the VS (validation errors) iii .
Employing ii Although it was not the case here, the maximum γ of VS may remain high (γ ≫ γ thr ) even if the maximum γ of LS is already below γ thr (imagine, e.g., all configurations relevant for the description of material's fracture ending up in the VS).Then the strategy could be switching the LS and VS, i.e. use the old VS for learning and the old LS for validation.
iii While not done here, one may use fitting and validation errors as additional criterion (besides γ thr ) for exiting the training loop.
iv Since we run AIMD tensile tests until fracture, directions with higher fracture strains produce more configurations.fitting and validation errors, quantified through the residual mean square error (RMSE), of total energies, forces, and stresses do not exceed 2.6 meV/at., 0.11 eV/Å, and 0.19 GPa, respectively.

MLIPs' validation against atomic scale tensile tests
As a next step, the above developed MLIPs are employed for MD calculations, from now on referred to as ML-MD.Specifically, we simulate uniaxial tensile deformation of TiB 2 with computational setup equivalent to that used in AIMD.The aim is two-fold: (i) To asses accuracy of quantities relevant for the targeted MLIP's application and possibly measurable in experiment.Here, key descriptors of mechanical response are (time-averaged) stresses in the loaded direction and quantities characterizing material's strength and toughness.
(ii) To decide if MLIP up-fitting is necessary.By up-fitting we understand expanding the LS by additional ab initio configurations and going back to the step (4) of Procedure 1 (see Fig. 1b,c).Up-fitting is triggered when γ during a ML-MD simulation exceeds reliable extrapolation v .Following Ref. 63 we use γ reliable ≤ 10 and show that such choice provides a good accuracy of stresses during atomic scale ML-MD tensile tests in relation to AIMD results.
Fig. 2a depicts stress/strain curves derived from roomtemperature AIMD and ML-MD tensile tests, in which TiB 2 supercell (≈ 10 3 atoms, ≈ 1.5 3 nm 3 ) is loaded in the [0001], [1010], and [1210] direction, respectively.Note that each deformation is simulated with a MLIP trained to the respective loading condition (e.g., MLIP-[0001] for the [0001] tensile test etc.).Excellent quantitative agreement between AIMD and ML-MD results indicates reliability of our MLIPs.Specifically, (time-averaged) stresses in ML-MD differ from AIMD values by 0.07-1.94GPa, yielding statistical errors RMSE ≈ 1.02 GPa, R 2 ≈ 0.9997 vi .The fracture point in [0001] deformation is excluded from the analysis (there, the [0001] stress component in AIMD does not drop to zero due to long-range electrostatic effects absent in ML-MD).The extrapolation grade during all ML-MD simulations remains low (γ ≤ 5 < γ reliable ), thus, suggests reliable extrapolation and does not trigger MLIP up-fitting.
v From our training procedure (Procedure 1), it follows that the final MLIP yileds γ ≤ γ thr = 2 for all ab initio configurations in the TS, VS, and LS.Finitetemperature effects, however, may cause γ > γ thr , even during a ML-MD simulation with computational setup equivalent to that used to generate the training data (since ML-MD trajectories will not be exactly the same as those in AIMD).Then, (some of) the configurations causing γ > γ reliable can be simply labelled by ab initio energies, forces, stresses and added to the LS for up-fitting (note that this is a typical definition of active learning 67 ).At scales where direct ab initio calculations are unfeasible, additional configurations for up-fitting need to be generated in a different way, example of which is given in the following section.
vi Note that stresses normal to the loaded direction-not vanishing due to the omission of Poisson's effect in both AIMD and ML-MD simulations-are also used for quantitative comparison.Furthermore, we use the predicted stress/strain data to evaluate TiB 2 's (theoretical, intrinsic) tensile strength and toughness in the [0001], [1010], and [1210] direction.By tensile strength we mean the ultimate tensile strength, which corresponds to the global stress maximum during the tensile test and may be reached at a strain beyond the yield point 69 .By tensile toughness (or toughness modulus) we understand the integrated stress/strain area until the fracture point.This property describes the ability of a material with no initial cracks to absorb mechanical energy until failure.The [1210], [1010], and [0001] tensile strength in ML-MD reaches 63.7, 55.0, and 52.7 GPa, respectively, differing from AIMD values by maximum 0.8 GPa (0.99%).The ML-MD predicted toughness along [1210], [1010], and [0001] reaches 4.3, 3.1, and 4.3 GPa, respectively, differing from AIMD values by maximum 0.029 GPa (0.67%).Note, however, that theoretical tensile strength and toughness values may be size-dependent and should saturate at large-enough supercell sizes (see nanoscale simulations in the following sections).
For MLIPs targeted to simulations of deformation and fracture, additional aspect of validation is qualitative analysis of strain-induced structural changes.This means confirming that the MLIP captures phenomena as, e.g., transformation toughening mechanisms or material's cleavage predicted by equivalent AIMD simulations.Here, our MLIPs correctly reproduce brittle failure pathways observed in AIMD of TiB 2 subject to [0001], [1010], and [1210] tensile deformation (Fig. 2b-c), as also indicated by the agreement between ML-MD and AIMD stress/strain curves (Fig. 2a).The TiB 2 's fracture strains (same in AIMD and ML-MD) are 24%, 20%, and 18% during [0001], [1010], and [1210] tensile tests, respectively.

MLIP-[0001], MLIP-[1010], and MLIP-[1210]
provide reliable description of TiB 2 's response to uniaxial tensile loading at the atomic scale.This is indicated by low extrapolation grades (γ ≤ 5 < γ reliable ) as well as accuracy of stress/strain curves, theoretical strength and toughness, and fracture mechanisms consistent with AIMD observations (see previous section).
As a next step, we simulate more realistic tensile tests at the nanoscale, including Poisson's contraction and continuously increasing the applied strain.For clarity, we establish the following notation.• Atomic scale tensile tests (presented in the previous section) are uniaxial tensile tests with the following setup: (i) Supercells with ≈ 10 3 atoms.Specifically, our supercell has 720 atoms, corresponding to dimensions of ≈ (1.5 × 1.6 × 2.6) nm 3 .(ii) A 2% strain step with a 3 ps equilibration under fixed lattice vectors normal to the loaded direction (i.e.no Poisson's contraction).
Employing MLIP-[0001], MLIP-[1010], and MLIP-[1210] for room-temperature nanoscale tensile tests results in unphysical dynamics (losing atoms) and rapidly increasing extrapolation grades (γ ≫ 10 3 ≫ γ reliable ) when approaching the fracture point.This indicates we have reached length scales where extended defects-absent in atomic scale simulationsdominate materials' deformation.To enable description of TiB 2 's fracture at the nanoscale, our MLIPs require up-fitting (recall Fig. 1b).Generally, this is a very non-trivial task 70,71 , since structures causing large γ cannot be directly treated by ab initio calculations vii .
Below we describe up-fitting steps leading to MLIP-[4] (schematically depicted in Fig. 1c), which is a MLIP enabling room-temperature nanoscale tensile tests for TiB 2 .
• We produce MLIP- [1] by up-fitting MLIP-[0001], where the LS is expanded by final TSs of MLIP-[1010] and MLIP-[1210] viii .Although MLIP- [1] does not yet suffice to describe TiB 2 's fracture during tensile deformation at the nanoscale (γ ≫ γ reliable ), it can be applied to simulate atomic scale tensile tests for all the three loading directions.
In particular, MLIP- [1] yields low extrapolation grades (γ ≤ 6 < γ reliable ) and correctly reproduces fracture mechanisms from AIMD.MLIP- [1] also preserves nearly the same accuracy as MLIP-[0001], MLIP-[1010], and vii Due to the fact that direct ab initio calculations are unfeasible at the nanoscale, proving that a MLIP predicts the "correct physics" is possible only through indirect indications.These include the MLIP's uncertainty indication, γ, and calculations of properties that can be derived also from atomic scale simulations, e.g.lattice parameters or theoretical tensile strength.
MLIP-[1210] ix .In terms of (time-averaged) stresses, differences from AIMD values, are 0.13-1.47GPa, resulting in statistical errors RMSE ≈ 0.79 GPa, R 2 ≈ 0.9999.The corresponding stress/strain curves are presented in the Suppl.Mat. (Fig. S2).Additionally, MLIP- [1] provides good accuracy of room-temperature elastic constants, C i j (Tab.1), differing from AIMD-calculated values by less than 4.9%.Although our MLIPs are not targeted to accurate predictions of C i j x due to low energy cutoff used in the underlying AIMD training set, the here obtained values further underpin reliability of MLIP- [1] xi .
Besides reliability indication through γ, MLIP- [4] provides physically sound stress/strain curves and dynamics of nanoscale tensile deformation (illustrated in Supp.Mat.Fig. S4, and in the following section).Additional indication of MLIP- [4]'s reliability are lattice parameters evaluated by equilibrating atomic scale (≈ 10 3 atoms) ix Generally, this does not have to be the case.Making a MLIP applicable to a wider spectrum of environments (here, MLIP- [1] can tackle all the three loading conditions) may lead to decreased accuracy for a particular simulation (here, e.g., MLIP-[1010] may be in principle more accurate in describing the [1010] loading condition).
x A training approach suitable for accurate predictions of C i j has been recently suggested by Ref. 23 .
xi Furthermore, the only computational report of TiB 2 's elastic constants at finite temperatures, besides ours, is Ref. 72 , which does not use AIMD but a method based on ab initio calculations of C i j at 0 K and ab initio phonon theory of thermal expansion.
xii Note that all the three LSs are generated at low computational and human time.In particular, we do not "hand-design" any material-specific environments or e.g.surfaces, only pull a TiB 2 supercell (720-atoms, equilibrated at 300 K and 1200 K) along one of the three lattice vectors.

5/18
Table 1.Elastic constants, C i j (in GPa), of TiB 2 (at temperature T (K))-calculated using the here-developed MLIP (MLIP- [1])-together with the polycrystalline bulk modulus, B (in GPa), shear modulus, G (in GPa), Young's modulus, E (in GPa), and Poisson's ratio, ν, compared to reference ab initio and experimental (exp.)data.Ref. 73 and Ref. 74 is for TiB 2 single and polycrystal, respectively.AIMD and ML-MD elastic constants were evaluated following Ref. 75 To offer some understanding of why MLIP- [4] enables nanoscale tensile tests, one needs to analyse the corresponding training set, TS(MLIP- [4]).In Fig. 3, we visualize selected characteristics of TS(MLIP- [4]) in comparison to the training set of MLIP- [1], TS(MLIP- [1]), where the latter is not applicable to simulate TiB 2 's fracture at the nanoscale.The radial distribution function (RDF, Fig. 3a) and bond angle distribution analysis (Suppl.Mat., Fig. S6) suggest minor geometrical differences between structures contained in TS(MLIP- [1]) and TS(MLIP- [4]).Their total energy and stress distribution, however, differ significantly (Fig. 3b).In particular, TS(MLIP- [4]) contains atomic configurations with higher total energy and higher total energy in combination with higher stress in principal crystallographic axes, which are missing in TS(MLIP- [1]).An illustration of structures from the two training sets is given in Fig. 3c.The chosen snapshots indicate that TS(MLIP- [4]) provides a variety of atomic environments relevant for simulations of non-stoichiometry, locally amorphous regions, and surfaces, which are likely to be helpful also for simulations of extended defects nucleating due to high strains during tensile tests.

6/18
4 Size effects in tensile response of TiB 2 Equipped with the above developed MLIP- [4], in this section we discuss TiB 2 's response to room-temperature uniaxial tensile loading from atomic to nanoscale.Recall that an important difference between our atomic and nanoscale tensile tests is that the former disregard Poisson's contraction.The stress/strain curves calculated at room temperature are depicted in Fig. 4. For strains below ≈ 10%, stresses for the respective loading direction ([0001], [1010], and [1210]) almost perfectly overlap regardless the supercell size (≈ 10 3 -10 6 atoms).Such overlap indicates consistency in elastic constants derived from atomic and nanoscale models xiii .Furthermore, also elastic isotropy of the basal plane is size independent.The isotropy is suggested by nearly the same initial slope of stress/strain curves for [1210] and [1010] elongation and in line with experimental reports for hexagonal crystals 80 .
Besides characterizing directional dependence of tensile strength and toughness in dislocation-free monocrystals, nanoscale simulations also provide insights into crack nucleation and growth mechanisms.This can be illustrated-and contrasted with atomic scale ML-MD-using the example of xiii Note that in order to obtain the C 44 elastic constant, we would need to perform additional shear simulations, which we did only at the atomic scale (recall Tab. 1).
xiv The Poisson contraction was calculated as ν = − , where the dε compressed (dε elongated ) is the lattice parameter shrinkage (increment) orthogonal (parallel) to the loading direction.The presented value is an average of Poisson's ratios for both in-plane directions.
xv Even more significant size effects may be expected for materials with strain-activated plasticity mechanisms.
[0001] tensile test (Fig. 5).At the atomic scale, all atoms in TiB 2 vibrate close to their ideal lattice sites until a sudden brittle cleavage induces formation of two surfaces almost perfectly parallel with (0001) basal planes (Fig. 5, row 1).At the nanoscale, fracture is initiated by opening of voids accompanied by local necking which produces lattice re-orientations (Fig. 5, row 2 and 3).Rapid void coalescence and fraying of ligaments results in corrugated fractured surfaces, predominantly with (0001) orientation.Following the stress release, inner parts of the crystal relax back to the ideal TiB 2 lattice sites.
The S1 supercell yields in only one region (Fig. 5, row 2).The larger S2, S3, and S4 supercells-where only S4 is depicted in Fig. 5-do not fracture in two pieces but show fewnm-size cracks inside.Here, fracture surfaces do not align only with the basal planes but also with the {1011} first order pyramidal planes (recall the notation in Fig. 2d).Volumetric strain analysis (Fig. 5d,e) highlights locally increased tensile strain concentration surrounding small voids (see TiB 2 slice at ≈ 27% strain) due to decreased interplanar spacings between Ti and B layers (predominantly due to [0001] compression) above and below the voids.The largest S4 supercell allows for crack propagation under different directions, thus, provides a certain statistical viewpoint missing in the S1 supercell and at the atomic scale.
For the [1010] tensile test, size effects in fracture mechanisms are compared in Fig. 6.At the atomic scale, two voids open diagonally across Ti/B layers (Fig. 6, row 1).At the nanoscale, we observe nucleation of V-shaped cracks, as illustrated for the S1 and the S4 supercell (Fig. 6, row 2 and 3, and panels c, d), where S4 additionally reveals lattice rotation near the V-shaped defects.We infer that loading in the direction of strong covalent B-B bonds most often induces crack deflection and fracture at {1122} family of surfaces parallel to the second order pyramidal planes.
Changing to the [1210] tensile deformation, atomic scale simulations predict fracture along {1010} prismatic planes (Fig. 7, row 1).This is underpinned also by nanoscale ML-MD (Fig. 7, row 2 and 3), suggesting that crack growth is most often orthogonally and diagonally across Ti/B layers (see the dashed line with arrow in Fig. 7e).

Other loading conditions and MLIP's transferability
Targeted applications of our MLIP (MLIP- [4]) are atomic to nanoscale simulations of TiB 2 subject to room-temperature uniaxial tensile loading.Although the training set only contained snapshots of a 720-atom TiB 2 supercell, with one of the three lattice vectors elongated with respect to the equilibrium length at 300 K, Fig. 3c indicates variety of atomic environments relevant for simulations of non-stoichiometry, locally amorphous regions, or e.g.surfaces.
Here we discuss transferability of our MLIP to simulations for which it has not been explicitly trained.Accuracy of the predicted observables (e.g.shear strengths or surface energies) is presented in the context of extrapolations grades, allowing  • Room-temperature shear deformation of TiB 2 .Simulations of shear deformation provide useful insights for understanding of how dislocations nucleate and move in generally brittle UHTCs 82,83 .Furthermore, as diborides typically crystallize in layered structures (α, γ, ω) which can be viewed as different stackings of the transition metal planes, shearing may also induce transformation toughening 84 .Following Ref. 55  xviii There are various slip systems experimentally observed in diborides depending on temperature.For an overview, please see Ref. 80 .
Overall, we infer that MLIP up-fitting would benefit from severely sheared configurations.Already the current potential (MLIP- [4]), nonetheless, suffices to estimate shear strengths and stress release mechanisms at the atomic scale.• Room-temperature volumetric compression of TiB 2 .
Training on snapshots of compressed structures may be important not only in obvious cases, such as e.g.simulations of uniaxial compression or nanoindentation, but also in order to correctly account for Poisson's contraction during tensile deformation.For TiB 2 , the relatively low Poisson's ratio (Tab. 1) induces rather small compression, accurately reproduced by MLIP- [4]  at the nanoscale xxi .Here we simulate severe (roomtemperature) volumetric compression-shrinking all lattice vectors by up to 12%-to illustrate rapidly growing γ and how this can be improved by up-fitting.
As shown in Fig. 9, compression-induced stresses along main crystallographic directions are indeed extreme.In AIMD, they exceed 50 GPa for a 5% compression and reach ≈ 150 GPa for a 10% compression.Our MLIP yields satisfactory agreement with AIMD for volumetric compression of 1-2%, with stress tensor components differing from ab initio values by less than 1.83 GPa (9.57%) and γ indicating reliable extrapolation (γ ≤ 10 = γ reliable ).Further compression (of each lattice vector) up to 10%, however, causes increasing deviations from AIMD stresses andγ ≈ 10 2 -10 3 .
We illustrate the effect of up-fitting by producing a new MLIP (MLIP- [4] Plus ) which learns from AIMD snapshots of a 10% volumetrically-compressed TiB 2 (added to the LS of MLIP- [4]).This not only greatly improves accuracy for the 10% compression-stress differences are maximum 0.79 GPa (0.48%) and γ ≤ 2-but also within the entire compression range (see red data points in Fig. 9).
• Surface energies of TiB 2 .Surfaces are undoubtedly relevant for simulations of fracture.For routine MLIP development, however, it is desirable to minimize human time handcrafting material-specific surfaces.The heresuggested training data generation is easily automated and may already include atomic environments relevant for calculations of low-energy surfaces (recall Fig. 2c).
xxi Specifically, Poisson's ratio derived from lattice parameter shrinkage during nanoscale tensile tests quantitatively agrees with that calculated from AIMD elastic constants (Tab.1).
xxii For accurate surface energy calculations, however, relevant surface structures should be explicitly trained on using accurate-enough ab initio dataset (in contrast to our philosophy of large but less accurate training configurations).
Our MLIP was trained to snapshots of TiB 2 (AlB 2 -type phase, P6/mmm) with a perfect stoichiometry (speaking of the entire supercell).Visualization of the training set (Fig. 3c), however, indicates presence of atomic environments with various Ti-to-B ratios as well as bond lengths and angles different from those in TiB 2 .This may be useful for simulations of e.g.vacancy-containing TiB 2 structures commonly reported by synthesis 39 or other phases in the Ti-B phase diagram, although it is quite a stretch from the intended applicability of our MLIP.
To investigate transferability to other phases, we use MS to find the ground-state of known phases from the Ti-B phase diagram 85

Conclusions
We proposed a strategy for the development of MLIPs suitable to portray intrinsic tensile strength, toughness, and fracture mechanisms of monocrystals from atomic scale (≈ 10 3 atoms, accessible by ab initio methods) to nanoscale (≈ 10 4 -10 6 atoms).Training data generation, fitting, and validation procedure were illustrated using the moment tensor potential (MTP) formalism.Material-wise, TiB 2 ceramic served as a model system.The MLIP was used to simulate room-temperature uniaxial tensile deformation of TiB 2 at length scales beyond ab initio reach, ≈ 5 3 nm 3 -15 3 nm 3 .Key findings of our work are summarized below.
Methodological: MLIP development 1. MLIPs for simulations of tensile deformation until fracture can be trained following the scheme in Fig. 1, based on snapshots from finite-temperature AIMD simulations of sequential elongation with supercells near ab initio size limit (≈ 10 3 atoms).The strategy can be generalized to other loading conditions (e.g.shearing).
2. Due to strain-induced nucleation of extended defects, a transition from atomic to nanoscale simulations-here including also more realistic computational setup (continuous deformation, Poisson's contraction)-may require up-fitting.We propose to generate additional ab inito xxiv We can also compare the structures relaxed by ab initio calculations and MS, by calculating dfferences between the corresponding atomic positions.
Prior to up-fitting, square root of the maximum difference (of each corresponding atom position in Ti 3 B 4 , 780 at.) is 6.77 Å, which decreases to 6.60 after up-fitting.For comparison, initial TiB 2 (720 at.) configuration would have maximum difference of 2.28 Å.
data by finite-temperature AIMD: imposing a large strain along one lattice vector, initializing atoms at ideal lattice sites, and equilibrating the supercell under fixed volume and shape.Again, this can be generalized to, e.g., simple shear or compression.In projection, the here-proposed training strategy should be applicable to other ceramics (particularly transition metal diborides but also e.g.nitrides, carbides) as well as other MLIP formalisms (e.g.neural network potentials).Our nanoscale simulations for TiB 2 provide guidance for interpretation of micromechanics experiments, in particular, can complement post-mortem transmission electron microscopy.Follow-up work could focus on MLIP up-fitting for more complex loading geometries, e.g., with a pre-crack.
Stress tensor components were calculated as averages of the last 0.5 ps.Elastic constants, C i j , were evaluated following Ref. 75 For purposes of the Transferability section, simulations of volumetric compression were carried out for a 720-atom TiB 2 supercell maintained at 300 K (for 2 ps) with the NVT ensemble.Surface energies were calculated using a 60-atom TiB 2 supercell (2 × 1 × 5, with a k-mesh of 60 in each direction) together with a 10Å vacuum layer.The supercell was fully relaxed at 0 K (in terms of lattice constants and atomic positions) until forces on atoms were below 10 −5 eV/Å and total energy was converged until 0.01 eV/supercell.Other ground-state and higher-energy structures from the Ti-B phase diagram (Ti 2 B, Ti 3 B 4 , TiB, TiB 12 , etc.) were fully relaxed at 0 K starting from lattice parameters and atomic positions from the Materials Project 85 , applying the same convergence criteria as for surface energies.

Development of machine-learning interatomic potentials (MLIPs)
We used the moment tensor potential (MTP) formalism, as implemented in the MLIP-2 package 96 .Training data generation and general workflow are detailed in the section Training dataset generation and workflow.Note that the pool of training/learning/validation configurations did not contain snapshots from the very initial stage (first 5%) of NVT simulations (with potentially largely oscillating total energies).MLIPs were fitted based on the 16g MTPs (referring to the highest degree of polynomial-like basis functions in the analytic description of the MTP 28 ), using the Broyden-Fletcher-Goldfarb-Shanno method 97 with 1500 iterations and 1.0, 0.01 and 0.01 weights for total energy, stresses and forces in the loss functional.The cutoff radius of 5.5 Å was employed, similar to other recent MLIP studies 23,98 (our tests for larger cutoffs, 7.4 and 10.0 Å did not show notable changes in accuracy).

Molecular dynamics with MLIPs (ML-MD)
ML-MD calculations were performed with the LAMMPS code 99 interfaced with MLIP-2 package 96 which allows to run simulations with MTP-type potentials.Computational setup of atomic scale ML-MD tensile and shear tests (at 300 or 1200 K) was equivalent to 14/18 AIMD tensile and shear tests described above, in particular, carried out for TiB 2 with the same supercell size and orientation.Stress tensor components and elastic constants were calculated in the same way as described above in case of AIMD.
Atomic scale volumetric compression simulations used analogical setup and the same supercell sizes as described above in case of AIMD.Surface structures and other Ti-B phases were fully relaxed at 0 K to a local energy minimum using the conjugate gradient method (molecular statics, MS) following analogical setup as described above in case of ab initio calculations.

Visualization and structural analysis
The OVITO package 100 allowed to visualize and analyze selected AIMD and ML-MD simulations, in particular, using functions (i) Radial pair distribution function (with a cut-off radius of 5.5 Å, i.e. the cutoff of our potentials), (ii) Elastic strain calculation and (iii) Atomic strain (with cut-off radius of ± 0.1 mm).For details see the OVITO documentation.

Figure 3 .
Figure 3. Qualitative differences between training sets (TSs) of the here-developed ML potentials (MLIP-[1], MLIP-[4]).Only MLIP-[4] is suitable for nanoscale ML-MD tensile tests.(a) Radial distribution function (RDF, with 5.5 Å cutoff) for B-B, Ti-B, and Ti-Ti bonds (integrated over all configurations).(b) Stress components (in-plane and in the loaded direction) vs. total energy of all configurations in the training set.(c) Snapshots of representative structures from the two training sets.

Figure 4 .
Figure 4. ML-MD stress/strain curves (obtained by MLIP-[4]) for TiB 2 subject to (a) [0001], (b) [1010], and (c) [1210] tensile deformation at 300 K.Only the stress component in the loaded direction is plotted.The orange diamonds correspond to atomic scale ML-MD simulations, while the solid lines correspond to nanoscale ML-MD simulations, as defined at the beginning of this section.

Figure 5 . 18 Figure 6 .
Figure 5. Illustration of size effects in (room-temperature) ML-MD [0001] tensile tests for TiB 2 at key deformation stages: (a) bond elongation in the loaded direction, (b) onset of crack nucleation, and (c) fracture.Thin slices of the nanoscale (d) S1 and (e) S4 supercells color-coded based on volumetric strain (using the corresponding equilibrium structure as reference).Red (blue) regions denote high tensile (compressive) strain.

Figure 7 .
Figure 7. Illustration of size effects in (room-temperature) ML-MD [1210] tensile tests for TiB 2 at key deformation stages: (a) bond elongation in the loaded direction, (b) onset of crack nucleation, and (c) fracture.Thin slices of the nanoscale (d) S1 and (e) S4 supercells color-coded based on volumetric strain (using the corresponding equilibrium structure as reference).Red (blue) regions denote high tensile (compressive) strain.

Figure 9 .
Figure 9. Transferability of the here-developed MLIP (MLIP-[4]) to atomic scale room-temperature simulations of volumetric compression.(a) Differences in main stress tensor components (σ x,y , in the basal plane, and σ z in the [0001] direction) between ML-MD and AIMD simulations, plotted as a function of the compression percentage.(b) Evolution of extrapolation grades.Blue and red data point correspond to ML-MD simulation with different MLIPs, MLIP-[4] and MLIP-[4] Plus .

Table 3 .
Mechanical properties of various TiB 2 supercells derived from room-temperature ML-MD stress/strain data (Fig.4) together with 1200 K results for comparison.