Using machine learning and a data-driven approach to identify the small fatigue crack driving force in polycrystalline materials

The propagation of small cracks contributes to the majority of the fatigue lifetime for structural components. Despite significant interest, criteria for the growth of small cracks, in terms of the direction and speed of crack advancement, have not yet been determined. In this work, a new approach to identify the microstructurally small fatigue crack driving force is presented. Bayesian network and machine learning techniques are utilized to identify relevant micromechanical and microstructural variables that influence the direction and rate of the fatigue crack propagation. A multimodal dataset, combining results from a high-resolution 4D experiment of a small crack propagating in situ within a polycrystalline aggregate and crystal plasticity simulations, is used to provide training data. The relevant variables form the basis for analytical expressions thus representing the small crack driving force in terms of a direction and a rate equation. The ability of the proposed expressions to capture the observed experimental behavior is quantified and compared to the results directly from the Bayesian network and from fatigue metrics that are common in the literature. Results indicate that the direction of small crack propagation can be reliably predicted using the proposed analytical model and compares more favorably than other fatigue metrics. A machine learning technique can identify the complex variables behind the propagation direction of small cracks in a titanium alloy. A team led by Michael Sangid at Purdue University in the U.S.A built two separate Bayesian networks using machine learning to analyse diffraction and tomography data acquired during in situ fatigue cycling of a titanium alloy. The orientation of the first principal stress axis in a specific direction and the maximum resolved shear stress were the most strongly correlated with crack propagation, and were incorporated into an analytical relationship to describe the probability of the crack propagation direction. This analytical expression reproduced experimental results and was more reliable than previous literature predictions. This sort of semi-supervised machine learning methodology may help us identify driving forces in other complex engineering problems.


INTRODUCTION
Modeling the propagation of small fatigue cracks, especially cracks that are intragranular in nature, requires information about how the underlying microstructure affects the crack behavior. While, crack initiation has been modeled as both stochastic 1,2 and deterministic, [3][4][5][6] there is still an open question if the small fatigue crack behavior can be predicted. Small crack propagation follows crystallographic directions and planes, and thus is said to be a slipmediated process. [7][8][9] The behavior of long cracks is well described by linear elastic fracture mechanics through the Paris law. 10 While for small cracks, the propagation rate strongly deviates from linear elastic fracture mechanics behavior and exhibits large scatter, [11][12][13] based on the complex interactions between the small crack and the local microstructure. Several relationships have been proposed to capture the small crack behavior, albeit these theories have not been validated at the appropriate length-scale due to prior limitations in the experimental measurements. With the advent of synchrotron-based x-ray tomography and diffraction techniques combined with in situ loading, the necessary data are available for the crack direction and propagation rate with respect to the microstructure. In this work, experimental data for the evolution of a fatigue crack relative to the local microstructure during in situ loading 14,15 are used as the foundation to build a model for the driving force of small fatigue cracks.
Based on the 3D nature and intricacies of the local crack growth process, simple relationships governing the fatigue crack dynamics are very difficult to extract, thus data-driven approaches offer a promising path forward. Specifically, machine-learning techniques can be utilized to address the complexity of the small crack propagation phenomenon by identifying statistically relevant correlations. Bayesian networks 16 (BNs) provide a machine learning, data-driven framework offering two major benefits. First, BNs are non-parametric by construction, thus equations are not required a priori to construct a model of the investigated phenomenon, and second, the results of the BN are presented in terms of probabilities and correlations. The fact that a BN model does not require equations is instrumental to avoid assumptions, and the associated inherent biasing, regarding the influence of each variable on the target response. Based on the micromechanical fields ahead of the small crack, the interpretation of the BN results provides a means to build a deterministic metric for the small crack driving force, which is supported by the available data. The use of BNs have been underutilized in fatigue, but the few available studies have shown promising results. [17][18][19][20] In this study, as shown in Fig. 1, a propagating crack is characterized relative to the local microstructure in a polycrystalline beta-metastable titanium alloy (Fig. 1a) and combined with associated crystal plasticity simulations (Fig. 1b) to complement the dataset. Machine-learning techniques are applied to the available data to build a BN framework (Fig. 1c), which is used to compute correlations (Fig. 1d). This BN can be used, on its own to predict crack growth, but this study aims to identify an analytical relationship for the crack driving force metric. Thus, the relevant variables, as identified from correlations produced from the BN, are selected, and a functional form of the deterministic driving force metric is ascertained through a machine learning approach (Fig. 1e). Finally, experimental observations are compared with the predictions of fatigue crack growth via the (i) BN and (ii) the analytical form of the driving force metric identified by interpreting the results of the BN (Fig. 1f).
Through the years, researchers have identified multiple micromechanical variables and microstructure features that influence the propagation of small fatigue cracks. Since small crack growth is crystallographic in nature, it is heavily influenced by the microstructure of the material. Originating from multi-axial approaches, several researchers have pointed to the important role of a tensile opening stress acting normal to the critical (slip) plane 21,22 (σ N ) on material failure. Stress triaxiality (σ TriAX ), which is the ratio between the hydrostatic stress (σ H ) and the von Mises equivalent stress (σ VM ), has been reported to be a precursor to material failure. 23,24 Moreover, at the continuum length-scale, the fatigue crack propagation direction is based on the orientation of the principal stress. 25 Therefore, at the grain-scale, the minimum angle of the principal stress axes with respect to the active 〈111〉 slip direction, α σI;II;III 111 h i , and the corresponding magnitude of the principal stresses (σ I,II,III ) are included in the crack propagation analysis. The reference 〈111〉 slip directions are selected due to the prevalence of pencil-glide mechanics for the BCC material analyzed in this study. 26,27 The values of micromechanical fields in the proximity of the crack front are strongly influenced by the microstructural features. Two of the most influential variables governing the rate of small crack propagation are the distance of the crack to the nearest grain boundary (GB) along the direction of crack propagation (GB dist ) and the character of the GB imposing the barrier to crack growth. [28][29][30][31] The GB character is not investigated in this present work, as the crack is confined to a small sample of grains and does not cross a statistically significant number of GBs to employ a correlation analysis. To accurately predict the small crack propagation direction and rate, the aforementioned micromechanical and microstructure variables are considered in a semi-supervised machine learning framework to identify correlations via two distinct BNs. Starting from the results of the machine learning framework, this work aims to identify a metric that can be used to describe the small crack driving force for BCC alloys subjected to high-cycle fatigue loading. In the last decades, many analytical surrogate metrics for the small crack driving forces, which are commonly known as fatigue indicator parameters, have been postulated, 21,32-35 albeit seldom of these theories have been directly compared to 3D experimental data. [36][37][38] Due to the nature of their construction, these fatigue metrics would benefit from a systematic analysis of the variables possibly influencing the small crack propagation process. In previous work, we show that the available fatigue metrics, present in the literature, do not exhibit Fig. 1 Schematic depicting the adopted procedure to identify a data-driven deterministic driving force for small crack propagation, in this figure X 1 , X 2 ,…, X n represent the microstructural and micromechanical variables considered within this framework satisfactory predictive fatigue performance for the analyzed BCC alloy. 37 Therefore, we improved the fatigue prognosis based on a non-local data mining procedure along potential crack paths aligned with the available 〈111〉 crystallographic slip directions. 38 This echoes the pencil-glide deformation mechanics observed in other BCC alloys 26,27,39 ; although the spatial resolution available in this work (slightly below a micrometer), ultimately limits the microscopic failure mechanism determination. A similar approach is used in this work but considering multiple micromechanical and microstructural variables to first predict the crack path and then the associated speed of crack propagation. Two separate BNs are constructed by semi-supervised machine learning, one to model the small crack propagation direction, and the other to compute the associated small crack propagation rate. The correlation between each variable and the fatigue crack mechanics, in terms of both the direction and propagation rate, will be reported and used to construct an analytical form of the small crack driving force. The predictions based on the BNs and analytical formulation will be directly compared with the experimental results.

RESULTS
BNs can provide a representation for machine learning applications and causal relationships. The foundation of BNs is the Bayes' theorem (Eq. 1), which is used to update the probability of an event θ occurring, given a set of evidence x.
where f xjθ À Á is the likelihood function and π represents a probability distribution. In this work, θ will either represent the crack propagation direction or the associated propagation rate, while x will represent all the micromechanical and microstructural variables investigated in this work.
The goal of this analysis is to use BNs to identify analytical expressions for the crack growth direction and the associated propagation rate. The first step toward this goal is to identify and characterize correlations embedded within the BN models and use these correlations to construct the building blocks of the analytical forms for the small crack driving forces. In addition to the micromechanical variables previously mentioned, the BN models account for the maximum resolved shear stress τ α max À Á and the maximum accumulated plastic shear Γ α max À Á along a slip direction. Furthermore, the overall crack length (a) has been included in the BN model describing the crack propagation rate.
Relevant variables are identified by quantifying their correlation with the experimental observations (e.g., crack propagation in this case). In BNs, the relationship between discretized variables are encoded into joint probability tables (i.e., p(x,y), where x and y are the states of two discrete variables X and Y, respectively). Therefore, the normalized mutual information (NMI, as defined in the Methods section, Eq. 24) metric is used in this work to quantify the correlations. The NMI is a measure of the uncertainty reduction achieved on the value of one variable by observing the state of another. Figure 2 depicts the values of the NMI, and therefore their associated correlation, between the selected micromechanical/microstructural variables and the crack propagation direction (Fig. 2a) and speed (Fig. 2b).
By interrogating the BN models, the nature of these correlations (i.e., linear, quadratic, etc.) between each variable and the observed crack propagation direction and rate can be identified. To perform this analysis, the BN models are systematically interrogated by continuously varying the mean of the variables of interest (e.g., those variables displaying a strong correlation in Fig. 2), while recording the updated value of the objective quantities, the probability of the crack propagating along a certain crystallographic direction, P(F), or the rate of propagation, da/dN. This procedure is applied to one variable at a time without enforcing any constraints on the other variables. The result of this procedure is a statistical trend representing the effect that a variable has on the crack propagation direction and rate (Fig. 3).
The correlation analysis shows, in Fig. 2, that the micromechanical quantities most influencing the crack propagation direction are (i) the orientation of the first principal stress axes with a given slip direction (α σI 111 h i , NMI = 12%) and (ii) the maximum resolved shear stress (τ α max , NMI = 10%). All other variables exhibit a relatively low correlation value. In addition to α σI 111 h i and τ α max , the magnitude of the maximum principal stress (σ I ) and the maximum accumulated plastic shear along a slip direction Γ α max À Á have been selected as possible candidates for deriving the analytical form of the direction of crack propagation. The magnitude of the first principal stress is selected as a natural complement to the axes of the principal stress, which displays the largest correlation to the direction of crack growth. Meanwhile Γ α max has been included since the effect of plasticity and irreversibility ahead of the crack tip have historically been reported as one of the major drivers of small crack propagation in ductile materials. 7,[40][41][42][43] Fig. 3 a, b, d shows the almost linear effect of τ α max , Γ α max and σ I on the crack propagating direction and a nonlinear effect of the alignment of the first principal stress axis with a given slip direction, α σI 111 h i , displaying an optimum value around 60° (Fig. 3c). Moreover, very similar effects are observed for τ α max and Γ α max , and the only noticeable difference is the less pronounced slope exhibited by Γ α max . The similarity in their trends is due to the formulation of the Hutchinson flow rule 44 utilized in Fig. 2 Correlations, as demonstrated based on the normalized mutual information (NMI), between the investigated variables denoted on the x-axes and the probability of the a direction of crack propagation and b rate of crack propagation. A full description of the variables analyzed are given in Table 1 Using machine learning and a data-driven approach to identify A Rovinelli et al.
the crystal plasticity simulations, which deterministically relates τ α max and Γ α max . Therefore, only a linear term corresponding to τ α max is included in the formulation to describe the direction of crack growth. Furthermore, based on the Fig. 3c, the effect of α σI 111 h i is described as cos 2 α σI 111 h i À 60 . Combining the normalized effects of τ α max , σ I , and α σI 111 h i , a general functional relationship to describe the direction of crack propagation can be written as where τ α 0 is the initial critical resolve shear stress and σ Y represents the yield stress of the material (1000 MPa, in this case).
The terms inside the parentheses in Eq. 2 can be combined in multiple ways. Therefore, to identify an appropriate closed form expression for the direction of crack propagation, the machine learning software Eureqa 45 is utilized. Specifically, Eureqa identifies a least square fit of the values of P(F) computed by the BN model by using any combination or function of the three terms inside the parenthesis in Eq. 2. Eq. 3 shows the identified driving force representing the probability of the crack propagating along a given 〈111〉 direction.
where logistic(X) = 1/(1 + exp(−X)), w 1 and w 2 are constants quantifying the weights of the shear and principal stress term, respectively, c is a parameter identifying the baseline value, and 〈•〉 means that only positive values are retained. The values of w 1 , w 2 , and c are 10.5, 5, and −7.09, respectively. The functional form of Eq. 3 is naturally bounded between 0 and 1. In defining the appropriate direction of crack growth, P(F) = 0.5 has been selected as the threshold value to identify failure. For a given spatial position located on the crack front, if more than one slip direction is classified as failing, then the direction exhibiting the highest value of P(F) is used to define the crack propagation path. A comparison between the derived analytical form defining the direction of crack propagation and the experimental results is presented in Fig. 4. The crack propagation directions are depicted by unit vectors with the origins located at the crack front. For the experimental results in Fig. 4a, c representing crack growth after 56k and 112k cycles, respectively, the blue vectors represent the physical observations. While in the numerical predictions depicted in Fig. 4b, d, based on the analytical expression in Eq. 3, the green and red vectors correspond to correct and incorrect predictions of the crack direction, respectively. The dark gray regions behind the crack front represent the failed portions of the crack surface and shades of gray ahead of the crack front are proportional to the residual fatigue life based on the experiment. The histograms in the inset of Fig. 4b, d summarize the distributions for the prediction results. Orange tortuous lines represent the GBs intersected by the crack during the propagation process. Two different crack snapshots are depicted in Fig. 4. In one occurrence, the majority of the crack front is in the proximity of a GB (cycle 56k in Fig. 4a, b) and another instance representing a longer crack (cycle 112k in Fig. 4c, d). Based on the resulting histograms, the predictions at 56k cycles (Fig. 4b) represent the worst-case scenario, as the correct crack direction are only predicted 55% of the time, since the underlying crystal plasticity model, which provides data to the BN model, produces a less accurate description of the micromechanical fields near the GB. By construction, crystal plasticity models do not conserve the nature of the Burgers vector for the mobile dislocations representing plastic deformation at GBs and therefore limits their accuracy at the GB. 46 By comparison, at cycle 112k (Fig. 4d), more than 65% of the resulting predictions are in accordance with experimental observations, which is more representative of the overall model's ability to forecast the crack path.  Fig. 2. Specifically, the influences of the following variables are depicted: magnitude of first principal stress, σ I , misalignment between the principal stress axis and slip direction, α σI 111 h i , distance between the crack front and the grain boundary along a slip direction, GB dist , overall crack length, a, and slip system values of the maximum shear stress, τ α max , and maximum accumulated plastic shear, Γ α max Using machine learning and a data-driven approach to identify A Rovinelli et al.
The quantitative behavior of the proposed analytical model describing the direction of crack growth is probed by the use of a receiver operative characteristic (ROC) curve. 47 The ROC curve is commonly used in the literature to quantify the reliability of a binary classifier, by comparing the rate of correct versus incorrect predictions while systematically decreasing the threshold acceptance value of the classifier, in this case the crack driving force to describe the direction of crack advancement. In this graph, a curve overlaid with the 45°line would represent a model response equivalent to a random choice, i.e., no better than flipping a coin. While a Γ-shape curve embedded in the upper left corner of the graph (i.e., the point with coordinates (0,1)) would represent a model with a perfect predictive capability. Figure 5 provides a comparison by means of the ROC curve for the analytical formulation of the crack direction derived from the present work (i.e. black line, Eq. 3) with the results directly from the BN (i.e., blue line) and fatigue metrics commonly present in the literature (i.e., red curves, see Table 2 for a complete description of these fatigue indicator parameters). The ROC curves provide evidence that the data-driven analytical form of the crack driving force describing the crack direction exhibits a more quantitative behavior than the existing metrics available in the literature. Most of the metrics from the literature provide similar predictive capabilities and are relatively close to the random choice; e.g., 45°line. Moreover, the black line representing Eq. 3 from this work is further away from the random choice line and closer to the upper left corner than the responses of the existing metrics from literature. This entails that the proposed crack driving force is significantly more sensitive to the selected threshold value to describe the crack front advancement and is physically intuitive. Therefore, the values calculated via Eq. 3 can be directly related to the propensity of the crack to advance, including the crystallographic direction of the crack propagation. Further, this suggests that the present model can capture intragranular crack arrest, which is not necessarily true for existing metrics from the literature due to their random predictive behavior. The BN model provides the best quantitative behavior, because it includes the effects of all mined variables on the crack propagation direction. Yet, the objective in the present study is to identify an analytical expression for the crack driving force for future use by researchers and in engineering analyses. The arrows represent the direction of crack propagation either observed from the experiments (blue) or computed (red and green). The green color represents correct predictions and the red color represent incorrect predictions, which are summarized in the histograms in the inset of (b, d). Orange tortuous lines represents the GBs on the crack surface. Additionally, the shades of gray represent the residual life on the crack plane (dark gray regions behind the crack front have already failed). The crack structure and predicted crack directions are shown (a, b) after 56k cycles and (c, d) after 112k cycles Using machine learning and a data-driven approach to identify A Rovinelli et al.

DISCUSSIONS
The crack driving force describing the direction of short fatigue crack propagation proposed in this work exhibits distinctive features but also supports previous works. Noticeably, the analytical form expressed in Eq. 3 does not include the plasticity ahead of the crack tip, but contains terms for the shear stress along the crack propagating direction and the principal stress state. The majority of the existing metrics, describing fatigue crack propagation direction and rate, available in the literature Table 2 are proportional to the accumulated microplasticity ahead of the crack tip 21,[32][33][34][35]48 ; however, this assumption is not supported by the available data. The comparison of Fig. 3b, e shows opposite effects of Γ α max and σ I accounts for the effect of the opening stress on the crack plane. Furthermore, Fatemi and Socie 22 suggested that the ratio between the shear and opening stress terms is approximately 0.5, which is in agreement with the ratio of the weights, w 2 and w 1 , in the present analytical expression of 0.48.
From a crystallographic perspective, the first principal stress for a BCC material is analogous to the opening stress on the critical plane in a FCC material. In FCC materials, the distribution of the four available slip planes facilitates the identification of a critical plane because the opening stress acting on each plane is distinct. In BCC materials, the availability of 42 potential slip planes produces a geometrical uncertainty to identify the critical plane, thus generating a low correlation between σ N and the direction of crack propagation (see Fig. 2a). In contrast, by considering pencil-glide deformation, the effect of an opening stress on the critical plane can be replicated by the first principal stress magnitude (σ I ) and orientation with respect to the propagation direction, α σI 111 h i . The fact that the driving force does not include σ N is in accordance with the pencil-glide approach for small crack propagation in BCC materials that we previously proposed. 38 The slip mediate nature of the small crack propagation mechanics is supported by experimental and numerical data. The optimal orientation angle of the first principal stress axis shown in Fig. 3c (i.e., 60°) and the absence of plasticity in Eq. 3 are not resulting from the competition or transition between ductile and cleavage fracture. 40,49,50 In BCC materials, brittle fracture may occur along the {100} cleavage plane aligned with the maximum principal stress axis. 51 The angle between a 〈111〉 slip direction and the three closest {100} planes is α = 54.7°, which is very similar to the identified optimal angle in this study, 60° (Fig. 3c). Therefore, to identify if the presence of the principal stress axis is related to brittle fracture, the correlation between the alignment of the first principal stress axis and the closest cleavage plane α σI f100g with the crack driving force, thereby negating the possibility of a brittle fracture mechanism in this scenario. These results are in accordance with the experiment, in which crystallographic failure is observed (see Fig. 1a or ref. 37,38 ), thus building confidence in a slip-mediated model to describe the small crack driving force. The intragranular crack deflections present in the analyzed dataset, as depicted by the orange dashed lines in Fig. 6, suggest the influence of the first principal stress as a macroscopic driver for the crack direction. The orientation of the first principal stress axis is in turn, related to the geometry of the crack, strongly influenced by the macroscopic loading conditions, and mediated by the local grain orientation. At the intragranular length scale, small cracks propagate in a shear dominated mode due to the underlying crystallography. 11 However, it has been reported that a small crack can alter its propagation direction according to the orientation of the remote stress field for near threshold stress intensity factor values. 25 When the small crack front excessively deviates from the average crack plane, the crack deflects to restore the global mechanical equilibrium required to satisfy the remote boundary conditions. The crack deflections indicated by the orange lines labeled 3 and 4 in Fig. 6 occur at 112k cycles, as the crack front is located in the core of a grain and at the maximum distance from the initial notch plane (Δz in Fig. 6) observed during the experimental crack propagation. This behavior is captured by the model, as shown in Fig. 4c, d. Thus  Table 2) Fig. 6 The crack surface is colored accordingly to its orientation in the global reference system. An abrupt change in color represents a deflection of the crack during propagation. Black tortuous lines represent GBs, while the orange dashed lines highlight the locations of intragranular crack deflections; Δz represents the distance between the original notch plane (z = 170 μm) and the height at which deflections 3 and 4 occur (z = 80 μm) Using machine learning and a data-driven approach to identify A Rovinelli et al.
far, we have discussed the crack driving force to describe the direction of crack advancement (Eq. 3). A similar analysis was performed to identify the associative speed of crack propagation and an analytical form for the rate of crack advancement is derived from the associated BN. From Fig.  2b, the variables exhibiting the strongest correlation with the crack propagation rate are the overall crack length (a, NMI = 31%) and the maximum accumulated plastic shear (Γ α max , is inversely correlated to the crack propagation rate (Fig. 3f), although this unexpected result may be a numerical artifact of the crystal plasticity model or data mining procedure or be due to insufficient data. The overall crack length, a, exhibits an almost quadratic influence on the propagation rate (Fig. 3e). The GB dist does not have a pronounced influence on crack propagation rate; nevertheless, the propagation rate constantly increases until the crack front is a few microns away from the GB, then it decreases (Fig.  3g). This result is in accordance with experimental observation obtained by Schäf et al. in mild steel 30 ; however, due to its weak correlation, the GB dist has not been used in the formulation of the associated propagation rate. The above analysis entails that the propagation rate needs to include at least two terms: one for accumulated plastic strain and the other for the influence of the overall crack length.
Eureqa 45 was used to identify the functional form of the analytical expression, yet it should be noted that the expression for the crack propagation rate is the result of observing only a small portion of the fatigue life for a single specimen. The resulting predictions for both the BN and the analytical form of the rate of propagation are not as promising as those obtained for the crack direction (Eq. 3). Albeit, both models for rate of crack growth exhibit higher correlations and lower relative errors than the fatigue metrics available in the literature (Table 2). In this comparison, the values of the surrogate fatigue metrics at the crack front are assumed to be directly proportional to the crack propagation rate. 21 Fig. 7a, b depict Pearson's correlation coefficient (Eq. 25) and the root mean square relative error (RMSRE) (Eq. 26), respectively, obtained by comparing the prediction of each model's fatigue crack growth rate against the observed experimental data. The BN model and the analytical form identified with Eureqa show a Pearson's correlation coefficients value of~0.65, while the fatigue metrics from the literature possess a correlation value of~0. 25. In general, a correlation greater than 0.3 is considered significant. Despite the high correlation values for the BN and the analytical expression for the crack growth rate, the error analysis shows that neither model can be considered predictive. In fact, both models exhibit a RMSRE between 3 and 4 (i.e., 300% and 400%). Thus, the BN and the analytical expression for the crack growth rate capture the overall trend in the observed data but cannot provide reliable predictions. Furthermore, the crack growth rate value determined from the experiment has larger error associated with it, due to the resolution of the phase-contrast tomography characterization and the uncertainty introduced by the crack segmentation procedure. Thus the location of the crack front is accurate to ±1-2 voxels (±1.4-2.8 μm). Additionally, since the phase-contrast tomography was performed at distinct cyclic intervals, information about the local crack growth rate at each cycle was not available. Due to its unreliable predictive capabilities, the analytical form of the crack growth rate will not be reported here, as it is not representative of general fatigue crack growth rate behavior. We believe that the poor predictive behavior for the crack growth rate originates from examining a single, relatively sparse dataset, in which the crack only crosses~12 GBs, thus additional datasets from multiple samples are necessary to determine an appropriate analytical expression for the rate of small crack propagation.
Throughout this work, we showed how to utilize machine learning techniques to distill a useful and simple analytical expression for the small crack driving force. Two BN models have been constructed for the direction and speed of fatigue crack propagation, and the correlations resulting from these BN models are leveraged to identify relevant micromechanical and microstructure variables that influence the small crack driving force. From these variables, an analytical expression to predict the threshold and direction of crack advancement (Eq. 3) and the associated speed of crack growth are identified. The analytical form of the direction of crack growth presents more reliable predictions than other models in the literature. While the BN and the analytical form of the rate of crack growth presents high correlations between predictions and experimental observations (Fig. 7a), the RMSRE (Fig. 7b) suggests that the available data may be insufficient to produce a predictive model. In both cases, care should be taken before application of these models, as the dataset represents only one experimental specimen of a specific alloy. Although optimism exists, as the presented methodology is flexible and can be used in a variety of problems to gain useful insights on the effect of the governing variables and identify the driving force for complex engineering problems. Fig. 7 Comparisons of the fatigue crack growth rate predictions for the Bayesian network, analytical expression proposed in this work, and common fatigue metrics from the literature via a the Pearson's correlation coefficient and b root mean square relative error Using machine learning and a data-driven approach to identify A Rovinelli et al.

METHODS
The material analyzed in this work is a beta-metastable titanium alloy, Ti-55531, which was annealed to obtain a full beta microstructure with a grain size of 65 μm. A notch was cut into the specimen via focus ion beam with geometry of 2 μm in height, 140 μm in width, and 25 μm in depth. Diffraction contrast tomography was used to characterize the microstructure. Afterwards the sample was subjected to a stress controlled, cyclic loading, with minimum and maximum stress values of 10 and 320 MPa, respectively (R = 0.03). The fatigue test was interrupted at maximum load, every 1000 loading cycles, and a phase-contrast tomography scan was performed in situ to record the crack geometry. The phase and diffraction contrast tomography reconstructions were consolidated to create a complete description of the propagating crack front and the surrounding microstructure. The experiment was performed at the European Synchrotron Radiation Facility at beamline ID-19. The reader is referred to refs. 14,15 for more details.
The micromechanical fields not available during the experiment were computed by means of crystal plasticity simulations, for which results of diffraction contrast tomography provide the 3D microstructure and the results of the phase-contrast tomography provide the crack morphology and evolution during cycling. One crystal plasticity simulation has been performed for each crack geometry. The adopted computational framework is a parallelized implementation 53 of the small strain, elastoviscoplastic fast Fourier transform based crystal plasticity solver proposed by Lebensohn et al. 54 For more detail on the simulation setup the reader is referred to Rovinelli et al. 37 Simulations and experimental results are consolidated into a multimodal dataset. The crack propagation direction and the associated propagation rate were reconstructed by utilizing a slip direction-based procedure. The data are collected from the crack front by using a non-local 2D data mining procedure that is physically based along potential crack propagating directions as determined from the pencil-glide mechanics within this BCC material. Within the data mining procedure, the data are collected for a given slip direction transcribed along a predetermined length of 6.3 μm along the advancing crack front and over a plane with width of 2.8 μm. These values are determined based upon a sensitivity analysis and informed from the resolution of the associated characterization techniques and corresponding setup of the crystal plasticity simulations. 38 The data are then averaged over the 2D plane, while point-wise macroscopic variables, i.e., the alignment of the principal stress axis, are computed by averaging data over all slip systems belonging to the pencil-glide slip direction. The micromechanical and microstructural variables analyzed in the BN framework for correlation with the direction and rate of small fatigue crack growth are summarized in Table 1.
Further, the analytical models for the crack driving force derived in the present work are compared to common models from the literature, denoted as fatigue indicator parameters. The fatigue indicator parameters used for comparisons in the present work are summarized in Table 2. 21,[32][33][34][35] The resulting data are used to train the BNs. To obtain uniform prior distributions, the data associated with the crack propagation direction have been randomly sampled, while a stratification procedure has been applied to the ones pertaining to the crack propagation rate. Data stratification is adopted to account for the sparse availability of crack propagation rate data (~650 data points). In both instances, all the sampled data are used, while if a crack front location is stationary, it is sampled only once. A K-fold cross validation procedure is then applied to check the independence of the BN parameters from the data. Results showed only a minor influence of the data, based on the overall reliability and precision metrics. For more details about the crack propagation reconstruction, data mining, and sampling procedures, the reader is referred to Rovinelli et al. 38 The overall crack length a is computed assuming a quarter-elliptic corner crack with major and minor axes of length a and b, respectively. The minor axis has been recorded from the reconstructed phase diffraction contrast tomography images. The major axis has been computed projecting the crack surface on the plane perpendicular to the loading axis (i.e., S C ) according to Eq. 13.
Two BNs were constructed: (i) one describing the crack propagation direction and (ii) the other representing the associated crack propagation rate. The rationale for the construction of two BNs, as opposed to a single BN, is that the crack propagation direction and rate are influenced by different variables, as shown in Figs. 2 and 3. For both BNs, the augmented Naïve-Bayes structure has been selected 55 , which assumes conditional dependence between the crack behavior and all the variables listed in Table 1, thus allowing the use of machine learning to establish the conditional dependence between the involved variables. The commercial software Bayesialab 56 was utilized to construct the BNs by identifying the correlations embedded in the sample data using machine learning.
The correlations embedded in the joint probability tables defining the BN's structure can be quantified by means of mutual information. 57 For discrete distributions, the mutual information between two variables (i.e. X and Y) is given by where x and y are the indices specifying the states of the variable X and Y, respectively, p(x) and p(y) are the marginal probabilities, and p(x, y) represents their joint distribution. To obtain comparable correlation values for the analyzed variables in Fig. 2, the mutual information have been normalized by the entropy (H) for the given variable, X representing either the direction or speed of the propagating crack. Eq. 14 Distance between the crack front and the grain boundary along a slip direction To compute the correlations between the numerical predictions (X) and experimental observations (Y), the Pearson's correlation coefficient has been adopted where N is the number of observation, i is an index representing a specific observation, μ X and μ Y represent the mean value of predictions and observations, respectively, and σ X and σ Y represent their variance. To further quantify the predictive ability of the investigated models, the RMSRE between predictions and observations has been computed:

Data availibility
The authors will make the data utilized in this work available, upon request. It is understood that the data provided will not be for commercial use.