Damage mechanism identification in composites via machine learning and acoustic emission

Damage mechanism identification has scientific and practical ramifications for the structural health monitoring, design, and application of composite systems. Recent advances in machine learning uncover pathways to identify the waveform-damage mechanism relationship in higher-dimensional spaces for a comprehensive understanding of damage evolution. This review evaluates the state of the field, beginning with a physics-based understanding of acoustic emission waveform feature extraction, followed by a detailed overview of waveform clustering, labeling, and error analysis strategies. Fundamental requirements for damage mechanism identification in any machine learning framework, including those currently in use, under development, and yet to be explored, are discussed.


INTRODUCTION
Thousands of years predating the wheel and written language, potters across the world, from the Fertile Crescent to eastern Asia, listened for sounds emanating from ceramics during cooling as a prognosis of structural integrity 1,2 . At the start of the Bronze age in 3000 BC, smelters in Asia minor documented the "tin cry," a characteristic cackling produced during forming, to guide tin mining and alloy processing 3 . By the eighth century, the Persian alchemist Jabir ibn Hayyan reported on the sounds emitted by Venus (copper) and Mars (iron) as part of a treatise on seven metals; his work contributed to alchemic theories about metallic composition that were widely accepted for a millennium 4,5 .
We now call these phenomena acoustic emission (AE): the release of accumulated strain energy from material displacement in the form of elastic waves 6 . While most historical documents report damage activity audible to humans, AE emissions have been captured from damage sources down to the order of nanometers in size. For example, AE has been used to indicate the formation of polar nanoregions in relaxor ferroelectrics 7 . In practice, the AE technique is used to triangulate damage source locations 6 , identify areas of concern in the material bulk 8 , and evaluate the severity of incurred damage 9 .
An outpouring of AE research efforts emerged in the 1980s 10-12 , but were limited by both the experimental hardware and data storage limitations of the time. The narrow frequency response of resonant transducers forced waves to be recorded as dampened sine waves, losing detailed frequency (and therefore time domain) resolution. Moreover, storage and memory constraints mandated that waves be preserved in terms of time-domain parameters such as first peak amplitude, energy, and duration (Table 1), which resulted in further information loss 6 . The development of broadband transducers combined with advances in computer hardware during the 1990s enabled researchers to record the full frequency spectrum of a signal and digitally store the waveform. This method, known as modal AE, is now standard practice 6,13,14 .
Armed with improved waveform resolution in the time and frequency domains, researchers could discern previously obfuscated differences between waveforms. A new era of AE studies began, aimed at evaluating the chronology of damage progression by leveraging a range of specimen geometries [14][15][16] , architectural and processing choices 17 , and loading conditions 18 .
These efforts laid the foundation for a hypothesis that we are still wrestling with today: that there are features encoded in AE signatures that can be used to discriminate between damage modes. The ability to discriminate damage modes from AE information has the potential to revolutionize non-destructive structural health monitoring tools for scientific investigations and practical applications 19 . Resolving the local damage chronology would shed light on how processing and microstructural landscapes are tied to the failure response of structural materials. A detailed understanding of the damage accumulation and failure envelope would significantly broaden the potential application space for structural materials of more complex geometries [20][21][22][23][24] . As such, high-fidelity damage mode identification of AE signals has become a core objective of modern AE investigations.
Initial efforts in damage mode identification used only one or a few features of the waveform to describe the emitting damage mode [25][26][27] . While these low-dimensional representations are useful for indicating broader trends in damage mode shifts, they obstruct the high-fidelity characterization of signals. It is well known that certain waveform features, such as peak frequency, are affected by propagation pathways [28][29][30] and difficult to control experimental factors 31 . These effects are included in the received waveforms and currently influence subsequent analyses; thus, they lower the confidence with which damage modes can be assigned for any one waveform. In general, many features can be used to describe a waveform (Table 1); such features are complex, and often interdependent. Furthermore, it is possible that while one waveform feature may be an obvious discriminator between some mechanisms, this may not be the case for all mechanisms 9,27 . Therefore, multivariate (high-dimensional) analyses are necessary to explore relationships between features and tie them to the related damage mode.
Historically, the AE community has used the well-established k-means algorithm for such multivariate analyses [32][33][34][35] . Although k-means is in general a robust and repeatable algorithm with many applications, it is not necessarily well-suited to AE data. K-means finds isotropic clusters, and so it is best suited for data that has this structure 36 ; it is unclear if AE data abide by this constraint, as up to this point there has been limited analysis of the validity of partitions of AE data by k-means.
The recent, rapid development of machine learning (ML), also termed pattern recognition, tools has greatly increased the range of data structures that can be analyzed 36,37 . With pre-packaged ML algorithms 38 now readily available, the AE community has easy access to analytical pathways for high-dimensional AE representations in the identification of waveform-damage mode relationships. Despite the growing abundance of available statistical tools, the majority of analyses are still limited to specific, wellunderstood techniques. Therefore, a discussion of canonical and emerging approaches, their strengths and weaknesses, and their applicability to the outstanding question of damage source identification via AE is needed.
In this review, we discuss current ML efforts to identify damage modes based on their AE signatures. We restrict our discussion to the analysis of composites as a model material system, as their multi-phase structures host distinct damage mechanisms well suited for study by AE (Fig. 1a). We begin with a description of wave propagation physics in solids as it relates to ML. We follow with an overview of how waveforms are represented in feature space (an example of which is shown in Fig. 1b) and subsequently relate feature choice to the current understanding of wave propagation. Then, we highlight clustering algorithms that have been employed to distinguish source mechanisms (the outcome of which is shown schematically in Fig. 1c) and provide guidelines for effective use. Finally, we review prior efforts that have utilized the aforementioned tools and propose avenues for future work.

WAVE PROPAGATION IN SOLID MATERIALS
Historically, researchers have used Lamb's homogeneous equations (a special case of the wave equation) to understand AE phenomena 13,39 , where it is assumed that acoustic waves are confined to a thin plate of thickness h. This model has been useful for engineering applications as many modern manufactured composites typically follow this constraint 40 . When waves are modeled this way, the predicted out-of-plane flexural (w) and in-plane extensional (u 0 , v 0 ) displacements are governed by: with constants A and D (the bending stiffness): where ρ is the density, E is the elastic modulus, and ν is the Poisson ratio.
While this model is a powerful tool for predicting the basic properties of elastic waves in solids, the assumptions of elastic isotropy and material homogeneity in this model are often violated, which prevents its use for damage mode discrimination. The microstructure (e.g., pores, grain boundaries, and architectural heterogeneities) affects density and introduces scattering sites 14,15,[41][42][43] .
Current efforts model AE through a process known as forward modeling, wherein dynamic equilibrium equations are solved via finite element method (FEM) calculations [44][45][46][47][48][49][50] . In these studies, a wave (or a damage event that emits a wave) is simulated and modeled as it propagates through the material. The wave is then passed through a set of filters to mimic the sensor response to the received wave.
Forward modeling allows researchers to explore the effects of microstructural heterogeneities and determine potentially useful waveform characteristics, without experimental evidence. Sause and Horn demonstrated differences between damage modes in the time-frequency domain with an FEM model of a carbon fiberreinforced composite 48 Fig. 1 The problem statement of an AE-ML framework. a When damage occurs in a composite, accumulated strain energy from various damage mechanisms is released as AE into the surrounding medium until it is recorded by broadband sensors. b The recorded waveforms are described with features hypothesized to encode information about the source damage mechanism. c Machine learning uses these features to partition waveforms, which can then be labeled according to their source damage mechanism.
supported the claim that changes to power carried by a frequency band are minimal in the near field 48,51 . This suggests that partial power (Table 1) is a salient feature. Other researchers have employed forward modeling to evaluate the efficacy of commonly used AE features in the identification of damage modes [52][53][54][55] . Gall et al. showed that in the absence of defect structures, rise time, amplitude, and energy scatter are dependent on source-to-sensor distance, while partial power is not 53 . Similarly, Aggelis et al. used numerical simulations to show that RA values (Table 1) increase with increasing source-to-sensor distance 54 and provide evidence that the density of scattering interfaces distorts the waveform in the time-domain.
In general, forward modeling studies guide us to two complementary conclusions: (i) if there is information encoded in AE that enables damage mode discrimination, it likely exists within the frequency domain; and (ii) the high degree of perturbation imposed on time-domain features by experimental factors likely renders them inadequate for differentiating source mechanisms. These deductions highlight the necessity for determining salient features for ML applications, where the accuracy of the output is directly dependent on the quality of feature choice.

MACHINE LEARNING
ML describes algorithms that learn from input data 56 . Broadly speaking, ML tasks are categorized as clustering (the grouping of similar objects), regression (model selection), or forecasting (prediction of future events). An ML algorithm accomplishes these tasks either in an unsupervised or supervised manner, depending on input information. In unsupervised algorithms, raw data are supplied, and a model is generated [57][58][59] . In contrast, supervised algorithms build a model based on a ground-truth data set, and subsequently interpret new data according to this model 36,37,60 . As ground truth data sets rarely exist (exceptions are detailed in the section "Toward improved AE-ML frameworks"), researchers are constrained to unsupervised clustering methods, for categorizing AE data, whose workflow follows 37 : 1. Experimentation: a number, n, observations are collected 2. Feature extraction: observations are vectorized by extracting d pertinent features (i.e., each waveform is represented by a d-dimensional vector) 3. Metrics and transformations: a metric or similarity function is chosen and the set of feature vectors undergoes an optional transformation 4. Machine learning algorithm: an algorithm is selected and waveforms are clustered 5. Error analysis: post-clustering analysis is performed to assess the validity of results The following discussion focuses on steps 2-4. A general overview of how AE data are obtained (step 1) is provided in ref. 6 and a description of error quantification (step 5) is given in the section "Toward improved AE-ML frameworks." The authors note that clustering tasks are also identified as pattern recognition tasks 60,61 ; however, these terms are equivalent and here we will use the term clustering for consistency.

Feature extraction
The accuracy of a clustering algorithm is foremost contingent on feature choice (i.e., objects must be represented by features which are salient in order to find meaningful partitions). To illustrate this, b c a Fig. 2 Feature selection determines the quality of clusters. a The initial group of objects is partitioned in b based on color and in c based on flight capability. The researcher must tailor the feature set to achieve the desired partition.
consider a researcher who aims to differentiate a set of ducks and cats (Fig. 2a). The researcher could represent each animal by color, resulting in the partition in Fig. 2b, or by their ability to fly (Fig. 2c). While both partitions are valid, only Fig. 2c is a useful result if the researcher aimed to differentiate species. This example is formalized by Watanabe's ugly duckling theorem, which states that any two objects are equally similar unless a bias is applied when clustered 37,62 . A corollary of this theorem is that, when described by improper features, it is possible to obtain compact and well-separated clusters that do not represent desired categories. Without salient feature representation, researchers will incur a substantial loss in discriminating power.
Historically, AE waveforms have been represented by parameters in the time domain (Fig. 3a), the frequency domain, or by composite values comprised of two or more base values 13,28,63-74 . Table 1 lists common time domain, frequency, and composite features, where x(t) is the measured signal and F[*] is the Fourier transform operator. The predominance of time domain features appears to be for historical reasons, as AE was originally stored in terms of these parameters; today, these parameters are readily extractable either through commercial AE software or in-house programs.
Despite their prevalence, the efficacy of certain traditional features is unclear. For example, many features in the time domain are dependent on the signal time-of-arrival (ToA) (e.g., signal duration and RA value). If parameters derived from ToAs have large scatter for the same damage source, then this will negatively impact the ability to discriminate between damage modes. This is evidenced by forward modeling studies, which cast doubt on the salience of time domain features and instead advocate for the use of frequency features.
While there is an agreement that frequency features should be used as inputs to ML, AE waveforms are non-stationary, meaning that their frequency content changes in time and the optimal way to extract frequency information remains unclear. The fast Fourier transform (FFT) is typically used to extract frequency information, but systematically misrepresents frequency content for nonstationary waves 75 . For this reason, researchers have espoused time-frequency representations 45,[76][77][78][79][80] .
The continuous wavelet transform (CWT) is a popular timefrequency representation for inspecting non-stationary signals, whose frequency content changes in time (Fig. 3b). It can be used for either qualitative AE analyses 45,76,81 or quantitative characterization 79,82 . The CWT is described by: where ϕ * is the complex conjugate of the mother wavelet ϕ (a function satisfying certain criteria 79 ), a is the scale parameter corresponding to a pseudo-frequency, and b is the translation

Continuous Wavelet Coefficients Parameter Features
Marginal Hilbert Spectrum x (3,1) x (3,2) x (2,1) x (1,1) x (3,3) x (3,4) x (2,2) x (3,5) x (3,6) x (2,3) x (1,2) x (3,7) x (3,8) x (2,4) Wavelet Packet Decomposition Energy a b c d The continuous wavelet transform is applied to the signal using Eq. (6). These CWT coefficients are then used as input to the feature vector. c Wavelet packet decomposition takes the original signal, and down-samples it in to 2 j subsignals, each containing frequency content according to Eq. (7). The energy contained in each of these subsignals is then used as features. d Features from the marginal Hilbert spectrum are extracted. The frequency centroid of each IMF is used in this example. Panel b is adapted from ref. 79 with permission from Elsevier. Panel c is adapted from ref. 83 with permission from Elsevier. Panel d is adapted from ref. 89 with permission from Elsevier.
parameter. Common choices for ϕ include the Debauchies wavelet and the Morlet wavelet 79,83 . Eq. (6) is a convolution calculation; however, when a and b are considered fixed, Eq. (6) is the inner product of the time series with the mother wavelet. This leads to an intuitive interpretation; high values of the CWT at time b indicate a large frequency content corresponding to a. The authors are aware of only one method to extract features with the CWT: calculate all desired values of CWT(a, b) and represent them as a column vector 79,84 . For example, if an AE signal has 1000 time domain samples, CWT (a, b) can be calculated for N values of frequency, resulting in a feature vector whose length is 1000 × N.
An alternate method for interrogating non-stationary signals is the wavelet packet transform (WPT) 83,[85][86][87] . The WPT is a set of decompositions of the original signal, x (0, 0) . At each decomposition iteration, two down-sampled signals containing the original's low frequency (approximation) and high-frequency (detail) components are retrieved. The decomposition process continues on both subsignals until a specified decomposition level is reached. This is shown schematically in Fig. 3c. A signal decomposed to level j will produce 2 j waveforms, x (j, i) , i ∈ [1, 2, . . . , 2 j ] with non-overlapping frequencies in each subsignal 86,88 . The relative energies (the portion of the original signal's energy contained within each down-sampled signal at the final decomposition level) are then extracted and used as features 83,86,88 .
While the CWT and WPT are useful for inspection of nonstationary signals, these approaches decompose signals with a fixed set of basic functions that may not be representative of the underlying process. The Hilbert-Huang transform (HHT) addresses this issue by extracting the instantaneous frequencies and amplitudes using basis functions defined by the signal itself 75,77,89 . An HHT analysis consists of two steps: (i) empirical mode decomposition (EMD) of a signal into a set of intrinsic mode functions (IMFs), and (ii) a Hilbert transformation of the IMFs.
In a well-behaved time series x(t) (symmetric about 0 with no superimposed waves), the Hilbert transform y(t) is defined as 75,77 : where PV is the Cauchy principal value of the improper integral. From this, the analytic function z(t) is defined: This has a natural phasor representation: where a(t) is the instantaneous amplitude of x(t), and θ(t) is the instantaneous phase of x(t).
θðtÞ ¼ arctan yðtÞ xðtÞ The instantaneous frequency ω(t) is then calculated: In practice, x(t) is unlikely to be well behaved; however, EMD ensures that each IMF, c i , is. By EMD, the frequency content of the first IMF is the highest, and decreases monotonically with index i. For AE data, the lowest order IMFs isolate frequency information from the signal, thereby expelling statistical noise. An analogous approach is ensemble empirical mode decomposition (EEMD), which provides a superior level of robustness to EMD and should be used when computationally affordable 75 . The readers are referred to Huang et al. 75 for details of executing E/EMD; in addition, there are available MATLAB 88 and Python 90,91 packages that readily perform both E/EMD and the Hilbert transform.
After EMD, the Hilbert transform is applied to each IMF. For a given IMF, we can define: which assigns a time value and its related instantaneous frequency to the instantaneous amplitude of c i . The sum of all H i is the Hilbert spectrum, which represents the extracted instantaneous frequency and amplitude from all c i 77 . An alternative visualization of the Hilbert spectrum is the marginal Hilbert spectrum, defined as: where T is the length of signal x(t) (Fig. 3d). The marginal Hilbert spectrum is the probability of finding an instantaneous frequency at any given point in the signal, which Huang et al. differentiated from the FFT, which represents the total energy persisting from a frequency throughout the signal 77 .
While the HHT has the potential to reveal hidden insights in the data, its applicability for feature extraction is uncertain 89,[92][93][94] . Though previous work by Hamdi et al. suggested that HHT features could be used for unsupervised ML, their findings are inconclusive 89 . Their input data included labeled data from other studies 95,96 that employed ML techniques unverified with experiments, as a basis for labeling their own data sets. In general, HHT properties are not as well characterized as the FFT, CWT, or WPT. For this reason, it has not been as widely adopted.

Metrics and transformations
Once features are extracted, a metric, or similarity function, d(*, *), can be used to define the degree of similarity between observations. In general, d must satisfy: In some cases (e.g., an algorithm working on local neighborhoods), requirement 4 can be relaxed 97 .
The efficacy of a given metric is tied to the geometry of the input data, which manifests itself in the quality of the resulting partition 37 . For example, neighborhood-based distances are better suited for grouping nested circles than Euclidian distance 97 . While there is flexibility in choosing a metric, an alternative strategy is to pre-process feature vectors by a transformation 36,37,60 .
Principal component analysis (PCA) is a transformation used for decorrelation, a requirement for k-means, and for dimension reduction 34,61,79,98 . PCA is a linear and orthogonal change of basis calculation whereby the new coordinate system describes the directions of maximum variance. It transforms the ordered basis (e 1 , e 2 , . . . , e d ) to a new basis, ðe 0 1 ; e 0 2 ; :::; e 0 d Þ with the property that e 0 1 is oriented along the direction of greatest variance, e 0 2 is oriented along the direction of second greatest variance, and so on, with a diagonal covariance matrix (i.e., no correlation).
In addition, when the feature vector v ¼ P d i¼1 a i e 0 i is projected to the q-dimensional subspace by dropping the last d − q basis vectors, the reconstruction error is minimized, allowing for data to be efficiently represented in a lower dimensional space.
It is worth noting that dimension reduction via PCA aims to decrease the computational load. However, the intrinsic computational load for AE data is often light, with few observations (≾10,000) and low-dimensionality (≾20), and runtimes for common ML algorithms tend not to exceed a few minutes. As such, the dimensionality reduction described above is not recommended, as information is lost. However, PCA is still a useful transformation. Features with large variance tend to dominate clustering results 37 , it therefore may be desirable to perform PCA whitening, or sphering 37,79,99 . The whitening transformation is a re-scaling of the PCA axes such that the diagonal covariance matrix obtained by PCA is the identity matrix. Major programming languages offer packages to execute both PCA and whitening 88,91,100 . The reader is referred to MacGregor et al. 101 for further details.
In general, many transformations are available, but the business of choosing an appropriate one is tricky. Transformations alter the geometry of the input data so that clustering can be executed. However, a complete understanding of the transformation's effect on the input data is contingent on a priori knowledge of the starting geometry. This is problematic, as AE data lives in a highdimensional space and can never be viewed in its natural setting. Modern projection techniques, described further in the section "Data visualization," allow for estimation of the high-dimensional geometry.

Machine learning algorithms
After pre-processing, we arrive at the matter of choosing a clustering algorithm, which must complement the new geometry of the processed data. The authors stress that if an informed decision is not made at this juncture, an accurate partition cannot be achieved, even if the initial data are properly represented.
K-means is among the earliest clustering techniques and as such, it has positioned itself as the primary clustering method for AE applications. The k-means algorithm finds spherical clusters and is therefore best suited for clustering convex data 36 . However, it remains unknown if AE data obeys this constraint, and thus whether k-means should be the default choice. To this end, a deeper understanding of the k-means algorithm is needed to properly leverage its strengths and interpret the efficacy of previous AE studies that have utilized it.
The k-means algorithm seeks to partition a set of observations into k clusters by finding the minimum of the L 2 loss function: where μ j is the center of cluster j ∈ {1, 2, . . . , k} and x i are all data points associated with cluster j 36,37,60 . The process is outlined in Fig. 4a and works as follows: 1. k centroids are initialized 2. Feature vectors are labeled according to the closest centroid 3. The center of the labeled data is calculated, and the respective centroids are moved 4. Steps 2-3 are repeated until a stopping criterion is reached Because L 2 is minimized by gradient descent, final clusters are dependent on where centroids are initialized. For this reason, kmeans tends to fall into local minima 102 , and for any given run of k-means there is likely a better partition to be found (Fig. 4b).
To avoid falling into such local minima, variants of k-means modify the initialization scheme 65,95,103 . Among those, the maxmin and k-means++ variants are the most robust 59,91,102 , as both initialize well-dispersed seeds. Maxmin selects a point for the first cluster center, and at each subsequent step, the following centroid is calculated as the farthest distance from the nearest centroid. K-means++ selects subsequent seeds from a statistical distribution with farther distances being preferred 102 . Sibil et al. developed an approach using a genetic algorithm, where multiple runs of k-means are initialized, and the best results are cross-bred according to a set of user-defined rules 103 . While this genetic algorithm allows k-means to find an optimal solution with fewer restarts, appreciable gains in runtime or accuracy are not achieved by its use.
An alternative to modifying the initialization scheme is to alter how centroids move after initialization. One such approach is fuzzy c-means (FCM), in which a feature vector (x j ) is assigned a membership grade (u i ) to cluster i such that u i (x j ) ∈ [0, 1]. FCM updates centroids according to weighted (rather than absolute) membership, which reduces tendency to fall into local minima 37 . Despite these benefits, it is unclear if FCM is superior to kmeans 96,104-108 due to difficulties in verifying cluster results.
Ultimately, the user can opt for another strategy altogether: restarting k-means multiple times and selecting the best run. When the number of clusters is low and data are poorly separated, as is often the case with AE data, this strategy yields acceptable results provided that the number of restarts is sufficiently large (>100) 59 . Given that clustering via k-means tends to have short runtimes, this is a feasible strategy.
However, none of the aforementioned techniques address the issue of data geometry. Therefore, alternative algorithms must be considered that are suited for larger ranges of data geometries at similar computational costs.
One such viable alternative to k-means found in the literature is the Gaussian mixture model (GMMs) 74 . In contrast to k-means, GMMs find Gaussian distributions that model the input data and thus are well suited to clustering data sampled from Gaussian distributions. Moreover, they provide greater flexibility for clustering non-convex geometries and data with different scales (Fig. 4c).
GMMs model input data as belonging to a set of d-dimensional Gaussian distributions and search for the set of related parameters 58,109 . The probability of finding feature vector x j ∈ X where X is the set of feature vectors sampled is: where M is the number of component Gaussians, and w i is the weight associated with the multivariate Gaussian distribution: parameterized by covariance matrix Σ i and mean μ i . The weights are normalized such that their sum is unity. The Gaussian mixture is then completely described by the set of parameters: λ ¼ fw i ; μ i ; Σ i g; i 2 f1; 2; :::; Mg The GMM searches for λ, which maximizes the likelihood of sampling X by way of the Expectation-Maximization (EM) algorithm 110 . Once EM reaches some stopping criterion, feature vectors are labeled according to the Gaussian that provides the highest likelihood of sampling it. Final GMM partitions are also sensitive to the initialization, and therefore multiple restarts are recommended. For an in-depth discussion of GMMs, the reader is referred to Reynolds et al. 58 .

Hyperparameter selection
Once a suitable clustering algorithm has been chosen, a set of user input parameters known as hyperparameters must be specified. These control how the ML algorithm operates on the data and influences the final partition.
In unsupervised learning, a common hyperparameter is the number of clusters within the natural structure of data, which in practice, is uncertain. In AE literature this is analogous to identifying the number of unique damage modes that are discernible. We have previously described two hypotheses developed by the AE community: (i) any AE event is generated by only one damage mode and (ii) waveforms have distinct features corresponding to that damage mode. It then follows that the number of clusters should correspond with the number of detectable damage modes. However, if two or more damage modes have similar characteristics, experimental noise may mask the true source, in which case, the number of distinct clusters will be between one and the number of unique damage modes.
The above assumes that each AE signal is the result of only one damage mode, yet in practice, the AE-damage relationship is more nuanced. It has been experimentally shown in situ that more than one damage mechanism can be active within the timeframe of a single AE event 9 , such that in the case of a fiber-bridged matrix crack, the damage mechanisms of transverse matrix fracture, fiber debonding, and interfacial sliding all occur within a short period of time and are captured by a single AE signal. It is infeasible to deconvolute the contribution of each mechanism to the signal. Yet the critical features of that signal are the sum of the contributing features, and thus are dependent on the activity of each contributing mechanism. Future work must address whether multi-mode AE events fall into the cluster of the dominant damage mode, or into their own multi-mode cluster(s).
Ultimately, due to multi-mode damage mechanisms and noise masking, the true number of clusters is unknown. In addition, data projection methods are unsuitable for determining cluster number due to inherent information loss 101,111,112 . As a solution, researchers apply heuristic functions to measure intercluster/intracluster spread. All such functions operate on the premise that the optimal number of clusters yields the highest intercluster separation and lowest intracluster variance [113][114][115][116][117][118][119] , and the true number of clusters is estimated where the function is extremized.
The silhouette value (SV) is one of the most popular heuristic functions 115 . It has become a staple in unsupervised clustering because of its ease of use and natural graphical representation. For a feature vector x k , the SV is defined as: where a(x k ) is the average distance of x k to all objects in its cluster, and b(x k ) is the minimum of the average distance from x k to all objects in a different cluster. While defined for a single feature Spectral Clustering GMM k-means Fig. 4 A visual guide to k-means and other ML algorithms. a When clustering data with k-means, seed points are selected to represent the initial data, which are subsequently labeled (or colored) according to the closest seed point. The seeds are then moved to the centroid of the labeled data, and points are labeled. This process is repeated until a user-defined stopping criterion is reached. b K-means often falls into local minima, resulting in sub-optimal partitions. An initialization with 15 centroids is shown, where two centroids have fallen into a local minima (denoted with minuses) and are unable to move to their optimal position (denoted with pluses). c Although widely used, k-means is unsuited for clustering certain types of data. The performance of k-means, GMM, and spectral clustering is demonstrated on a Gaussian data set (row 1), an anisotropic data set (row 2), and a concentric circle data set (row 3). The performance of an ML algorithm is dependent on the data scale and geometry. Panel a is adapted from ref. 36 with permission from Elsevier. Panel b is adapted from ref. 59 under creative commons license. Panel c is generated from the sci-kit learn toolbox 38 .
vector, the SV of a partition is the average SV over all feature vectors; this is the value used to estimate the number of clusters. The SV is bounded such that SV ∈ [0, 1], with 1 corresponding to perfectly compact clusters.
The SV is applicable to evaluate clusters independent of their geometry; however, it is best suited for convex and spherical data. When non-convex data are clustered, other heuristic functions should be considered. In GMMs, both the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are established model selection techniques 91 . It has been shown that AIC generally overestimates the true number of mixture components, whereas there is considerable evidence that BIC consistently selects the correct number 120 .
The authors stress that heuristic functions are estimates. When possible, multiple heuristic functions should be used to corroborate the results of each other. Finally, researchers cannot rely on heuristic functions to evaluate clustering success, as it is possible to obtain compact and well-separated clusters whose membership does not reflect damage modes.

Data visualization
In addition to quantitative assessments of partition quality, such as those described in the section "Hyperparameter selection," it is often useful to visualize data to inspect the partition quality. The self-organizing map (SOM) is a type of neural network used for the visual assessment of partition quality through dimension reduction 111 . SOMs maintain their topology, meaning that nearby points in the reduced representation correspond to similar feature vectors. Unlike PCA, SOMs are non-linear and thereby better suited to visualize a wider variety of data geometries.
The SOM consists of a grid G of l × l nodes, h ij . The grid architecture describes node connections and dictates that nodes move when the SOM is learning the structure of the input data. Finally, each node has an associated weight vector, w ij . This can be considered the node's position within the high-dimensional feature space. The SOM learns the structure of the input data as follows: 1. Weights w ij are randomly initialized 2. A random observation, x k is selected (presented) and its distance from the high-dimensional embedding of node ij is calculated as: 3.
Step 2 is repeated for all nodes, and the closest node is identified by: 4. The weight of node h Ã ij is updated according to: where t is the time step and η is the learning gain factor that starts at 1 and monotonically decreases with t 5. The weights of nodes in the neighborhood of h * are updated according to Step 4, and t is incremented 6. Steps 2-5 are repeated until all data points have been presented at least once Once completed, the high-dimensional data can be visualized by assigning a gray value to the 2D grid of the SOM according to the maximum distance of a node from its neighbors (Fig. 5a) 121 . Thus, the SOM effectively has two representations, a highdimensional representation and a 3D representation (2D grid + 1D gray values). The strength of the SOM is rooted in the fact that the high-dimensional representation adapts to the structure of the feature vectors, while maintaining a meaningful 3D representation 111 .
Previously, AE studies have used SOMs to pre-process data by associating input data to the closest w ij and clustering these weight vectors with k-means 95,122,123 . However, approximating points by their closest node can only reduce the amount of available information, thereby degrading the fidelity of k-means. For this reason, the authors recommend that data be clustered and subsequently visualized with an SOM, rather than clustering SOM nodes directly.
Another method for visualizing high-dimensional data is tdistributed stochastic neighbor embedding (t-SNE), which is a manifold learning algorithm. While the SOM projection loses geometric structure in the low-dimensional representation, the t-SNE projection retains a sense of the high-dimensional geometry 112 .
At the highest level, t-SNE maintains all pairwise distances between the high-dimensional and low-dimensional representation of feature vectors, x i and y i , respectively 112 . The fundamental assumption in t-SNE is that for a given feature vector, x i , all other points follow a Gaussian distribution with standard deviation σ i centered at x i . The conditional probability of finding another feature vector x j is then: There does not exist an optimal value for σ i that describes all data points. Instead, this is estimated with a hyperparameter termed perplexity, which is a measure of how much of the local structure is retained in the final low-dimensional map. As perplexity increases, local structure information is exchanged for global structure information 112 . Pairwise probabilities in the high dimension are translated to similar pairwise probabilities in the low dimension, q j|i . If the lowdimensional representation has correctly maintained the same high dimension pairwise distances, then p j|i = q j|i for all pairs i, j.
In contrast to the high-dimensional representation, where similarities are calculated according to a Gaussian distribution, pairwise similarities in the low dimension are calculated according to a Student t-distribution with a single degree of freedom. This alleviates the crowding problem in traditional SNE while permitting for tractable runtimes.
Like PCA, t-SNE axes have no intrinsic meaning; only relative distances between points are valuable. Intercluster separation and cluster size also have no intrinsic meaning. Despite this, t-SNE is exceptionally powerful, as few user inputs are needed and it outperforms other manifold learning techniques such as Sammon mapping, Isomap, and local linear embedding [124][125][126] (Fig. 5b). T-SNE is described in greater detail in van der Maaten and Hinton 112 and a practical guide is provided by Wattenberg et al. 127 .
TOWARD IMPROVED AE-ML FRAMEWORKS Experimental, numerical, and statistical tools have all been leveraged to infer damage modes from AE. While the first two have provided valuable insights, there are still unknowns regarding experimental factors (e.g., effect of sensor coupling) and microstructural details (e.g., wave scattering in a complex microstructural landscape). This mandates the use of statistical tools provided by ML to relate what is known about AE to explore the waveform-mechanism relationship. In this section, we discuss advantages and limitations of previous studies and highlight issues that must be addressed moving forward.
While methodologies and findings in AE/ML studies differ within composites, there is consensus on the following: (i) damage modes are distinguishable by their AE signatures (Fig. 1), (ii) clusters can be discerned with unique activities, and (iii) robust methods for verifying results are needed.
The details of what features are deemed salient vary between investigations, but there is overwhelming agreement that they exist and are frequency based. Forward models of carbon/glass fiber-reinforced polymers show explicit differences in frequency characteristics of measured waveforms 48,128 , lending support to experimental efforts suggesting that damage modes can be distinguished by frequency 17,18,26,27,33,129 .
Clusters can be identified with reasonable degrees of separation based on heuristic functions 34,65,68,70,130 . This suggests that ML can efficiently partition AE signatures, although what the partitions represent remains uncertain. Despite variations in experimental configurations, material systems, and representation schemes, the number of clusters reported is consistently in the range of 3-6 67,69,84,95,131 . This agrees with the number of expected damage modes, suggesting that partitions are meaningful; few studies report more clusters than the feasible number of damage modes 66 .
The verification of ML results is a closed loop problem; the optimal feature representation and ML algorithm is not known a priori, motivating the need for manual verification that requires a priori knowledge of distinguishing features. As a result, strategies that rely on domain knowledge, such as the activity of damage in the stress domain, have been employed. The simplest verification method is correlation of average cluster frequency values (peak or centroid) to damage modes, or correlation of cluster activity to damage modes 35,66,67,79,80,89,96,108,130 . Some researchers have gone a step further and incorporated high-resolution postmortem techniques (namely, microscopy) to further justify cluster labels 65,69,123,[132][133][134][135] .
More complex solutions involve the creation of AE libraries to serve as ground truth data sets 32,68,136,137 . This is a particularly powerful approach because once ground truth sets have been established, there is no difficulty in the interpretation of final results and refinement of the learning procedure becomes possible. One such method is to isolate one to two damage modes by using specialized sample geometries, and then leverage these cluster characteristics to label the damage modes in a more relevant geometry. For example, Godin et al. used AE from pure epoxy resin and single-fiber microcomposites to label the unsupervised results from unidirectional composites 32 . Another method of library creation is to test full-scale samples in ways that promote one damage mechanism. Gutkin et al. tested a variety of composite geometries under monotonic tension, compact tension, compact compression, double cantilever beam, and fourpoint end notch flexure tests 68 . Finally, libraries have been created by testing specimens with known damage modes and using secondary high-resolution in situ methods to assign labels to clusters. Tat et al. collected AE from open-hole tension samples in conjunction with x-ray radiography and DIC to characterize damage modes 137 . While such studies should elucidate the nature of the AE-mechanism relationship, there are caveats that limit their usefulness, as discussed below.
Despite progress toward the development of an AE/ML framework, there are outstanding issues in: (i) verification procedures, (ii) error quantification, and (iii) feature selection and algorithm choice, which obstruct continued advances.
Verification procedures are each accompanied by certain deficiencies. When cluster verification is limited to correlation of frequency values, there is often large experimental variance. We now know that singleton frequency values (e.g., peak, average, or centroid) are influenced by factors such as source-to-sensor distance and the state of damage accumulation; both affect waveform attenuation and the number of scattering sites 29,30,54,61 . When cluster activity is considered, there is often considerable overlap in ways that are not physically expected. For example, Lyu et al. concluded that all modes of damage in 0/90 woven C f /SiC CMCs, including fiber bundle failure, are active from the beginning of a creep test. It is unclear if this is a real phenomenon, or if fewer damage modes are active and are instead being incorrectly distributed across all clusters 138 . Few studies show distinct differences in chronological cluster activation 84,130,139,140 . More complex fiber architectures often show simultaneous cluster activation; however, these findings have not been verified experimentally 34,66,141,142 . Furthermore, previously applied postmortem methods are insufficient for informing on damage chronology and can only provide information on what labels are available to be assigned. Finally, the library methods discussed train models with flawed data sets. They use unsupervised clustering results whose complete membership is unknown, rather than using reference data to directly train models. Preliminary results by Godin et al. suggest that the accuracy of such methods is low (≈55%) 32 .
There exists a natural marriage between error quantification and cluster verification. Robust procedures for error quantification exist and have been adapted for AE 61 . However, the ability to broadly implement them hinges on knowledge of ground truth. Although libraries have been developed to serve as ground truth data sets, they cannot be used to assess error because they lack one-to-one correlation between events and damage modes. Moreover, of the studies that employ these libraries 32,84,87,103,136,143 , there is no overlap between material systems or experimental configurations, and no framework currently exists to address these differences. This prevents meaningful comparison of the efficacy of different representation schemes and choice of ML algorithm.
The authors are only aware of one library, from Tat et al. 137 , that is a candidate for assessing error. However, Tat et al. opted to capture a single tomographic scan in the period of interest, limiting temporal resolution as well as the library's utility. In addition, this study used an open-hole dogbone specimen geometry, which alters waveform characteristics 144 , preventing its extension to other geometries. More generally, a library built for a specific specimen geometry can only be used to classify waveforms from the same geometry. Moving forward, meaningful error quantification relies on the use of spatially and temporally resolved bulk damage observations in situ, which can then be used to assign labels to AE events. While this is experimentally challenging, it is the key to confirming that an ML framework is effective and for quantifying its accuracy.
Related to feature selection, there is significant experimental evidence 17,18,26,33,145 and modeling efforts 48,49,53,98,128 , which indicate that damage mechanisms exhibit distinguishable differences in frequencies. It then follows that waveforms should be represented by methods that encode frequency information such as frequency parameters, CWT coefficients, and HHT spectra. However, the salient frequency features remain unknown because the nature of experimental frequency variations are in dispute 29,30,54 .
Moreover, such features depend on the experimental configuration. Changes to the sample geometry affect waveform features, even when the dominant damage mode has not changed 46,49,72,146 . Ospitia et al. demonstrated that plate geometries exhibit lower frequencies for the same artificial excitations as compared to beam geometries, with similar trends observed in actual mechanical tests 146 . This difference was attributed to the unrestricted crack extension in the beam geometry compared with incremental extensions in the plate geometry. In addition, variations in propagation pathways distort acoustic waveforms resulting from a single event recorded at different sensors 46,49,72,146 . Hamstad showed, via FEA calculations, that excitations will produce lower frequencies corresponding to the flexural wave mode when the excitation is farther away from the midplane of the plate 46 ; this has been experimentally verified 29 . Maillet et al. experimentally demonstrated that path lengths shift frequency centroids to lower values for larger source-to-sensor distances 72 . Thus, an event not emitted from the center of a specimen will have different frequency content depending on which sensor is used to analyze the wave. Finally, the sensor contact is also known to influence the frequency spectra of a signal [147][148][149] . Theobald et al. showed that the acoustic impedance matching between a couplant and the sensor significantly influences the recorded frequency spectra 148 . It is the authors' opinion that future AE studies aiming to study point-by-point AE information across specimens must standardize methods for ensuring experimentally consistent sensor contact.
Because waveform features are tied to the experimental configuration (and for other historical reasons 6 ), large numbers of features, including time domain features, are commonly employed (Table 1), with the expectation that the correct features are included. This leads to the use of many features whose physical basis is unclear. This practice has been observed to degrade the fidelity of clustering 32,66,98,150 .
Data geometry and spread are affected by the feature extraction process. Many common parameter features are highly correlated 65,67 , which implies that many composite features are also correlated. In some cases, the method of feature extraction introduces artificial variance, thereby increasing the intrinsic cluster spread. For example, signal onset time is often determined by hand or by use of a floating threshold 66,68,87,143 . While handselected signal onset time introduces less variance than threshold determined onset time 72 , both are artificial in the sense that there is no mathematical rigor involved. In contrast, the AIC method offers a mathematically consistent procedure, rooted in statistical theory, for extraction of signal onset time 72,151 . Although manual measurements are considered to be more accurate than AIC, it is time consuming to extract first peak times for large quantities of signals (>1000), which can introduce inconsistencies in analysis that are mitigated by the AIC approach. Moving forward, it should also be the case that other features are extracted in similarly consistent manners.
Finally, while most investigators agree that broader categories of damage mechanisms can be distinguished 32,48,130 , it is unclear if more precise labels can be assigned 34,67,138,152 . For example, Moevus et al. claimed to discriminate three types of matrix cracking in woven composites 67 based on cluster activity. More recently, Lyu et al. claimed to discriminate between single-fiber failures and fiber bundle failures but did not differentiate types of matrix cracks 138 . Meanwhile, Li et al. made no distinction between delamination and fiber failure 34 in woven glass fiber composites. Furthermore, the lack of consensus when assigning more precise labels to damage modes is exacerbated by the uncertainty in whether feature differences are phenomenologically expected for these labels.

CONCLUSIONS
Recent advances in ML techniques have created an opportunity for high-fidelity labeling of single AE signals. Such a breakthrough would be seminal to our understanding of the structure-damage mode relationship at the length scale on which damage initiates. This clarity would enable an understanding of the local structural drivers on the damage response of composites. Ultimately, highfidelity characterization would shed light on the critical microstructural features that, under thermo-mechanical loading, determine when and how the material will fail.
In this review, we summarize and discuss ML techniques that have been leveraged for damage mode identification in composites from their AE. We present an overview of the AE phenomenon and the full implementation of a general ML framework. We detail a variety of commonly used features and their extraction in practice. Specific considerations at each step of building an ML framework are identified such that the user maximizes the possibility of accurately clustering a set of AE waveforms. We end with an overview of the state of ML as it relates to AE in composite materials, discussing current agreements within the community and challenges that must be addressed before an AE-ML framework can be used to relate damage modes to microstructural parameters.
In reviewing AE, composites, and ML literature we draw the following conclusions: 1. There are encoded characteristics within the frequency domain of waveforms that relate AE to damage modes. This is stipulated by the fact that singular frequency values should not be used alone due to their dependence on propagation pathways from damage sources. 2. It is often the case that features are used with no physical basis, and in some cases, features are extracted by arbitrary methods. This practice degrades the quality of partitions.
Only physically meaningful features should be used, and their extraction should be mathematically consistent. 3. Previous efforts in AE-ML have explored different feature sets, yet have largely relied on k-means clustering. Future work should move beyond k-means, as it is only suited for a narrow set of data geometries. 4. There is limited work in verifying fidelity of cluster labels and membership. Progress in this field hinges on quantifying error, which is contingent on spatially and temporally resolved in situ observations of bulk damage accumulation.

DATA AVAILABILITY
Data sharing not applicable to this article as no data sets were generated or analyzed during the current study.