A machine learning framework for damage mechanism identification from acoustic emissions in unidirectional SiC/SiC composites

In this work, we demonstrate that damage mechanism identification from acoustic emission (AE) signals generated in minicomposites with elastically similar constituents is possible. AE waveforms were generated by SiC/SiC ceramic matrix minicomposites (CMCs) loaded under uniaxial tension and recorded by four sensors (two models with each model placed at two ends). Signals were encoded with a modified partial power scheme and subsequently partitioned through spectral clustering. Matrix cracking and fiber failure were identified based on the frequency information contained in the AE event they produced, despite the similar constituent elastic properties of the matrix and fiber. Importantly, the resultant identification of AE events closely followed CMC damage chronology, wherein early matrix cracking is later followed by fiber breaks, even though the approach is fully domain-knowledge agnostic. Additionally, the partitions were highly precise across both the model and location of the sensors, and the partitioning was repeatable. The presented approach is promising for CMCs and other composite systems with elastically similar constituents.


INTRODUCTION
Silicon carbide/silicon carbide ceramic matrix composites (SiC/SiC CMCs) are structural ceramics with advantageous mechanical properties for use in extreme environments. As a result, CMCs have been increasingly incorporated in applications such as high-pressure turbine (HPT) shrouds in the hot section of aircraft engines, HPT vanes, and combustor liners, and are being considered as fuel cladding in generation IV nuclear reactors [1][2][3][4] . In these and other applications, CMC failure is costly and dangerous. A detailed understanding of the failure envelope is critical for predicting component lifetimes and optimizing material performance; to this end, structural health monitoring (SHM) techniques are well suited for characterizing the damage state in these high-impact materials.
Acoustic emission (AE) is one well-established SHM technique that is widely used for nondestructive monitoring of damage accumulation across a range of material systems. AE captures the elastic waves that are produced when strain energy is rapidly accumulated and released from damage sources 5 . In modern practice, AE is used to triangulate the locations of damage sources 5 , identify highly damaged areas in the bulk 6 , and assess the severity of incurred damage 7 . As a characterization tool, the spatial resolution of AE for triangulating surface cracking in situ has been shown to be ±100 μm 7 , and the temporal distance between resolvable AE events is 100 μs 8 ; these exceed those of other techniques, such as eddy current 9 , electrical resistance 10 , and X-ray computed tomography (XCT) 11 , making AE uniquely powerful for capturing evolving damage 8 .
Despite the rich scope of information that AE provides, the full range of its utility remains to be explored. Akin to how humans interpret variations in sounds to distinguish a tuba from a trombone, one longstanding hypothesis is that AE waveforms contain signalspecific features that can be used to identify the damage mechanism(s) that generated them [12][13][14][15] . In continuous fiber reinforced composites, this signal-specific identification is equivalent to mapping an AE signal to the matrix crack, interfacial damage, or fiber failure event that emitted it. The ability to identify a damage mechanism from its AE event has far-reaching advantages both in SHM and in predictive modeling. It would allow for the nondestructive, in situ identification of damage sources, enable researchers to bypass expensive high-resolution methods (such as scanning electron microscopy and XCT 7,8,11 ), and produce information-rich datasets for large-scale statistical analyses of the effects of small-scale damage in full composite structures.
However, damage mechanism identification in CMCs is nontrivial and has remained an elusive goal 16,17 . In order to assign a damage mechanism to an AE waveform, the identification of patterns between many features (i.e., the use of a highdimensional representation) is necessary 16 , which is infeasible to perform manually. Instead, this objective requires the use of machine learning (ML) techniques, which are capable of finding structure in datasets where objects are described by highdimensional feature vectors 18,19 .
AE features can be understood in terms of the orthotropic model for wave propagation in solids, where the out-of-plane (w) and in-plane (u 0 , v 0 ) displacements are governed by 20 : with A ij and D ij functions of the orthotropic elastic constants and plate thickness (h), with ρ* defined as: In this model, the waveform structure is dictated by the elastic constants and plate density. It follows that, when the elastic properties of the constituents are disparate, AE waveforms are distinct in both the time and frequency domains 21 .
Over the past 30 years, many AE-ML frameworks have been developed around material systems that obey the constraint of constituent elastic dissimilarity, such as polymer matrix composites or elastically dissimilar brittle matrix composites (BMCs) [21][22][23][24] . These frameworks typically encode waveforms with time-domain parameters that are easily extractable by the commercial software used to record the waveforms. ML techniques are then applied to partition the encoded signals based on the source damage mechanism 21,25,26 . As the ground truth of partition membership is rarely known, the results must be manually interpreted to ensure that the ML algorithm has indeed discriminated signals by damage mechanisms. This workflow is demonstrated by Kostopoulos et al., who recorded waveforms in elastically dissimilar BMCs. Waveforms were encoded with time-domain features and sorted with the k-means algorithm. Their findings showed clear differences in cluster activity, which corresponded with mechanistic expectations for their simplified composite geometry 21 , allowing them to assign damage mechanism labels to clusters.
Despite the success of frameworks developed for elastically dissimilar composites, damage mechanism identification in SiC/ SiC CMCs is more nuanced. In this system, where the elastic properties of the fiber and matrix are similar, the microstructural landscape drives waveform structure in the time domain 14,16,[27][28][29][30][31] . Commonly used time-domain parameters, like rise time/amplitude value, are then dictated by waveform propagation pathways (both as a function of propagation distance and the accumulated damage state in the material bulk) rather than the generating damage mechanism 32 . This relationship is detrimental to the discriminating power of AE-ML frameworks, as waveforms are sorted based on stochastic waveform distortions rather than the source damage mechanism 33 .
It has also been proposed that, given their elastically similar properties, the SiC/SiC matrix and fiber will fracture in similar manners, thus preventing discrimination between damage mechanisms altogether 34 . This hypotheses was predicated on well-established theories of plate-wave propagation 20,27,35 and further substantiated by experimentation that found little or no discriminating power between matrix cracking and fiber failure 34,36 .
In this work, it is instead hypothesized that the historic lack of discriminating power between the dominant damage mechanisms in SiC/SiC CMCs is a result of both the encoding scheme and algorithm choice rather than their constituent elastic properties. In addition to the improper use of time-domain features for encoding schemes, the majority of previous ML frameworks 21,22,34 use the k-means algorithm, which can only find isotropic clusters. Because of the restrictive assumptions of k-means, it is often outperformed by algorithms that are less limiting 16,37,38 . This restriction motivates the exploration of alternate waveform feature representations in conjunction with alternate ML algorithms for damage mechanism identification from AE.
Here a frequency-based AE-ML framework capable of damage mechanism identification in SiC/SiC minicomposites is presented. The minicomposite architecture was chosen for this demonstration as it exhibits a limited number of damage mechanisms, in which a well-established damage chronology enables a straightforward evaluation of clustering results. AE was recorded with a four-sensor experimental configuration (Fig. 1a) in three SiC/SiC specimens subjected to monotonic tensile loading. This configuration allowed for evaluation of framework precision, a metric of the labeling consistency. AE waveforms were encoded with a modified partial power scheme and partitioned according to their generating damage mechanism using the spectral clustering algorithm 37,39 . We find distinct activity of clusters in the stress domain that follows the established damage chronology of minicomposites. This ML approach is shown to assign labels independently of the sensor used to record the AE signals. This sensor independence demonstrates that differences between the fundamental damage mechanisms drive label assignment, rather than the stochastic distortion of waveforms resulting from the experimental configuration. The framework developed herein can be more broadly applied to brittle composites whose constituents have elastically similar properties.

RESULTS
Unsupervised classification of acoustic spectra AE data collection and preliminary filtering (described in "Methods") represent Step 1 of the general unsupervised AE-ML framework, which proceeds as follows: 1. Experimentation: a number, n, waveforms are collected 2. Feature extraction: waveforms are represented in feature space by extracting d pertinent features (i.e., each waveform is represented by a d-dimensional vector) 3. ML algorithm: an algorithm is selected to partition waveforms into clusters that are representative of damage mechanisms 4. Labeling and error analysis: post-clustering analysis is performed to assign damage mechanism labels to clusters and assess the validity of results At each step, considerations must be made to ensure the framework functions properly. The following section describes Steps 2-4. The code created for this investigation utilizes the Scikit-learn toolbox 39 and is available without restriction 40 .

a b
Location 'a' Location 'b' Fig. 1 Diagram of the experimental set-up. a Two sets of AE sensors were coupled to the minicomposite whose length is nominally 20 mm. When any sensor was triggered, all sensors began recording, ensuring that each AE event could be correlated between all sensors. b A photograph of a sample. Epoxy tabs used to mount AE sensors are denoted with arrows. Scale bar length is 10 mm.

Feature extraction
After AE waveforms are collected, suitable representations to encode damage mechanism information must be determined. Appropriate waveform features are those that are more dependent on the generating damage mechanism than on extrinsic factors, such as propagation distance. Prior finite element analysis studies, supported by experimental evidence, indicate that partial power is one such feature, provided that signals are recorded in the near field 31,41 .
Here AE waveforms are encoded with a modified partial power scheme, where the ith component of the feature vector is: where F[*] is the Fourier transform operator, x(t) is the recorded signal, and d is the number of entries in the feature vector. We set k 0 = 200 kHz and k d = 800 kHz for all specimens. To determine the value of k 0 , a parametric sweep from k 0 = 50 to 350 kHz in increments of 50 kHz was conducted. The valueoptimizing validity metrics was chosen. Including partial powers >800 kHz did not improve clustering quality. This was attributed to the fact that the power of a frequency spectrum >800 kHz approached zero and thus could not provide additional discriminating power.
The frequency bounds encompass the pre-amplifier bandpass on the digital wave system and the flat frequency response of the B1025 sensors. While the frequency range includes values outside the flat frequency response of the S9225 sensors, this does not impact discriminating power. Any partitions made by the ML algorithm result from differences between waveform characteristics. The only stipulation is that recorded waveforms should be clustered independently for each sensor to capture the shift in damage mechanism (i.e., the singular set of all waveforms are not clustered together).
Another parametric study was conducted to determine d, where d was swept from d = 2 to 45. It was found that, when all other parameters are fixed, d = 26 (Δk = 23 kHz) optimized validity metrics for all specimens.
Though previous investigations have included the partial power approach as part of their representation schemes 15,42-45 , the representation scheme described here is unique in that it uses a comparatively much finer resolution and only uses partial power. Typically the partial power bandwidth used is 200-600 kHz, whereas the approach herein uses a width of 23 kHz.

Spectral clustering
Once AE data are properly represented, a suitable ML algorithm for clustering can be chosen. For this task, spectral clustering was used. This is an unsupervised learning technique that has been shown empirically to outperform k-means 37,38 and is less restrictive with respect to assumptions about input data geometries.
Spectral clustering models the input dataset as a graph with nodes (data points) connected by edges whose weight is 1 if the nodes are nearest neighbors (NNs) and 0 if they are not. The algorithm finds the optimal place to remove edges and segment the original graph into a user-specified number of subgraphs (i.e., clusters) 37 .
Both the number of clusters and the number of NNs are considered hyperparameters (i.e., a set of user-selected parameters). One common method for estimating the number of clusters is through use of the eigengap heuristic, a measure of differences between successive eigenvalues of the graph Laplacian of the data 46 . However, noisy data can reduce differences between successive eigenvalues, which is often the case for AE data. In this case, the eigengap heuristic is not sufficient for determining the number of clusters. Instead, a parametric sweep from two to five clusters is performed and a drop in validity metrics is used to indicate the optimum number of clusters. A steep drop is observed after two (Fig. 2), corresponding with the hypothesis that matrix cracking and fiber failure events (the dominant damage mechanisms in SiC/SiC minicomposites) can be differentiated.
It is important to note there are other less dominant damage mechanisms active in minicomposites during loading (e.g., interfacial debonding and frictional sliding). It is well established from a micromechanics frame that, when matrix cracking occurs, there is simultaneous debonding and sliding in the crack wake; it is also understood that, when fibers fail, there is simultaneous fiber sliding and pullout 47,48 When dominant and non-dominant mechanisms occur simultaneously, their waveforms become superimposed 49 ; when nondominant mechanisms occur independently, they likely do so in quantities too small for recognition by the ML algorithm. As such, it is currently infeasible to isolate events resulting from damage to the boron nitride (BN) interphase or determine which AE features are characteristic of such damage. This is expected to be a source of error. Moreover, the inability to discriminate interfacial damage from other types of damage is reflected by the steep drop in validity metrics seen after two clusters in Fig. 2.
Similar to the determination of d and the number of clusters, a parametric study from NN = 5 to 20 was performed. The value of NN that optimized validity metrics slightly varied between studies (5, 5, and 7 for three experiments, respectively); however, there was a range of values for NN that produced acceptable results. To demonstrate the effectiveness of our approach, the number of NN is fixed to be 5 for all specimens resulting in suboptimal validity metrics for Experiment 3, particularly between sensors B1025-a and S9225-a (Table 3).

Validity metrics for error analysis
To evaluate the efficacy of our AE-ML framework, events at each sensor were clustered according to the steps described above, the results of which are called a partition. The desired outcome is that all partitions for AE data from a given specimen are the same and are independent of both the sensor model and sensor location.
To quantify partitioning success, the total matching rate is first considered, which is the percent of events assigned the same label by the clustering routine. As clusters have unbalanced sizes Fig. 2 Adjusted Rand Index as a function of the number of clusters. When more than two clusters are used to initialize spectral clustering, the steep drop in ARI corresponds to a decrease in precision. As such, when more than two clusters are specified, the spectral clustering algorithm is forced to find clusters, which are not correlated with damage mechanisms. This drop occurs in all experiments and is corroborated by results shown in Fig. 3.
(fewer fiber break events are expected than matrix crack events), it is possible to be unable to discriminate between damage mechanisms and retain high matching rates. For example, if 90% of AE events come from matrix cracks, then classifying every AE event as a matrix crack would yield a 90% matching rate, yet the ML framework would have no discriminating power.
Therefore, it is also useful to consider the permutation model of the adjusted Rand Index (ARI), which makes considerations for unbalanced cluster sizes 50,51 . The ARI is an adjusted-for-chance version of the Rand Index (RI), a metric for comparing the similarity of a partition to the ground truth. First, the RI for two different partitions, (A, B), is calculated as: where N is the number of elements, N 11 is the number of element pairs that are grouped into the same cluster in both partitions, and N 00 is the number of element pairs that are grouped into different partitions in both A and B. The ARI is then calculated as 39 : ARIðA; BÞ ¼ RIðA; BÞ À E½RIðA; BÞ max½RIðA; BÞ À E½RIðA; BÞ where E½RIðA; BÞ is the expected value of the RI under a random model. The ARI is bound between 0 and 1, with 0 corresponding to random label assignments and 1 corresponding to perfectly matching labels. This metric is useful for comparing a partition to the ground truth, and it is also useful for comparing two partitions that are assumed to be drawn from the same random model 51 . This makes it an effective tool to compare similarity between two partitions whose ground truth is not known a priori. ARI values exceeding 0.40 are correlated with high values of other classification metrics 52 .

T-distributed stochastic neighbor embedding (t-SNE)
A final, necessary step in this study was to confirm that the AE data forms identifiable clusters in the chosen feature space. To this end, t-SNE was employed. T-SNE is a manifold learning algorithm used to produce a low-dimensional visualization of highdimensional data. Although t-SNE axes, inter-cluster separation, and cluster size have no intrinsic meaning, t-SNE has been empirically shown to be a powerful tool for the identification of natural cluster structures in high-dimensional data 53 .
T-SNE maintains pairwise distances between the high-and lowdimensional representation of feature vectors, x i and y i , respectively 53 . For a given feature vector, x i , t-SNE models pairwise distances in the high-dimensional representation according to a Gaussian probability distribution with standard deviation σ i , centered at x i . Under this model, the conditional probability of finding another feature vector x j is then: and pairwise distances in the high-dimensional space are: The pairwise distances in the high dimension are then translated to similar pairwise distances in the low dimension, q ij , which follow a Student's t-distribution with a single degree of freedom: If the low-dimensional representation has correctly maintained the same high-dimensional pairwise distances, then p ij = q ij for all pairs i, j.
An important consideration for t-SNE plots is the choice of perplexity. This hyperparameter estimates a global value of σ i , as there is no single value of σ i to describe all data points. Perplexity measures how much of the local structure is retained in the final low-dimensional map; as perplexity increases, local structure information is exchanged for global structure information 53 . Typical values of perplexity range from 5 to 50; a perplexity value of 15 is chosen as this best shows the cluster structure within our data.

Framework precision
Matching rates and ARIs for AE data from three SiC/SiC specimens are presented (Tables 1-3) for partitions made from: (i) sensors of the same model fixed on opposite ends of the specimen gauge and (ii) sensors of different models fixed at the same gauge location. Specimen 1 was found to have a misaligned sensor on its larger epoxy tab wherein it did not fully overlay the minicomposite and is included to show that the ARI metric can also be used to detect experimental issues such as these. The ARI of zero for sensor pairs that included sensor S9225-b are the result of this improper placement.
From the high values of matching rates and ARIs obtained across specimens, we conclude that labels are not assigned randomly between sensors based either on model or location. A corollary of this observation is that stochastic experimental effects that are known to influence frequency spectra, such as source-tosensor distance 17 or proximity to a free surface 54 , do not drive label assignment when using the approach presented herein.
The clustered events at each individual sensor exhibit distinct activities in the stress domain, which are strongly characteristic of  how damage progresses in a CMC (shown for Specimen 3 in Fig. 3). These clusters follow the established chronology of damage accumulation in CMCs, wherein cracks initiate in the SiC matrix early in the loading profile and evolve throughout the specimen lifetime. As such, the initially activated cluster is designated as matrix cracking. After the onset of major matrix cracking, fiber failure begins and fibers continue to fail until rupture 55 . The secondary cluster becomes significantly active at 70-85% of the ultimate tensile strength (UTS), which agrees with experimental observations of fiber failure in SiC/SiC 11 .

Partial power trends
To further explore the hypothesis that there are frequency trends which allow for discrimination between matrix cracking and fiber failure, it is useful to inspect the partial powers of AE signals (the input representation) as a function of stress. It was found that select frequency bands exhibited similar behaviors, as shown in Fig. 4. Specifically, AE events occurring at stresses >70% UTS exhibited tighter distributions in partial power. A two-sample Kolmogorov-Smirnov test shows that the partial powers, for the selected frequency bands, sampled below 70% UTS come from a different distribution than the partial powers sampled above 70% UTS at a significance level of α = 0.01 (Fig. 4). This decrease in partial power scatter coincided with the activation of the fiber failure cluster, but it is currently unclear whether this correlation is (i) characteristic of the damage mechanism or (ii) characteristic of the differential strain-at-failure of the constituents 56,57 (e.g., a weak fiber failing at low strain is indistinguishable from matrix cracking at the same strain). The second option is possible while still allowing for the stepped activity observed as the strain-to-failure of the fiber and matrix are disparate. The former possibility is more consequential, as it would a b c d  The resultant identification of AE events closely follows CMC damage chronology, wherein early matrix cracking is later followed by fiber breaks, even though the approach is fully domain-knowledge agnostic. The cluster that becomes active at ≈85% of the UTS is labeled as fiber failure, consistent with experiment 11 . Additionally, the partitions were highly precise across both the model and location of the sensors, and the partitioning was repeatable.
indicate that the ML algorithm is learning the differences between dominant damage mechanisms. Given that the homogeneous orthotropic model of wave propagation does not predict meaningful frequency trends for elastically similar material systems, additional experimentation and explicit modeling of AE in SiC/SiC minicomposite geometries is needed to understand the physical origins of the observed trends.

Feature vector cluster structure
The partitioning of each AE event into its associated cluster is the result of the intrinsic cluster-like structure and not an artifact of the chosen algorithm. This finding is evident in the visualization of the input feature vectors via t-SNE. In Fig. 5, the left-hand column shows the raw feature vectors plotted via t-SNE, and the righthand column shows the same feature vectors subsequently colored according to the labels assigned by our ML framework. In Fig. 5a, b, the two clusters are sufficiently distinct to be visually identified in the unlabeled data. The unlabeled cluster structures in Fig. 5c, d are less readily discernible, yet still evident. This behavior is likely a result of dimension reduction rather than an intrinsic lack of cluster structure, as validity metrics show the events are given the same labels.

DISCUSSION
Damage mechanism identification in SiC/SiC CMCs from AE data is of interest for both lifetime prediction and SHM. While computational frameworks exist for damage mechanism identification, these are predominantly successful only when the composite constituents are elastically dissimilar. As such, differentiating between the dominant damage mechanisms in SiC/SiC architectures has remained a long-standing challenge.
In this work, we develop and evaluate an AE-ML framework to overcome these difficulties. A modified partial power representation scheme is adopted that allows inspection of local changes to frequencies. This representation scheme is combined with the spectral clustering algorithm, which is well suited for partitioning AE data. This framework is then applied to waveforms collected by our unique four-sensor configuration, which allows us to draw the following conclusions: 1. Damage mechanism identification in elastically similar composite systems is possible, when salient features are chosen to represent AE waveforms and a suitable ML approach is applied. 2. Partial power is a salient feature for damage mechanism discrimination. It is not significantly perturbed by stochastic experimental factors, as is evidenced by consistent labeling between sensors, independent of both location and model. Moreover, t-SNE plots show that AE data intrinsically adopt cluster-like structures when represented by this scheme. 3. There are meaningful frequency trends, which are not predicted by the orthotropic model of wave propagation, which enable damage mechanism discrimination in SiC/SiC CMCs. Further investigation is needed to determine the physical basis of this.
This AE-ML framework is domain-knowledge agnostic, yet when contextualized with domain knowledge it is clear that cluster activities follow a domain-based assessment of accuracy, indicating promising robustness of this approach. Still, a full characterization of this framework's behavior is needed before it can be more broadly applied. While precision of clustering results is high in our model minicomposite system, it is unclear how precision will be affected when this framework is applied to more complex CMC architectures, where more damage mechanisms are active. Additionally, it is not known how specimen geometry, architecture, and scale influence AE features 58,59 ; this is an active area of research. Both of these questions will require testing large-scale composites in conjunction with in situ microstructural observations for error quantification and is the subject of future work. We hypothesize that the shift in partial power is a result of a shift in active damage mechanism from matrix cracking to fiber failure; however, further experimentation and modeling is needed.
Finally, it is desirable to understand the drivers for observed trends in partial power in order to remove irrelevant frequency bands from the feature space and increase precision. Future investigations will aim to address this.

Experimental configuration
SiC/SiC minicomposites (Rolls Royce High Temperature Composites, Cyprus, CA) were loaded under uniaxial tension using a custom load frame built in-house with a 220 N load cell and a cross-head displacement rate of 0.120 mm/min, corresponding to a nominal strain rate of 7.5 × 10 −5 s −1 . Specimens consisted of 500 Hi-Nicalon Type S TM SiC fibers (HNS fibers), a BN interphase, and an overlayer of chemical vapor-infiltrated SiC matrix.
The microstructural and damage characteristics of these minicomposites are described in Swaminathan et al. 7 , where they are referred to as low fiber content (LFC) minicomposites. Three tension tests on LFC minicomposites are presented here to demonstrate the ML framework. AE activity was recorded using a four-channel fracture wave detector acquisition system (Digital Wave Corporation, Centennial, CO). The threshold voltage was set to 0.1 V, the number of pre-trigger points was set to 256, and the total length of signal captured was 1024 points at a rate of 10 MHz. Two sets of AE sensor models were mounted (Fig. 1) on Duralco 132 epoxy tabs with vacuum grease (Fig. 1b). The tab thickness was 0.35 mm, as measured from the surface of the minicomposite to the sensor. Tabs were created using three-dimensional-printed molds of 10 mm in diameter.
The choice of coupling medium is significant for waveform transmission through an interface. The transmission coefficients are determined by how -a on the left and are subsequently colored on the right according to the labels given after spectral clustering as described in Fig. 3. While the t-SNE axes have no intrinsic meaning, well-formed clusters indicate that partitions were made according to similarity between feature vectors and are not an artifact of the clustering routine.
closely the acoustic impedance of the specimen and the couplant match.
As the impedance mismatch between the two increases, the transmission coefficient decreases 60 . Of the available medium choices examined, Duralco 132 exhibited the lowest impedance mismatch to SiC/SiC. Additionally, the thickness of the epoxy tabs controls the quality of data collected. Previous work has reported that transmission coefficients decrease as a function of coupling medium thickness 60 . Thinner epoxy tabs resulted in more consistent labeling, which is hypothesized to be a result of the higher transmission coefficients at all frequencies as compared to thick (>1.5 mm) epoxy tabs. AE signals were recorded using two miniature S9225 piezoelectric AE transducers (Physical Acoustics, Princeton, NJ) with a broadband response of 300-1800 kHz and two B1025 transducers (Digital Wave Corporation, Centennial, CO) with a broadband response of 50-2000 kHz. The sensors were designated S9225-a, S9225-b, B1025-a, and B1025-b, where the I.D. letter corresponds to the position on the minicomposite (Fig. 1). The acquisition system was synchronized; when one sensor was triggered, all sensors recorded waveform data simultaneously.

Data processing
After acquisition, the raw AE data were cleaned to remove events not suitable as input to the ML framework. First, clearly identifiable noise events were removed (Fig. 6a). These events were either characterized by a single voltage spike in the time domain of the waveform or they presented with energies <0.001 V (low signal-to-noise ratio). Recorded waveforms that showed multiple damage events within the same time window (Fig.  6b) were also not considered; <1% of all recorded waveforms were of type (b). If the signal saturated the sensor (Fig. 6c, d), the event was not considered. The majority of waveforms removed were of types (a), (c), and (d). Then a location analysis was performed to remove out-of-gauge events 29 using: where x is the sensor separation distance, Δt is the difference in time-ofarrival, and Δt x is the difference in time-of-arrival for an out-of-gauge event as a function of the damage parameter. This analysis ensured that only signals arising from in-gauge damage events were analyzed. During this process, it was found that the majority of type (a) waveforms (>90%) came from out-of-gauge events. This cleaning removed approximately 35% of all recorded AE waveforms, including out-of-gauge events. Fig. 6 Examples of removed waveforms. a Low signal-to-noise ratio, b two distinct damage events occurring in the same time window, and c, d events that saturated the AE sensor. The majority of removed waveforms were of types a, c, and d. Less than ten waveforms were of type b across the three specimens.