Investigation into using resonant frequency measurements to predict the mechanical properties of Ti-6Al-4V manufactured by selective laser melting

There is a need to qualify additively manufactured parts that are used in highly regulated industries such as aerospace and nuclear power. This paper investigates the use of resonant ultrasound measurements to predict the mechanical properties of Ti-6Al-4V manufactured by selective laser melting using a Renishaw AM 250. It is first demonstrated why R2 should not be used to assess the predictive capability of a model, before introducing a method for calculating predicted R2, which is then used to assess the models. It is found that a linear model with the resonant frequency peaks as predictors cannot be used to predict elongation at failure or reduction in area. However, linear models did demonstrate better predictive capabilities for Young’s modulus, yield strength, and especially ultimate tensile strength.

www.nature.com/scientificreports www.nature.com/scientificreports/ statistical models and a discussion of those results. The final part of this work gives the authors' conclusions about the usefulness of resonant frequency data in predicting tensile properties.

Method
Experimental method. Selective laser melting. A batch of 60 mechanical test blanks was additively manufactured using a Renishaw AM 250 selective laser melter. The AM 250 uses a 50 ms pulsed ytterbium fibre laser of wave length 1060 nm. As these parts were primarily intended for fatigue testing, the part geometry was based on the 12.5 mm 2 cross section FCE type A cylindrical fatigue specimen detailed in BS EN 6072:2010 5 . To enable the parts to be machined to final size before mechanical testing, extra material was included by adding an offset of 0.5 mm to all surfaces in the CAD model. The parts were designed around a fatigue geometry to support another element of work under this grant, however tensile tests were used in this part of the work due to significantly narrower distribution of results obtained by tensile testing than by fatigue testing.
Following creation of the 3D CAD representation of the test geometry the file was processed using Materialise Magics. Specimen samples were located on the build plate as indicated in Fig. 1. Individual specimen ID's were extruded on to the top surface of each specimen. These specimens required no additional support structures, however, the bottom face was extruded by 0.5 mm to facilitate removal via wire electro-discharge machining (EDM) following completion of the build.
The parts were manufactured using a 30 μm layer thickness, and a meander scan strategy with a 67° hatching direction rotation between layers. The laser power, exposure time, point distance, and hatch distance values are given in Table 1. The mean travel speed of the laser spot is inferred by the exposure time, point distance, and inter-point travel speed. The powder used was Renishaw Ti-6Al-4V ELI-0406 powder, which has a nominal particle size range of 15-45 μm. The parts were built with the longitudinal axis parallel to the machine Z-axis.
The parts were removed from the build plate using EDM prior to annealing to enable resonant testing in both the as built and annealed conditions. The annealing heat treatment, which was performed in a vacuum furnace, was: 1. Evacuate the chamber. 2. Heat to 350 °C in 1 hour. 3. Hold at 350 °C for 1 2 hour. 4. Heat to 850 °C in 1 hour. 5. Hold at 850 °C for 1 hour. 6. Turn-off heating and allow to furnace cool. 7. Turn-off the vacuum pumps once the temperature was below 100 °C.
Resonant frequency measurements. The resonant ultrasound testing was performed using a system provided by Vibrant GmbH (Elz, Germany), which was designed for PCRT. The measurement system included a piezoelectric driver for inducing the resonance and two piezoelectric sensors for measuring the resonance. The driver and sensors were brought into contact with the part by placing the part in a purpose built jig that was provided with the measurement system (see Fig. 2). The primary purpose of the jig was to ensure consistent placement of the driver and sensors.  www.nature.com/scientificreports www.nature.com/scientificreports/ Resonance ultrasound measurements were taken from the parts in the as built state and after annealing. The frequency scanning ranges were selected by the RUT instrument manufacturer as being from 500 Hz to 226.5 kHz for the as built parts, and from 500 Hz to 600.5 kHz for the annealed parts. The measurement of 8 resonant frequency peaks was selected by the instrument manufacturer for parts in the as built state and 17 for after annealing. Further details about the instrument manufacturer's process for selecting the resonant frequencies can be found in Sloan et al. 2 . The driver and sensors were controlled by the instrument manufacturer's data acquisition system, which was in turn connected to a personal computer (PC) running the manufacturer's Quasar/Galaxy software. The Galaxy software analyzed the sensor data to identify the resonant frequencies of the part. When a peak could not be identified during data collection the software issued a warning message, in which case the scan was repeated for that part. This would sometimes enable the identification of all peaks, but would other times still lead to incomplete results, in which case the scan was repeated for a third and final time.
Tensile. To assess the mechanical properties of the samples, fifteen parts were sent to Exova Ltd (Middlesborough, UK) for room temperature tensile testing according to ASTM E8 6 . The samples were machined by the testing company prior to performing the tensile testing. The machined sample dimensions are according to specimen 4 of figure 8 in ASTM E8 6 for test specimens with a gauge length four times the diameter (gauge length = 16 mm, diameter = 4 mm).
Statistical method. Introduction to cross-validation, predicted R 2 , and the lasso. An often used statistic to assess the goodness of fit of a statistical model, and to select between different models, is R 2 . R 2 is the ratio of variability in the response variable that can be explained by the model to the total variability in the response variable. The total variability in the response variable is measured by the total sum of squares (TSS). The variability that can be explained by the model is equal to the TSS less the residual sum of squares (RSS) (1), where y i is the value of the i th response variable, ŷ i is the predicted value of the i th response variable, and y is the mean value of the response variables. R 2 is often seen as being relatively easy to interpret, as it always takes a dimensionless value between 0 and 1, with 0 indicating no fit whatsoever, and 1 indicating a model with a perfect fit to the data. However, a significant problem with R 2 is that the measure improves in the presence of over-fitting. Over-fitting occurs when the model complexity is increased in an attempt to reduce the residual error that is due to noise. Unfortunately, by reducing the residual error for the training set, this can increase the error when making predictions. For example, consider an experiment to measure the thermal expansion of a material over some range where the expansion is actually a linear function of temperature (l = 8.64 × 10 −6 ⋅ T + 1), but where the length measurement is subject to a Gaussian error. Table 2 shows some randomly generated data for such an experiment.
Due to the Gaussian noise in the measurement of the length, the residual error can be reduced by over-fitting. Figure 3 shows the case where both a linear model and a 9th order polynomial have been fitted to the ten data points of Table 2. The linear model (Eq. 2) has an R 2 value of 0.9447, indicating an excellent fit.   Using the R 2 measure, the polynomial model would appear to be better. But it is important to note that whilst the polynomial model is indeed a better fit for this training data, if this model is used to make predictions of sample length based on temperature, then on average the linear model will perform better. For example, the observed value at 30 °C is 1.000419 m. The polynomial model predicts a mean sample length of 1.000424 m at 30 °C. The linear model predicts a mean length of 1.000264 m at the same temperature. The true mean length of the underlying model used in the simulation at 30 °C is actually 1.000259 m. So whilst some repeat measurements at 30 °C may be closer to the value predicted by the polynomial model, on average the measurements at 30 °C will be closer to the value predicted by the linear model.
There are a number of available statistical techniques (see James et al. 7 for some examples) that can be used to assess the ability of a model to make predictions and help the designer to avoid over-fitting; one class of methods is cross-validation. In cross-validation the training data is first split into groups. Then all but one of the groups are used to fit the statistical model. The excluded group is then used as a test set to assess the performance of the fitted  Table 2. Randomly generated data for the measured length of a heated sample with Gaussian measurement noise. Observations Linear model Polynomial model Figure 3. Computer simulated measurements from a linear system with Gaussian noise. The linear model is an accurate model for the system, but is not able completely remove the residual error. The polynomial model is a poor approximation to the underlying regression, but through extreme over-fitting is able to almost eliminate the residual error. (2019) 9:9278 | https://doi.org/10.1038/s41598-019-45696-w www.nature.com/scientificreports www.nature.com/scientificreports/ model. The excluded group is then returned to the training set, and the process repeated with a different excluded group. This process continues until all groups have been used as the test set once. If the training data is split into groups of one, the method is referred to as leave-one-out cross-validation (LOOCV).
One metric that can be calculated using leave-one-out cross-validation to assess the predictive power of a model is the prediction sum of squares (PRESS) 8 . The PRESS is calculated as the sum of the squared differences between the actual value of the response variable (y i ) and the value predicted by the model fitted using the set excluding the value being predicted (ŷ i ( ) ) (4). The PRESS can also be extended to the more general case of cross-validation, in which case it is the sum of the squared differences between the actual response variable value and the value predicted by the model fitted using the test set excluding all members of the same group as the member being predicted. If the PRESS statistic is divided by the population size of the training set, this mean cross-validation error becomes an estimator of the test mean squared error (MSE).
Whilst the estimated test MSE is a widely used performance measure, interpretation can be difficult as the magnitude of the MSE will depend upon the magnitude of the response variable and will have squared units of the response variable. To put the PRESS statistic into perspective, it can be useful to compute the PRESS of the null model. The null model is the model with all of the coefficients set to zero, which is equal to the mean of the response variable. The PRESS of the null model (5) is the sum of the squared differences between the actual value of the response variable (y i ) and the mean value of the response variable excluding the samples from the same group as the sample currently being predicted (y i ( ) ).
The total sum of squares that appears in Eq. 1 can also be considered as the sum of squares of the null model, which makes R 2 a measure that compares the error (sum of squares) of the model to the error of the null model. Following this definition of R 2 , the predicted R 2 can be defined using LOOCV, which replaces the numerator with the PRESS of the model, and the denominator with the PRESS of the null model (6). It should be noted whilst the maximum value of predicted R 2 is one, unlike R 2 , predicted R 2 can take negative values. This will be the case if the model has a higher sum of squares error than the null model when tested using LOOCV. If the predicted R 2 is calculated for the example models in Fig. 3, the value for the linear model is 0.93, suggesting that the model is a good approximation for the underlying regression and can be used with some confidence to make predictions about sample length based on temperature. However, the predicted R 2 value for the polynomial model is −119, which indicates that the polynomial model is a poor approximation to the underlying system, and should not be used to make predictions about sample length based on temperature measurements.
In the case of a polynomial model with over-fitting, the over-fitting can be reduced by reducing the order of the polynomial without affecting the number of predictors used in the model. In the case of a multiple linear regression, LOOCV still enables the assessment of model over-fitting, but this then leads to the question of how select the predictors to remove from the model to reduce the over-fitting. There are a range of variable selection techniques that can be used, but the method used in this work is the lasso 9 (least absolute shrinkage and selection operator). The lasso (Eq. 7) introduces an extra term to the regression equation that is the sum of the absolute value of the coefficients multiplied by a tuning parameter (λ). This extra term is the lasso penalty, which has the effect of reducing the coefficients towards zero. This effect will be greater the larger the value of the tuning parameter. If the tuning parameter is equal to zero, the equation reverts to the standard equation for multiple linear regression. Statistical models. To analyze whether the resonant frequency data could be used to predict the tensile properties of the samples, multiple linear regressions were fitted to the frequencies of the resonant peaks (predictors) and each tensile property (response variable). The predictors were taken to be either the set of measured resonant frequencies for as built condition or the annealed condition. Given that the training set size was only 15 and there are 8 predictors (resonant frequencies) for the as built condition, and 17 predictors for the annealed condition, variable selection was performed using the lasso. In this analysis, the linear model with the lasso is calculated by co-ordinate descent using the Glmnet library 10 in the R programming language. The near-optimum tuning parameter is determined by calculating the lasso for a range of discrete values, and using LOOCV to calculate which value of λ gives the lowest estimate of test MSE. 1000 different values of λ between 10 3 and 10 −3 were tested in descending order, as given by (8). The ordinary least squares (OLS) multiple linear regressions and the multiple linear regressions with the lasso were calculated using the as built predictors. As the number of annealed predictors is greater than the training set size, only the multiple linear regressions with the lasso were calculated using these predictors.

Results
Experimental results. Tensile tests. The results of the room temperature tensile testing are presented in Table 3. AMS4999A 11 specifies minimum values in the Z-direction for additively manufactured Ti-6Al-4V of 855 MPa for ultimate tensile strength (UTS), 765 MPa for yield strength at 0.2% offset, and 5% elongation at failure. It can be seen from Table 3 that all samples met the minimum requirements for UTS and yield stress, but that seven of the samples failed to meet the required standard for elongation.
Resonant frequency measurements. The measured resonant frequency data for the as built state can be found in Table 4 and for the annealed state in Table 5. It should be noted that for one of the parts (FCE B6) one of the resonant frequency peaks could not be detected in the annealed condition. This part was excluded from the training set for the annealed model, which led to different training sets for the two conditions. It is for this reason that two null models were calculated for each tensile property.     www.nature.com/scientificreports www.nature.com/scientificreports/ presented graphically in Fig. 4, which for each tensile property plots the value of the response variable as predicted by the model built with that sample excluded against the actual value of the response variable. A line is also included in the plot to indicate where the points should lie if the predicted value equals the actual value.    www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
The results for all six models indicate that a multiple linear regression using resonant frequency peaks cannot be used to make predictions about elongation at failure or reduction in area. The two ordinary least squares models have predicted R 2 values less than zero. Two of the models using the lasso have reduced to the null model with no non-zero coefficients, and the remaining two models that use the lasso have predicted R 2 values of just 0.14 and 0.02.
For the Young's modulus, the yield strength, and the ultimate tensile strength, the models have varying levels of predictive capability. The ordinary least squares models using the as built predictors for the Young's modulus and the yield strength performed worse than the null model, having predicted R 2 values of −1.06 and −0.54. The equivalent model for predicting ultimate tensile strength performed better, having a predicted R 2 value of 0.18. For the models built using the as built predictors with the lasso, the predictive accuarcy was increased for all three tensile properties, with the model for ultimate tensile strength again demonstrating the greatest predictive accuracy. For the Young's modulus, the yield strength, and the ultimate tensile strength, the multiple linear regressions that gave the lowest prediction errors were those using the annealed frequency peaks with the lasso. Of these three tensile properties, the Yield strength model predictions had the greatest error, with only a modest improvement over the null model ( = . R 0 27 PRED 2 ). The model using the annealed predictors showed some ability in being able to predict the Young's modulus with = . R 0 54 PRED 2 . This would suggest, given the better performance of the UTS model, that although resonant frequency peaks can be used to make predictions about the Young's modulus of additively manufactured Ti-6Al-4V, a multiple linear regression using resonant frequency peaks is insufficient to properly model the relationship (bias error).
The models for predicting the ultimate tensile strength had the lowest error of all the multiple linear regressions. The model using the annealed predictors achieved = . R 0 74 PRED 2 , suggesting that this model may be useful for predicting the UTS. It is unclear whether the variance unexplained by the model is due to the inherent distribution of the response variable (variance error), or whether there is an element of bias error, either due to an incorrect model type, or the exclusion of predictors other than resonant frequency -for example the part temperature or mass.
It should be noted that for all of the models that have at least one non-zero coefficient that the value of R 2 is greater than the value of predicted R 2 . For some of the models the values of the two measures of R 2 are sufficiently close that the same interpretation of predictive ability would be reached from either measure. For example, the model to predict UTS using the as built predictors with the lasso has an R 2 of 0.78 and a predicted R 2 value of 0.71. It is quite possible that both of these results could be read to suggest that the model has some predictive capability. Similarly, the model to predict the reduction in area using the annealed predictors with the lasso has an R 2 value of 0.12 and a predicted R 2 value of 0.02. Again it is reasonable to assume that both of these results would be interpreted in the same way: this model has no predictive ability. However, for the ordinary least squares model using the as built predictors to predict reduction in area, the R 2 value of 0.69 could be understood to say that the model does have some predictive use, whereas the predicted R 2 value of −1.65 clearly indicates that no predictive use should be made of the model. This final example illustrates why the authors believe that predicted R 2 should be used in place of R 2 when considering the ability of a statistical model to make predictions.

Conclusion
It has been demonstrated that R 2 should not be used to measure the ability of multiple linear regressions to make predictions, as low residual error between model and training data does not necessarily translate to low prediction error when used with test data. The predicted R 2 value has been introduced, which is calculated using leave-one-out cross-validation, and is a more useful measure of the prediction accuracy of a statistical model. The calculated value of predicted R 2 has been used to assess the predictive ability of a series of linear models using resonant frequency peaks to predict the mechanical properties of samples built using selective laser melting.   www.nature.com/scientificreports www.nature.com/scientificreports/ The statistical results indicate that the resonant frequency peaks as measured in the samples either in the as built or the annealed states can be used to make predictions about the ultimate tensile strength of the samples. It should be recognized that although these two linear models were able to explain the majority of the variability in the measured values of UTS, the residual errors are not insignificant when compared to the limited scatter in the measured UTS.

AB Null AB OLS AB Lasso Ann Null Ann Lasso
The linear models were less able to accurately predict the measured value of Young's modulus than the UTS. However, a predicted R 2 value of 0.54 using the annealed resonant frequency data does suggest the possibility that a different type of model could potentially make useful predictions about Young's modulus using resonant frequency measurements.
There is very limited evidence to support the conclusion that resonant frequency measurements can be used with a linear model to predict yield strength. There is no evidence that a linear model can use resonant frequency data to predict either elongation at failure or reduction in area. It is slightly disappointing that the models were unable to make useful predictions about elongation or reduction in area, as these were the properties that were found to have the most variability in the mechanical testing results.