Abstract
A hybrid rulebased/ML approach using linear regression and artificial neural networks (ANNs) determined pitting corrosion descriptors from highthroughput data obtained with Scanning Electrochemical Cell Microscopy (SECCM) on 316 L stainless steel. Nonparametric density estimation determined the central tendencies of the Epit/log(jpit) and Epass/log(jpass) distributions. Descriptors estimated using conditional mean or median curves were compared to their central tendency values, with the conditional medians providing more accurate results. Due to their lower sensitivity to high outliers, the conditional medians were more robust representations of the log(j) vs. E distributions. An observed trend of passive range shortening with increasing testing aggressiveness was attributed to delayed stabilisation of the passive film, rather than early passivity breakdown.
Similar content being viewed by others
Introduction
Despite considerable achievements in the predictive modelling of pitting corrosion^{1,2,3,4,5,6}, more research is still undoubtedly needed. The challenge of estimating relevant pitting descriptors from experimental data is still seldom addressed in literature^{7}.
Potentiodynamic polarisation (PP) curves are one of the main electrochemical techniques used for corrosion research in academia, also with a particularly high acceptance in the industry^{8}, as a benchmark test for examining the resistance to localised corrosion. As summarised by Hughes et al.^{1}: “the cyclic polarisation (CP) method, such as the standard ASTM G61^{9}, is probably the only standardised, traditional electrochemical method used to determine the relatively localised corrosion susceptibility. It involves the anodic polarisation of a specimen until localised corrosion initiates, as indicated by a large increase in the applied current. An indication of the susceptibility to initiation of localised corrosion in this test method is given by the potential (E) at which the anodic current increases rapidly, i.e., the breakdown potential. The nobler (more positive) this potential, the less susceptible the alloy is to initiate localised corrosion. The conventional understanding is that the breakdown potential is the potential above which pits are initiated”^{1}.
Not only do corrosion experts^{7,10} often rely on a rather qualitative description of the pitting potential (“Epit is defined as a potential above which there is a rapid increase in the current on a polarisation curve”^{10}), but also the referred standard^{9} is vague on the extraction of the descriptor out of PP (or CP) curves (“the potential in which a sharp rise in current is observed”). According to another standard, ISO 15158^{11,12}, “Epit is defined as the potential corresponding to the anodic current density of 10 μA cm^{−2} in the region of stable pit growth”. Nonetheless, such a definition (yet quantitative) is potentially problematic since it is based on a fixed, static value, not considering the likely high variability of responses.
Beyond the sensitivity to the concentration and combination of aggressive species^{13} and the scan rate^{14}, the determination of Epit was found to be dependent on the experimental method used^{15}. Simple potentiodynamic polarisation experiments have shown extremely variable results in pitting potential^{16}, exhibiting wide experimental scatters of hundreds of millivolts^{6}.
Previously, it was believed that Epit had a sharp threshold value below which all specimens would exhibit infinite immunity to pitting, and any observed data scatter was attributed to poor experimental control^{17}. However, Nathan and Dulaney^{18} were the first to challenge this notion, emphasising the importance of statistical approaches to localised corrosion. Subsequently, Shibata and Takeyama^{19} demonstrated that the random variation in data is an intrinsic property of pitting corrosion and should be analysed statistically. More recently, Nyby et al.^{7} precisely observed that the rapid increase in current density (j) occurs “when the applied potential is more noble than a specific range of values”. One research paper even argues that it is questionable that exact values of pitting potential can be experimentally measured^{10}.
Difficulties in determining a generalised value for Epit are associated with events (stable pitting growth) that are very dynamic in nature (e.g., high pit growth rates and extreme pit chemistry changes) and take place on a nanometre scale^{15}. Aggressive species (Cl^{}), in combination with surface heterogeneities, trigger a dynamic degradation process in which transient passivity breakdown/repassivation events occur over a large population of initiated pits^{4}. The study of localised corrosion triggered by chloride remains a relevant topic within the scope of the targets set by the blue economy, as pitting corrosion is particularly harmful in marine environments and coastal areas^{20}. The development of advanced scanning electrochemical techniques, such as the scanning vibrating electrode technique (SVET) and scanning electrochemical microscopy (SECM)^{21}, has facilitated substantial progress in research on localised corrosion^{1}. The Scanning Electrochemical Cell Microscopy (SECCM) is the next generation of the wellknown electrochemical droplet cell technique^{22} and differs from the more commonly used SECM^{23,24}, as only small portions of a surface are exposed to electrolyte through brief meniscus contact from a nanopipet probe, and electrochemical signals are measured directly^{25,26}. In this work, the SECCM was selected as an experimental tool mainly due to its proven highthroughput capabilities^{22,27,28}. The collection of statistically representative amounts of data is key when high variance is expected in the target feature^{29}.
Only a limited amount of works in localised corrosion have used datadriven approaches^{2,7,30,31,32,33,34,35,36,37,38} so far. The major reasons for their limited application are the community traditionally relying on lowthroughput means for data generation, focusing on specific inputoutput relationships; and complicated feature engineering due to the vast number of influencing variables. The conjecture of pitting corrosion should ideally be faced in the light of datacentric approaches. As shown in our previous work^{29}, the distributions of the local current density at potential regions associated with pitting are potentially uniform (high randomness). This means that, as the observation error tends to decrease with increased sample size^{39}, if only a few samples are considered, the actual underlying distributions are not captured (subrepresentation). Similarly to what has been done for ML modelling of corrosion inhibitors^{40,41,42,43,44,45}, the creation of structured databases for pitting corrosion is urged^{31,37,46}.
As explained by Weaver et al. in a 2022 communication on the unsupervised learning of voltammetric data^{47}, deviations from the model behaviour can significantly enhance the complexity of the data extraction. Therefore, instead of performing the task by hand, as traditionally done by electrochemists^{47}, there is an emerging call for recording data in (semi) automated ways, including highthroughput screening^{48}.
This work elaborates on 5 datasets of log(j) vs. E (PP) curves obtained in a highthroughput fashion with the SECCM on 316 L stainless steel. We provide a methodology for estimating Epass (passive potential) and Epit from: 1. typical log(j) vs. E curves with a straightforward passivity breakdown (using an algorithm based on linear regression (LR)); 2. PP curves with more unique profiles mainly due to metastable events (using artificial neural networks (ANNs) trained on the LR estimates). The estimated Epit and Epass descriptors of 316 L are included in this article (Dataset 1,.ipynb files) and available to download in a public repository^{49}.
Furthermore, as there are cases where the estimate of the conditional distribution of y given x (log(j) given E) is not always a conditional mean (although this is most common^{50}), we also considered the analysis of quantiles curves (the conditional median, in particular). The main advantage of conditional quantiles is to give a more comprehensive analysis of the relationship between E and log(j) at different points in the conditional distribution of log(j)^{50}. Therefore, we also propose a simplified methodology for determining the central tendency of the Epit/log(jpit) and Epass/log(jpass) distributions using the conditional median (or mean) of the log(j) vs. E curves. These proxy estimations were compared against the outputs of nonparametric density estimations, considered as the ground truth of the central tendencies of the descriptors. The related code is available (https://github.com/bcoelholeonardo/Estimatingpittingdescriptorsof316Lstainlesssteelbymachinelearningandstatisticalanalysis/tree/5c7c8eac41907667f94c22881650f23a6aee0d64), and is expected to serve as a toolkit for future localised corrosion works dealing with big data. The same code can be a basis for extracting meaningful descriptors for other potentiostatic or potentiodynamic experiments important in electrodeposition, electrocatalysis, and other electrochemical processes^{51,52}.
Many decades ago, Evans^{53} noted that studying the probability of corrosion is more practically important than determining the exact corrosion rate values. We expect to provide a foundation for the future development of monitoring tools (based on current or potential measurements) capable of predicting stable pitting with secure margins.
This work provides three main contributions: 1. a robust MLbased method for estimating Epit/log(jpit) and Epass/log(jpass) descriptors from individual polarisation curves; 2. an accurate proxy model (conditional median of log(j) for estimating the central tendencies of the descriptors distributions for a given dataset; 3. insights into localised corrosion mechanisms gained by interpreting the proxy models and also by selecting a subset of log(j) vs. E examples presenting the highest activities (high outliers).
Results and discussion
Density estimation of passivity and pitting descriptors
In this work, we were initially concerned with the problem of estimating conditional quantiles, as such analysis often results in further insights out of the distributions of our random variable (log(j)E)^{50}.
Figure 1 shows the kernel density estimations of the Epass/log(jpass) and Epit/log(jpit) for the 5 experimental datasets. The quantiles of the log(j) vs. E curves are superimposed in the plots to illustrate the high dispersion of both passivity and pitting descriptors. As a general trend, the high dispersion of the descriptors observed in the log(j) direction seems relatively constant for all sets, while the dispersion in the E direction seems to increase with the testing aggressiveness. The distributions of Epass/log(jpass) and Epit/log(jpit) generally extend as far as the corresponding Qmin and Qmax curves, except in cases of individual outliers present (such as in Fig. 1c, d). In any case, the distributions of the descriptors clearly spread beyond the socalled interquartile ranges (the IQR is the middle half of a dataset, comprising the range between the second and third quartiles).
Central tendency estimations of descriptors based on the mean and median models
While the estimation of descriptors based on the conditional mean of distribution might serve, there exists a large area of problems (outliers detection^{54,55}, risk assessment^{56}) where estimating a quantile, such as the median, would be a better choice^{50}.
The same hybrid LRbased ANN approach employed on each individual log(j) vs. E curves was applied to the conditional mean/median curves of the populations, to estimate their central tendency values. The Figs. 2 and 3 show the Epass/log(jpass) and Epit/log(jpit) values obtained from the mean and median curves for extreme case datasets (the least and the most aggressive conditions, respectively). The results corresponding to the intermediate conditions are displayed in Supplementary Figs. 1, 2, 3. In plots a of Figs. 2 and 3, the conditional means (with their conditional standard deviations (SD) and errors (SE)) are plotted with the Epass/Epit estimates provided by the mean model; while plots b of Figs. 2 and 3 display the conditional medians (with their conditional median absolute deviations (MAD)) with the outputs of the median model (all estimates are represented by cross markers, in reference to “dart attempts”). In all of these plots, the ground truth for the central tendencies of the descriptors (maximum kernel density estimation (KDE) values) is plotted as a benchmark (represented by circle markers in reference to “target locations”).
It could be observed that the conditional mean of log(j), as a function of E, was generally a poor representative of a polarisation curves set (in line with the log(j) distributions scrutinised in ref. ^{29}). As illustrated in Fig. 3a (0.05 M NaCl, 50 mV s^{1}), the averaged values do not outline a typical polarisation curve. In this most aggressive condition, the Epass feature could not even be estimated from the mean curve. On the contrary, the conditional median seemed to capture further the expected overall behaviour of this set of curves (Fig. 3b).
By analysing the Epass/log(jpass) and Epit/log(jpit) ground truth values with respect to the conditional mean, one could see they were relatively far from these curves, mostly lying outside the standard error limits (plot a, Fig. 2; Supplementary Figs. 1, 2, 4). This mismatch between the “target locations” and the mean curves explains the mean model’s failure to provide accurate values for the descriptors (as those are estimated by fitting the conditional mean curves).
On the contrary, estimating the Epass/log(jpass) and Epit/log(jpit) values from the conditional median curve of a set provided an accurate alternative for obtaining their central tendencies. One could observe the median curves generally crossing (or at least touching) the “target” markers (plot b, Fig. 2, Supplementary Fig. 1, Supplementary Fig. 2, Supplementary Fig. 3); the only exception was the Epit in 0.05 M NaCl (50 mV s^{1}), which was the most difficult value to be appraised out of the 10 descriptors considered (high uncertainty of j values in E regions associated to pitting^{29}).
Moreover, one could see that the data distribution (KDE) extended beyond the standard deviation (by comparing plots a of Fig. 2, Fig. 3, Supplementary Fig. 1, Supplementary Fig. 2, Supplementary Fig. 3 with Fig. 1). The analysis of the quantile curves was illustrative of the high data dispersion, with the descriptors distributions extending as far as the Qmin and Qmax curves in some cases.
Evaluation of the central tendency estimates based on residuals
At 0.05 M (50 mV s^{1}), the estimated location of Epit from the conditional median curve lagged behind this descriptor ground truth for this set (Fig. 3b). When evaluating the accuracy of estimation, not only the absolute distance between the estimate and the actual value is relevant, but also the sign of that difference; in other words, the sign of the model bias.
Residual analysis provides a basis for diagnosis checking while assessing model biases. The following bar charts present the residuals of estimation of log(jpass), Epass, log(jpit) and Epit, as a function of testing corrosiveness (Fig. 4). The shorter the bar, the more accurate the model estimation. Again, the ground truth for the central tendencies of the descriptors was the maximum KDE values of their distributions. The horizontal line (residual equal to zero) represents the ideal benchmark, where a regression would be 100% accurate. Results from both the conditional mean and conditional median models are displayed.
When estimating the central tendency of passivity descriptors (Fig. 4a), the residuals of log(jpass) were consistently and significantly smaller for the medianbased model than the meanbased one. With respect to Epass, the medianbased approach was also generally more accurate, the only significative exception being at 0.005 M NaCl, but with a relatively small residual (–0.0067 V) still. Again, the residual comparison from both strategies is not possible for the most aggressive scenario (0.05 M NaCl, 50 mV s^{1}), as the mean model could not even provide passivity descriptors. In the following most aggressive condition (0.05 M NaCl, 100 mV s^{1}), the estimation errors related to log(jpass) and Epass were generally the largest for both models. Nonetheless, even in this case, the median model could reduce the estimation residuals by 54.2% and 73.2%, respectively, compared to the mean model.
Regarding estimating the central tendencies of pitting descriptors (Fig. 4b), the same overall observations made for the passivity features could be replicated here. First, the log(jpit) residuals of the median model were systematically lower than the mean model (reduction in residuals of 89.5, 77.4, 53.8, 25.8 and 97.1% for increasing testing aggressiveness). Secondly, the median model generally yielded smaller residuals than the mean for Epit; when not lower, the magnitude of the error was still acceptable (0.0047 V for 0.01 M NaCl (50 mV s^{1})). The only case where the median estimator underperformed was for Epit in 0.05 M NaCl (50 mV s^{1}).
As already mentioned, the picture was somewhat less clear in the most aggressive set, likely the most challenging condition for the regression of features. Nevertheless, in the addressed cases where the median model produced larger residuals than the mean model, it is important to note that the estimations were at least negatively biased (positive residuals). When attempting the prediction of pitting corrosion, as no model is perfectly accurate, the negative bias would be favoured in comparison to positive bias: underestimation of Epit (or log(jpit)) is preferable to its overestimation. In other words, if estimation errors are unavoidable to a certain degree, it is more desirable to have stable pitting growth occurring at higher potentials (or current densities) than the expectations; as the opposite situation would imply in catastrophic failure based on overly optimistic predictions.
The magnitude and sign of the estimations of log(Epass) and log(Epit) can be appraised in Fig. 5a, in which the “passive current density ranges” are defined by the yellow bars. One can judge that the median model (green markers) outperformed to the mean model (blue markers) in accurately estimating the central tendency ground truth values for both current density descriptors.
Similarly, Fig. 5b presents the estimates of Epass and Epit compared against the corresponding “passivity ranges” (yellow bars). Both models produced relatively tight errors, although the median model resulted in overall better estimations. In general, the residuals of estimation were proportionally lower for the E descriptors than for the log(j) ones (further discussed in the section “Proxy models for estimating the central tendency of pitting descriptors”, including the analysis of coefficients of variation). As mentioned above, when the median model underperformed, at least an underestimation of Epit was verified (“preferable sign of bias” indicated in the plot). If one task is utterly crucial for the models, this would be the estimation of the Epit feature.
Larger errors of estimation were achieved in the two most aggressive conditions (0.05 M NaCl media) in general (considering all descriptors in Fig. 5), while the errors related to log(jpass) and log(jpit) in particular, increased with the testing corrosiveness (Fig. 5a).
This investigation provides a solid and simplified framework for estimating the central tendency of passivity/pitting descriptors. Instead of individually assessing an entire set of log(j) vs. E curves, estimating Epass/log(jpass) and Epit/log(jpit) from the conditional median curve can provide satisfactory outcomes, assuming that the data size is large enough. In the present case, all estimations of Epass and Epit (either from the conditional median of log(j)) of a set or the individual log(j) vs. E curves) were done using the same hybrid LR/ANN approach for a fair basis of comparison. By doing so, the authors avoided introducing additional sources of bias to the estimations. Nonetheless, other simplified medianbased methods could be thought of (an expert could even proceed with “by hand” selection^{47} of Epass/Epit).
Interpreting the higher robustness of the median model
In the case of polarisation curves displaying pitting corrosion, the conditional median of log(j) has qualitatively shown to be representative of the population of curves. As exemplified in Fig. 6, the location of the conditional median curve (plot b, green curve) was significantly coincident with the regions with the highest data density in the corresponding log(j) vs. E plot (plot a). Figure 6 illustrates the effect of high outliers (log(j) vs. E curves lying more than 1.5 times the IQR above Q3) on the conditional mean of log(j). The result is the shift of the conditional mean to log(j) values consistently higher than the conditional median curve.
The difference between the conditional mean and the conditional median tends to increase with E, because the high outliers present particularly high log(j) values at high potential regions (E > 1.15 V (vs. Ag/AgCl)). As demonstrated in^{29}, the log(j) distributions become more positively skewed with increased corrosiveness (more positive potential and higher [Cl^{}]). The occurrence of high outliers with particularly high j values at high E regions results from pitting corrosion processes. Indeed, applying more positive potentials increases the likelihood of metastable pitting (accompanied by repassivation events), which may gradually change into stable pitting growth.
As stated by Koenker^{50}, assessing a set of conditional quantile curves provides a more informative description of the relationship among variables, especially in cases of: 1. nonconstant variance; 2. nonnormality of the noise distribution. The described picture illustrates well the datasets in question, in which: 1. the log(j) distributions are heteroscedastic (increased conditional variance as a function of E and testing aggressiveness)^{29}; 2. the anomalous high j values at high potentials, observed for the high outliers (Fig. 6), could be seen as “noise”, as they positively skewed conditional means that would otherwise (in the absence of pitting) be normally distributed; as expected for electrochemical descriptors derived from PP curves of passive systems^{57,58}. If the pitting activity could be considered as “noise”, it would undoubtedly implicate the referred “nonnormality of the noise distribution”, as most of the population (grey curves in Fig. 6b) would have relatively low and similar noise levels with only a few examples (the high outliers) displaying significantly higher levels of noise.
To illustrate the positive skewness of log(j) beyond their conditional distributions, Fig. 7 displays the histograms of the log(jpass) and log(jpit) descriptors for the 5 datasets. From Fig. 7, it is confirmed that the medians of the log(jpass) and log(jpit) descriptors are more representative of the underlying distributions than the respective means, as the former were closer to the ground truth central tendencies of the distributions (maximum KDE values). Similar to what was observed for the conditional log(j) distributions (Fig. 6), the means of the descriptors were generally higher than the corresponding medians due to the presence of high outliers (Fig. 7). The only exception (median larger than mean) was at 0.05 M NaCl (100 mV s^{1}) (Fig. 7d), where low outliers were more prominent than the high outliers (the lowest conditional Qmin curve was computed for this dataset, Fig. 1d).
These statistical analyses further explain why quantile analysis (the conditional median of log(j), in particular) provided a robust model for a simplified estimation of passivity/pitting descriptors. In future investigations on predictive ML, instead of traditional least square regression, quantile (or robust) regression^{59,60} might be a promising route for approaching pitting corrosion^{50,61}. As a perspective, the analysis of the quantile curves (Fig. 1) might also help locate data clusters (preliminary defined before the application of the rulebased algorithm).
Effect of corrosiveness on the pitting susceptibility
Comparison of the central tendency values of the log(j) descriptors did not indicate a clear trend with increased testing aggressiveness (Fig. 7). On the contrary, by comparing the distributions of Epit (and Epass) (Fig. 8), a few tendencies as a function of corrosiveness could be appraised. First, similarly as previously determined for the conditional log(j)^{29}, the higher the aggressiveness, the more spread the distributions of the E descriptors (clear trend in Fig. 8, from b to e) tend to be. Secondly, all these distributions were continuous and roughly unimodal, with increased multimodality with corrosiveness, ultimately leading to a uniform functionlike behaviour at 0.05 M NaCl, 50 mV s^{1} (Fig. 8e). Despite Shibata and Takeyama’s conclusion that the random variation of the Epit of stainless steels (in macroscale polarisation) obeys a normal distribution^{6,19}, all the achieved unimodal distributions failed to be formally described as normal, even with the removal of outliers (for illustration purposes, the normal curves plotted in Fig. 8 were fitted to data without outliers). Even when considering the highest pvalues obtained (from D’Agostino and Pearson’s Test), these values consistently equalled or were lower than the significance level: 0.00 and 0.03 (0.05 M NaCl, 50 mV s^{1}), 0.05 and 0.00 (0.05 M NaCl, 100 mV s^{1}), 0.00 and 0.00 (0.01 M NaCl, 50 mV s^{1}), 0.00 and 0.00 (0.01 M NaCl, 100 mV s^{1}), 0.01 and 0.00 (0.005 M NaCl, 100 mV s^{1}). In summary, the null hypothesis “the data is normally distributed” was consistently rejected for all sets based on the various normality tests employed.
Most importantly, the Epass distribution generally presented a linear increase as a function of the considered testing corrosiveness; while the Epit distribution displayed a relatively constant behaviour in the least aggressive conditions (Fig. 8a–c), with a pronounced decrease in the most aggressive scenario (Fig. 8e). These trends could be appraised by following the evolution of the descriptors’ maximum KDE, as punctuated by the two arrows in Fig. 8. It should be noted that the 0.05 M NaCl set (100 mV s^{1}) (Fig. 8d) was again as a group outlier (the same reason as elaborated above for the log(j) descriptors, Fig. 7d).
As expected, the combined effect of the Epass and Epit progression with increased corrosiveness resulted in an overall decrease in the passivity range. Analysis of Fig. 8 also reveals that this passivity range shortening was generally more affected by the increase in Epass than by the decrease in Epit. Although corrosionists often accredit more attention to the upper end of the passivity range, framing Epit as the main predictor of passivity breakdown, our datadriven analysis suggests that the robustness against pitting (related to the passivity range^{1,62}) would be highly sensitive to the lower end of passivity (Epass). i.e., the trend of passive range shortening seemed to be primarily influenced by the delayed stabilisation of the passive film rather than its early disruption. For instance, based on XPS measurements, Cl^{} was reported to cause thinning of the Fe passive film, even under conditions where pitting did not occur (passivity)^{63}. Based on the filmbreaking mechanism of pit initiation, the thin passive film is in a continual state of breakdown and repair^{64,65}; and in chloride media, there would be a lower likelihood for such a breakdown to heal (inhibition of repassivation by chloride)^{6}.
In Shibata’s stochastic theory of pitting corrosion^{19}, the coefficient of variation (CV) of Epit was calculated for the polished 316 stainless steel dataset, and a value of 9.6% was obtained. In our cases, the following CV values were obtained for the Epit distributions, from the least to the most aggressive conditions: 4.3, 1.8, 11.1, 11.4 and 32.7%. If we remove the uncertain Epit values assigned as 0.5 V by default, the obtained CVs were even lower: 2.6, 1.8, 3.0, 7.0, and 7.9% (highest variations obtained at 0.05 M—50 mV s^{1}, as expected). Interestingly, the CV values obtained were relatively lower than the 9.6% reported for 316 under classic PP^{19}; which is somewhat surprising, as our Epit values, derived from microscale PP measurements, are more sensitive to local surface heterogeneities. The reasons for the overall low relative variability of our sets as compared to the referred benchmark^{19} might be related to: the 316 L grade being more resistant to corrosion than the 316; the expected higher surface quality of our electropolishing in comparison to (2/0) emery polishing; and our Cl^{} media being at least 1 order of magnitude less aggressive than their 3.5% NaCl solution; and, most importantly, our large number of samples, generally over one hundred per set (in 19, estimated to be only ~20). As a higher CV might indicate a greater degree of uncertainty in the shape of the underlying distribution, our sets arguably provide more representative distributions of Epit as compared to the Shibata’s set^{19} (mostly likely resulting from the referred discrepancy in the sample sizes).
In any case, the CVs calculated for the log(jpit) without considering the 0.5 V data points (10.0, 8.1, 10.3, 21.8 and 17.1%) were much larger than the corresponding values determined for Epit (3.8, 4.5, 3.4, 3.1 and 2.2 times larger, respectively), as a function of aggressiveness. Being the CV a statistical measure that represents the relative variability in a set, this quantitative outcome illustrates why the comparison of the distributions across the sets was less straightforward for log(jpit) (Fig. 7) than for Epit (Fig. 8); and also the reason for the generally more accurate estimation of the E descriptors than the log(j) descriptors (Figs. 4 and 5).
To further assess the different susceptibility to pitting among the sets, as suggested in Fig. 8, one extends the comparative analysis to a selection of the largest log(jpit) values achieved (Fig. 9) aligned with the “weakest link” theory applied to pitting corrosion^{19}.
The “weakest link” concept has allowed advances in the statistical strength theory^{66,67}, explaining the high randomness observed in fracture stress resulting from flaws with varying dimensions in a solid material^{68}. The stochastic approach developed for the effect of the body volume on fracture stress^{69}, and often applied to describe sensitive structure properties such as fatigue life^{68}, was generalised to pitting corrosion^{69,70}. Concerning failure by pitting, the presence of a precursor or active state in the film is responsible for pit generation^{19}. Extreme value analysis developed by Gumbel^{71} was applied to pitting corrosion of Al by Aziz^{72} and Eldredge^{73}.
Likewise, the “weakest link” concept applies to other electrochemical fields beyond pitting corrosion. For instance, in electrodeposition, nucleation of a new phase on a foreign substrate is primarily driven by the most active sites, similar to pit nucleation. In both cases, the growth of a film (or pit) is driven by the most active sites where nucleation (or initiation) preferably takes place. Recently, we demonstrated that the macroscopic electrodeposition response, described by the onset potential for nucleation, corresponds to that of the most active sites (more positive onset potentials) of a distribution of hundreds of voltammetric curves obtained by SECCM^{51,52}.
Hence, in our “weakest link” problem, one could expect the most active sites to ultimately determine the overall macroscopic electrochemical response in the pitting corrosion of stainless steel. Attempting to trace the most active pits, the top 4 highest log(j) Vs E examples of each set was plotted with their corresponding Epit/log(jpit) pair of estimates (Fig. 9). It could be observed that the higher the testing aggressiveness: 1. the higher the “top 4 highest log(j) Vs E curves” (ranking based on the mean of log(j)); 2. the lower the Epit and the higher the log(jpit). For a few particularly active log(j) vs. E curves, the Epit could not be determined due to a large uncertainty or an absence of a passivity breakdown (Epit modelled as 0.5 V by default). Likewise, the same tendency was appraised by examining the “top 4 conditional jmax curves as a function of E”, as presented in Supplementary Fig. 4. To conclude, by selecting only high j examples, we corroborate the notion (seen as tacit knowledge in macroscale polarisation) that Epit substantially drops with corrosiveness.
As the range of applicability of the models is somewhat restricted to the limits of the training data, modelling based on local techniques is hardly generalisable. Additional log(j) vs. E datasets obtained at varied conditions (macro experiments, different substrates, alternative PP parameters, etc) would be recommended for evaluating the robustness of the developed modelling methodology. Particular efforts should be focused on validating the approach extended to classic potentiodynamic polarisation curves. Our estimation modelling strategy is expected to perform well on the macroscale, as less variability could be imagined. In fact, as only the most active pitting sites (Fig. 9) would drive the resulting overall corrosion behaviour of a macro surface, they would likely dominate the intensity of the electrochemical signal measured; thus possibly resulting in fewer data dispersion. Whether the conditional median of log(j), as proxy model for estimating the central tendencies of pitting descriptors from a population of microscale polarisation curves, would also be accurate in macroscale experiments warrants further investigation.
In conclusion, our hybrid rulebased/ML approach, combining an LRbased algorithm with supervised ANN, was able to determine relevant pitting corrosion descriptors of electropolished 316 L from populations of localised polarisation curves with different testing aggressiveness. The rulebased LR provided initial estimates of Epit (or Epass) descriptors by fitting two independent linear regression lines to smoothed polarisation curves. However, unsatisfactory results were observed for some sets, indicating the need for an improved estimation strategy. To address this limitation, we leveraged supervised deep learning: the ANN was trained on sets with satisfactory estimates and then deployed on the sets with unsatisfactory estimates, significantly improving the estimation task. The ANN model was designed with feature engineering methods for selecting input features representative of the researched behaviour (passivity or pitting). To ensure the network’s ability to handle the complexity of the data, a data reduction step was performed, reducing the curves to a selection of log(j) values linearly spaced in terms of their “potential distances”. The training process involved hyperparameter tuning and pruning to achieve satisfactory validation performances. The resulting ANN demonstrated accurate mapping of the relationships in the data, with impressive final MSE and R² values, ensuring the generalisation ability on the unseen (unlabelled) data (our method was somewhat similar to active learning). Throughout this study, ML played a pivotal role in addressing the challenges of estimating pittingrelated predictors and interpretating highthroughput data for improved mechanistic understanding. Looking ahead, the potential of ML as a framework for pitting prediction is promising. The automated extraction developed for pitting descriptors could be extended to larger datasets and diverse experimental conditions, seeking improved robustness and generalisability of the models.
Methods
The employed substrate and the data acquisition methodology were identical to the ones described in^{29}. Briefly, an industrial electropolished 316 L stainless steel sample was subjected to potentiodynamic Polarisation (PP) tests using an SECCM platform in hoppingmode protocol^{27,28}. Five different combinations of [NaCl] and voltammetric scan rates were employed: 0.005 M NaCl—100 mV s^{1}, 0.01 M NaCl—100 mV s^{1}, 0.01 M NaCl—50 mV s^{1}, 0.05 M NaCl—100 mV s^{1}, 0.05 M NaCl—50 mV s^{1}. Singlebarrel pipets (borosilicate) with a final internal circular diameter of ~2 µm were used as SECCM probes. The starting potential was –0.5 V, and the end anodic potential was 1.355 V (vs. Ag/AgCl) (LabVIEW (2019, National Instruments) interface running Warwick’s software (WECSPM, www.Warwick.ac.UK/electrochemistry)).
Data analysis
The code for data processing and visualisation was written in Python 3.8 language and was made available on GitHub (as Jupyter Notebook files): https://github.com/bcoelholeonardo/Estimatingpittingdescriptorsof316Lstainlesssteelbymachinelearningandstatisticalanalysis/tree/5c7c8eac41907667f94c22881650f23a6aee0d64.
The log(j) vs. E datasets: this work employed the same datasets reported in ref. ^{29}. The only difference is that eventual existing missing values were filled with an iterative imputer. The IterativeImputer class (from sklearn.impute) models each feature with missing values as a function of other features and uses that estimate for imputation. All 955 data samples (polarisation curves) with the referred update (filled missing values) are accessible at Mendeley Data^{74}.
As in ref. ^{29}, the datasets considered were sliced upward from 0.5 V (considerably more positive than the open circuit potential (OCP)). Passivity was presumably reached at potentials less positive than 0.5 V for only a few examples of curves; while in another few cases, passivity was not observed within this potential range. In those respective cases, as an approximation, the 0.5 V value was assigned to Epass and eventually assigned to Epit (early passivity and pitting occurrences, respectively). The selfpassivation behaviour of 316 L (relative passive state already at OCP) was not considered.
Supervised hybrid rulebased/machinelearning algorithm
A deterministic rulebased algorithm, based on linear regression (LR), was developed to estimate Epit/jpit (or Epass/jpass) descriptors pairs (continuous values) from polarisation curves. Figure 10 schematically shows the overall hybrid rulebased/ML approach employed comprising the LR steps. As illustrated in the plots with borders (in green and grey), two independent linear regression lines fit the smoothed data, one fitted line starting at “low E” and the other ending at “high E” values (~0.7 and ~1.25 V, respectively). The obtained location for Epit (or Epass) is the one that maximises the sum of the R² (goodness of fit) for the two LRs.
This handcrafted method (rules manually created by domain experts to define the system’s behaviour) is adaptive, requiring the user input for defining threshold values for the potential (thereby separating the “low E” and “high E” regions to which the LRs are separately applied). The different E thresholds define the existing groups (classes) of log(j) vs. E curves. The definition and validation of classes were qualitatively based on the degree of similarity among the PP curves^{7}. Therefore, the developed code is datasetspecific, with specific condition statements for the different classes.
In summary, the described method was our initial labelling strategy, thus providing labels (target attributes) to the unlabelled data. The label validation was done by visual examination of the Epit/log(jpit) (or Epass/log(jpass)) in the individual curves. Although the LRbased model generally led to satisfactory estimates, for the 0.05 M NaCl (100 mV s^{1}) and 0.005 M NaCl (100 mV s^{1}) sets, unsatisfactory results were eventually observed (exemplified in the plot with grey border (Fig. 10)).
In the case of unsatisfactory estimates, instead of further handcrafting the algorithm to improve the performance of the linear regressors, the strategy was to employ supervised ANN for this task. The ANN was trained on the set of satisfactory estimates and then deployed on the set of unsatisfactory examples (schematic plots with green and grey borders, respectively, in Fig. 10). Contrary to standard practice where a fixed proportion of the data (e.g., 20%) is randomly selected for testing, our test sets comprised specifically challenging samples. This “stresstesting” approach offered a more stringent test of the models’ robustness and generalisation ability, as the predictions were made on samples where the simpler model presented failed estimations. As a result, the proportion of samples in our test sets relative to the entire datasets varied (10%, 17%, and 3%, depending on the set).
The selection of input features was based on feature engineering, aiming to identify relevant features representative of the descriptor of interest, thus allowing the ANN to estimate the data targets accurately. To focus specifically on the relevant regions of the PP curves that encompass the passivity/pitting descriptors, the log(j) vs. E (smoothed) curves were partitioned through a data slicing procedure (Supplementary Fig. 5). The sliced curves presented 700 or 500 data points (0.539 or 0.385 V, in potential range), sufficiently capturing the regions related to Epass or Epit with high security margins. Given the complexity that an ANN would face in processing thousands of data points as features, a second data reduction step was undertaken to decrease further the dimension of the log(j) input array. Sparse sampling was conducted at every 40th (or 60th) point from the sliced log(j) array, leading to a final selection of 13 (or 12) log(j) values (for Epit or Epass, respectively). These numbers of input features were found to represent the target regions of the PP curves adequately.
Reducing the curves to a selection of log(j) values that are linearly spaced in terms of their “E (V) stamps” was sufficient to describe the relevant regions in the curves. This is demonstrated by the 13 “descriptor blue dots” in the “1. neural network training” plot (Fig. 10), which represent the selected log(j) input features (equidistant on the potential (V) scale). The potential was treated as a constant feature, and related values were not included as input. Finally, to improve the model convergence, we applied the StandardScaler method (sklearn.preprocessing package) to the sliced log(j) data for standardisation of both the input and output data. This method standardises features by removing the mean and scaling to unit variance.
A sequential model (keras.models.Sequential) was defined, generating a classic multilayer perceptron networks (also known as feedforward neural networks). As shown in Fig. 10, the number of nodes in the input layer was equal to the number of input descriptors (12 or 13 log(j) values). The output layer consisted of a single node, providing only one output (either Epit or Epass). Given that log(j) is a function of E, the Epit and Epass estimates sufficed for finding the corresponding log(jpit) and log(jpass) values. The network’s topology, including the optimal number of hidden layers and nodes within each layer, was determined through exploratory testing and visual validation. Specifically, it was found that two hidden layers were sufficient to accurately map the relationships in the data, with the optimal number of nodes in the first and second hidden layers consisting of 12 and 11, respectively. Attempts to reduce the number of nodes in these hidden layers led to an unsatisfactory generalisation of the learning process (increasing the number of nodes beyond 12 would deviate from the logical progression of reducing the node count as approaching the output layer).
The training started with 20fold crossvalidation (CV) (KFold function from sklearn.model_selection), allowing hyperparameter tuning by monitoring the loss function of the validation set. The loss function was the mean squared error (MSE) with Adam optimiser. The ReLU activation was used in the input/hidden layers. The number of batches was equal to the number of training samples (110 and 278 for in 0.05 M and 0.005 M NaCl (100 mV s^{1})). Although after 200 epochs of training, the validation losses were generally fairly low (below 0.0001 (µA cm^{2})²), for a few validation sets, the losses reached relatively higher values (~0.025 (µA cm^{2})²). After this initial training, the network was pruned (using the tensorflow_model_optimization module) and further validated by random sampling (validation_split=0.1) of the labelled dataset. The number of epochs increased to 4500–6000 for the pruned network, and the learning rates eventually decreased from 10^{3} (default) to 10^{12}–10^{5}. After achieving satisfactory validation performance, the final stage consisted of retraining the model with the entire labelled dataset. The final MSE (Eq. 1) values were 9.87 × 10^{5}, 1.35 × 10^{4} and 1.72 × 10^{4} (µA cm^{2})²; and the final R² (Eq. 2) values achieved were 0.9025, 0.9707 and 0.9653, respectively for 0.005 M NaCl 100 mV s^{1} (Epit) and 0.05 M NaCl 100 mV s^{1} (Epass and Epit).
Such as the LRbased estimates, the validation of the ANN estimates was conducted through visual examination of the obtained Epit/log(jpit) (or Epass/log(jpass)) in the individual log(j) vs. E curves. Again, the cost for prelabelling the deployed set would be prohibitively high, implying further hard coding of the rulebased model. It is important to note that if the LRbased approach alone could solve our estimation problem, resorting to ML would not be necessary. It was precisely because the labels obtained were unsatisfactory that the rulebased algorithm was leveraged with ANN. In summary, we employed a supervised learning strategy on unlabelled targets^{75}. Our ANN strategy, in particular, is somewhat similar to a transductive transfer learning framework, where labelled data is only available for the source domain but not for the target domain^{75,76}. In our case, the unlabelled sets from the target domain served as our test sets, allowing us to evaluate the models’ generalisation ability on the unseen data in the target domain.
An alternative strategy based on active learning could be thought of: when the LR estimation is uncertain, human expertise could proceed with labelling. This process could be repeated iteratively, selecting the most informative instances for labelling based on the model’s uncertainty, thus improving the accuracy of ulterior ML modelling^{77,78}. As a perspective, convolutional neural networks (CNNs) may be a promising alternative for feature extraction from similar univariate electrochemical signals, given their ability to capture local patterns and spatial hierarchies in data^{79,80}.
Ground truth for the central tendencies of passivity and pitting descriptors
As explained above, a hybrid rulebased/ML supervised approach was used to estimate passivity/pitting descriptors from the populations of log(j) vs. E curves from the 5 different sets (Fig. 11a and b). Next, the bivariate distributions of the Epass/log(jpass) (or Epit/log(jpit)) estimates obtained for each set were modelled with Gaussian KDE (kde.gaussian_kde, scipy.stats module) and qualitatively validated (Fig. 11c). The maximum KDE values of the distributions were considered the ground truth of the central tendency values of Epass/log(jpass) and Epit/log(jpit)) (Fig. 11d).
Epass and Epit distributions: normality tests
We applied three normality tests to assess whether the E descriptors followed a normal distribution: ShapiroWilk, D’Agostino and Pearson’s test, and AndersonDarling. We used the shapiro, normaltest and anderson modules from the scipy.stats package to perform these tests. The null hypothesis for all tests was that “the E descriptors followed a normal distribution”. The decision whether or not to reject the null hypothesis was based on comparing the obtained pvalues (or the test statistic values for AndersonDarling) with the significance level (alpha = 0.05 by default).
Proxy models for estimating the central tendency of pitting descriptors
Two statistical estimation strategies were tested to verify whether the central tendency values of the Epass/log(jpass) and Epit/log(jpit) distributions could be estimated in a reduced manner. These simplified approaches were either meanbased or medianbased (illustrated with the median in Fig. 11e). Our research question was whether the conditional mean (or conditional median) of log(j), as a function of E, could be used as a proxy model for accurate estimation of the central tendencies of the pitting descriptors. To test such hypotheses, the conditional mean and conditional median of log(j) were used as input features in the ANN (Fig. 10) to obtain Epass (or Epit) outputs (Fig. 11f). The biases of the trained ANN were transmitted to the median/meanbased estimations, thus establishing a fair basis of comparison with the ground truth data (also derived from the ANN estimates from individual curves).
Quantiles are data values that divide a dataset into adjacent intervals containing the same number of data samples^{81}. Quantiles display variation in population samples without making assumptions about the underlying distribution. They are useful to gain insight into the distribution of a random value compared to its mean value. The conditional quantiles 0.35, 25, 50, 75 and 99.65% (referred to as Qmin, Q1, median, Q3 and Qmax) were used for representing the log(j) distributions as well as for estimation purposes (median).
The model assessment of the mean and medianbased approaches was based on separate residual analysis for the central tendencies of Epass, log(jpass), Epit and log(jpit). The actual central tendencies of the descriptors corresponded to their maximum KDE values. As presented in Eq. 3, residuals are calculated by subtracting the estimated ŷ_{i} from the actual y_{i} value for the different descriptors y (Epass, log(jpass), Epit, log(jpit)) and sets _{i}.
Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files) and are available in the Mendeley Data repositories, https://data.mendeley.com/datasets/5x4dmc38bg/1,
Code availability
The code required to reproduce these findings is included in this published article as Dataset 1 (.ipynb extension, Jupyter Notebook) and is available to download from GitHub: https://github.com/bcoelholeonardo/Estimatingpittingdescriptorsof316Lstainlesssteelbymachinelearningandstatisticalanalysis/tree/5c7c8eac41907667f94c22881650f23a6aee0d64.
References
Hughes, A. et al. Corrosion inhibition, inhibitor environments, and the role of machine learning. Corros. Mater. Degrad. 3, 672–693 (2022).
Qu, Z. et al. Pitting judgment model based on machine learning and feature optimization methods. Front. Mater. 8, 1–8 (2021).
Wei, R. P. & Harlow, D. G. Mechanistically based probability modelling, life prediction and reliability assessment. Model. Simul. Mater. Sci. Eng. 13, R33–R51 (2005).
Macdonald, D. D. Passivity–the key to our metalsbased civilization. Pure Appl. Chem. 71, 951–978 (1999).
Macdonald, D. D. & Engelhardt, G. R. Predictive Modeling of Corrosion. In Shreir’s Corrosion, Vol. 2 (eds. Richardson, J. A. et al.) 16301679 (Elsevier, Amsterdam, 2010).
Frankel, G. S. Pitting corrosion of metals: a review of the critical factors. J. Electrochem. Soc. 145, 2186–2198 (1998).
Nyby, C. et al. Electrochemical metrics for corrosion resistant alloys. Sci. Data 8, 58 (2021).
Coelho, L. B. et al. Corrosion inhibition of AA6060 by silicate and phosphate in automotive organic additive technology coolants. Corros. Sci. 199, 110188 (2022).
ASTM. ASTM G6186(2018). Standard Test Method for Conducting Cyclic Potentiodynamic Polarization Measurements for Localized Corrosion Susceptibility of Iron, Nickel, or CobaltBased Alloys. (ASTM, 2018).
Yi, Y., Cho, P., Al Zaabi, A., Addad, Y. & Jang, C. Potentiodynamic polarization behaviour of AISI type 316 stainless steel in NaCl solution. Corros. Sci. 74, 92–97 (2013).
Jegdic, B. V., Bobić, B., Bošnjakov, M. & Alić, B. Testing of intergranular and pitting corrosion in sensitized welded joints of austenitic stainless steel. Metall. Mater. Eng. 23, 109–117 (2017).
ISO. ISO 15158:2014 Corrosion of Metals and Alloys—Method of Measuring the Pitting Potential for Stainless Steels by Potentiodynamic Control in Sodium Chloride Solution. (ISO, 2014).
Anderko, A., Sridhar, N. & Dunn, D. S. A general model for the repassivation potential as a function of multiple aqueous solution species. Corros. Sci. 46, 1583–1612 (2004).
Wilde, B. E. & Williams, E. The relevance of accelerated electrochemical pitting tests to the longterm pitting and crevice corrosion behavior of stainless steels in marine environments. J. Electrochem. Soc. 118, 1057 (1971).
Soltis, J. Passivity breakdown, pit initiation and propagation of pits in metallic materialsreview. Corros. Sci. 90, 5–22 (2015).
Williams, D. E., Westcott, C. & Fleischmann, M. Stochastic models of pitting corrosion of stainless steels: I. Modeling of the initiation and growth of pits at constant potential. J. Electrochem. Soc. 132, 1796–1804 (1985).
Freiman, L. I. & Metallov, Z. Potentiodynamic determination of stainless steel repassivation and pitting formation potentials. Corros. Sci. 8, 693–695 (1972).
Dulaney, C. C. N. & C. L. Localized Corrosion. NACE 184 (NACE, 1974).
Shibata, T. & Takeyama, T. Stochastic theory of pitting corrosion. Corrosion 33, 243–251 (1977).
Pereira, V. J. et al. In Blue Economy. 191–220 (Springer Nature Singapore, 2022).
Izquierdo, J. et al. Resolution of the apparent experimental discrepancies observed between SVET and SECM for the characterization of galvanic corrosion reactions. Electrochem. commun. 27, 50–53 (2013).
Bentley, C. L., Kang, M. & Unwin, P. R. Scanning electrochemical cell microscopy: new perspectives on electrode processes in action. Curr. Opin. Electrochem. 6, 23–30 (2017).
Bard, A. J., Fan, F. R. F., Kwak, J. & Lev, O. Scanning electrochemical microscopy. Introduction and principles. Anal. Chem. 61, 132–138 (1989).
Payne, N. A., Stephens, L. I. & Mauzeroll, J. The application of scanning electrochemical microscopy to corrosion research. Corrosion 73, 759–780 (2017).
Yule, L. C. et al. Nanoscale active sites for the hydrogen evolution reaction on low carbon steel. J. Phys. Chem. C. 123, 24146–24155 (2019).
Gateman, S. M., Georgescu, N. S., Kim, M.K., Jung, I.H. & Mauzeroll, J. Efficient measurement of the influence of chemical composition on corrosion: analysis of an MgAl diffusion couple using scanning micropipette contact method. J. Electrochem. Soc. 166, C624–C630 (2019).
Shkirskiy, V. et al. Nanoscale scanning electrochemical cell microscopy and correlative surface structural analysis to map anodic and cathodic reactions on polycrystalline Zn in acid media. J. Electrochem. Soc. 167, 041507 (2020).
Yule, L. C., Bentley, C. L., West, G., Shollock, B. A. & Unwin, P. R. Scanning electrochemical cell microscopy: a versatile method for highly localised corrosion related measurements on metal surfaces. Electrochim. Acta 298, 80–88 (2019).
Coelho, L. B. et al. Probing the randomness of the local current distributions of 316 L stainless steel corrosion in NaCl solution. Corros. Sci. 217, 111104 (2023).
Salami, B. A., Rahman, S. M., Oyehan, T. A., Maslehuddin, M. & Al Dulaijan, S. U. Ensemble machine learning model for corrosion initiation time estimation of embedded steel reinforced selfcompacting concrete. Measurement 165, 108141 (2020).
Coelho, L. B. et al. Reviewing machine learning of corrosion prediction in a dataoriented perspective. npj Mater. Degrad. 6, 8 (2022).
Enikeev, M., Enikeeva, L., Maleeva, M. & Gubaydullin, I. Machine learning in the problem of recognition of pitting corrosion on aluminum surfaces. CEUR Workshop Proc. 2212, 186–192 (2018).
Sasidhar, K. N., Siboni, N. H., Mianroodi, J. R. & Rohwerder, M. Deep learning framework for uncovering compositional and environmental contributions to pitting resistance in passivating alloys. npj Mater. Degrad. 6, 71 (2022).
Yidong, X. Use of time series models to forecast the evolution of corrosion pit in steel rebars. Funct. Mater. 23, 457–462 (2016).
Yang, X. et al. A new understanding of the effect of Cr on the corrosion resistance evolution of weathering steel based on big data technology. J. Mater. Sci. Technol. 104, 67–80 (2022).
Kamrunnahar, M. & UrquidiMacdonald, M. Prediction of corrosion behavior using neural network as a data mining tool. Corros. Sci. 52, 669–677 (2010).
Jiang, X., Yan, Y. & Su, Y. Datadriven pitting evolution prediction for corrosionresistant alloys by timeseries analysis. npj Mater. Degrad. 6, 2–9 (2022).
Zhu, Y., Macdonald, D. D., Qiu, J. & UrquidiMacdonald, M. Corrosion of rebar in concrete. Part III: artificial neural network analysis of chloride threshold data. Corros. Sci. 185, 109438 (2021).
Borboudakis, G. et al. Chemically intuited, largescale screening of MOFs by machine learning techniques. npj Comput. Mater. 3, 1–6 (2017).
Würger, T. et al. Data science based Mg corrosion engineering. Front. Mater. 6, 1–9 (2019).
Feiler, C. et al. In silico screening of modulators of magnesium dissolution. Corros. Sci. 163, 108245 (2020).
Würger, T. et al. Exploring structureproperty relationships in magnesium dissolution modulators. npj Mater. Degrad. 5, 2 (2021).
Schiessler, E. J. et al. Predicting the inhibition efficiencies of magnesium dissolution modulators using sparse machine learning models. npj Comput. Mater. 7, 193 (2021).
Galvão, T. L. P. et al. CORDATA: an open data management web application to select corrosion inhibitors. 4–7 https://doi.org/10.1038/s41529022002599 (2022).
Galvão, T. L. P., NovellLeruth, G., Kuznetsova, A., Tedim, J. & Gomes, J. R. B. Elucidating structure–property relationships in aluminum alloy corrosion inhibitors by machine learning. J. Phys. Chem. C. 124, 5624–5635 (2020).
Sridhar, N., Brossia, C. S., Dunn, D. S. & Anderko, A. Predicting localized. Corros. Seawater Corros. 60, 915–936 (2004).
Weaver, C., Fortuin, A. C., Vladyka, A. & Albrecht, T. Unsupervised classification of voltammetric data beyond principal component analysis. Chem. Commun. 58, 10170–10173 (2022).
Godfrey, D., Bannock, J. H., Kuzmina, O., Welton, T. & Albrecht, T. A robotic platform for highthroughput electrochemical analysis of chalcopyrite leaching. Green. Chem. 18, 1930–1937 (2016).
Coelho, L. B. & Ustarroz, J. Epit and Epass descriptors of 316L stainless steel estimated by machine learningdatasets, Mendeley Data. https://doi.org/10.17632/5x4dmc38bg.1 (2023).
Takeuchi, I., Le, Q. V., Sears, T. D. & Smola, A. Nonparametric Quantile Regression. https://www.semanticscholar.org/paper/NonparametricQuantileRegressionTakeuchiLe/5f7e54b38d096f202236f44e2561a6d635bdb79c (2005).
Torres, D. et al. Distribution of copper electrochemical nucleation activities on glassy carbon: a new perspective based on local electrochemistry. J. Electrochem. Soc. 169, 102513 (2022).
Bernal, M. et al. A microscopic view on the electrochemical deposition and dissolution of Au with scanning electrochemical cell microscopy–Part I. Electrochim. Acta 445, 142023 (2023).
Evans, U. R. Localized corrosion. NACE 144 (NACE, 1974).
BlázquezGarcía, A., Conde, A., Mori, U. & Lozano, J. A. A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 54, 1–33 (2021).
Hu, H., Nguyen, N., He, C. & Li, P. Advanced outlier detection using unsupervised learning for screening potential customer returns. in 2020 IEEE International Test Conference (ITC) 1–10 (IEEE, 2020).
Aprillia, H., Yang, H.T. & Huang, C.M. Statistical load forecasting using optimal Quantile Regression Random Forest and Risk Assessment index. IEEE Trans. Smart Grid 12, 1467–1480 (2021).
TorbatiSarraf, H., Ding, L., Khakpour, I. & Poursaee, A. Electrochemical Impedance Spectroscopic Analyses of the Influence of the Surface Nanocrystallization on the Passivation of Carbon Steel in the Pore Solution. J. Mater. Civ. Eng. 33, 40204191–0402041910 (2021).
Horta, D. G., Beviláqua, D., Acciari, H. A., Júnior, O. G. & Benedetti, A. V. Optimization of the use of carbon paste electrodes (cpe) for electrochemical study of the chalcopyrite. Quim. Nova 32, 1734–1738 (2009).
Dartois, J. E., Knefati, A., Boukhobza, J. & Barais, O. Using quantile regression for reclaiming unused cloud resources while achieving SLA. CloudCom 2018—10th IEEE International Conference on Cloud Computing Technology and Science, Dec 2018, Nicosia, Cyprus. pp. 89–98 (2018).
Koenker, R. & Bassett, G. Regression quantiles. Econometrica 46, 33 (1978).
Mohammad Zubir, W. M. A., Abdul Aziz, I. & Jaafar, J. In Computational and Statistical Methods in Intelligent (eds. Silhavy, R., Silhavy, P. & Prokopova, Z.) vol. 859, 236–254 (Springer International Publishing, 2019).
Esmailzadeh, S., Aliofkhazraei, M. & Sarlak, H. Interpretation of cyclic potentiodynamic polarization test results for study of corrosion behavior of metals: a review. Prot. Met. Phys. Chem. Surf. 54, 976–989 (2018).
Strehblow, H.H. In Corrosion Mechanisms in Theory and Practice (eds. Marcus, P. & Oudar, J.) 201 (Marcel Dekker, Inc., New York, 1995).
Sato, N. A theory for breakdown of anodic oxide films on metals. Electrochim. Acta 16, 1683–1692 (1971).
Richardson, J. A. & Wood, G. C. A study of the pitting corrosion of Al byscanning electron microscopy. Corros. Sci. 10, 313–323 (1970).
Weibull, W. A Statistical Theory of Strength of Materials. Generalstabens Litografiska Anstalts Förlag, Stockholm (1939).
Volkov, S. D. Statistical strength theory. FOREIGN Technol. DIV WRIGHTPATTERSON AFB OHIO (Society for Industrial and Applied Mathematics, 1962).
Davidenkov, N., Shevandin, E, & Wittmann, F. The influence of size on the brittle strength of steel. J. Appl. Mech. 14, A63A67A63–A67 (1947).
Hirata, M. Statistical phenomena in science and engineering. KikainoKenkyu 1, 231 (1949).
Hori, M. Statistical aspects of fracture in concrete, I. An analysis of flexural failure of portland cement mortar from the standpoint of stochastic theory. J. Phys. Soc. Jpn. 14, 1444–1452 (1959).
Gumbel, E. J. Statistics of Extremes. (Columbia University Press, 1958).
Aziz, P. M. Application of the statistical theory of extreme values to the analysis of maximum pit depth data for aluminum. Corrosion 12, 35–46 (1956).
Eldredge, G. G. Analysis of corrosion pitting by extremevalue statistics and its application to oil well tubing caliper surveys★. Corrosion 13, 67–76 (1957).
Coelho, L. B. & Ustarroz, J. MicroScale Potentiodynamic Polarisation (log(j)) Curves of 316L Stainless Steel—Datasets, Mendeley Data. https://doi.org/10.17632/7j6b6y48jw.1 (2023).
Weber, M., Auch, M., Doblander, C., Mandl, P. & Jacobsen, H. A. Transfer learning with time series data: a systematic mapping study. IEEE Access 9, 165409–165432 (2021).
Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).
Yu, J., Li, X. & Zheng, M. Current status of active learning for drug discovery. Artif. Intell. Life Sci. 1, 100023 (2021).
Warmuth, M. K. in Advances in Neural Information Processing Systems. 14 (The MIT Press, 2001).
Nash, W., Drummond, T. & Birbilis, N. A review of deep learning in the study of materials degradation. npj Mater. Degrad. 2, 37 (2018).
RicolfeViala, C. & Blanes, C. Improving robot perception skills using a fast imagelabelling method with minimal human intervention. Appl. Sci. 12, 1557 (2022).
Benson, F. A note on the estimation of mean and standard deviation from quantiles. J. R. Stat. Soc. Ser. B 11, 91–100 (1949).
Acknowledgements
The author, L.B. Coelho, is a Postdoctoral Researcher of the Fonds de la Recherche Scientifique—FNRS (Belgium), which is gratefully acknowledged. D.T. acknowledges financial support to the Fonds de Recherche dans l’Industrie et dans l’Agriculture (FRIA). J.U. and M.B. acknowledge financial support to the Fonds de la Recherche Scientifique de Belgique (F.R.S.FNRS) under Grant No. F.4531.19 and to the Fonds Wetenschappelijk Onderzoek (FWO) under contract G0C3121N. G.B. and G.P. are supported by the Service Public de Wallonie Recherche under grant nr 2010235–ARIAC by DigitalWallonia4.ai. The authors acknowledge Prof. Marjorie Olivier (University of Mons) for providing stainless steel treated plates. The author, L.B. Coelho, would like to thank Dr. Denis Steckelmacher for fruitful discussions on data manipulation and analysis.
Author information
Authors and Affiliations
Contributions
L.B.C.: conceptualisation, methodology, software, formal analysis, data curation, writing—original draft, visualisation, project administration, funding acquisition. D.T.: validation, investigation, writing—review & editing, visualisation. V.V.: methodology, software. M.B.: investigation. G.P.: methodology, software. G.B.: validation, formal analysis, writing—review & editing, visualisation. J.U.: validation, formal analysis, resources, writing—review & editing, visualisation, supervision.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Coelho, L.B., Torres, D., Vangrunderbeek, V. et al. Estimating pitting descriptors of 316 L stainless steel by machine learning and statistical analysis. npj Mater Degrad 7, 82 (2023). https://doi.org/10.1038/s4152902300403z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4152902300403z
This article is cited by

DATACORTECH: artificial intelligence platform for the virtual screen of aluminum corrosion inhibitors
npj Materials Degradation (2024)

Electrochemical nucleation and the role of the surface state: unraveling activity distributions with a crosssystem examination and a local electrochemistry approach
Journal of Solid State Electrochemistry (2024)