Introduction

Despite considerable achievements in the predictive modelling of pitting corrosion1,2,3,4,5,6, more research is still undoubtedly needed. The challenge of estimating relevant pitting descriptors from experimental data is still seldom addressed in literature7.

Potentiodynamic polarisation (PP) curves are one of the main electrochemical techniques used for corrosion research in academia, also with a particularly high acceptance in the industry8, as a benchmark test for examining the resistance to localised corrosion. As summarised by Hughes et al.1: “the cyclic polarisation (CP) method, such as the standard ASTM G619, is probably the only standardised, traditional electrochemical method used to determine the relatively localised corrosion susceptibility. It involves the anodic polarisation of a specimen until localised corrosion initiates, as indicated by a large increase in the applied current. An indication of the susceptibility to initiation of localised corrosion in this test method is given by the potential (E) at which the anodic current increases rapidly, i.e., the breakdown potential. The nobler (more positive) this potential, the less susceptible the alloy is to initiate localised corrosion. The conventional understanding is that the breakdown potential is the potential above which pits are initiated”1.

Not only do corrosion experts7,10 often rely on a rather qualitative description of the pitting potential (“Epit is defined as a potential above which there is a rapid increase in the current on a polarisation curve”10), but also the referred standard9 is vague on the extraction of the descriptor out of PP (or CP) curves (“the potential in which a sharp rise in current is observed”). According to another standard, ISO 1515811,12, “Epit is defined as the potential corresponding to the anodic current density of 10 μA cm−2 in the region of stable pit growth”. Nonetheless, such a definition (yet quantitative) is potentially problematic since it is based on a fixed, static value, not considering the likely high variability of responses.

Beyond the sensitivity to the concentration and combination of aggressive species13 and the scan rate14, the determination of Epit was found to be dependent on the experimental method used15. Simple potentiodynamic polarisation experiments have shown extremely variable results in pitting potential16, exhibiting wide experimental scatters of hundreds of millivolts6.

Previously, it was believed that Epit had a sharp threshold value below which all specimens would exhibit infinite immunity to pitting, and any observed data scatter was attributed to poor experimental control17. However, Nathan and Dulaney18 were the first to challenge this notion, emphasising the importance of statistical approaches to localised corrosion. Subsequently, Shibata and Takeyama19 demonstrated that the random variation in data is an intrinsic property of pitting corrosion and should be analysed statistically. More recently, Nyby et al.7 precisely observed that the rapid increase in current density (j) occurs “when the applied potential is more noble than a specific range of values”. One research paper even argues that it is questionable that exact values of pitting potential can be experimentally measured10.

Difficulties in determining a generalised value for Epit are associated with events (stable pitting growth) that are very dynamic in nature (e.g., high pit growth rates and extreme pit chemistry changes) and take place on a nano-metre scale15. Aggressive species (Cl-), in combination with surface heterogeneities, trigger a dynamic degradation process in which transient passivity breakdown/repassivation events occur over a large population of initiated pits4. The study of localised corrosion triggered by chloride remains a relevant topic within the scope of the targets set by the blue economy, as pitting corrosion is particularly harmful in marine environments and coastal areas20. The development of advanced scanning electrochemical techniques, such as the scanning vibrating electrode technique (SVET) and scanning electrochemical microscopy (SECM)21, has facilitated substantial progress in research on localised corrosion1. The Scanning Electrochemical Cell Microscopy (SECCM) is the next generation of the well-known electrochemical droplet cell technique22 and differs from the more commonly used SECM23,24, as only small portions of a surface are exposed to electrolyte through brief meniscus contact from a nanopipet probe, and electrochemical signals are measured directly25,26. In this work, the SECCM was selected as an experimental tool mainly due to its proven high-throughput capabilities22,27,28. The collection of statistically representative amounts of data is key when high variance is expected in the target feature29.

Only a limited amount of works in localised corrosion have used data-driven approaches2,7,30,31,32,33,34,35,36,37,38 so far. The major reasons for their limited application are the community traditionally relying on low-throughput means for data generation, focusing on specific input-output relationships; and complicated feature engineering due to the vast number of influencing variables. The conjecture of pitting corrosion should ideally be faced in the light of data-centric approaches. As shown in our previous work29, the distributions of the local current density at potential regions associated with pitting are potentially uniform (high randomness). This means that, as the observation error tends to decrease with increased sample size39, if only a few samples are considered, the actual underlying distributions are not captured (subrepresentation). Similarly to what has been done for ML modelling of corrosion inhibitors40,41,42,43,44,45, the creation of structured databases for pitting corrosion is urged31,37,46.

As explained by Weaver et al. in a 2022 communication on the unsupervised learning of voltammetric data47, deviations from the model behaviour can significantly enhance the complexity of the data extraction. Therefore, instead of performing the task by hand, as traditionally done by electrochemists47, there is an emerging call for recording data in (semi-) automated ways, including high-throughput screening48.

This work elaborates on 5 datasets of log(j) vs. E (PP) curves obtained in a high-throughput fashion with the SECCM on 316 L stainless steel. We provide a methodology for estimating Epass (passive potential) and Epit from: 1. typical log(j) vs. E curves with a straightforward passivity breakdown (using an algorithm based on linear regression (LR)); 2. PP curves with more unique profiles mainly due to metastable events (using artificial neural networks (ANNs) trained on the LR estimates). The estimated Epit and Epass descriptors of 316 L are included in this article (Dataset 1,.ipynb files) and available to download in a public repository49.

Furthermore, as there are cases where the estimate of the conditional distribution of y given x (log(j) given E) is not always a conditional mean (although this is most common50), we also considered the analysis of quantiles curves (the conditional median, in particular). The main advantage of conditional quantiles is to give a more comprehensive analysis of the relationship between E and log(j) at different points in the conditional distribution of log(j)50. Therefore, we also propose a simplified methodology for determining the central tendency of the Epit/log(jpit) and Epass/log(jpass) distributions using the conditional median (or mean) of the log(j) vs. E curves. These proxy estimations were compared against the outputs of non-parametric density estimations, considered as the ground truth of the central tendencies of the descriptors. The related code is available (https://github.com/bcoelho-leonardo/Estimating-pitting-descriptors-of-316L-stainless-steel-by-machine-learning-and-statistical-analysis/tree/5c7c8eac41907667f94c22881650f23a6aee0d64), and is expected to serve as a toolkit for future localised corrosion works dealing with big data. The same code can be a basis for extracting meaningful descriptors for other potentiostatic or potentiodynamic experiments important in electrodeposition, electrocatalysis, and other electrochemical processes51,52.

Many decades ago, Evans53 noted that studying the probability of corrosion is more practically important than determining the exact corrosion rate values. We expect to provide a foundation for the future development of monitoring tools (based on current or potential measurements) capable of predicting stable pitting with secure margins.

This work provides three main contributions: 1. a robust ML-based method for estimating Epit/log(jpit) and Epass/log(jpass) descriptors from individual polarisation curves; 2. an accurate proxy model (conditional median of log(j) for estimating the central tendencies of the descriptors distributions for a given dataset; 3. insights into localised corrosion mechanisms gained by interpreting the proxy models and also by selecting a subset of log(j) vs. E examples presenting the highest activities (high outliers).

Results and discussion

Density estimation of passivity and pitting descriptors

In this work, we were initially concerned with the problem of estimating conditional quantiles, as such analysis often results in further insights out of the distributions of our random variable (log(j)|E)50.

Figure 1 shows the kernel density estimations of the Epass/log(jpass) and Epit/log(jpit) for the 5 experimental datasets. The quantiles of the log(j) vs. E curves are superimposed in the plots to illustrate the high dispersion of both passivity and pitting descriptors. As a general trend, the high dispersion of the descriptors observed in the log(j) direction seems relatively constant for all sets, while the dispersion in the E direction seems to increase with the testing aggressiveness. The distributions of Epass/log(jpass) and Epit/log(jpit) generally extend as far as the corresponding Qmin and Qmax curves, except in cases of individual outliers present (such as in Fig. 1c, d). In any case, the distributions of the descriptors clearly spread beyond the so-called interquartile ranges (the IQR is the middle half of a dataset, comprising the range between the second and third quartiles).

Fig. 1: Kernel density estimation of the bivariate distributions of Epass/log(jpass) and Epit/log(jpit).
figure 1

All plots share the same colorbars (green for Epass/log(jpass) and red Epit/log(jpit)), expressed in normalised density. The descriptors were estimated by the hybrid LR-based/ANN method. a 0.005 M NaCl—100 mV s-1, b 0.01 M NaCl—100 mV s-1, c 0.01 M NaCl—50 mV s-1, d 0.05 M NaCl—100 mV s-1 and e and f 0.05 M NaCl—50 mV s-1 (Epass/log(jpass) and Epit/log(jpit), respectively).

Central tendency estimations of descriptors based on the mean and median models

While the estimation of descriptors based on the conditional mean of distribution might serve, there exists a large area of problems (outliers detection54,55, risk assessment56) where estimating a quantile, such as the median, would be a better choice50.

The same hybrid LR-based ANN approach employed on each individual log(j) vs. E curves was applied to the conditional mean/median curves of the populations, to estimate their central tendency values. The Figs. 2 and 3 show the Epass/log(jpass) and Epit/log(jpit) values obtained from the mean and median curves for extreme case datasets (the least and the most aggressive conditions, respectively). The results corresponding to the intermediate conditions are displayed in Supplementary Figs. 1, 2, 3. In plots a of Figs. 2 and 3, the conditional means (with their conditional standard deviations (SD) and errors (SE)) are plotted with the Epass/Epit estimates provided by the mean model; while plots b of Figs. 2 and 3 display the conditional medians (with their conditional median absolute deviations (MAD)) with the outputs of the median model (all estimates are represented by cross markers, in reference to “dart attempts”). In all of these plots, the ground truth for the central tendencies of the descriptors (maximum kernel density estimation (KDE) values) is plotted as a benchmark (represented by circle markers in reference to “target locations”).

Fig. 2: Central tendency estimation.
figure 2

The central tendency values of Epass/log(jpass) (in green) and Epit/log(jpit) (in red) estimated by the: maximum KDE of the descriptors distributions (ground truth), represented as “target” markers; mean and median models, displayed as “dart attempt” markers. Superimpositions as a function of potential (V) of the: a conditional mean of the log(j) (with conditional SD and SE); b the conditional median of the log(j) (with conditional MAD). Results related to the 0.005 M NaCl (100 mV s-1) set.

Fig. 3: The central tendency values of Epass/log(jpass) (in green) and Epit/log(jpit) (in red) estimated by the: maximum KDE of the descriptors distributions (ground truth), represented as “target” markers; mean and median models, displayed as “dart attempt” markers.
figure 3

Superimpositions as a function of potential (V) of the: a conditional mean of the log(j) (with conditional SD and SE); b the conditional median of the log(j) (with conditional MAD). Results related to the 0.05 M NaCl (50 mV s-1) set.

It could be observed that the conditional mean of log(j), as a function of E, was generally a poor representative of a polarisation curves set (in line with the log(j) distributions scrutinised in ref. 29). As illustrated in Fig. 3a (0.05 M NaCl, 50 mV s-1), the averaged values do not outline a typical polarisation curve. In this most aggressive condition, the Epass feature could not even be estimated from the mean curve. On the contrary, the conditional median seemed to capture further the expected overall behaviour of this set of curves (Fig. 3b).

By analysing the Epass/log(jpass) and Epit/log(jpit) ground truth values with respect to the conditional mean, one could see they were relatively far from these curves, mostly lying outside the standard error limits (plot a, Fig. 2; Supplementary Figs. 1, 2, 4). This mismatch between the “target locations” and the mean curves explains the mean model’s failure to provide accurate values for the descriptors (as those are estimated by fitting the conditional mean curves).

On the contrary, estimating the Epass/log(jpass) and Epit/log(jpit) values from the conditional median curve of a set provided an accurate alternative for obtaining their central tendencies. One could observe the median curves generally crossing (or at least touching) the “target” markers (plot b, Fig. 2, Supplementary Fig. 1, Supplementary Fig. 2, Supplementary Fig. 3); the only exception was the Epit in 0.05 M NaCl (50 mV s-1), which was the most difficult value to be appraised out of the 10 descriptors considered (high uncertainty of j values in E regions associated to pitting29).

Moreover, one could see that the data distribution (KDE) extended beyond the standard deviation (by comparing plots a of Fig. 2, Fig. 3, Supplementary Fig. 1, Supplementary Fig. 2, Supplementary Fig. 3 with Fig. 1). The analysis of the quantile curves was illustrative of the high data dispersion, with the descriptors distributions extending as far as the Qmin and Qmax curves in some cases.

Evaluation of the central tendency estimates based on residuals

At 0.05 M (50 mV s-1), the estimated location of Epit from the conditional median curve lagged behind this descriptor ground truth for this set (Fig. 3b). When evaluating the accuracy of estimation, not only the absolute distance between the estimate and the actual value is relevant, but also the sign of that difference; in other words, the sign of the model bias.

Residual analysis provides a basis for diagnosis checking while assessing model biases. The following bar charts present the residuals of estimation of log(jpass), Epass, log(jpit) and Epit, as a function of testing corrosiveness (Fig. 4). The shorter the bar, the more accurate the model estimation. Again, the ground truth for the central tendencies of the descriptors was the maximum KDE values of their distributions. The horizontal line (residual equal to zero) represents the ideal benchmark, where a regression would be 100% accurate. Results from both the conditional mean and conditional median models are displayed.

Fig. 4: Residuals of the central tendency estimation of the passivity/pitting descriptors (with respect to their ground truth) obtained by the mean model (in blue tone) and the median model (in green) as a function of testing aggressiveness.
figure 4

a log(jpass) and Epass (left and right vertical axes, respectively); b log(jpit) and Epit (left and right vertical axes, respectively). In a, to make the mean bar visible at 0.005, 100, the edges of the median bar were traced in dashed lines.

When estimating the central tendency of passivity descriptors (Fig. 4a), the residuals of log(jpass) were consistently and significantly smaller for the median-based model than the mean-based one. With respect to Epass, the median-based approach was also generally more accurate, the only significative exception being at 0.005 M NaCl, but with a relatively small residual (–0.0067 V) still. Again, the residual comparison from both strategies is not possible for the most aggressive scenario (0.05 M NaCl, 50 mV s-1), as the mean model could not even provide passivity descriptors. In the following most aggressive condition (0.05 M NaCl, 100 mV s-1), the estimation errors related to log(jpass) and Epass were generally the largest for both models. Nonetheless, even in this case, the median model could reduce the estimation residuals by 54.2% and 73.2%, respectively, compared to the mean model.

Regarding estimating the central tendencies of pitting descriptors (Fig. 4b), the same overall observations made for the passivity features could be replicated here. First, the log(jpit) residuals of the median model were systematically lower than the mean model (reduction in residuals of 89.5, 77.4, 53.8, 25.8 and 97.1% for increasing testing aggressiveness). Secondly, the median model generally yielded smaller residuals than the mean for Epit; when not lower, the magnitude of the error was still acceptable (0.0047 V for 0.01 M NaCl (50 mV s-1)). The only case where the median estimator underperformed was for Epit in 0.05 M NaCl (50 mV s-1).

As already mentioned, the picture was somewhat less clear in the most aggressive set, likely the most challenging condition for the regression of features. Nevertheless, in the addressed cases where the median model produced larger residuals than the mean model, it is important to note that the estimations were at least negatively biased (positive residuals). When attempting the prediction of pitting corrosion, as no model is perfectly accurate, the negative bias would be favoured in comparison to positive bias: underestimation of Epit (or log(jpit)) is preferable to its overestimation. In other words, if estimation errors are unavoidable to a certain degree, it is more desirable to have stable pitting growth occurring at higher potentials (or current densities) than the expectations; as the opposite situation would imply in catastrophic failure based on overly optimistic predictions.

The magnitude and sign of the estimations of log(Epass) and log(Epit) can be appraised in Fig. 5a, in which the “passive current density ranges” are defined by the yellow bars. One can judge that the median model (green markers) outperformed to the mean model (blue markers) in accurately estimating the central tendency ground truth values for both current density descriptors.

Fig. 5: Definition of passive ranges.
figure 5

The passive current density range (a) and the passivity range (b), as respectively defined by the ground truth values of the log(jpass)/log(jpit) and Epass/Epit pairs, as a function of testing aggressiveness. Superimposed to these ranges, the central tendency estimates obtained by the mean model (in blue) and the median model (in green). In b indication of the preferable sign of the model bias (if any).

Similarly, Fig. 5b presents the estimates of Epass and Epit compared against the corresponding “passivity ranges” (yellow bars). Both models produced relatively tight errors, although the median model resulted in overall better estimations. In general, the residuals of estimation were proportionally lower for the E descriptors than for the log(j) ones (further discussed in the section “Proxy models for estimating the central tendency of pitting descriptors”, including the analysis of coefficients of variation). As mentioned above, when the median model underperformed, at least an underestimation of Epit was verified (“preferable sign of bias” indicated in the plot). If one task is utterly crucial for the models, this would be the estimation of the Epit feature.

Larger errors of estimation were achieved in the two most aggressive conditions (0.05 M NaCl media) in general (considering all descriptors in Fig. 5), while the errors related to log(jpass) and log(jpit) in particular, increased with the testing corrosiveness (Fig. 5a).

This investigation provides a solid and simplified framework for estimating the central tendency of passivity/pitting descriptors. Instead of individually assessing an entire set of log(j) vs. E curves, estimating Epass/log(jpass) and Epit/log(jpit) from the conditional median curve can provide satisfactory outcomes, assuming that the data size is large enough. In the present case, all estimations of Epass and Epit (either from the conditional median of log(j)) of a set or the individual log(j) vs. E curves) were done using the same hybrid LR/ANN approach for a fair basis of comparison. By doing so, the authors avoided introducing additional sources of bias to the estimations. Nonetheless, other simplified median-based methods could be thought of (an expert could even proceed with “by hand” selection47 of Epass/Epit).

Interpreting the higher robustness of the median model

In the case of polarisation curves displaying pitting corrosion, the conditional median of log(j) has qualitatively shown to be representative of the population of curves. As exemplified in Fig. 6, the location of the conditional median curve (plot b, green curve) was significantly coincident with the regions with the highest data density in the corresponding log(j) vs. E plot (plot a). Figure 6 illustrates the effect of high outliers (log(j) vs. E curves lying more than 1.5 times the IQR above Q3) on the conditional mean of log(j). The result is the shift of the conditional mean to log(j) values consistently higher than the conditional median curve.

Fig. 6: Schematic of the data distribution from the 0.05 M NaCl (100 mV s-1) set illustrating how high outliers shift the conditional mean of log(j) from the regions with the highest data density.
figure 6

a Data density estimated by KDE (proportional to the colourmap); b population of the log(j) vs. E curves (grey) superimposed to the conditional mean and median of log(j) (in red and green, respectively).

The difference between the conditional mean and the conditional median tends to increase with E, because the high outliers present particularly high log(j) values at high potential regions (E > 1.15 V (vs. Ag/AgCl)). As demonstrated in29, the log(j) distributions become more positively skewed with increased corrosiveness (more positive potential and higher [Cl-]). The occurrence of high outliers with particularly high j values at high E regions results from pitting corrosion processes. Indeed, applying more positive potentials increases the likelihood of metastable pitting (accompanied by repassivation events), which may gradually change into stable pitting growth.

As stated by Koenker50, assessing a set of conditional quantile curves provides a more informative description of the relationship among variables, especially in cases of: 1. non-constant variance; 2. non-normality of the noise distribution. The described picture illustrates well the datasets in question, in which: 1. the log(j) distributions are heteroscedastic (increased conditional variance as a function of E and testing aggressiveness)29; 2. the anomalous high j values at high potentials, observed for the high outliers (Fig. 6), could be seen as “noise”, as they positively skewed conditional means that would otherwise (in the absence of pitting) be normally distributed; as expected for electrochemical descriptors derived from PP curves of passive systems57,58. If the pitting activity could be considered as “noise”, it would undoubtedly implicate the referred “non-normality of the noise distribution”, as most of the population (grey curves in Fig. 6b) would have relatively low and similar noise levels with only a few examples (the high outliers) displaying significantly higher levels of noise.

To illustrate the positive skewness of log(j) beyond their conditional distributions, Fig. 7 displays the histograms of the log(jpass) and log(jpit) descriptors for the 5 datasets. From Fig. 7, it is confirmed that the medians of the log(jpass) and log(jpit) descriptors are more representative of the underlying distributions than the respective means, as the former were closer to the ground truth central tendencies of the distributions (maximum KDE values). Similar to what was observed for the conditional log(j) distributions (Fig. 6), the means of the descriptors were generally higher than the corresponding medians due to the presence of high outliers (Fig. 7). The only exception (median larger than mean) was at 0.05 M NaCl (100 mV s-1) (Fig. 7d), where low outliers were more prominent than the high outliers (the lowest conditional Qmin curve was computed for this dataset, Fig. 1d).

Fig. 7: Histograms of the estimated log(jpass) and log(jpit) descriptors for different testing aggressiveness.
figure 7

a 0.005 M NaCl—100 mV s-1, b 0.01 M NaCl—100 mV s-1, c 0.01 M NaCl—50 mV s−1, d 0.05 M NaCl—100 mV s−1 and e 0.05 M NaCl—50 mV s−1. Each distribution’s mean and median are plotted as solid and dashed lines, respectively. The maximum KDE values are represented by empty circle markers (located at the frequency axis origin for illustrative purposes). The arrows indicate the effect of the high outliers, which shift the mean to values larger than the corresponding median.

These statistical analyses further explain why quantile analysis (the conditional median of log(j), in particular) provided a robust model for a simplified estimation of passivity/pitting descriptors. In future investigations on predictive ML, instead of traditional least square regression, quantile (or robust) regression59,60 might be a promising route for approaching pitting corrosion50,61. As a perspective, the analysis of the quantile curves (Fig. 1) might also help locate data clusters (preliminary defined before the application of the rule-based algorithm).

Effect of corrosiveness on the pitting susceptibility

Comparison of the central tendency values of the log(j) descriptors did not indicate a clear trend with increased testing aggressiveness (Fig. 7). On the contrary, by comparing the distributions of Epit (and Epass) (Fig. 8), a few tendencies as a function of corrosiveness could be appraised. First, similarly as previously determined for the conditional log(j)29, the higher the aggressiveness, the more spread the distributions of the E descriptors (clear trend in Fig. 8, from b to e) tend to be. Secondly, all these distributions were continuous and roughly unimodal, with increased multimodality with corrosiveness, ultimately leading to a uniform function-like behaviour at 0.05 M NaCl, 50 mV s-1 (Fig. 8e). Despite Shibata and Takeyama’s conclusion that the random variation of the Epit of stainless steels (in macro-scale polarisation) obeys a normal distribution6,19, all the achieved unimodal distributions failed to be formally described as normal, even with the removal of outliers (for illustration purposes, the normal curves plotted in Fig. 8 were fitted to data without outliers). Even when considering the highest p-values obtained (from D’Agostino and Pearson’s Test), these values consistently equalled or were lower than the significance level: 0.00 and 0.03 (0.05 M NaCl, 50 mV s-1), 0.05 and 0.00 (0.05 M NaCl, 100 mV s-1), 0.00 and 0.00 (0.01 M NaCl, 50 mV s-1), 0.00 and 0.00 (0.01 M NaCl, 100 mV s-1), 0.01 and 0.00 (0.005 M NaCl, 100 mV s-1). In summary, the null hypothesis “the data is normally distributed” was consistently rejected for all sets based on the various normality tests employed.

Fig. 8: Histograms of the estimated Epass and Epit descriptors for different testing aggressiveness.
figure 8

a 0.005 M NaCl—100 mV s-1, b 0.01 M NaCl—100 mV s-1, c 0.01 M NaCl—50 mV s−1, d 0.05 M NaCl—100 mV s1 and e 0.05 M NaCl—50 mV s−1. Dashed lines indicate the maximum KDE values. All plots are in the same E scale. The two arrows indicate the overall trends observed for Epass and Epit, from a to e (except for d, which presented low outliers in larger amounts). The normal distribution fitting lines excluded data outliers, which were here defined based on the Z-score threshold of 3: points more than 3 standard deviations away from the mean were considered outliers (in d, the low outliers were pre-filtered by removing points lower than the first quartile).

Most importantly, the Epass distribution generally presented a linear increase as a function of the considered testing corrosiveness; while the Epit distribution displayed a relatively constant behaviour in the least aggressive conditions (Fig. 8a–c), with a pronounced decrease in the most aggressive scenario (Fig. 8e). These trends could be appraised by following the evolution of the descriptors’ maximum KDE, as punctuated by the two arrows in Fig. 8. It should be noted that the 0.05 M NaCl set (100 mV s-1) (Fig. 8d) was again as a group outlier (the same reason as elaborated above for the log(j) descriptors, Fig. 7d).

As expected, the combined effect of the Epass and Epit progression with increased corrosiveness resulted in an overall decrease in the passivity range. Analysis of Fig. 8 also reveals that this passivity range shortening was generally more affected by the increase in Epass than by the decrease in Epit. Although corrosionists often accredit more attention to the upper end of the passivity range, framing Epit as the main predictor of passivity breakdown, our data-driven analysis suggests that the robustness against pitting (related to the passivity range1,62) would be highly sensitive to the lower end of passivity (Epass). i.e., the trend of passive range shortening seemed to be primarily influenced by the delayed stabilisation of the passive film rather than its early disruption. For instance, based on XPS measurements, Cl- was reported to cause thinning of the Fe passive film, even under conditions where pitting did not occur (passivity)63. Based on the film-breaking mechanism of pit initiation, the thin passive film is in a continual state of breakdown and repair64,65; and in chloride media, there would be a lower likelihood for such a breakdown to heal (inhibition of repassivation by chloride)6.

In Shibata’s stochastic theory of pitting corrosion19, the coefficient of variation (CV) of Epit was calculated for the polished 316 stainless steel dataset, and a value of 9.6% was obtained. In our cases, the following CV values were obtained for the Epit distributions, from the least to the most aggressive conditions: 4.3, 1.8, 11.1, 11.4 and 32.7%. If we remove the uncertain Epit values assigned as 0.5 V by default, the obtained CVs were even lower: 2.6, 1.8, 3.0, 7.0, and 7.9% (highest variations obtained at 0.05 M—50 mV s-1, as expected). Interestingly, the CV values obtained were relatively lower than the 9.6% reported for 316 under classic PP19; which is somewhat surprising, as our Epit values, derived from micro-scale PP measurements, are more sensitive to local surface heterogeneities. The reasons for the overall low relative variability of our sets as compared to the referred benchmark19 might be related to: the 316 L grade being more resistant to corrosion than the 316; the expected higher surface quality of our electropolishing in comparison to (2/0) emery polishing; and our Cl- media being at least 1 order of magnitude less aggressive than their 3.5% NaCl solution; and, most importantly, our large number of samples, generally over one hundred per set (in 19, estimated to be only ~20). As a higher CV might indicate a greater degree of uncertainty in the shape of the underlying distribution, our sets arguably provide more representative distributions of Epit as compared to the Shibata’s set19 (mostly likely resulting from the referred discrepancy in the sample sizes).

In any case, the CVs calculated for the log(jpit) without considering the 0.5 V data points (10.0, 8.1, 10.3, 21.8 and 17.1%) were much larger than the corresponding values determined for Epit (3.8, 4.5, 3.4, 3.1 and 2.2 times larger, respectively), as a function of aggressiveness. Being the CV a statistical measure that represents the relative variability in a set, this quantitative outcome illustrates why the comparison of the distributions across the sets was less straightforward for log(jpit) (Fig. 7) than for Epit (Fig. 8); and also the reason for the generally more accurate estimation of the E descriptors than the log(j) descriptors (Figs. 4 and 5).

To further assess the different susceptibility to pitting among the sets, as suggested in Fig. 8, one extends the comparative analysis to a selection of the largest log(jpit) values achieved (Fig. 9) aligned with the “weakest link” theory applied to pitting corrosion19.

Fig. 9: The top 4 highest log(j) vs. E curves for each dataset superimposed with their Epit/log(jpit) estimates.
figure 9

The ranking was based on the mean of log(j). Epit estimation was not possible for a few curves (absence of passivity in the considered range and/or early pitting cases).

The “weakest link” concept has allowed advances in the statistical strength theory66,67, explaining the high randomness observed in fracture stress resulting from flaws with varying dimensions in a solid material68. The stochastic approach developed for the effect of the body volume on fracture stress69, and often applied to describe sensitive structure properties such as fatigue life68, was generalised to pitting corrosion69,70. Concerning failure by pitting, the presence of a precursor or active state in the film is responsible for pit generation19. Extreme value analysis developed by Gumbel71 was applied to pitting corrosion of Al by Aziz72 and Eldredge73.

Likewise, the “weakest link” concept applies to other electrochemical fields beyond pitting corrosion. For instance, in electrodeposition, nucleation of a new phase on a foreign substrate is primarily driven by the most active sites, similar to pit nucleation. In both cases, the growth of a film (or pit) is driven by the most active sites where nucleation (or initiation) preferably takes place. Recently, we demonstrated that the macroscopic electrodeposition response, described by the onset potential for nucleation, corresponds to that of the most active sites (more positive onset potentials) of a distribution of hundreds of voltammetric curves obtained by SECCM51,52.

Hence, in our “weakest link” problem, one could expect the most active sites to ultimately determine the overall macroscopic electrochemical response in the pitting corrosion of stainless steel. Attempting to trace the most active pits, the top 4 highest log(j) Vs E examples of each set was plotted with their corresponding Epit/log(jpit) pair of estimates (Fig. 9). It could be observed that the higher the testing aggressiveness: 1. the higher the “top 4 highest log(j) Vs E curves” (ranking based on the mean of log(j)); 2. the lower the Epit and the higher the log(jpit). For a few particularly active log(j) vs. E curves, the Epit could not be determined due to a large uncertainty or an absence of a passivity breakdown (Epit modelled as 0.5 V by default). Likewise, the same tendency was appraised by examining the “top 4 conditional jmax curves as a function of E”, as presented in Supplementary Fig. 4. To conclude, by selecting only high j examples, we corroborate the notion (seen as tacit knowledge in macro-scale polarisation) that Epit substantially drops with corrosiveness.

As the range of applicability of the models is somewhat restricted to the limits of the training data, modelling based on local techniques is hardly generalisable. Additional log(j) vs. E datasets obtained at varied conditions (macro experiments, different substrates, alternative PP parameters, etc) would be recommended for evaluating the robustness of the developed modelling methodology. Particular efforts should be focused on validating the approach extended to classic potentiodynamic polarisation curves. Our estimation modelling strategy is expected to perform well on the macroscale, as less variability could be imagined. In fact, as only the most active pitting sites (Fig. 9) would drive the resulting overall corrosion behaviour of a macro surface, they would likely dominate the intensity of the electrochemical signal measured; thus possibly resulting in fewer data dispersion. Whether the conditional median of log(j), as proxy model for estimating the central tendencies of pitting descriptors from a population of micro-scale polarisation curves, would also be accurate in macro-scale experiments warrants further investigation.

In conclusion, our hybrid rule-based/ML approach, combining an LR-based algorithm with supervised ANN, was able to determine relevant pitting corrosion descriptors of electropolished 316 L from populations of localised polarisation curves with different testing aggressiveness. The rule-based LR provided initial estimates of Epit (or Epass) descriptors by fitting two independent linear regression lines to smoothed polarisation curves. However, unsatisfactory results were observed for some sets, indicating the need for an improved estimation strategy. To address this limitation, we leveraged supervised deep learning: the ANN was trained on sets with satisfactory estimates and then deployed on the sets with unsatisfactory estimates, significantly improving the estimation task. The ANN model was designed with feature engineering methods for selecting input features representative of the researched behaviour (passivity or pitting). To ensure the network’s ability to handle the complexity of the data, a data reduction step was performed, reducing the curves to a selection of log(j) values linearly spaced in terms of their “potential distances”. The training process involved hyperparameter tuning and pruning to achieve satisfactory validation performances. The resulting ANN demonstrated accurate mapping of the relationships in the data, with impressive final MSE and R² values, ensuring the generalisation ability on the unseen (unlabelled) data (our method was somewhat similar to active learning). Throughout this study, ML played a pivotal role in addressing the challenges of estimating pitting-related predictors and interpretating high-throughput data for improved mechanistic understanding. Looking ahead, the potential of ML as a framework for pitting prediction is promising. The automated extraction developed for pitting descriptors could be extended to larger datasets and diverse experimental conditions, seeking improved robustness and generalisability of the models.

Methods

The employed substrate and the data acquisition methodology were identical to the ones described in29. Briefly, an industrial electropolished 316 L stainless steel sample was subjected to potentiodynamic Polarisation (PP) tests using an SECCM platform in hopping-mode protocol27,28. Five different combinations of [NaCl] and voltammetric scan rates were employed: 0.005 M NaCl—100 mV s-1, 0.01 M NaCl—100 mV s-1, 0.01 M NaCl—50 mV s-1, 0.05 M NaCl—100 mV s-1, 0.05 M NaCl—50 mV s-1. Single-barrel pipets (borosilicate) with a final internal circular diameter of ~2 µm were used as SECCM probes. The starting potential was –0.5 V, and the end anodic potential was 1.355 V (vs. Ag/AgCl) (LabVIEW (2019, National Instruments) interface running Warwick’s software (WEC-SPM, www.Warwick.ac.UK/electrochemistry)).

Data analysis

The code for data processing and visualisation was written in Python 3.8 language and was made available on GitHub (as Jupyter Notebook files): https://github.com/bcoelho-leonardo/Estimating-pitting-descriptors-of-316L-stainless-steel-by-machine-learning-and-statistical-analysis/tree/5c7c8eac41907667f94c22881650f23a6aee0d64.

The log(j) vs. E datasets: this work employed the same datasets reported in ref. 29. The only difference is that eventual existing missing values were filled with an iterative imputer. The IterativeImputer class (from sklearn.impute) models each feature with missing values as a function of other features and uses that estimate for imputation. All 955 data samples (polarisation curves) with the referred update (filled missing values) are accessible at Mendeley Data74.

As in ref. 29, the datasets considered were sliced upward from 0.5 V (considerably more positive than the open circuit potential (OCP)). Passivity was presumably reached at potentials less positive than 0.5 V for only a few examples of curves; while in another few cases, passivity was not observed within this potential range. In those respective cases, as an approximation, the 0.5 V value was assigned to Epass and eventually assigned to Epit (early passivity and pitting occurrences, respectively). The self-passivation behaviour of 316 L (relative passive state already at OCP) was not considered.

Supervised hybrid rule-based/machine-learning algorithm

A deterministic rule-based algorithm, based on linear regression (LR), was developed to estimate Epit/jpit (or Epass/jpass) descriptors pairs (continuous values) from polarisation curves. Figure 10 schematically shows the overall hybrid rule-based/ML approach employed comprising the LR steps. As illustrated in the plots with borders (in green and grey), two independent linear regression lines fit the smoothed data, one fitted line starting at “low E” and the other ending at “high E” values (~0.7 and ~1.25 V, respectively). The obtained location for Epit (or Epass) is the one that maximises the sum of the R² (goodness of fit) for the two LRs.

Fig. 10: Schematic of the supervised hybrid rule-based/machine-learning approach employed for estimating Epit (or Epass).
figure 10

As a “zero step”, a linear regression algorithm was employed but could not generalise well to all examples of curves. The sets with satisfactory estimates were used to train supervised artificial neural networks (1). Then, the trained ANN was deployed on the set of unsatisfactory estimates (2) and considerably improved the task of estimating Epit (or Epass) (3).

This hand-crafted method (rules manually created by domain experts to define the system’s behaviour) is adaptive, requiring the user input for defining threshold values for the potential (thereby separating the “low E” and “high E” regions to which the LRs are separately applied). The different E thresholds define the existing groups (classes) of log(j) vs. E curves. The definition and validation of classes were qualitatively based on the degree of similarity among the PP curves7. Therefore, the developed code is dataset-specific, with specific condition statements for the different classes.

In summary, the described method was our initial labelling strategy, thus providing labels (target attributes) to the unlabelled data. The label validation was done by visual examination of the Epit/log(jpit) (or Epass/log(jpass)) in the individual curves. Although the LR-based model generally led to satisfactory estimates, for the 0.05 M NaCl (100 mV s-1) and 0.005 M NaCl (100 mV s-1) sets, unsatisfactory results were eventually observed (exemplified in the plot with grey border (Fig. 10)).

In the case of unsatisfactory estimates, instead of further hand-crafting the algorithm to improve the performance of the linear regressors, the strategy was to employ supervised ANN for this task. The ANN was trained on the set of satisfactory estimates and then deployed on the set of unsatisfactory examples (schematic plots with green and grey borders, respectively, in Fig. 10). Contrary to standard practice where a fixed proportion of the data (e.g., 20%) is randomly selected for testing, our test sets comprised specifically challenging samples. This “stress-testing” approach offered a more stringent test of the models’ robustness and generalisation ability, as the predictions were made on samples where the simpler model presented failed estimations. As a result, the proportion of samples in our test sets relative to the entire datasets varied (10%, 17%, and 3%, depending on the set).

The selection of input features was based on feature engineering, aiming to identify relevant features representative of the descriptor of interest, thus allowing the ANN to estimate the data targets accurately. To focus specifically on the relevant regions of the PP curves that encompass the passivity/pitting descriptors, the log(j) vs. E (smoothed) curves were partitioned through a data slicing procedure (Supplementary Fig. 5). The sliced curves presented 700 or 500 data points (0.539 or 0.385 V, in potential range), sufficiently capturing the regions related to Epass or Epit with high security margins. Given the complexity that an ANN would face in processing thousands of data points as features, a second data reduction step was undertaken to decrease further the dimension of the log(j) input array. Sparse sampling was conducted at every 40th (or 60th) point from the sliced log(j) array, leading to a final selection of 13 (or 12) log(j) values (for Epit or Epass, respectively). These numbers of input features were found to represent the target regions of the PP curves adequately.

Reducing the curves to a selection of log(j) values that are linearly spaced in terms of their “E (V) stamps” was sufficient to describe the relevant regions in the curves. This is demonstrated by the 13 “descriptor blue dots” in the “1. neural network training” plot (Fig. 10), which represent the selected log(j) input features (equidistant on the potential (V) scale). The potential was treated as a constant feature, and related values were not included as input. Finally, to improve the model convergence, we applied the StandardScaler method (sklearn.preprocessing package) to the sliced log(j) data for standardisation of both the input and output data. This method standardises features by removing the mean and scaling to unit variance.

A sequential model (keras.models.Sequential) was defined, generating a classic multi-layer perceptron networks (also known as feedforward neural networks). As shown in Fig. 10, the number of nodes in the input layer was equal to the number of input descriptors (12 or 13 log(j) values). The output layer consisted of a single node, providing only one output (either Epit or Epass). Given that log(j) is a function of E, the Epit and Epass estimates sufficed for finding the corresponding log(jpit) and log(jpass) values. The network’s topology, including the optimal number of hidden layers and nodes within each layer, was determined through exploratory testing and visual validation. Specifically, it was found that two hidden layers were sufficient to accurately map the relationships in the data, with the optimal number of nodes in the first and second hidden layers consisting of 12 and 11, respectively. Attempts to reduce the number of nodes in these hidden layers led to an unsatisfactory generalisation of the learning process (increasing the number of nodes beyond 12 would deviate from the logical progression of reducing the node count as approaching the output layer).

The training started with 20-fold cross-validation (CV) (KFold function from sklearn.model_selection), allowing hyperparameter tuning by monitoring the loss function of the validation set. The loss function was the mean squared error (MSE) with Adam optimiser. The ReLU activation was used in the input/hidden layers. The number of batches was equal to the number of training samples (110 and 278 for in 0.05 M and 0.005 M NaCl (100 mV s-1)). Although after 200 epochs of training, the validation losses were generally fairly low (below 0.0001 (µA cm-2)²), for a few validation sets, the losses reached relatively higher values (~0.025 (µA cm-2)²). After this initial training, the network was pruned (using the tensorflow_model_optimization module) and further validated by random sampling (validation_split=0.1) of the labelled dataset. The number of epochs increased to 4500–6000 for the pruned network, and the learning rates eventually decreased from 10-3 (default) to 10-12–10-5. After achieving satisfactory validation performance, the final stage consisted of retraining the model with the entire labelled dataset. The final MSE (Eq. 1) values were 9.87 × 10-5, 1.35 × 10-4 and 1.72 × 10-4 (µA cm-2)²; and the final R² (Eq. 2) values achieved were 0.9025, 0.9707 and 0.9653, respectively for 0.005 M NaCl 100 mV s-1 (Epit) and 0.05 M NaCl 100 mV s-1 (Epass and Epit).

$${MSE}=\frac{1}{N}\sum {{(y}_{i}-{\hat{y}}_{i})}^{2}$$
(1)
$${R}^{2}=1-\frac{{\sum }_{i}{({y}_{i}-{\hat{y}}_{i})}^{2}}{{\sum }_{i}{({y}_{i}-{\bar{y}}_{i})}^{2}}$$
(2)

Such as the LR-based estimates, the validation of the ANN estimates was conducted through visual examination of the obtained Epit/log(jpit) (or Epass/log(jpass)) in the individual log(j) vs. E curves. Again, the cost for pre-labelling the deployed set would be prohibitively high, implying further hard coding of the rule-based model. It is important to note that if the LR-based approach alone could solve our estimation problem, resorting to ML would not be necessary. It was precisely because the labels obtained were unsatisfactory that the rule-based algorithm was leveraged with ANN. In summary, we employed a supervised learning strategy on unlabelled targets75. Our ANN strategy, in particular, is somewhat similar to a transductive transfer learning framework, where labelled data is only available for the source domain but not for the target domain75,76. In our case, the unlabelled sets from the target domain served as our test sets, allowing us to evaluate the models’ generalisation ability on the unseen data in the target domain.

An alternative strategy based on active learning could be thought of: when the LR estimation is uncertain, human expertise could proceed with labelling. This process could be repeated iteratively, selecting the most informative instances for labelling based on the model’s uncertainty, thus improving the accuracy of ulterior ML modelling77,78. As a perspective, convolutional neural networks (CNNs) may be a promising alternative for feature extraction from similar univariate electrochemical signals, given their ability to capture local patterns and spatial hierarchies in data79,80.

Ground truth for the central tendencies of passivity and pitting descriptors

As explained above, a hybrid rule-based/ML supervised approach was used to estimate passivity/pitting descriptors from the populations of log(j) vs. E curves from the 5 different sets (Fig. 11a and b). Next, the bivariate distributions of the Epass/log(jpass) (or Epit/log(jpit)) estimates obtained for each set were modelled with Gaussian KDE (kde.gaussian_kde, scipy.stats module) and qualitatively validated (Fig. 11c). The maximum KDE values of the distributions were considered the ground truth of the central tendency values of Epass/log(jpass) and Epit/log(jpit)) (Fig. 11d).

Fig. 11: Schematic of the two modelling strategies for estimating the central tendencies of the pitting descriptors.
figure 11

a Population of experimental log(j) vs. E curves (here exemplified with the 0.005 M NaCl (100 mV s-1) dataset). b Descriptors distributions obtained by the hybrid rule-based/machine-learning algorithm. c Density estimation of the descriptors distributions using KDE. d Definition of the central tendencies of the descriptors as the maximum KDE values (ground truth). e Proxy model (mean or median-based) for representing an entire set of polarisation curves. f Simplified approach for estimating the central tendency values of Epass, log(jpass), Epit, log(jpit): estimation of the pitting descriptors directly from the conditional median curve (e). For the sake of comparison, the same ML algorithm was employed in both modelling approaches (between steps a and b and between steps e and f).

Epass and Epit distributions: normality tests

We applied three normality tests to assess whether the E descriptors followed a normal distribution: Shapiro-Wilk, D’Agostino and Pearson’s test, and Anderson-Darling. We used the shapiro, normaltest and anderson modules from the scipy.stats package to perform these tests. The null hypothesis for all tests was that “the E descriptors followed a normal distribution”. The decision whether or not to reject the null hypothesis was based on comparing the obtained p-values (or the test statistic values for Anderson-Darling) with the significance level (alpha = 0.05 by default).

Proxy models for estimating the central tendency of pitting descriptors

Two statistical estimation strategies were tested to verify whether the central tendency values of the Epass/log(jpass) and Epit/log(jpit) distributions could be estimated in a reduced manner. These simplified approaches were either mean-based or median-based (illustrated with the median in Fig. 11e). Our research question was whether the conditional mean (or conditional median) of log(j), as a function of E, could be used as a proxy model for accurate estimation of the central tendencies of the pitting descriptors. To test such hypotheses, the conditional mean and conditional median of log(j) were used as input features in the ANN (Fig. 10) to obtain Epass (or Epit) outputs (Fig. 11f). The biases of the trained ANN were transmitted to the median/mean-based estimations, thus establishing a fair basis of comparison with the ground truth data (also derived from the ANN estimates from individual curves).

Quantiles are data values that divide a dataset into adjacent intervals containing the same number of data samples81. Quantiles display variation in population samples without making assumptions about the underlying distribution. They are useful to gain insight into the distribution of a random value compared to its mean value. The conditional quantiles 0.35, 25, 50, 75 and 99.65% (referred to as Qmin, Q1, median, Q3 and Qmax) were used for representing the log(j) distributions as well as for estimation purposes (median).

The model assessment of the mean and median-based approaches was based on separate residual analysis for the central tendencies of Epass, log(jpass), Epit and log(jpit). The actual central tendencies of the descriptors corresponded to their maximum KDE values. As presented in Eq. 3, residuals are calculated by subtracting the estimated ŷi from the actual yi value for the different descriptors y (Epass, log(jpass), Epit, log(jpit)) and sets i.

$${residuals}={actual}\,y\left({y}_{i}\right)-{estimated}\,y\left({\hat{y}}_{i}\right)$$
(3)