Introduction

Petrochemical plants and refineries, among others, have been fighting corrosion under insulation (CUI) for decades. Despite these efforts, CUI occurs recurrently and remains a focal challenge for safe operations in those industries. In this regard, ~40–60% of the pipe maintenance costs is related to CUI detection and damage repairs1,2.

CUI damage is not visible from the outside and cannot be detected easily3. Although some non-destructive examination (NDE) techniques can be used for CUI detection without having to remove—or only require partial removal of—the insulation and jacketing or cladding4, in a complex industrial site there are always areas that are impossible to access. Thus, the full removal of the insulation becomes the most effective approach to assess the severity of CUI in those areas5. However, this process is expensive and time-consuming requiring, e.g., inspecting extensive insulated pipe networks and components, long process downtime and high labour cost6,7.

Risk-based methodologies are commonly used to decide where and when to implement inspection campaigns8. However, the actual hitting rate of identifying truly active CUI sites may be improved through in situ CUI monitoring. It is anticipated that an early CUI detection may assist in locating affected areas for inspection, thereby reducing the overall maintenance costs9.

Several on-line monitoring techniques have been investigated under CUI environments. Ayello et al.9 proposed the use of a radio frequency identification (RFID) tag connected with a Cu–Mg galvanic couple for CUI detection. When the galvanic couple was wet, the RFID tag will be activated, indicating the presence of moisture in the insulation system. Similarly, He10 developed a CUI sensor based on a passive RFID tag. A thin steel sheet was attached to the surface of the RFID transponder as an RF shielding layer. The strength of the RF signal received by the RFID receiver increased as the shielding effectiveness of the steel sheet decreased due to corrosion. A correlation between the signal strength and corrosion rate was established for further CUI monitoring. However, RFID tags that can withstand high temperatures may be required if they are used under hot service conditions5. Moreover, the presence of metallic jacketing may interfere with the readable RFID signals and this technique is not sensitive to localised corrosion10. Cho et al.11 applied an optical fibre humidity sensor and acoustic emission (AE) technique for CUI detection. Although it can provide an early warning of water intrusion and active CUI, the intensity and form of corrosion cannot be estimated. Additionally, partial removal of insulation is required to facilitate the AE measurement, making it unsuitable for continuous monitoring of CUI.

It is widely accepted that the root cause of CUI is the presence of water or moisture at the interface between the external surface of the metal component and the thermal insulation, creating electrochemical corrosion cells12,13,14. Conventional corrosion monitoring methods, such as electrochemical impedance spectroscopy (EIS) and electrical resistance (ER) can estimate general corrosion rates; however, they are inadequate for detecting localised corrosion15,16,17, which is highly likely to be associated with CUI18,19. Moreover, it is challenging to obtain valid electrochemical results in low-moisture and high ER environments such as those commonly present in CUI situations20. Electrochemical noise (EN), on the other hand, has been reported as a promising tool for detecting localised corrosion21, including in low moisture conditions, such as atmospheric corrosion22 and soil corrosion23. Recently, Yang et al. introduced a corrosion monitoring approach based on recurrence quantification analysis (RQA) of EN and machine learning (ML) methods, capable of differentiating between pitting corrosion and uniform corrosion processes24,25. This approach was also successfully applied to monitor corrosion mechanisms of carbon steel buried in various types of ore cargoes with low moisture contents26. However, there is limited research on the use of EN-based approaches to study CUI rates and mechanisms.

The objective of this work was, therefore, to evaluate the use of EN for real-time CUI monitoring. Herein, electrochemical current noise (ECN) was measured with a custom-built sensor comprising two identical half-ring steel samples. The corrosion rates and mechanisms of the top and bottom sections of a pipe under mineral wool thermal insulation were investigated using the methodology developed by Yang et al.26. The standard deviation (SD) of the measured ECN data was correlated with the wet/dry condition and the severity of corrosion under different experimental variables. Additionally, a random forest (RF) model based on the recurrence quantification feature variables extracted from the ECN signals was established for detecting localised CUI. The potential of EN and RF methods for CUI monitoring is discussed.

Results and discussions

Corrosion rate and maximum pit depths

Figure 1 shows the surface morphologies of half-ring samples after experiments with and without the addition of a volatile corrosion inhibitor (VCI). The corrosion products were removed according to the procedures described in the ASTM G1 standard. As can be seen, in both situations, the samples placed at the 6 o’clock (i.e., bottom) position were uniformly corroded, while localised corrosion occurred on the top samples (i.e., 12 o’clock position).

Fig. 1: Surface morphologies of carbon steel half-ring samples after corrosion product removal.
figure 1

a Sample at the top section of the test rig without VCI. b Sample at the top with VCI. c Sample at the bottom without VCI. d Sample at the bottom with VCI.

The corrosion rates based on weight loss and maximum pit depths were measured, and the results are presented in Fig. 2a, b, respectively. It can be seen that, in the absence of VCI, the top samples (i.e., 12 o’clock position) experienced a high uniform corrosion rate of 1.88 mm year−1 while that of the bottom samples (i.e., 6 o’clock position) was 0.2 mm year−1. The addition of the VCI reduced the corrosion rate to 0.53 mm year−1 for top samples. However, the corrosion rate of the bottom samples was 0.16 mm year−1, i.e., similar to that without VCI. Regarding the extent of localised corrosion, a maximum pit depth of 844 µm was found on the top samples in the absence of VCI. In contrast, the addition of VCI reduced the maximum pit depth of the top samples to 148 µm.

Fig. 2: Effects of VCI on the corrosion of carbon steel under mineral wool insulation.
figure 2

a Uniform corrosion rates of carbon steel half-ring specimens with and without VCI. Bars represent the average value of triplicate samples, and the error bars indicate the corresponding standard deviation. b Maximum pit depths of carbon steel half-ring specimens with and without VCI. The pit depths were associated with the deepest pits found among the triplicate specimens; thus, no error bars are shown.

Estimation of corrosion intensities under insulation

Figure 3 shows a segment of an ECN signal associated with dry and wet insulation in the presence and absence of VCI, as indicated. As can be seen, the surface of the EN sensor remained dry in the absence of active corrosion propagation during the dry cycles, resulting in an ECN signal that appeared to be random noise with an extremely low amplitude, i.e., around 10−10 A cm−2. In comparison, the magnitudes of the ECN signals related to the wet insulation were approximately three orders of magnitude higher than those of the dry condition, suggesting high corrosion intensities. Additionally, the transients present in the ECN signals obtained with the top sensors suggested the occurrence of pitting corrosion27.

Fig. 3: ECN segments obtained with top and bottom sensors covered by dry insulation and wet insulation with and without the addition of VCI.
figure 3

a ECN segment associated with dry insulation. b ECN segment related to the top section under wet insulation without VCI addition. c ECN segment related to the bottom section under wet insulation without VCI addition. d ECN segment related to the top section under wet insulation with VCI addition. e ECN segment related to the bottom section under wet insulation with VCI addition.

Figure 4 shows the SDs calculated from segmented ECN signals obtained during the experiments, where a horizontal line at 10−10 A cm−2 indicates the dry condition baseline, derived from the ECN segment shown in Fig. 3. SD values below the dry baseline indicate no corrosion while values above it suggest active CUI propagation. As can be seen, the current SDs associated with both top and bottom specimens were all above the baseline during the 14-day tests, indicating that corrosion was continuously active on the steel samples under the mineral wool insulation. In this regard, the addition of VCI led to a decrease in current SD values for both top and bottom sections, indicating lower corrosion intensities, which was in agreement with the corrosion rates shown in Fig. 2a.

Fig. 4: Variations of the current SDs with exposure time.
figure 4

Triangles represent the top section and dots represent the bottom section. The area above the dry baseline is related to wet steel surface and active corrosion under insulation, while that below the baseline means the insulation system is dry and safe from corrosion attack. Lines added to aid trend visualisation.

A separate experiment was conducted to determine whether the current SD signal can be used as a suitable indicator for monitoring insulation dry-out, which involved accelerating the drying process. In this experiment, the mineral wool insulation was directly exposed to the ambient without the metallic jacketing. Two wet/dry cycles were carried out. In the first cycle, the insulation was soaked in artificial seawater overnight, and ECN measured once the temperature of the test rig reached 80 °C. The second wetting cycle was initiated on day 3 by injecting 1000 mL of artificial seawater from the top section of the insulation using a syringe. Figure 5 shows the variations of current SDs obtained from this experiment.

Fig. 5: Current SDs obtained from two wet/dry cycles with mineral wool insulation without jacketing.
figure 5

Horizontal line indicates the dry condition baseline. Values above the baseline indicate wet condition with corrosion activities, while those at or below the baseline suggest dry condition and without corrosion activities. Lines added to aid trend visualisation.

As can be seen, in the first cycle, both top and bottom sections dried in less than 2 days since the absence of the metal jacket facilitated water evaporation. The second wetting cycle started on day 3 resulted in a sharp increase in current SD for both top and bottom sections, suggesting an increase in corrosion activity. Subsequently, the top section dried out in within 2 days, while the bottom section remained wet for about 3 days before it was completely dry. For the bottom section, current SD gradually decreased with time, indicating that the intensity of corrosion activities diminished gradually as the insulation dried. Although the current SD associated with the top section was approximately two orders of magnitude higher than that of the bottom section during the wet period, the shorter time of wetness decreased the exposure time and could reduce the overall corrosion attack. As a result, the average corrosion rates of the top and bottom sections could be similar when taken over the full exposure period. Indeed, weight loss measurements after 6 days confirmed this assumption, i.e., both the top and bottom sections had a corrosion rate around 0.09 mm year−1.

Additionally, the post-test analysis revealed that the top ECN sensor mainly underwent uniform corrosion. However, some shallow pits (i.e., between 2 and 14 μm) initiated during each wet period, as shown in Fig. 6a. In contrast, isolated pits were found on the bottom ECN electrode. The maximum pit depth was 37 µm, as shown in Fig. 6b.

Fig. 6: Surface morphologies of ECN sensors after corrosion product removal.
figure 6

a Top ECN sensor showing superficial general corrosion. b Bottom ECN sensor showing distinct pitting corrosion. The sensors were retrieved from the mineral wool insulation system without jacketing.

Development of RF classification model

As discussed previously, mineral wool insulation without the addition of a VCI resulted in localised corrosion of the top section and uniform corrosion on the bottom samples. An RF model with tree-bagging algorithms was developed to distinguish between the two types of corrosion processes. Firstly, 12 feature variables were extracted from the ECN signals by the RQA method26. Secondly, these variables were labelled as uniform or pitting to reflect the corrosion processes. Thirdly, 70% of the labelled variables were randomly selected to train an RF model containing 100 decision trees. Once the model was trained, the classification accuracy was evaluated using out-of-bag (OOB) error28. Finally, the accuracy of the model was validated by comparing the predicted and actual corrosion types for the other 30% of the labelled feature variables. After validation, the established RF model could be used to automatically identify the corrosion mechanisms associated with other mineral wool insulated systems. Similar approaches have been applied for assessing the forms of corrosion of carbon steel buried in iron ore and coal26. Details regarding feature variable extractions, the concept of OOB error, and the procedures of training and validation of the RF model are presented in the Methods section of this paper.

Figure 7 shows the OOB error with an increasing number of trees grown in the forest. Five independent training processes were implemented, obtaining similar results, which confirmed the robustness of the RF algorithm. As can be seen, the OOB error decreased as the number of trees increased from 1 to 10. Afterwards, the OOB error reached a stable value of around 0.09. In other words, the RF model could accurately discriminate between localised and uniform corrosion with a classification error <10% when 20 or more trees are present in the forest.

Fig. 7: Out-of-bag classification error of the RF model.
figure 7

It shows that the model can correctly discriminate between uniform and localised corrosion with a percentage error <10% when the RF model contains more than 20 independent decision trees.

Once the accuracy of the model was confirmed, the test datasets were processed by the RF algorithm for classification. Figure 8 compares the original and the predicted labels. The majority of the test data associated with the bottom section were correctly identified as uniform, and those from the top section were identified as localised corrosion. The overall prediction accuracy was 91%, computed from the ratio of correctly classified test data to the total number of test data.

Fig. 8: Comparison between the actual corrosion types and the predictions of the RF model for the test dataset associated with mineral wool insulation.
figure 8

Overlaps of circles and crosses indicate the corrosion forms associated with the bottom and top sections are correctly identified as uniform and localised, respectively.

Application of the classification model

The trained RF model was applied to identify the dominant corrosion types (i.e., localised vs. uniform) of the carbon steel samples during the 14-day experiment with VCI and the 6-day test with two wet/dry cycles. The raw ECN data were first consecutively divided into equal segments, followed by RQA to extract feature vectors from each ECN segment. Depending on the duration of the ECN sampling, the total number of segments varied from 8 to 30 for each day. Afterwards, all the feature vectors were submitted to the RF model to predict the corrosion type. The predicted ‘% localised corrosion’ for each day was obtained based on Eq. (1):

$${\mathrm{\% }}\,{\mathrm{localised}}\,{\mathrm{corrosion}} = \frac{{{\mathrm{Number}}\,{\mathrm{of}}\,{\mathrm{segments}}\,{\mathrm{predicted}}\,{\mathrm{as}}\,{\mathrm{localised}}\,{\mathrm{corrosion}}}}{{{\mathrm{Total}}\,{\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{segments}}}} \times 100{\mathrm{\% }}.$$
(1)

Figure 9 shows the percentage of ECN data predicted as localised corrosion over 14 days under mineral wool insulation in the presence of a VCI. According to the model, the bottom section suffered mainly uniform corrosion with up to 10% of the readings identified as localised corrosion. In comparison, more fractions of the ECN data were classified as localised corrosion for the top samples during the first 10 days, suggesting a greater extent of localised corrosion compared to that of the bottom samples. Since the overall fraction of localised corrosion was less than 50%, it is plausible to assume that localised corrosion of the top section in the presence of VCI was less severe than that of the VCI-free condition. These assumptions agreed with the surface morphologies and the maximum pit depths shown in Figs. 1 and 2.

Fig. 9: Percentage of ECN data predicted by the RF model as localised corrosion for both top and bottom samples under mineral insulation with VCI addition.
figure 9

Lines added to aid trend visualisation. This figure implies that the top section suffered from a greater extent of localised corrosion attack than the bottom section.

Figure 10 shows the prediction results associated with the validation test with mineral wool insulation without jacketing and without VCI. As indicated in Fig. 5, active corrosion only occurred on day 1 and day 3 for the top section, and two more days (i.e., days 4 and 5) for the bottom section. ECN data measured on these days were selected for corrosion type identification. As can be seen, most of the ECN segments related to the bottom section were predicted as localised corrosion, which correlated with the surface morphologies shown in Fig. 6b.

Fig. 10: Percentage of localised corrosion predicted by the RF model for top and bottom samples under mineral wool insulation without VCI addition and jacketing.
figure 10

Lines added to aid trend visualisation. It indicates that dominant form of corrosion for the bottom section is localised, which is in line with the surface morphology of the ECN sensor after corrosion product removal, as shown in Fig. 6b.

In contrast, only 35% of the ECN data obtained on day 3 were predicted as localised corrosion for the top section. Given that the top section had a relatively short time of wetness, pit propagation was limited by the cathodic reaction on the reduced wet surface area surrounding a given pit29,30. Therefore, the overall predictions can be viewed as consistent with the surface morphologies shown in Fig. 6a.

Limitations of the methodology

The use of ECN data to detect insulation dry-out and the application of the RF model to predict the forms of corrosion were shown to be promising tools for in situ CUI monitoring. Nonetheless, some limitations still exist, which require further investigation:

  1. (1)

    Only qualitative results regarding corrosion intensities were obtained according to the current SD. More experimental data are required to quantify the relationship between SD and corrosion rates.

  2. (2)

    The RF model presented in this work was trained with the experimental data collected from the mineral wool insulation case, and it may not apply to other types of insulation materials. In this regard, future investigations will focus on expanding the training database to include a variety of insulation materials and exploring different methods to extract feature variables from ECN data. Additionally, efforts can be made to predict the intensity of localised corrosion based on EN data with longer exposure time.

Methods

Experimental

Half-ring samples were cut from seamless UNS G10220 carbon steel pipes. The outer diameters of the half-ring samples were 60 mm, and their width was 10 mm. The chemical composition of the steel samples was (wt. %): 0.22 C, 0.94 Mn, 0.24 Si, 0.05 Cr, 0.01 P, 0.08 Cu, 0.03 Mo, 0.02 Ni, 0.01 Al and Fe (balance). Before each experiment, the steel samples were electro-coated with POWERCRON® 6000CX, and the outer surface wet ground down to 600 grit SiC paper. Afterwards, the samples were rinsed with deionized water and ethanol, followed by drying with compressed nitrogen gas. The ECN sensor consisted of two half-ring samples cut from the same steel pipe with conducting wires soldered onto each sample.

A mineral wool insulation material was used in this study. Artificial seawater prepared according to ASTM D1141 was used as the test solution. A commercially available VCI was used in the insulated system with mineral wool to evaluate its effectiveness in terms of CUI mitigation.

The test assembly used in this work was adapted from ASTM G189. Figure 11a–c shows the 3D rendering of the fully assembled setup, specimens and EN sensors underneath insulation and jacketing, and a cross-section view of the assembly, respectively. Figure 11d presents the actual assembly with the right-hand side showing the location of the specimens. This design facilitates two sets of tests with different conditions at the same time and investigating the corrosion behaviour of top and bottom sections separately.

Fig. 11: Different views of the CUI test assembly.
figure 11

a 3D drawing—full setup. b 3D drawing—insulation and jacket removed. c 3D drawing—cross-section. d Photo of the setup with the right-hand side showing the arrangement of specimens and EN sensors.

Firstly, three half-ring samples and one ECN sensor were placed on the top and bottom of the test rig, respectively. Afterwards, the insulation was placed around the samples and jacketed with a UNS S31603 sheet. Two holes with a diameter of 6 mm spaced 50 mm apart were drilled at the bottom of the jacketing. The insulation and jacketing were held in place using a stainless steel ring clamp. Both ends of the jacketed insulation were closed with a polycarbonate cap. Afterwards, the lap joints were sealed with silicone sealant. The surface temperature of the test rig was controlled using an immersion heater. Artificial seawater was slowly injected into the insulation once the surface temperature of the test rig reached 80 °C. The total volume of injected seawater was 2.5 L, which was five times the weight of the dry insulation used. Table 1 shows the designed test conditions applied in this study.

Table 1 Experimental conditions applied in the study.

For Test no. 2, once the surface temperature of the test rig was stabilised at 80 °C, 2 mL of a commercially available VCI used to treat CUI was injected into the dry insulation before the wetting step. The volume of injected VCI was equivalent to 660 mL per cubic metres (m3) of the insulation volume. Test no. 3 was performed to evaluate the sensitivity of the ECN technique for CUI monitoring. The insulation was soaked in artificial seawater overnight before testing and re-wetted with 1 L of seawater after 2 days. The test was stopped on day 6 when the insulation was completely dry.

ECN was measured using the ESA410 data acquisition software with a Gamry Reference 600 potentiostat in ZRA mode. The sampling rate was 2 Hz. ECN was recorded for at least 2 h daily for each test with a built-in low-pass filter applied to avoid aliasing in the data. Matlab 2019a with the built-in Statistics and Machine Learning Toolbox was used to process ECN data.

At the end of the exposure, samples were retrieved and cleaned as per ASTM G1 (ref. 31) to obtain corrosion rate values based on mass loss. The extent of localised corrosion was examined based on the maximum pit depths, which were measured using an infinite focus microscope (Alicona Instruments).

Methodology for ECN data processing

The proposed method for CUI monitoring had two main objectives: (1) the estimation of corrosion intensities and (2) the identification of the prevalent forms of corrosion in the case of active corrosion. Figure 12 shows the framework of the proposed method. Overall, two stages are involved in this method; i.e., an off-line stage to establish the baseline in terms of corrosion intensities and dominant corrosion types, and an on-line stage, in which newly measured ECN data are assessed and compared with the baseline to predict the corrosion activities of the insulated system. Specifically, the SD of the measured ECN is chosen to indicate corrosion intensities. Before extracting the SD value, it is necessary to remove the direct current (DC) drift in the raw data to convert the signal to a quasi-stationary state27. It has been demonstrated that very different SD values could be obtained with different drift removal methods32. However, as a statistic parameter, SD is a measure of the total power of the signal, which contains a series of frequencies27,32,33. In terms of corrosion monitoring, the changes in the SD values may provide additional information to the simple comparison of absolute values, as long as the trend removal method was kept the same for all cases32. Therefore, a simple linear detrending method was employed instead of more complicated methods, such as 5-order polynomial fitting and wavelet analysis. Current SD obtained under dry conditions served as a baseline. Corrosion was considered active only when the SD of the measured ECN was above the baseline. RQA was used to extract feature variables from ECN signals. The extracted feature vectors were then used as inputs to train an RF classifier. After training, the RF model was used to predict the dominant corrosion types of newly measured ECN data.

Fig. 12: Flowchart of the proposed method for CUI monitoring.
figure 12

Off-line stage involves baseline measurement and collecting ECN data from both uniform and localised corrosion systems for training the model to distinguish the two forms of corrosion. For the on-line monitoring, newly measured ECN data will firstly be processed to compare with the baseline to determine wet/dry condition of the insulation system. If wet condition is indicated, then the ECN data will be further examined using the model developed at the off-line stage to predict the dominant form of corrosion.

Figure 13 illustrates the procedures to generate the representative current SD for each day. For instance, suppose that on day 1, 7200 s of ECN data were recorded, there would, then, be n = 4 short segments in total. For each segment, the DC drift was removed by subtracting the linear trend from the raw data32. Afterwards, the SD of each segment was calculated, and the average SD values of the four segments obtained and reported.

Fig. 13: Procedures to generate the current standard deviation (SD) for each day.
figure 13

The ECN data collected on each day are chopped into consecutive short segments containing recordings for 1800 s. DC drift is removed before standard deviation is calculated.

Recurrence quantification analysis

RQA refers to the characterisation of small structures in a recurrence plot (RP). RP is a graphical tool developed in the late 20th century for visualising the recurrence behaviours in dynamic systems34. RQA-based methods have been successfully applied to numerous fields such as the detection of dynamical transitions35, ecological regimes36,37, economical dynamics38, medical signal analysis39,40, chemical reactions41,42 and damage detection43. In recent years, RQA has also been employed to analyse electrochemical noise data24,44,45,46.

An RP can be expressed mathematically by Eq. (2). In this study, xi and xj represent the measured current data at times i and j, and ||·|| denotes the Euclidean distance between any two data points, N is the total number of data points in the current noise signal, ε is a pre-defined threshold value, H represents Heaviside function which produces one when the value in the parenthesis is negative and otherwise zero34. Therefore, the resultant R is a matrix composed of zeros and ones. There are several options proposed in the literature for the selection of ε47,48,49. In this study, the threshold was fixed as 50% of the SD of the Euclidean distances for all pairs of data points contained in the measured ECN signal.

$$R_{i,j} = {\mathbf{H}}\left( {\varepsilon - \left\| {x_i - x_j} \right\|} \right),i,j = 1,2, \ldots ,N.$$
(2)

If a black dot is assigned to Rij = 1 and a white dot to Rij = 0, then the matrix R can be transformed into a graph, referred to as, RP. Figure 14 presents an example of ECN segment (top) and associated RP (bottom). As can be seen, the black dots formed some small structures. These structures can be quantified by a number of variables. In this work, 12 feature variables were extracted and collectively used to quantify the RP plots. Because the RPs can be viewed as the graphical transformation of the original ECN data, the feature variables can be considered as the characteristics of the ECN signal. A publicly available Matlab toolbox—“Cross Recurrence Plot (CRP)” toolbox—developed by Norbert Marwan34 was used in this work to generate RPs and quantification variables. Details of what the variables represent and how they are calculated can be found in the previous work done by the authors24,25 and the original work by Norbert Marwan34.

Fig. 14: An example of the ECN signal obtained in this work and associated recurrence plot.
figure 14

a ECN signal. b Recurrence plot generated from the ECN signal.

Random forest

RF is a supervised ML algorithm, which can be used to solve both classification and regression problems28. RF consists of multiple decision trees. A decision tree learning algorithm recursively searches for a binary partition of input feature variables to generate outputs with homogenous class labels. However, a single decision tree is susceptible to overfitting, which means that the tree model may not be able to predict the class labels of test data correctly even if the model has learned to classify the given training data with 100% accuracy.

RF overcomes the overfitting issue of decision tree models28,50. An RF model is generated by bootstrapping (bagging) of training data. During the training process, original data are randomly divided into subsets, and each subset is used to train an individual decision tree. Moreover, for each subset, the feature variables (predictors) are also randomly selected for splitting at each node of the tree. Additionally, for each tree in the forest, a set of data is excluded during training. This dataset is called OOB data. The OOB data are used to estimate the classification error of the RF model. Specifically, each tree grown in the forest gives a prediction for its OOB data and the majority votes of the predictions from all the trees are used as the final prediction of the RF model. The generated prediction error, namely OOB error, is used as the classification error of the RF model50,51,52.

Figure 15 shows the step-by-step procedure of data preparation for training an RF model based on ECN data. The training data were collected with the top and bottom ECN sensors in Test 1. Specific steps include:

  1. (1)

    Divide the raw ECN data into short segments and remove DC drift, as illustrated in Fig. 11. In total, the 14-day experiment resulted in 727 and 860 segments for the bottom and top sections, respectively.

  2. (2)

    Transform the digital data into RPs.

  3. (3)

    Extract a feature vector containing the 12 variables that quantify each RP.

  4. (4)

    Label all the feature vectors related to bottom ECN segments as ‘1’, representing uniform corrosion dominated corrosion process, and those related to the top as 2, representing localised corrosion.

  5. (5)

    Randomly select 70% of all the labelled feature vectors (1111 vectors in total) for training the RF model. The rest 30% (476 vectors in total) served as test dataset, which were not involved in the training process.

  6. (6)

    Evaluate the classification accuracy of the trained RF model by comparing the predicted labels for the test dataset and the original labels (i.e., Eq. (3)):

$${\mathrm{Accuracy}} = \frac{{{\mathrm{Sum}}\,({\mathrm{predicted}}\,{\mathrm{labels}} = {\mathrm{original}}\,{\mathrm{labels}})}}{{{\mathrm{Total}}\,{\mathrm{number}}\,{\mathrm{of}}\,{\mathrm{vectors}}\,{\mathrm{in}}\,{\mathrm{the}}\,{\mathrm{test}}\,{\mathrm{dataset}}}} \times 100\%.$$
(3)
Fig. 15: Steps for training a random forest model to identify different forms of CUI.
figure 15

Briefly, the procedures include (1) ECN data segmentation and pre-treatment, (2) converting digital data into recurrence plot, (3) feature vector extraction, (4) labelling feature vectors based on data source, (5) model training and (6) model validation.

The trained model was then used to predict the dominant corrosion type resulted from Tests 2 and 3. For predictions, feature vectors were generated from the ECN data obtained for each day with the same methods used during the training process.