Introduction

Although pyro refining could be used to yield copper products with minimal impurities, these products potentially did not meet the stringent high-quality standards for copper. Consequently, most of the crude copper often undergo electrorefining to eliminate the impurities that are resistant to the pyrometallurgical refinement, thereby enhancing the quality of electrolytic copper1. Typically, copper from the anode dissolves into the solution under direct current. Preferentially, copper from this solution precipitates on the cathode, resulting in what is termed as electrolytic copper. During this electrolysis procedure of copper, operating parameters have traditionally been determined based on the experience of operators, introducing significant subjectivity and arbitrariness, and being susceptible to various disturbances. However, the production process could yield inconsistent quality of electrolytic copper, evidenced by a low proportion of first-grade products which were characterized by the uneven copper distribution, frequent fins, and granular protrusions2. Hence, it would be imperative to investigate and control the factors influencing the quality of electrolytic copper to attain the improved outcomes. Current research trend is merging quality prediction with control and it is transitioning from conventional, reactive methodologies to proactive quality prediction techniques. These proactive approaches allow for early detection of potential production issues, facilitating the timely remediation and minimizing the quality degradation. Therefore, predictive control over electrolytic copper quality would stand as a pivotal concern in producing cathode copper with high-purity.

There are growing appeals for predictive control over electrolytic copper quality. For instance, Zhao et al.3 utilized atomic force microscopy and image scaling analysis technique to predict the influence of current density, temperature, and leveling agent on the morphology of electrolytically produced copper. Notwithstanding, the main issue is that the production process of electrolytic copper often encompasses numerous intricate physical and chemical reactions. The interplay within these reactions bestows the copper electrolysis process with heightened nonlinearity and complexity, rendering traditional statistical methods for quality prediction and control challenging. As a result, limited literature addresses the quality issues of copper electrolysis. Moreover, it is noteworthy that publications on quality prediction have been increasingly prevalent in recent years, with an ascending trend in publication counts: 9736 (2018), 10,945 (2019), 12,312 (2020), 12,182 (2021), and 12,334 (2022)4,5,6,7. Research methodologies on product quality has been broadly categorized into (1) conventional statistical process control theories which are exemplified by classic control charts and (2) contemporary intelligent prediction and control algorithms, which are notably epitomized by artificial neural networks (ANN). Control charts exhibit efficacy in large-scale production due to vast data mean ranges, significant offsets, and operational simplicity8,9. However, the sensitivity of control charts would wane with the diminutive average or offset of production data. Intelligent algorithms were rooted in principles or mechanisms of natural phenomena or entities and then predominantly employed for earlier study on the product predictions10. Conversely, the early research initially leaned towards the traditional linear regression and conventional neural network models11. Currently, the artificial neural network methodologies garner substantial interest globally, spurring the evolution of diverse research trajectories. For instance, the widely used intelligent algorithms consist of support vector machine (SVM)12, particle swarm optimization (PSO)13, random forest (RF)14, relevance vector machine (RVM)15, and other machine learning algorithms16,17,18,19. Such advancements facilitate the effective integration of intelligent algorithms within the engineering domain (i.e., chemical engineering or metallurgical engineering), resulting in innovative avenues for industrial research. In addition, this not only augments prediction accuracy but also broadens applicability.

Numerous studies focus on the modeling and design of industrial process20,21. For instance, Zang et al.22 developed an Arrhenius model coupled with a radial basis function (RBF) neural network to forecast oxidative alterations in whole egg powder. In a distinct approach, Ma et al.23 integrated partial least squares regression analysis of water quality with morphological spatial pattern analysis data to holistically assess the effects of land-use variations and landscape patterns on basin water quality. Similarly, Wang et al.24 employed spatially adaptive machine learning models to predict water quality in Hong Kong. Artificial neural networks present a promising avenue for delving deeper into industrial processes while traditional methodologies have not yielded commendable outcomes25,26,27,28. Collectively, these investigations underscore the efficacy of neural networks for industrial parameter predictions. However, neural networks also exhibit inherent limitations including the prerequisite for predefined network structures, susceptibility to local optima, and suboptimal generalization capabilities29. Predominant quality prediction techniques often emphasize model-centric approaches, inadvertently sidelining direct influencers including production equipment, operational environment, and workforce dynamics. For addressing this issue, Zhang et al.30 amalgamated the principal component analysis (PCA) technique with the support vector machine model, devising a quality prediction framework tailored for diverse, small-batch products. Conversely, Bai et al.31 harnessed principal component analysis to distill low-dimensional data, subsequently implementing support vector machine for modeling desensitization data from China’s Tianchi Big Data Contest. Components that are not relevant to parameter estimation can be rejected by PCA32. However, the PCA-derived principal components might not yield optimal results while non-Gaussian distributions was confronted. As mentioned, moreover, the intricate physical and chemical interplay within the electrolytic process of copper often manifests profound nonlinear traits. Augmenting this paradigm, He et al.33 introduced a product quality model grounded in relevance vector machine, transforming raw input into feature-rich space via kernel functions, offering a promising framework for quality prediction and control of electrolytic copper.

As highlighted, it remains challenging to effectively integrate diverse and disjointed factors into the quality prediction model of electrolytic copper even when considering the significance of various influencing factors to enhance the predictive accuracy for electrolytic copper quality. Notably, there exists a paucity of impactful research insights on utilizing known quantitative factors in the electrolytic copper production process to mitigate data wastage and augment prediction precision. This work aims to prioritize high-quality, energy-efficient production process of electrolytic copper by conducting multi-factorial, small-batch industrial experiments. Specifically, a thorough literature analysis on copper quality prediction and control is undertaken by employing both traditional and contemporary methodologies, culminating in the formulation and establishment of a novel predictive model for copper quality. The innovation of this work resides in the pioneering identification of five primary control factors impacting electrolytic copper quality using the random forest algorithm. Moreover, a hybrid model integrating particle swarm optimization with least square support vector machine (PSO-LSSVM) is introduced for predicting electrolytic copper quality based on the nineteen associated factors. Concurrently, a hybrid model combining random forest with relevance vector machine (RF-RVM) is crafted for quality prediction using these primary control factors. Then, the interference of extraneous variables on electrolytic copper quality is minimized by discerning the effect of these main control factors as realistic as possible, laying foundational insights into the mechanisms influencing electrolytic copper quality. The capabilities of inherent inadequacies and suboptimal prediction of conventional linear regression and neural network models are addressed. The newly introduced hybrid models bolster the dependability of predictions pertaining to electrolytic copper quality. Hence, the innovative strategy for in-depth exploration of industrial site data bears significant implications for the precise control of electrolytic copper quality when the specific attributes of electrolytic copper control objects and the exigencies of production are considered.

The structure of this article unfolds as follows: Section "Method and data" elucidates methods and data, encompassing the least square support vector machine, relevance vector machine, evaluation indices, and data description. Section "Result and discussion" delves into the primary controlling factors and associated prediction models. Conclusions are drawn in section “Conclusion”.

Method and data

LSSVM

Least squares support vector machine is a modified version of the conventional support vector machine34. It offers the benefits of straightforward computation, effortless operation, rapid learning, and convenient implementation. In terms of implementation, the linear regression function \(y(x)\) of least squares support vector machine is defined by35

$$y\left(x\right)=w\cdot \varphi \left(x\right)+b,$$
(1)

where \(w\) represents the weight vector, \(\varphi (x)\) stands for the mapping function, and \(b\) signifies the offset vector. By leveraging the structural risk minimization principle, the optimization challenge of LSSVM is articulated as35

$$\left\{\begin{array}{l}\underset{w, b,e}{\mathrm{min}}J(w,e)=\frac{1}{2}{w}^{\mathrm{T}}w+\frac{1}{2}\gamma \sum_{k=1}^{N}{e}_{k}^{2}\\ {y}_{k}={w}^{\text{T}}\varphi \left({x}_{k}\right)+b+{e}_{k}\end{array},\right.$$
(2)

where \(k\) ranges from 1 to \(N\), \(\gamma\) refers to the penalty coefficient, \({e}_{k}\) refers to the error in fitness, and \(b\) refers to the threshold value. To address this issue, the Lagrange function is formulated, introducing the Lagrange multiplier \(\alpha\) such that \(\alpha \ge 0\). Then,

$$L(w,b,e,\alpha )=J(w,e)-\sum_{k=1}^{N}{\alpha }_{k}\left[{w}^{\mathrm{T}}\varphi ({x}_{k})+b+{e}_{k}-{y}_{k}\right],$$
(3)

Taking partial derivatives of the above yields and then35

$$\left\{\begin{array}{l}\begin{array}{l}\begin{array}{ccc}\frac{\partial L}{\partial w}=0& \Rightarrow & w=\sum_{k=1}^{N}{\alpha }_{k}\varphi ({x}_{k})\end{array}\\ \begin{array}{ccc}\frac{\partial L}{\partial b}=0& \Rightarrow & \sum_{k=1}^{N}{\alpha }_{k}=0\end{array}\\ \begin{array}{ccc}\frac{\partial L}{\partial {e}_{k}}=0& \Rightarrow & {\alpha }_{k}=\gamma {e}_{k}\end{array}\\ \begin{array}{ccc}\frac{\partial L}{\partial \alpha }=0& \Rightarrow & {w}^{\mathrm{T}}\end{array}\varphi ({x}_{k})+b+{e}_{k}-{y}_{k}=0\end{array}\end{array},\right.$$
(4)

where \(k\) ranges from 1 to \(N\). Then, \(w\) and \({e}_{k}\) are excluded. A kernel function is introduced by

$$K\left({x}_{m},{x}_{n}\right)={\varphi \left({x}_{m}\right)}^{\mathrm{T}}\varphi \left({x}_{n}\right),$$
(5)

where both \(m\) and \(n\) range from 1 to \(N\). This leads to the following matrix equation which is given by

$$\left[\begin{array}{cc}0& {1}^{\mathrm{T}}\\ 1& \Omega +{\gamma }^{-1}I\end{array}\right]\left[\begin{array}{c}b\\ \boldsymbol{\alpha }\end{array}\right]=\left[\begin{array}{c}0\\ y\end{array}\right],$$
(6)

where \({1}^{\mathrm{T}}=[\mathrm{1,1},\cdots ,1]\) and \(\boldsymbol{\alpha }={[{\alpha }_{1},{\alpha }_{2},\cdots ,{\alpha }_{N}]}^{\text{T}}\). In this work, the radial basis function was chosen as the kernel function, which is given by

$$K\left(x,{x}_{k}\right)=\mathrm{exp}\left[-\frac{{\left(x-{x}_{k}\right)}^{2}}{2{\sigma }^{2}}\right],$$
(7)

where \(\sigma\) refers to the width of kernel function. The LSSVM predictive model is subsequently derived by

$$y\left(x\right)=\sum_{k=1}^{N}{\alpha }_{k}K\left(x,{x}_{k}\right)+b.$$
(8)

Hence, it becomes evident that the judicious selection of parameters in the LSSVM optimization model profoundly influences the intricacy and precision of model. Consequently, both the penalty coefficient \(\gamma\) and the kernel coefficient \(\sigma\) hold significant importance.

RVM

Relevance vector machine is a relatively new approach that has not been used widely in metallurgical process. Both relevance vector machine and support vector machine could utilize the kernel functions to convert the challenge of linear inseparability in lower-dimensional space to that of linear partitioning in higher-dimensional space36,37. The salient distinction between relevance vector machine and support vector machine lies in that relevance vector machine inherits the similar decision function and the choice of kernel function is more flexible. Hence, the classification function could attain its peak on the likelihood function value of the training set. For classification of relevance vector machine, the Laplace method could be employed for impending approximation. Both the weight posterior probability \(p(w|t,\alpha )\) and the marginal likelihood function \(p(t|\alpha )\) could be derived through integration. Consequently, the classification issue of relevance vector machine could be reframed as a regression issue.

Evaluation indices

Here, the prediction results are assessed using mean absolute error (MAE) and root mean square error (RMSE). In fact, mean absolute error offers an accurate representation of prediction value discrepancies, while root mean square error quantifies the deviation between forecasted values and actual ones38. The computation for the \(j\)-th component of the electrolytic copper mass is given by38

$$\mathrm{MAE}\left(j\right)=\frac{1}{N}\sum_{k=1}^{N}\left|{y}_{j}\left(k\right)-{\widehat{y}}_{j}\left(k\right)\right|,$$
(9)

and

$$\mathrm{RMSE}\left(j\right)=\sqrt{\frac{1}{N}\sum_{k=1}^{N}{\left({y}_{j}\left(k\right)-{\widehat{y}}_{j}\left(k\right)\right)}^{2}},$$
(10)

where \({y}_{j}(k)\) denotes the actual value of the \(j\)-th component of the electrolytic copper mass for the \(k\)-th experimental instance, and \({\widehat{y}}_{j}(k)\) signifies the predicted value for the same component in the \(k\)-th experimental instance.

Data description

The used experimental data were sourced from publicly available literature. Nineteen primary factors influencing product quality were identified from the product data, with each factor comprising \(N=36\) representative test data points. An investigation based on technical standards was conducted to examine the various factors influencing the quality of electrolytic copper. The primary quality indices for electrolytic copper include anode copper periphery (X1), anode copper surface (X2), starting piece periphery (X3), starting piece surface (X4), starting piece toughness (X5), Cu content in anode copper chemical composition (X6), As content in anode copper chemical composition (X7), cell voltage (X8), current density (X9), electro-hydraulic temperature (X10), electro-hydraulic flow (X11), number of short circuits (X12), Cu content in electro-hydraulic composition (X13), H2SO4 content in electro-hydraulic composition (X14), As in electro-hydraulic composition (X15), gelatin content in additives (X16), thiourea content in additives (X17), casein content in additives (X18), and hydrochloric acid content in additives (X19). The quality of electrolytic copper, derived from both its chemical composition and physical specification indices, was deconstructed to focus specifically on its components Cu and As. The resulting quality components were defined as electrolytic copper periphery (Y1), electrolytic copper surface (Y2), electrolytic copper toughness (Y3), copper content in electrolytic copper (Y4), and arsenic content in electrolytic copper (Y5).

In order to enhance the convergence speed and accuracy of the proposed models, data normalization was first executed. Given that all data points are fixed, the min–max procedure refers to the linearly transforms for the original data, ensuring that results fall within the interval \([0, 1]\). Consequently, min–max standardization was employed for data processing, represented by the subsequent equations. For \(i=1, 2,\cdots , 19 \mathrm { ~and~ } j=1, 2, 3, 4, 5\), the transition variable are given by

$$x_{i} = \frac{{X_{i} - X_{i}^{{{\text{min}}}} }}{{X_{i}^{{{\text{max}}}} - X_{i}^{{{\text{min}}}} }},$$
(11)

and

$$y_{j} = \frac{{Y_{j} - Y_{j}^{{{\text{min}}}} }}{{Y_{j}^{{{\text{max}}}} - Y_{j}^{{{\text{min}}}} }},$$
(12)

where \({x}_{i}\) denotes the transformed factors affecting electrolytic copper quality, and \({y}_{j}\) represents the transformed quality of electrolytic copper. Additionally, \({X}_{i}\) refers to the factors impacting electrolytic copper quality prior to the transformation, and \({Y}_{j}\) indicates the quality of electrolytic copper before said transformation. \({X}_{i}^{\text{max}}\) and \({X}_{i}^{\text{min}}\) refer to the maximum and minimum, respectively, among thirty-six test data sets for the \(i\)-th quality-affecting factor of electrolytic copper. Similarly, \({Y}_{j}^{\text{max}}\) and \({Y}_{j}^{\text{min}}\) refer to the maximum and minimum, respectively, among thirty-six test data sets for the \(j\)-th mass component of electrolytic copper. Normalized box plots of the quality of electrolytic copper and its influencing factors are depicted in Figs. 1 and 2. The graphics reveal outliers for both Y4 and X6, with data values exceeding the upper and lower boundaries. The experimental data pertaining to electrolytic copper quality and its associated factors present varying medians. With the exceptions of X1, X11, X18, and X19, the data distribution is relatively uniform. Thus, from a macroscopic data perspective, the relationship between the influencing factors and the quality of electrolytic copper appears intricately complex and profoundly nonlinear.

Figure 1
figure 1

Box plot of the normalized value of five-status indicator system of electrolytic copper quality.

Figure 2
figure 2

Box plot of the normalized values of nineteen factors affecting the electrolytic copper quality.

Result and discussion

Non-linear correlation analysis

Based on relevant research findings, the quality of electrolytic copper is influenced by various factors including personnel, equipment, environment, operation, and raw materials. These elements exhibit a nonlinear relationship, mutually interacting and constraining one another, collectively determining the quality of electrolytic copper. The traditional linear statistical approach faces challenges in deciphering these multifaceted influencing factors. Notably, researchers from the Broad Institute at Harvard University introduced a robust statistical method rooted in the maximal information coefficient (MIC), highlighting significant relationships39. The values of MIC ranging from 0.90 to 1.00 signify an exceptionally high correlation, the values between 0.70 and 0.90 denote a high correlation, the values between 0.40 and 0.70 suggest a moderate correlation, the values between 0.20 and 0.40 represent a low correlation; values from 0.10 to 0.20 indicate a very low correlation, and the values less than 0.10 imply a lack of correlation. Consequently, this work employs MIC to quantify the nonlinear association among factors influencing the quality of electrolytic copper, offering valuable insights into the critical determinants for quality management of electrolytic copper.

Calculations were conducted using MIC method via the popular mathematical software program based on thirty-six sets of experimental data encompassing nineteen distinct influencing factors. The resultant data are depicted in Fig. 3. From this figure, it can be observed that the yellow area at the bottom right occupies a larger area. The yellow color indicates a strong correlation between the two factors. Specifically, the computed MIC values between X8 and X9, X13, X14, X17, X18 are 0.94, 0.92, 0.92, 0.94, 0.94 respectively. Similarly, the MIC values for X9, and X13, X4, X17, X18 are 0.94, 0.94, 0.94, 0.94, and so on. Factors X17 and X18 exhibit a MIC value of 0.94, indicating a notably high correlation (i.e., MIC values exceeding 0.90). In contrast, the MIC value between X3 and X17 stands at 0.71, while for X4 and X5 it is 0.79. Additionally, the values for X8 and X12, X15 are 0.80 and 0.81 respectively. These relationships reflect a high correlation, as MIC values range from 0.70 to below 0.90. These results emphasize that harnessing the variability characteristics of electrolytic copper data can enhance the analysis of the correlation among its quality-affecting factors. It is also found that the maximum information coefficient is apt for exploring correlations amid complex variables, exemplified by fluctuations in factors influencing electrolytic copper quality.

Figure 3
figure 3

Non-linear analysis results of factors affecting the electrolytic copper quality using maximal information coefficient.

In fact, the change of one influencing factor of electrolytic copper often leads to the change of other influencing factors of electrolytic copper in terms of the accrual production process of hydrometallurgy. Furthermore, these changes are difficult to observe during the copper electrolysis procedure. To solve this problem of electrolytic copper through the industrial testing method is not only costly but also difficult to achieve the expected object. Nevertheless, the relationship between various factors is intuitively displayed through the calculation of maximal information coefficient. Hence, the dynamic correlations among diverse influencing factors are holistically evaluated in formulating quality control protocols for electrolytic copper, minimizing the undue focus on isolated variables.

Primary influencing factors

Random forest is one of the most influential techniques in machine learning40. This method utilizes multiple decision trees to facilitate comprehensive classification, correlation analysis, prediction, and data interpretation41. In this work, the dependent variable pertains to the quality of electrolytic copper, representing the target for decision classification. Conversely, the independent variables encompass a range of factors potentially impacting the quality of electrolytic copper, such as the starting sheet quality and the chemical composition of the anode copper. These variables serve as predictors for the dependent variable. Constructing the random forest model involves the following several steps. (1) Extracting training samples from the original dataset using the Bootstrap method, subsequently establishing \(n\) trees. (2) During the tree generation procedure, variables with number of \(m\) are randomly chosen at each tree node, from which those exhibiting the highest classification efficacy are selected for data classification. (3) The data excluded during the Bootstrap extraction serves as the test sample to appraise the performance of each tree. Together, the trees with number of \(n\) constitute a random forest for data prediction.

The random forest algorithm was employed to evaluate the significance of factors impacting the variability in quality of electrolytic copper. The results are delineated in Table 1. For the electric copper periphery (Y1), according to this table, the primary influential factors include the periphery of the starting sheet (X3), the additive with thiourea (X17), the electro-hydraulic component with H2SO4 (X14), the electro-hydraulic component with Cu (X13), and the additive with casein (X18). Regarding the copper surface (Y2), the principal determinants are additives with thiourea (X17), the periphery of the starting sheet (X3), additives with casein (X18), cell voltage (X8), and electro-hydraulic components with Cu (X13). For the toughness of electrolytic copper (Y3), significant factors encompass the periphery of the starting sheet (X3), the additive with thiourea (X17), the electro-hydraulic component with H2SO4 (X14), cell voltage (X8), and the additive with casein (X18). In terms of electro copper content (Y4), the prevailing factors are the electro-hydraulic component with Cu (X13), the electro-hydraulic component with H2SO4 (X14), the number of short circuits (X12), the additive with thiourea (X17), and the additive with gelatin (X16). For arsenic in electro copper (Y5), the primary influencers are H2SO4 content (X14), Cu content (X13), current density (X9), casein content (X18), and cell voltage (X8). Subsequent investigations corroborated the nonlinear correlations deduced by maximal information coefficient, aligning with the primary determinants of electrolytic copper quality as identified by the random forest approach.

Table 1 Importance of factors affecting the electrolytic copper quality by using random forest algorithm.

Hence, the primary factors of five quality control indicators for electrolytic copper quality were obtained. Although the quality control of copper electrolysis could be achieved by studying nineteen influencing factors, the acquisition of controlling factors could greatly simplify the research process. Especially in complex industrial production processes, controlling the five primary factors could not only improve production efficiency but also quickly improve product quality probably. At the same time, the acquisition of primary factors also provides fundamental for the prediction of copper electrolytic quality.

Comparison of prediction methods

The literature details quality indices of electrolytic copper, and nineteen factors influencing this quality were compiled into a sample library. For training and testing, \({N}_{1}\)=27 groups of electrolytic copper experimental data constituted the training set, while the remaining \({N}_{2}\)=9 groups formed the testing set. For comparative study, various algorithms, namely back propagation neural network, least squares support vector machine, relevance vector machine, and support vector machine enhanced by particle swarm optimization, were employed to develop the control predictive model of electrolytic copper quality. Specifically, the back propagation neural network utilized a three-layer network structure with parameters such as a maximum iteration of 1000, a learning rate of 0.01, a training error threshold of 0.0001, a momentum factor of 0.01, a minimum performance gradient of 10–6, and a maximum failure count of 6. For least square support vector machine, the primary computational parameters comprised a kernel width of sig2 = 500, a regularization parameter of gam = 5, with the RBF kernel function selected. Utilizing these methodologies, data from thirty-six actual production instances in the publicly available dataset were modeled and predicted. Predictive outcomes are presented in Table 2. According to this table, notably, parameters for the particle swarm optimization algorithm were pre-established, achieving anticipated optimization outcomes. Furthermore, results derived from the multiple linear regression model in this table are based on linear regression equations pertaining to various attributes of electrolytic copper as sourced from public literature. All other results emanate from the four artificial intelligence algorithms introduced in this work, dedicated to predicting control of electrolytic copper quality.

Table 2 Prediction results of electrolytic copper quality by different prediction methods.

Observational data indicate that the predictive accuracy of the PSO-LSSVM model significantly surpasses other conventional artificial intelligence techniques, whether it is evaluated using mean absolute error or root mean square error. Nevertheless, the accuracy of relevance vector machine closely trails that of PSO-LSSVM, exhibiting discrepancies of 4.45% and 14.16%, respectively. Such outcomes suggest that the PSO-LSSVM prediction method is suitable for multi-variety and small-batch production forecasting, thus expanding the applicability spectrum of PSO-LSSVM within this domain of hydrometallurgical process.

Effect of steps on predicting accuracy

The quantity of data, denoted as \({N}_{2}\), utilized for forecasting the quality control status of electrolytic copper remains indeterminate. A sensitivity analysis concerning this data volume is imperative, anchored by the evaluation metrics of mean absolute error and root mean square error. Table 3 shows the influence of prediction step size on prediction of electrolytic copper quality using relevance vector machine. Table 4 shows the influence of prediction step size on prediction of electrolytic copper quality using PSO-LSSVM. As delineated in Tables 3 and 4, for training set proportions of 80%, 85%, 90%, and 95%, the corresponding sample sizes are 29, 31, 32, and 34, while for the testing set, they are 7, 5, 4, and 2, respectively. In this work, the sensitivity associated with the predicted values for electrolytic copper quality control status were investigated. Notably, as the proportion of training set samples to the complete dataset transitions from 80 to 95%, the mean absolute error and root mean square error for the relevance vector machine model of electrolytic copper quality exhibit a pattern of initial decline followed by an ascent, reaching their nadir at 90%.

Table 3 Influence of prediction step size on prediction performance of relevance vector machine for electrolytic copper quality.
Table 4 Influence of prediction step size on prediction performance of PSO-LSSVM for electrolytic copper quality.

Additionally, as the proportion of training set samples relative to the entire dataset shifts from 80 to 95%, the mean absolute error and root mean square error for the PSO-LSSVM model concerning electrolytic copper quality consistently exhibit an initial decline, followed by an increase, with the minimum values observed at 90%. In general, relevance vector machine and the PSO-RVM hybrid model are close to each other in accuracy for predicting copper electrolytic quality. The hybrid PSO-RVM model is slightly more stable than relevance vector machine in the prediction process. The proposed hybrid PSO-RVM model may be a good choice for the production process which needs to consider all the influencing factors. However, the number of factors that are input into a predictive model is not always better. However, the accuracy of the model does not increase with the number of input factors. The objective of industrial processes is to minimize the number of factors used for predicting the desired outcome.

Prediction of electrolytic copper quality

Based on the presented research findings, five primary control factors were identified among the determinants influencing electrolytic copper quality. Utilizing \({N}_{1}\)=32 groups of electrolytic copper experimental data as training sets and the remaining \({N}_{2}\)=4 groups as test sets, the RF-RVM model was developed to provide intelligent predictions for metallurgical engineering. Table 5 shows the indictors prediction of electrolytic copper quality using relevance vector machine and RF-RVM models. It can be seen from this table that the prediction accuracy of the two models is satisfactory. For instance, the maximal value of mean absolute error is 0.1352 when the relevance vector machine and the data of electrolytic copper periphery (Y1) were used. Conversely, the maximal value of root mean square error is 0.1889 when the relevance vector machine and the data of copper content in electrolytic copper (Y4) were used. However, it becomes evident that the relevance vector machine yields a higher error while evaluating the prediction outcomes using the two metrics (i.e., mean absolute error and root mean square error). The hybrid RF-RVM model demonstrates superior predictive performance compared to relevance vector machine, achieving a minimal value of 0.0427 in terms of the error index (i.e., root mean square error) and electrolytic copper periphery (Y1), less than 5%. Furthermore, the maximal value of prediction performance between relevance vector machine and RF-RVM model are 0.0925 and 0.0842 both for electrolytic copper periphery (Y1). The necessity and merit of the proposed hybrid model is clearly demonstrated by the above.

Table 5 Indictors prediction of electrolytic copper quality using relevance vector machine and RF-RVM models.

Consequently, this novel hybrid model leverages the strengths of the random forest algorithm in extracting features (i.e., five pivotal controlling factors are extracted from nineteen determining factors) of electrolytic copper quality. Specifically, it is used to filter out redundant information among the numerous influencing factors, and the issues of the complexities associated with small samples, high dimensionality, and nonlinearity in hydrometallurgy engineering are adeptly addressed. In addition, the proposed RF-RVM hybrid model not only extracts the primary factors for copper electrolytic process, but also caters to the intelligent or digital development needs of metallurgical enterprises or hydrometallurgy process. In other words, the technologies of digital metallurgical engineering are useful to advance the knowledge on data science, machine learning and computational sciences to tackle metallurgical engineering problems.

Conclusions

From an in-depth analysis of the production mechanism of copper electrolysis and the consideration of factors such as electrolyte composition and power consumption, a predictive model was established for the quality control of electrolytic copper. The primary findings are as follows. (1) The random forest algorithm effectively delineates the intricate nonlinear relationship between factors that determine electrolytic copper quality. Five pivotal controlling factors of electrolytic copper have been elucidated, further corroborated by nonlinear correlation analysis employing maximal information coefficient. (2) Given an input of all nineteen determining factors, the predictive accuracy of relevance vector machine closely parallels that of the PSO-LSSVM model, with deviations of 4.45% and 14.16% respectively. Notably, it surpasses the conventional multiple linear regression and traditional neural network models in this regard. (3) The introduction of an electrolytic copper quality prediction model, based on the RF-RVM model, yields a prediction error for the test data set that is notably smaller than the relevance vector machine, with the minimum error index registering below 5%. To sum up, in this work, the employed machine learning technique adeptly discerns the latent correlations within the electrolytic copper experimental data, diminishes computational complexity, and demonstrates potential applicability to other quality prediction challenges in various metallurgical processes.