Estimation of the flow rate of pyrolysis gasoline, ethylene, and propylene in an industrial olefin plant using machine learning approaches

Light olefins, as the backbone of the chemical and petrochemical industries, are produced mainly via steam cracking route. Prediction the of effects of operating variables on the product yield distribution through the mechanistic approaches is complex and requires long time. While increasing in the industrial automation and the availability of the high throughput data, the machine learning approaches have gained much attention due to the simplicity and less required computational efforts. In this study, the potential capability of four powerful machine learning models, i.e., Multilayer perceptron (MLP) neural network, adaptive boosting-support vector regression (AdaBoost-SVR), recurrent neural network (RNN), and deep belief network (DBN) was investigated to predict the product distribution of an olefin plant in industrial scale. In this regard, an extensive data set including 1184 actual data points were gathered during four successive years under various practical conditions. 24 varying independent parameters, including flow rates of different feedstock, numbers of active furnaces, and coil outlet temperatures, were chosen as the input variables of the models and the outputs were the flow rates of the main products, i.e., pyrolysis gasoline, ethylene, and propylene. The accuracy of the models was assessed by different statistical techniques. Based on the obtained results, the RNN model accurately predicted the main product flow rates with average absolute percent relative error (AAPRE) and determination coefficient (R2) values of 1.94% and 0.97, 1.29% and 0.99, 0.70% and 0.99 for pyrolysis gasoline, propylene, and ethylene, respectively. The influence of the various parameters on the products flow rate (estimated by the RNN model) was studied by the relevancy factor calculation. Accordingly, the number of furnaces in service and the flow rates of some feedstock had more positive impacts on the outputs. In addition, the effects of different operating conditions on the propylene/ethylene (P/E) ratio as a cracking severity factor were also discussed. This research proved that intelligent approaches, despite being simple and straightforward, can predict complex unit performance. Thus, they can be efficiently utilized to control and optimize different industrial-scale units.


The olefin plant description
The simulation was carried out for the olefin plant in Jam petrochemical company (JPC) located in Pars special economic energy zone as an industrial case study.Different sections of the olefin plant were briefly described and schematically presented in Fig. 1.At the start of the olefin plant, the section of furnaces is located.Based on feedstock type, the furnaces are classified into liquid, gas, and dual furnaces.The feedstock comprises four streams provided from battery limit (BL) and the other three streams recycled from down streams.At the end of thermal cracking, the product is directed to the quench tower, where the cracked gas is cooled rapidly to stop the undesired coke formation reactions.The recovery of different products is achieved by condensation and then demethanization process.The stream from the top of the de-methanizer flows to the pressure swing adsorption unit, where an H 2 stream with ultra-high purity is produced.The flow from the bottom of the demethanizer is transferred to the de-ethanizer section to separate ethane and ethylene.C 3 and C 4 cut fractionation is continuously carried out in the de-propanizer and de-butanizer, respectively.The final products of the olefin unit are listed as ethylene, propylene, and PYG.However, different intermediates can be produced due to the complex reaction networks.

The theory of ML models
Multilayer perceptron (MLP) neural network.MLP is one of the most frequently used artificial intelligent approaches developed in the 1980s.A typical MLP model is composed of different layers; one input layer, one or more hidden layers, and at the end, one output layer 25 .The hidden layer(s) are responsible for establishing the connection between input and output layers using different transfer functions 26 .Table 1 lists different transfer functions widely used in different layers with their mathematical equations.Tangent sigmoid (tansig) and log-sigmoid (logsig) transfer functions are normally employed for hidden layers whenever the purelin, as a linear function, is employed for the output layer.In MLP, each layer possesses a determined number of neurons.
The number of input layer neurons is equal to the input parameters.However, the number of hidden layers and the number of neurons in each hidden layer are tuned to estimate the experimental data with the best accuracy 27 .
While the output layer typically includes only one neuron.The neurons of each layer are connected by tunable parameters named biases and weights.The relationship among input and output parameters is determined using training of the network, which is a process for determining weights and biases to minimize the errors between model-predicted and real experimental values 28 .Different optimization methods such as Levenberg-Marquardt (LM) can be used in network training.The output value of an MLP model with two hidden layers is calculated based on Eq. (1), in which tansig and logsig are transfer functions of the first and second hidden layers, respectively, and purelin belongs to the output layer 26 .
where w, b, and x denote weights, biases, and input data vector.Logsig, tansig, and purelin are activation functions.

Adaptive boosting (AdaBoost).
Freund and Schapire 29,30 invented AdaBoost, a reasonably popular approach.The AdaBoost approach is based on gradually integrating multiple weak classifiers, learning from each other's flaws to create a more accurate model.The AdaBoost method's basic phases are shown in Fig. 2.
The theory of the AdaBoost model is described as follows 29 : • Given a weight to every data point.All data sets have the same weighted sample in the first step: w j = 1 n , j = 1, 2, . . .., n , where n represents the number of training data.• Using weights to provide training data set to the weak learner Wl i (x) and computing the weighted error: That i is (1 to N), and N declares the total number of weak learners.
• From 1 to N, define the weights for all weak learners by Eq. ( 3): (1) Table 1.Frequently used transfer functions.

Transfer function Mathematical expression
tansig or tanh • Changing all of the weak learner's dataset weights from 1 to N.
• Assigning a weak learner to the data test (x) as a result.
The support vector regression (SVR) was the weak learner in this work's AdaBoost system (Fig. 2).

Support vector machine (SVM).
Support vector machine (SVM) is a type of ML model that may be used for clustering (SVC) and regression (SVR) 31 .SVR has already received a lot of interest because of its ability to represent complicated structures.The basic principle of the SVR model is concisely given in this paper 31,32 .
In the case of a data set x i , y i in which x ∈ R d is the d-dimensional input data and y ∈ R becomes an output vector, and the SVR model gives a function f(x) for each data set to estimate the output value as Eq. ( 4) 20 .
Weight vector, bias, and the non-linear function are denoted by w, b, and φ(x) , accordingly.The φ(x) function maps x from low dimensional space into n-dimensional feature space.Vapnik created an improvement approach to achieve the good values of the w and b vectors 33,34 .
ζ + j and ζ − j indicate the lower and higher excess variations, correspondingly.C is a regularization parameter that assigns the variance from ε and has a positive value.The following restricted optimization issue is converted to a dual function using Lagrange multipliers, yielding the following final solution 20 : K(x k , x l ) is the kernel function, a k and a * k are the Lagrange multipliers with the 0k and kC restrictions, accordingly.

Recurrent neural network (RNN).
Deep learning approaches outperformed statistical types of ML methods, mainly when modeling non-linear situations.There are several forms of deep learning systems, with RNN being one of the most used 35 .The main difference between a basic RNN and a simple ANN approach is that the former permits hidden layer output to be utilized as an input.Figure 3 depicts a basic ANN approach with a hidden layer.
Figure 4 also depicts a typical node of the hidden layer in the ANN and RNN models.The hidden layer nodes in the RNN and ANN systems behave differently, as shown in Fig. 4. The RNN comprises a unit delay (D) via which the hidden layer's output is delivered back to the neurons.
The followings are the RNN model's fundamental equations 35 : (3) h 1t varies as a function of h t−1 and X t , as shown in Eqs. ( 8) and (9).h_t: represents the hidden state at time step t.It captures the information from the previous time step and influences the current prediction.h_1t: denotes the hidden state at time step t in the first recurrent layer.RNNs can have multiple recurrent layers, and this variable refers to the first layer's hidden state.X_t: refers to the input at time step t.It could be a single data point or a sequence of data points, depending on the nature of the RNN application.

Restricted Boltzmann machine (RBM).
The RBM is one of the several types of Boltzmann machines.
The RBM architecture is depicted in Fig. 5.As shown, the RBM is a neural network with two layers.The visible layer is the first, while the hidden layer is the second.Each layer is made up of neurons that are represented by circles.The visible layer's circles are linked to the hidden layer via arrows, but the neurons in the same layer are not.The rules controlling the RBM model's inner neurons, on the other hand, are predicated on the bipartite graph.The RBM method is predicated on the training of a statistical distribution between several inputs 36 .
A predetermined energy function (given in Eq. ( 10)) may be used to establish the probability distribution, which should be reduced to achieve the proper value of the hyperparameters 36 .
Consequently, the activation of neurons in the hidden layer is independent of the visible layer's activation and vice versa.Equation (11) expresses the conditional probability of exposed neurons in terms of the order of hidden nodes 36 .The activation probability as Eq. ( 12) may be used to generate the jth hidden layer neuron.By using the activation probability of Eq. ( 13), the hidden layer could also reach the ith visible neuron 36 .
As a result, the RBM requires the determination of three variables: W, a, and b 36 .h: represents the hidden units in the RBM.These units capture latent features and interact with the visible units to learn patterns in the data.v: refers to the visible units in the RBM.These units are directly connected to the data and represent the observed variables.a, b: these are bias terms for the visible (a) and hidden (b) units.They allow the model to shift the activation threshold for the units.W: denotes the weight matrix that defines the connections and strengths between visible and hidden units.It plays a crucial role in learning the underlying data distribution.

Deep belief network (DBN).
Hinton 37 was the first to suggest the DBN approach.This method is a graphical model that may be applied to various applications, including classification, image recognition, and collaborative filtering 38 .The DBN model comprises numerous layers of RBMs, each of which can be used as the visible layer for the following one.These layers provide the function of a feature extractor.Finally, the model's forecasts are created in a logistic regression layer.Figure 6 depicts a schematic representation of the DBN model.The lines in this diagram indicate the 1-alternating sampling, 2-layer-wise pre-training, and 3-fine-tuning processes, accordingly.An asymmetric weight matrix connects the first RBM's exhibited visible layer with a concealed layer.The neurons in the first hidden layer are separate from the visible neurons.As a result, sampling a vector containing hidden neurons from a subsequent distribution (P(v|h)) and (P(h|v)) is easy.As a result, the alternating sampling procedure produces a learning feature that indicates the discrepancy between the seemed neurons in the visible layer and the first hidden layer at the start and conclusion of the sampling phase 39 .
To find intrinsic characteristics in the input data and to train the first RBM, the alternation sampling procedure is used.After training the first RBM, the output from the first hidden layer is used as the input for the second hidden layer.As a result, these two hidden layers are referred to as the second RBM, and they may be pre-trained using the alternating sampling method.The greedy layer-wise pre-training method is used to learn the nth hidden layer as a new RBM using this progressive training technique.Overfitting may be avoided by using a greedy layer-wise pre-training method and alternate sampling 40 .Finally, utilizing the fine-tuning method, the settings of the entire DBN are somewhat optimized.

Model development
Data assembling.The aim of this study includes assaying the ability of different robust ML methods for simulating an industrial-scale of the olefin plant.For this purpose, the actual operating data of JPC, the 1184 data sets collected during successive four years, was utilized for developing DBN, RNN, MLP, and AdaBoost-SVR models.In this regard, the number of liquid and gas furnaces in service, coil outlet temperature of different furnaces, and mass flow rates of different feedstock were considered independent variables.These 24 independent variables were implemented by different ML models as input data to estimate the flow rates of ethylene, propylene, and PYG as the output of the models.During 1184 days, the mean value of each 24 parameters (i.e., input variables) was recorded every day.The statistical information of the dataset gathered in this work is tabulated in Table 2.More details about the actual operating data are provided in Supplementary Information.Python is used for the implementation of different models.The training procedure of the models was carried out using 85% of ( 12)  www.nature.com/scientificreports/ the dataset, denoted as the train subset, and the ability of the developed model was assessed using the remaining 15% of the dataset, called the test subset.In this study, the optimization of hyperparameters for the models was conducted through the implementation of grid search on the dataset.The selection of hyperparameters was carefully considered to enhance the model's performance.Additionally, to prevent potential overfitting issues during the training phase, a rigorous approach was adopted, involving the utilization of a sixfold cross-validation technique.This enabled the dataset to be divided into six subsets, allowing each subset to serve as both training and validation data iteratively.By employing this well-established validation method, the models' generalization capabilities were effectively assessed, enhancing the reliability of the results obtained.
Each model's grid search hyperparameters were different, and the importance of the hyperparameters was determined by theoretical and practical considerations.For each model, the following hyperparameters were employed.Table 3 shows the optimal values of the major hyperparameters, as well as the search intervals for the hyperparameters set for the machine learning models used in this study.
Detection of the outliers.The reliability and accuracy of the models are highly dependent on the applicability of the actual employed data.Always some imprecise measurements can be found in the recorded data points, which can drastically affect the accuracy of the developed models as a threat.Therefore, to develop a reliable model, these outliers should be detected and eliminated from the recorded data set.In this study, the Leverage method [41][42][43] was used for finding outliers, and the Hat matrix of the Leverage method was constructed as Eq. ( 14) 20, 44 : In Eq. ( 14), X is a matrix with dimensions of p × q, where p is the number of data points, and q is the number of independent variables.The diagonal elements of the Hat matrix denote the hat value of data.Outliers can be identified by William's plot that the normalized residuals are illustrated with respect to the hat value.The warning Leverage value (H * ), which can be calculated by Eq. ( 15), is also shown on William's plot 19,[41][42][43] .
Assessment of the accuracy.The capability of the developed models is assessed using different statistical techniques, which are described bellows 45,46 : The average absolute error between the model-predicted and the target value is calculated by the average absolute percent relative error (AAPRE) using Eq. ( 16): The root mean square error (RMSE) was used to determine the error dispersion.Equation (17) shows the mathematical equation of RMSE: www.nature.com/scientificreports/ The data dispersion is indicated by Standard deviation error (STD) using Eq. ( 18).The lower the STD value, the lower is data dispersion.
Another parameter for assessment of the model accuracy is the coefficient of determination R 2 .The closer to 1, the better predicting experimental data by R 2 value.The R 2 value can be calculated using Eq. ( 19): In Eqs.(16-19), O is the flow rates of the produced PYG, propylene, and ethylene.n depicts the number of industrial data.and 6.It can be seen that for all three products, the RNN model indicated the best performance with the highest R 2 and the least AAPRE, RMSE, and STD for the total data set.In addition, the RNN model exhibited excellent performance in the training and testing stages.Based on Tables 4, 5 and 6, the best accuracy obtained from the RNN model (i.e., total AAPRE of 0.7%) is related to ethylene.The minimum accuracy is assigned to PYG with a total AAPRE of 1.9%.However, these low levels of AAPRE confirmed that the developed RNN model in this work was a reliable model for predicting the flow rates of the main products on the industrial scale of the olefin unit.
Error distribution for both train and test data sets obtained by different developed models is plotted in Figs. 10, 11 and 12. Error fluctuations on both sides of the zero line indicate that all models have provided acceptable accuracy for all three main products.The slight deviations of the RNN model from the zero line indicated the greater accuracy of this model, especially for PYG and propylene estimation.The cumulative frequency of data points regarding the absolute relative error is illustrated in Fig. 13.In these diagrams, any model closer to the vertical axis has higher accuracy in the prediction of experimental data.Based on Fig. 13, for ethylene prediction using the RNN model, approximately 85% of data points have a relative error smaller than 2%.While for example, when the MLP model was employed, only less than 25% of the data predicted for ethylene had a   www.nature.com/scientificreports/relative error lower than 2%.Accordingly, the RNN model presented higher accuracy with respect to the other models for predicting PYG and propylene.

Applicability of the range of RNN model.
To evaluate the applicability domain of the developed RNN model, William's plots for PYG, propylene, and ethylene are illustrated in Fig. 14.Based on the Leverage method 41-43 , the cut-off values are recommended as ±3 of the normalized residuals on the y-axis and 0 to H * on the x-axis.The data dispersed within these boundaries are considered valid data 47 .It can be seen in Fig. 14 that for all three main products, the majority of data points existed within the acceptable domain, indicating that the developed RNN model was statistically accurate and reliable.However, a few points of the data set were out of the specified range; accordingly, these points were considered doubtful data.In this study, an industrial-scale olefin plant was simulated over four years.The existence of some outliers is inevitable.Some of them are indeed related to the complexity of the plant.In addition, some inaccurate data may have been recorded during unexpected situations.Since the suspected data was insignificant compared to the entire data set, it can be concluded that both the actual data of the plant and the developed RNN model were accurate and reliable.

Sensitivity analysis.
The sensitivity analysis was performed to evaluate the extent of the effects of all input variables on the PYG, propylene, and ethylene flow rates predicted using the RNN model.The measure of r (relevancy factor) determines the magnitude of the effect of each input variable on the main products flow rates as the model outputs 48 .r can be negative or positive.The positive value of r for one input variable indicates the direct interaction between output and that input parameter.On the other hand, the negative r confirms the inverse relationship 49 .The greater the absolute value of r for an input variable, the more significant the impact of that parameter on the model output 50 .Relevancy factor (r) is calculated using Eq. ( 20) 19 : where ω j and ω represent the jth and the mean value of the predicted products flow rate, respectively.I ij and I i denote the jth and the mean value of the i-th input parameters, respectively.n is the number of data sets.
The calculated relative importance of all the inputs on the PYG, propylene, and ethylene flow rates based on the RNN model is shown in Fig. 15.As mentioned earlier, the input variables were the flow rates of different feedstock, numbers of active furnaces, and coil outlet temperature of furnaces.Among them, the numbers of operational furnaces and flow rates of some feedstock, especially Light ends, affected more significantly the main product flow rates.While coil outlet temperature represented negligible effects on the model outputs.As seen in Fig. 15, the number of liquid furnaces in service, as well as the flow rates of Light ends and C 9+ cut, were input variables that exhibited the most positive effect on the produced PYG flow rate.This means that any increase in these special inputs would increase the amount of PYG.For propylene flow rate, the inputs with the most positive effects were also the amount of active liquid furnaces and Light ends flow rate.At the same time, ethylene flow rate was highly influenced by the number of gas furnaces.Whereas the temperature of the F-101 furnace and C 5+ flow rate inversely affected all three products' flow rates, their influences were minor.Therefore, as the amount of these particular inputs increases, the amount of PYG, ethylene, and propylene decreases slightly.The propylene flow rate decreased a little by increasing the temperature of the F-106 furnace.
Effects of the operating conditions on the cracking severity factor.Understanding the effects of different operating conditions on the cracking process is important to ensure optimum reaction conditions.In the cracking processes, cracking severity factors characterize the operating conditions to maximize main product yields and minimize operational costs 51 .The cracking severity factors determine the extent of the breaking down of the feed molecules and the distribution of the products.These factors, as independent indices from the reactor scale, allow one to investigate the effects of different operating conditions and maintain the desired product distribution.There are various practical definitions for cracking severity indices 52 .One of the most currently used indices is the propylene to ethylene (P/E) ratio 51 .In industrial practices, the target value of the P/E ratio is around 0.55 to avoid over-and under-cracking 53 .Higher values of P/E ratios indicate the under-cracking and production of high amounts of liquid by-products.On the other hand, lower values of P/E ratios correlate with the increased production of coke.
In this study, the predicted P/E ratios using the RNN model at different operating conditions are compared with experimental values in Fig. 16.The operating conditions were the number of active liquid and gas furnaces, the outlet temperature of F-101 and F-108 furnaces, and the flow rates of Light ends and LPG selected as the most effective operating parameters based on the sensitivity analysis.As shown in Fig. 16, the experimental values of P/E ratios dispersed in the range of 0.15-0.35 in different conditions, resulting in high coke formation.However, the developed RNN model predicted the experimental P/E ratios with high accuracy at different operating conditions.As shown in Fig. 16a, b, the P/E ratios gradually increased by increasing the number of operational liquid furnaces.While increasing the number of active gas furnaces favored the ethylene yield.The feed composition in the cracking processes is one of the crucial factors affecting product distribution and has been highly investigated by different researchers 54,55 .Light olefins are produced mainly from alkanes, the higher amounts of n-alkanes result in a higher ethylene yield.By changing the feedstock from alkanes to liquid compounds, cracking has not occurred much, and large quantities of liquid products such as PYG are produced.
The effects of feed composition on the P/E ratios are also illustrated in Fig. 16e at different Light ends flow rates.From Fig. 16e, P/E ratios increase slightly with elevating Light ends flow rates.This is also primarily due to the higher concentration of liquid feedstock, causing under-cracking and producing more liquid products 52 .As seen in Fig. 16f, LPG flow rates exhibited negligible effects on the P/E ratios.This observation confirmed that referring solely to the feedstock composition is not sufficient for controlling the P/E ratios.To keep the process favorable towards the light olefin production, other important factors such as the temperature of furnaces need to be justified.The effects of the coil outlet temperature of F-102 and F-108 furnaces on the P/E ratios are indicated in Fig. 16c, d.The temperature of these furnaces was not changed much and was kept in the range of 835-850 °C.Since cracking reactions have endothermic nature, the temperature of furnaces required to be maintained as high as about 850 °C.Beyond the mentioned range, side-reactions are accelerated, and different by-products are formed.Higher temperature favors the coke formation inside the reactor tube, decreasing the furnace lifetime 54 .In addition, it was reported that the higher temperature causes the reaction of ethane and ethylene with other components leading to the production of more methane and other by-products 52 .At lower temperatures, the yields of secondary products increase, owning the acceleration of oligomerization reactions 56 .As a result, it is essential to employ the optimum temperature for furnaces to prevent the formation of by-products and maximize ethylene and propylene formation.
As mentioned above, the effects of operating conditions on the product distribution of cracking could be well studied by cracking severity factors.The analysis of severity factors will enable engineers to evaluate cracking performance from different points of view to ensure optimum olefin yields.This study confirmed that the severity factors can be well predicted by the ML approaches, and these models could be used as powerful and robust alternatives for expensive and time-consuming experimental investigations to control and optimizing of industrial plants.
It should be pointed out here that the proposed RNN model should be coupled with some optimization algorithms such as genetic algorithm, particle swarm optimization, etc. to optimize the real JPC plan.This could  be the objective of our future research.Indeed, after finding the optimal conditions, we should perform those conditions in the real JPC plant ensuring the good performance of the optimization algorithm.

Conclusions
In this study, the ability of four powerful ML models, namely MLP, AdaBoost-SVR, RNN, and DBN, was evaluated to predict the main products flow rates of an industrial-scale olefin plant using actual data gathered over four years.The flow rates of different feedstock, the coil outlet temperature of different furnaces, and the number of liquid and gas furnaces in service were used as input variables to forecast ethylene, propylene, and PYG flow rates.Among the developed models, the RNN was the most accurate one, which can predict the output flow rates with AAPRE of 0.7% to 1.9%.The extent of the effects of different input variables on the products was studied using the relevancy factor.In addition, the P/E ratio as a cracking severity factor was employed to investigate the effects of different operating conditions on the product distribution.This study proved that these simple and cost-effective models could be beneficially employed for studying and simulation complex and industrial-scale real plants in replacing costly and time-consuming experimental work (Supplementary Information). https://doi.org/10.1038/s41598-023-41273-4

Figure 1 .
Figure 1.Schematic representation of different sections of the olefin plant of JPC.

Figure 2 .
Figure 2. Schematically representation of the developed Adaboost model.

Figure 3 .
Figure 3. Schematic representation of the simple ANN with one hidden layer.

Figure 4 .
Figure 4. Connections around a hidden layer node in (a) ANN and (b) RNN.

2 Figure 7 .
Figure 7. Crossplots of the proposed ML models for PYG in this study.

Figure 8 .
Figure 8. Crossplots of the proposed ML models for propylene in this study.

Figure 9 .
Figure 9. Crossplots of the proposed ML models for ethylene in this study.

Figure 10 .
Figure 10.Error distribution plots of ML models for PYG training and test sets.

Figure 11 .
Figure 11.Error distribution plots of ML models for propylene training and test sets.

Figure 12 .
Figure 12.Error distribution plots of ML models for ethylene training and test sets.

Table 2 .
Statistical details of the dataset gathered in this paper.

Table 3 .
Optimal features for implemented models.

Table 4 .
Calculated statistical criteria for the developed models for PYG.

Table 5 .
Calculated statistical criteria for the developed models for propylene.

Table 6 .
Calculated statistical criteria for the developed models for ethylene.