Introduction

The pivotal role of fluid viscosity reverberates across numerous industries, with a notable emphasis on the oil and gas sector, where hydrocarbon fluids hold paramount economic and energy significance. The intricate interplay of fluid viscosity finds multifaceted applications spanning the practical and engineering realms. This includes its indispensable contribution to the calculation of pressure differentials within diverse mediums such as porous matrices and pipelines, intricate heat transfer computations, and the nuanced study of fluid rheology. Moreover, fluid viscosity assumes a central role in the meticulous design of surface equipment, the intricate simulation of reservoir (underground porous media) behavior, and the nuanced prognostication of oil recovery potential. Thus, a comprehensive grasp of fluid properties within petroleum reservoirs becomes an imperious imperative for the optimization of their production and efficacious conveyance [1]. However, the precise quantification of fluid viscosity poses a formidable challenge, marked by the reality that a singular mathematical relationship is often inadequate to encompass its multifarious expressions across diverse scenarios.

There are several classification approaches for categorizing crude oil based on its production and refining processes. As per the definition offered by the U.S. Department of Energy, heavy crude oil is distinguished by its viscosity falling within the range of 100 to 10,000 centipoises at a particular temperature within a reservoir [2]. These classifications delineate various crude oil types based on their inherent properties and are crucial for effective management and utilization within the industry.

Amidst the significance of hydrocarbon fluid viscosity and the relentless pursuit to predict it, estimating viscosity for heavy and extra-heavy fluids might arguably be one of the most intricate and specialized challenges. This complexity arises from the profound viscosity alterations associated with temperature fluctuations. Considering the potential encounters with such fluids in diverse scenarios, it becomes imperative to accurately ascertain their viscosity. The viscosity of heavy and extra-heavy oils, being a pivotal parameter, significantly impacts engineering design and simulation, spanning from production to transportation applications.

Beyond the aforementioned applications, extra-heavy crude may even be encountered within high-pressure reservoirs [3]. Given the substantial influence of viscosity on fluid movement in porous media and wellbores, its determination is crucial for assessing the production from the reservoir. Consequently, the prediction of viscosity holds paramount importance, and extensive efforts have been invested to forecast the viscosity of bitumen and heavy oils [4, 5]. These efforts encompass an array of studies [3, 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22], aiming to decipher the complex interplay of viscosity with temperature, pressure, and fluid properties in intricate reservoir conditions.

Hydrocarbon fluid viscosity has been the subject of extensive study and research over many years. While viscosity is influenced by parameters such as pressure and gas content, it initially functions as a function of crude oil’s Density and temperature [23]. A review of the research conducted in the field of viscosity prediction underscores the significance of the subject. Various researchers have considered different factors as influential on viscosity. For instance, in 2020, Safdari Shadloo and Shadloo regarded chemical composition as a determining factor [24]. In 2019, Rowane et al. not only emphasized chemical composition but also identified pressure as a contributing factor [25]. In 2021, Giwa et al. [26] highlighted temperature, while Holstein et al. in 2007 [27] pointed to pressure as the primary influencing factor on viscosity. This illustrates the intricate and multifaceted nature of viscosity’s dependency on a spectrum of interrelated factors.

Studies in the field of viscosity can be categorized into three main groups: dead oil, undersaturated oil, and saturated oil. In this research, focusing on fluids from light crude oil to extra-heavy oil, we will concentrate on relationships specifically designed for dead oil. Additionally, these relationships can be classified based on the primary influencing parameters identified by researchers. For example, various empirical relationships have been proposed to correlate viscosity with API

$$^\circ API=\frac{141.5}{Specific \,gravity}-131.5$$

gravity and temperature, such as Beal in 1946 [28], viscosity with temperature as seen in Khan, Mehrotra, and Svrcek in 1984 [29], viscosity with pressure as presented by Poling et al. in 2001 [30], and viscosity as a function of pressure.

and temperature, like the models by Appeldorn in 1963 [7], Martín-Alfonso in 2006 [16], and Alade et al. in 2016 [6]. Given the nature of extra-heavy crude or bitumen, we were focused solely on models pertinent to dead oil and extra-heavy crude in this segment.

Numerous researchers have sought to establish relationships between the viscosity of dead oil and its API gravity and temperature. In 1946, Beal proposed a viscosity relationship for dead oil [28]. In 1975, Beggs and Robinson extended this approach by presenting a relationship utilizing similar input variables, drawing upon data from 93 different oil samples [8]. In 1995, Giambattista De Ghetto et al. evaluates the accuracy of empirical correlations for estimating reservoir fluid properties when PVT data is unavailable. By using 65 heavy and extra-heavy oil samples and 1200 data points, statistical parameters like relative deviation and average error were calculated. The samples were divided into extra-heavy oils (API < 10) and heavy oils (API 10–22.3) [31]. In 1980, Glaso recognized that paraffinic and naphthenic crudes with similar API degrees exhibit disparate viscosities at a specific temperature. Glaso introduced a model, accounting for corrections to API, applicable within an API range of 20.1–48.1 and a temperature range of 10–149°C [13]. In 1987, AL-Khafaji et al. proposed an enhanced model derived from Beal’s relationship, tailored for the viscosity prediction of dead oil in the Middle East region and applicable within an API range of 15–51 [32]. Svrcek and Mehrotra introduced a one-parameter model for bitumen in 1988 [33]. In 1990, Egbogah and NG presented two distinct relationships for predicting the viscosity of dead oil. The first relationship was an improved form of the Beggs and Robinson model (1975), and in the second relationship, a “pure point” was introduced as a new parameter to derive crude oil viscosity.[34], In 1992, Labedi introduced a relationship for predicting the viscosity of light crude oil specific to one of the major oil producers in Africa, namely Libya. The proposed relationships were simplified functions of reservoir temperature, reservoir pressure, and stock-tank gravity. This model, applicable within a temperature range of 38–152°C, provided a straightforward means of estimating viscosity [35]. In 1998, Bennison’s work is motivated by the challenges in obtaining reliable viscosity measurements for live oils from viscous oil reservoirs. Correlations to estimate fluid viscosity offer a practical way for reservoir engineers to obtain preliminary values for calculation. This study evaluates different correlations using data provided by the DTI to assess their effectiveness in predicting viscosity for several heavy oil reservoirs in the North Sea [36].

Walter and Mehrotra [37] respectively, employed logarithmic relationships for dynamic and kinematic viscosity. Both relationships were calculated in logarithmic form using varying values of the constant 'c'. For Walter’s method, this constant ranged from 0.6 to 0.8, requiring smaller values for higher viscosities. Mehrotra considered the coefficient 'c' to be 0.8. It’s worth noting that the ASTM standard [38] also relies on the Walter relationship [39].

In 1940, Umstätter introduced a hyperbolic sinus argument to depict the relationship between viscosity and temperature [39]. Calculating viscosity based on this equation is not very convenient due to the complexities of working with this function and transforming the natural logarithm of viscosity into a hyperbolic sinus argument. Therefore, it might be more appropriate to utilize the inverse definition of hyperbolic functions by employing a logarithmic function for ease of computation.

In 1993, Puttagunta et al. introduced an enhancement to these equations for predicting a new set of viscosity data for various bitumen samples. These samples were related to Canadian heavy oil within a temperature range of 20 to 120°C and atmospheric pressure up to 18 MPa [3]. Other similar relationships have also been presented as correlations between viscosity, temperature, and API. For instance, Hossain in 2005 provided such a relationship applicable to APIs ranging from 10 to 22.3, offering an improvement of 3 to 50 percent compared to other equations for dead oil [40].

In 2014, a model for Kuwaiti dead crude oil was presented by Alomair et al. In this study, data related to the API and viscosity of 50 samples of dead crude oil from various regions of Kuwait’s oil fields were collected and measured. These samples had APIs in the range of 10 to 20°API, and their viscosity was determined for the temperature range of 20 to 16°C. The input data for the learning and model development section included 374 data points, and the dataset for testing the model consisted of 118 randomly selected data points. It should be noted that the model proposed by Alomair et al. is specifically tailored for the mentioned regional fluid and is not applicable to all fluids [41].

The power-law function proposed by Alade et al. in 2016 [6] is one of the empirical relationships presented for predicting the viscosity of compressed Nigerian bitumen. This relationship is formulated based on measured data within the temperature range of 85 to 150°C.

In the most recent endeavor in this field, in 2022, Bahonar et al. introduced a novel relationship for calculating oil viscosity across a wide range of pressures, from subsurface conditions to the surface, utilizing data mining algorithms. They allocated 75% of their data for training and used the remaining 25% for model validation and accuracy assessment. Employing symbolic regression analysis, they covered the entire pressure range, encompassing dead oil viscosity, bubble point oil viscosity, below and above the bubble point pressure. Ultimately, they presented their new relationship [42].

Other functions have also been proposed for predicting heavy oil viscosity at various pressures and temperatures. Among these efforts are the works of Appeldorn [7], Farobie et al. [43], Barus [44], Kartoatmodjo [15], Petrosky and Farshad [20], and Elsharkawy [12].

In recent years, Artificial Neural Network (ANN) and Machine Learning (ML) techniques have been employed for practical applications in various fields, including engineering. The exceptional capabilities and high-speed performance of these tools in approximating nonlinear functions have been utilized to derive nonlinear relationships between input and output data [45,46,47,48,49,50]. Given that the viscosity changes of heavy oil and bitumen exhibit nonlinear relationships with temperature, endeavors have been made in this domain as well. In 1998, Elsharkawy presented an ANN model for dead oil viscosity using the Radial Basis Function Neural Network Method (RBFNM), considering pressure, temperature, API, and gas specific gravity [51]. Furthermore, in 2012, Naseri et al. [52], in 2017, Adeeyo, Y.A., Saaid [53], and in 2019, Al-Amoudi et al. [54] introduced models in this context by incorporating temperature and API as input parameters.

One of the studies carried out in this field in 2019 was conducted by Alade et al. In this study, the viscosity data of two heavy oil samples labeled as X and Y, with asphaltene contents of 24.8% w/w and 18.5% w/w, were correlated with temperature and pressure. The measured viscosity values of these samples span a temperature range between 70 and 150 °C and pressures from atmospheric to 7 MPa. The ANN model presented by Alade et al. exhibits remarkable accuracy, showcasing a \({R}^{2}\) value of 0.9995 for sample X [55].

Regarding the initiatives in the field of using machine learning for viscosity prediction, the model proposed by Sinha et al. in 2020 is noteworthy. The input parameters considered in this model include molecular weight, temperature, and API. They aimed to model viscosity using various machine learning techniques such as K-Nearest Neighbor (KNN) and Kernel-based Support Vector Machine (KSVM) [56]. In 2021, Hadavimoghaddam et al. expanded on this by incorporating additional input parameters such as bubble point pressure and solution gas-oil ratio, alongside API and temperature. They employed diverse machine learning methods including RF, MLP, SRV, Lightgbm, XGBoost, and SuperLearner. Notably, in all of these methods, the \({R}^{2}\) value exceeds 0.9 [57].

Another powerful method for predicting fluid viscosity is the friction theory method, which can be referred to [58,59,60,61,62]. In this method, fluid viscosity is related to its state equations. This method requires more information about the fluid, such as the composition of its components. The advantage of the friction theory method is that viscosity can be determined with good accuracy at different pressures and temperatures. But if the information about the composition of the fluid is not available, the friction theory cannot be used.

In conclusion, fluid viscosity’s complex interplay and its essential role in multiple industries have prompted extensive research to predict its behavior. The intricate dependencies on various factors and the nonlinear relationships involved make this field both challenging and rewarding. The focus of this article lies in comprehensively reviewing the advancements and approaches toward predicting viscosity, particularly for dead oil.

In this study, a mathematical model with high accuracy has been presented for estimating the viscosity of various hydrocarbon fluids within a limited temperature range where fluid information is available. Using nonlinear regression, the viscosity of the target fluid is modeled as a function of temperature. The research investigates the viscosity changes with respect to temperature at atmospheric pressure for three heavy or fuel oil samples with APIs of 8.87, 12.92, and 15.4, and two light crude oil samples with APIs of 40.37 and 44.17, produced in Iran. The measured viscosity values are available at 243 data points within the temperature range of 10 to 180°C. Notably, a separate mathematical relationship has been proposed for each fluid sample, allowing for individual modeling. One of the significant advantages of the presented model is its ability to achieve accurate results with less than 30% of the data used as training data, eliminating the need for a large dataset. For a more comprehensive assessment of the case study samples, the study first focuses on a visual analysis of the fuel oil sample with an API of 12.92. For other samples, the performance of the model was evaluated graphically and through statistical analysis in comparison to other models. It should be noted that dead oil with API less than 10 as bitumen or extra heavy oil, with API less than 20 and more than 10 as heavy oil and also with API greater than 20 as intermediate to light are considered. Considering that in this research, the API value of two samples is more than 40, so they are considered as light crude oil. Also, three samples that have an API value of less than 20 are also considered as fuel oil.

In this article, we intend to provide a mathematical model, with limited information on fluid temperature and viscosity at a specific pressure, to provide a relationship between fluid viscosity and temperature for that specific pressure. It should be noted that in this study, the available information was for the fluid at atmospheric pressure.

Material and methods

Different developed models

To examine and compare the performance of the proposed model in this study, a group of relationships for predicting fluid viscosity has been selected. In order to formulate the temperature-dependent viscosity relationship for hydrocarbon fluids, numerous equations have been proposed. The logarithmic values of dynamic or kinematic viscosity have been expressed as a power function of absolute temperature, as shown below:

$$\text{log}\left(\mu +c\right)=a{T}^{b}$$
(1)
$$\text{log}\left(\upsilon +c\right)=a{T}^{b}$$
(2)

Walter and Mehrotra, respectively, employed logarithmic relationships (1) and (2) for kinematic and dynamic viscosity. Both equations were calculated in logarithmic form using various values of the constant 'c'. For the Walter method, this constant ranged from 0.6 to 0.8, requiring smaller values for higher viscosities. Mehrotra considered the coefficient 'c' to be 0.8 [37,38,39]. Additionally, the coefficients 'a' and 'b' in the Arrhenius-type equation can be obtained by employing two viscosities at different temperatures. The ASTM standard also relies on the Walter relationship, as expressed in Eq. (3). It’s worth noting that the power-law relationship proposed by Alade et al. exhibits notable similarities to these equations [6].

$$\text{log}\left(\text{log}\left(\upsilon +f\left(\upsilon \right)\right)\right)=a+b.\text{log}\begin{array}{c}\left(T\right)\\ \end{array}$$
$$\upsilon \ge 1.5 : f\left(\upsilon \right)=0.7$$
$$\upsilon \le 1.5 : f\left(\upsilon \right)=0.7+0.085{\left(\upsilon -1.5\right)}^{2}$$
(3)

Another form of this equation is given as (4).[39]

$$k\times \text{log}\left(\text{log}\left(\mu +0.8\right)\times k\right)=a+b.\text{log}\left(T\right)$$
$$k=0.434294$$
(4)

In 1940, Umstätter introduced a hyperbolic sinus argument to describe the relationship between viscosity and temperature as shown in Eq. (5). [39]

$$\text{arg sinh}\left(\text{ln}\left(\upsilon \right)\right)=a+b.\text{ln}\left(T\right)$$
(5)

Calculating viscosity based on Eq. (5) is not very convenient, as working with this function and transforming the natural logarithm of viscosity into a hyperbolic sinus argument can be challenging. Therefore, it might be better to use the inverse definition of hyperbolic functions with a logarithmic function for ease of computation.

$$\text{arg sinh}\left(x\right)=\text{ln}\left(x+\sqrt{{x}^{2}+1}\right)=a+b.\text{ln}\left(T\right)$$
(6)

By substituting \(x=\text{ln}\left(\upsilon \right)\), we can transform Eq. (5) into a logarithmic form as shown in Eq. (7): [39]

$$\text{arg sinh}\left(\text{ln}\left(\upsilon \right)\right)=\text{ln}\left(\text{ln}\left(\upsilon \right)+\sqrt{{\left(\text{ln}\left(\upsilon \right)\right)}^{2}+1}\right)$$
(7)

The power-law function proposed by Alade et al. [6] is one of the empirical relationships presented for predicting the viscosity of compressed Nigerian bitumen under pressure. This relationship is formulated based on measured data within the temperature range of 85 to 150 °C. It is expressed as Eq. (8), where \(\mu\) represents viscosity in cp, \(P\) is pressure in MPa, and \(T\) is temperature in °C. The constants \(\upsilon ,\theta ,\phi\) are system-specific and are calculated based on the problem’s data.

$$\mu =\upsilon {T}^{\theta }{P}^{\phi }$$
(8)

Other equations considered in this study for predicting fluid viscosity are summarized in Table 1. In all the equations presented in Table 1, temperature is expressed in degrees Celsius.

Table 1 Different Correlation which used in the research.

In this study, we focus on examining the presented models and aim to improve the estimated values of viscosity for a sample of light crude oil by introducing a model based on Newton interpolation and finding a logarithmic polynomial relationship. The collected data for this research pertains to the changes in viscosity with respect to temperature at atmospheric pressure for three fuel oil samples with APIs of 8.87, 12.92, and 15.4, as well as two light crude oil samples with APIs of 40.37 and 44.17, produced in Iran. The measured viscosity values consist of 243 data points within the temperature range of 10 to 180°C. The highest recorded viscosity value within this range is 5328.74 cp, while the lowest value is 0.29 cp.

In the fundamentals of machine learning, it’s often recommended to use around 70–80% of the data for training and the remaining 20–30% for testing when dealing with large datasets, and many viscosity studies have adhered to this practice with significantly larger datasets. However, in this research, there are only 183 data points available for all the fuel oil samples (8.87, 12.92, and 15.4 API—61 data points for each one) and 60 data points for both light crude oil samples (44.17 & 40.37 API – 30 data points for each one). Thus, considering the separate modeling of each fluid, the amount of data is limited. Nonetheless, one of the advantages of the presented model is its capability to achieve accurate results with a smaller dataset. For instance, for one of the fuel oil samples in this study, less than 23% of the data, specifically 14 data points, were used for modeling. The remaining 47 data points were utilized for validating the obtained relationship. It’s natural that a larger number of points used to derive the relationship would lead to better accuracy of the model and also result in a more complex relationship.

The viscosity values of the received samples are illustrated in Fig. 1. Due to the viscosity values at low temperatures and the significant discrepancy from the measured values at temperatures above 40°C, as well as the notable difference between the values of fuel oil and light crude oil, the measured values are presented in a semi-logarithmic format. As evident in Fig. 1, the viscosity value for the fuel oil sample with an API of 8.87 starts at 5328.74 cp at 30°C and decreases to 8.65 cp with 150°C increase in temperature, showing a decreasing trend. The viscosity values for two light crude oils with API values of 40.37 and 44.17 at 10°C are less than 4 cp, which are 3.0 cp and 1.48 cp, respectively, and are significantly lower than the other three samples.

Fig. 1
figure 1

Measured viscosity data for different crude samples of Iran.

As depicted in Fig. 1, the rate of viscosity decrease with temperature is much more pronounced at lower temperatures for the fuel oil samples compared to higher temperatures. For the light crude oil samples, Fig. 1 shows the relationship between temperature and viscosity on the semi-logarithmic scale is almost linear. It’s worth mentioning that the calculations to determine the coefficients for the presented relationships were performed using the Singular Value Decomposition (SVD—“in the Appendix A”) method. The study used the least square criterion to minimize the vertical distance between data points and the chosen curve, aiming to find the best-fitting curve for the data.

Model development

In this study, through trial and error, an attempt was made to find a suitable relationship based on the measured values. Initially, it was observed that the most effective approach for viscosity calculation was to use the Newton interpolation method to establish a reference relationship. According to this method, for a set of \(n\) data points, a polynomial of degree \(n-1\) like (18) could be written as a basis for interpolation between the points. However, despite this effort, significant discrepancies persisted between the calculated and measured values.

After extensive attempts to reduce the errors, it was discovered that employing a logarithmic scale solely for temperature, without including viscosity in the logarithmic scale, facilitated the reduction of error trends. Therefore, changing the temperature to a logarithmic base and introducing \(u=\text{ln}\left(T\right)\) was reconsidered. Despite this alteration, the error values remained considerable, indicating that this approach was not significantly effective in achieving accurate results.

$$\mu \left(u\right)={a}_{0}+{a}_{1}u+{a}_{2}{u}^{2}+\dots +{a}_{n-1}{u}^{n-1}$$
(18)

Ultimately, after numerous attempts, extensive trials, and exploring various scenarios, a breakthrough was achieved. By introducing a coefficient for temperature and observing its impact on error levels, it was found that this approach was highly effective. After countless iterations and reevaluations, the conclusion was reached that using \(u=ln\left(\frac{T}{1000}\right)\) significantly improved the results. This step marked a pivotal and effective progression towards finding a suitable interpolation relationship for viscosity, which is discussed and presented in this study. In this case, when only two data points are available and the aim is to determine viscosity values within the temperature range between these two points, the equation can be expressed as (19), where temperature is in Kelvin. It’s important to note that when selecting data points for training, the initial and final points from the provided data should always be included. This is because the proposed model is an interpolation model, and in order to interpolate across the entire temperature range, the initial and final points from the provided data must be considered as training data. Moreover, it’s natural that increasing the number of data points and decreasing the distance between them enhances the model’s accuracy. However, this also increases the complexity of the derived mathematical relationship.

$$\mu \left(T\right)={a}_{0}+{a}_{1}\text{ln}\left(\frac{T}{1000}\right)$$
(19)

In the end, this method was applied to various fluid samples, yielding remarkable results. However, it’s important to note that the proposed model, much like machine learning models and artificial neural networks (ANN), is not limited by a requirement for a high number of training data points. Achieving good and sufficient accuracy doesn’t necessarily demand that 70–80% of the data be used as training data. Additionally, similar to empirical models, this approach is not restricted to a specific range of API values or a particular region. It can be employed for all fluid samples. Nevertheless, it’s worth mentioning that this method is exclusively designed for interpolation under constant pressure and API conditions. It necessitates a minimum of two viscosity data points at the measured temperatures to formulate an interpolation relationship.

Statistical error analysis

The accuracy of the presented models was assessed using statistical parameters such as the percentage deviation of error (%AAD), the root-mean-square error (RMSE), and the coefficient of determination (\({R}^{2}\)). The formulas for these parameters are provided in Eqs. (20) to (22). Lower values for %AAD indicate a better relationship and less error in predicted values. RMSE is also a measurement similar in function to %AAD. The maximum value of \({R}^{2}\) is 1, and the higher the value of \({R}^{2}\), the better the alignment between measured and predicted data points.

$$\%ADD=\frac{100}{{N}_{d}}\sum_{i=1}^{{N}_{d}}\left|\frac{{\mu }_{exp}-{\mu }_{cal}}{{\mu }_{exp}}\right|$$
(20)
$$RSME=\sqrt{\frac{1}{{N}_{d}}\sum_{i=1}^{{N}_{d}}{\left({\mu }_{exp}-{\mu }_{cal}\right)}^{2}}$$
(21)
$${R}^{2}=1-\frac{\sum_{i=1}^{{N}_{d}}{\left({\mu }_{exp}-{\mu }_{cal}\right)}^{2}}{\sum_{i=1}^{{N}_{d}}{\left({\mu }_{avg}-{\mu }_{cal}\right)}^{2}}$$
(22)

Results and discussion

Iranian fuel oil I

In this section, we begin by examining a sample of fuel oil with an API of 12.92. Subsequently, we were proceed to calculate the coefficients for the mathematical equations proposed in this study, which require determination based on the provided data. By performing calculations using the Singular Value Decomposition (SVD) method for 14 randomly selected data points, we were determined that the coefficients for the relevant equations are presented in Table 2.

Table 2 First Investigation of Different Models for Iranian Fuel oil with API = 12.92.

The coefficients for the proposed model were presented as shown in Table 3. It’s worth noting that choosing fewer or more data points as training data is also possible. Although a good model can still be developed with a smaller number of data points as training data, using a larger number of data points enhances the accuracy and complexity of the model. The number of data points considered for each fluid sample as training and testing data, along with the degree of the algorithm presented for each sample, are provided in Table 4..

Table 3 Coefficient of present model in Table 2.
Table 4 Number of training and testing data and order of proposed model for different samples.

For better visualization and to reduce the number of charts, we have categorized the presented models based on their relative percentage errors and the necessity of evaluating coefficients into two groups. The first group includes the Walter, Umstätter, and Power Law models, which require coefficient determination and have a relative error smaller than 100%. The remaining models fall into the second group, not requiring coefficient determination and have a relative error higher than 100%.

Group I – II Models

In Fig. 2, a comparison between the predicted values by different models presented in Group I and the measured values can be observed. Approximately 40% of the testing data are registered at temperatures below 80°C, while 60% are at temperatures above 80°C. For better visualization, the chart representing the predicted values by various models has been divided into two temperature ranges: below 80°C and above 80°C, and plotted in Fig. 2-(a,b). As seen in Fig. 2-(a-b), the predicted and measured values are very close to each other, but a little attention is needed. As evident in the cross-plot provided in Fig. 2-(c), the models presented in Group I predict the measured values relatively well, but the discrepancies with the measured values are slightly more noticeable. Therefore, for a more detailed examination of errors associated with different models, we move on to investigate further.

Fig. 2
figure 2

Iranian Fuel Oil with API = 12.92 (testing data) (a) Comparison between predicted viscosity by models of Group I and measured viscosity (below 80°C), (b) Comparison between predicted viscosity by models of Group I and measured viscosity (above 80°C), (c) Validation cross-plot of predicted viscosity by models of Group I, (d) Comparison between relative error of Group I.

According to the curves representing relative errors shown in Fig. 2-(d), the maximum relative error for the Power Law, Umstätter, and Walter models are 11.24%, 17.47%, and 18.53%, respectively, which do not seem to be bad values. However, considering the definition of relative error and the fact that the maximum relative error occurs at temperatures below 45 degrees, where the measured viscosity values are large, we turn to the investigation of the maximum absolute errors. It should be noted that the maximum absolute error does not necessarily correspond to the maximum relative error. In this case, the maximum absolute errors for the Power Law, Umstätter, and Walter models are 201.83, 646.31, and 685.64 centipoise, respectively. These values are obtained at the first temperature data point, indicating the largest deviations for all three models, whereas the maximum relative error for the Power Law model occurs at a temperature of 40°C.

One of the issues that is not considered when examining the Umstätter and Walter models is that their graphs are often plotted in natural logarithmic and hyperbolic sine scales, following their respective equations. Consequently, the graph of these two models closely aligns with the graph of measured values, and the error magnitude is also calculated and examined in the same scale. It should be noted that this scale transformation only leads to a visual reduction in the apparent error magnitude, and before analyzing the errors, they should be reverted to Cartesian scales to enable a more accurate comparison.

The Power Law model presented by Alade et al. [6] also demonstrates a favorable trend and exhibits much lower error compared to the other two models, yet it still holds an error of 201.83cp. Let’s assume this fluid is flowing at a velocity of 3.6 m/s in a 16-inch diameter pipe. Even if we aim to better investigate the impact of this error magnitude by using the maximum relative and absolute errors, equivalent to 82.84cp, the Reynolds number value for this fluid, based on the measured viscosity, would be 1944. However, when using the predicted value by the Power Law model, the Reynolds number would become 2191. The Reynolds number less than 2000 indicates laminar flow and the value between 2000 and 4000 indicates transition flow [63]. So, This discrepancy would inaccurately predict the flow regime of this fluid, consequently leading to incorrect friction factor values and pressure drop calculations in the pipeline.

The performance of the models in the Group II, for predicting viscosity values of the sample with API equal to 12.92, is presented in Fig. 3. As evident, due to the significant variation between the models in predicting viscosity, the graphs in Fig. 3-(a,b) are displayed in a logarithmic scale, while Fig. 3-(c) uses a semi-logarithmic scale. Changing the scales requires extra caution, and we should not assume that all the graphs closely align with the measured values. The graphs comparing these correlations with the measured values and their relative errors are displayed in Fig. 3-(c). As evident, the predicted values in these models substantially deviate from the measured values and are unable to make accurate predictions. A model under scrutiny in this study is the one proposed by Alomair et al. This model demonstrates a maximum relative error of 51.1% for the Iranian fuel oil sample I, equivalent to an absolute error of 1890.2cp. Given that this model is based on measured values from samples of heavy oil from Kuwait, it is logical that its performance might not be satisfactory for predicting other samples. The maximum absolute error for this group is associated with the model of Hossain et al., which is 6209.04cp, corresponding to a relative error of 167.85% (Table 6). However, the \({R}^{2}\) value provided by the Hossain et al. model is better than that of the Alomair et al. model.

Fig. 3
figure 3

Iranian Fuel Oil with API = 12.92 (testing data) (a) Comparison between measured viscosity and predicted viscosity by Group II, (b) Validation cross-plot of Predicted Viscosity by Group II, (c) Comparison between relative error of Group II.

Overall, the models within this category exhibit high errors, which might be due to the fact that the development of these models is specific to fluids from particular regions.

Proposed model

We have examined various models developed for predicting viscosity in the previous section, which failed to accurately predict the viscosity values of the Iranian fuel oil sample. Only models within Group I exhibited lower relative errors compared to those in Group II; however, their maximum absolute error values were still noteworthy. In this section, let’s delve into the analysis of the proposed model. First, a set of data was randomly selected from the Fuel Oil I fluid sample, ensuring that the first and last points among the measured data are included in this set to enable interpolation across the entire range. Then, using the present model, the maximum absolute error in predicting the test data was examined for each set of training points. This analysis is shown in Fig. 4. As observed in Fig. 4, the model was evaluated by selecting anywhere from 2 data points (only the initial and final points in the temperature range) to 15 data points from the available data. The minimum acceptable value for the maximum absolute error occurred when 14 data points were selected, and a 13th-degree model was derived based on this selection. While increasing the fitted data points can improve the model’s accuracy, our goal is to evaluate the model with a maximum of 30% of the data points.

Fig. 4
figure 4

Comparison of Maximum Absolute Error in Viscosity [cP] of Test Data versus Number of Training Data for Present Model. This Comparison is done for Fuel Oil I with API = 12.92.

The graphs related to the proposed model are shown in Fig. 5. As evident in the cross-plot depicted in Fig. 5-(a), the predicted values align well with the Fit Line, indicating a good prediction of the measured values. In Fig. 5-(b), the relative error of the proposed method is displayed. Here, the error is nearly zero for temperatures below 40°C, where the viscosity values are extremely high. As the temperature increases, the relative error also rises, reaching its maximum value of 6.037% (equivalent to 0.5cp) at 165°C. Moreover, the minimum relative error is 0.0134%, occurring at 30°C, corresponding to an absolute error of 0.15cp. This is significantly negligible compared to the models in Groups I and II.

Fig. 5
figure 5

Iranian fuel oil with API = 12.92 (testing data) (a) validation cross plot of proposed model, (b) relative error of proposed model.

It’s important to note that due to the variations in measured viscosity with temperature, the maximum absolute error won’t necessarily coincide with the maximum relative error. Taking this into consideration, the maximum absolute error for the proposed model is 1.25cp in the testing data and 1.32cp in the training data.

To provide a better comparison between the results, statistical parameters related to various models can be examined in this section. The values of the statistical parameters for the training data of this fluid sample are presented in Table 5. As shown in the table, the proposed model yields an \({R}^{2}\) value close to unity, indicating significantly better performance compared to other models. Additionally, the Max. Absolute Error (\(MAE\)) and Max. Relative Error (\(MRE\)) for the proposed model are 1.32cp and 5.24%, respectively, which are much smaller compared to other models and are practically negligible. Even when compared to more recent models like Power Law, the proposed model provides much better values. The \(\%ADD\) value for the proposed model is 2.43%, nearly half of the Power Law model. The RMSE value of the proposed model is 0.866, significantly smaller than other models, indicating the accuracy of the proposed model in predicting measured values.

Table 5 Comparison of statistical parameters of the different models for predicting Iranian fuel oil viscosity (training data – API =12.92).

Therefore, as the statistical performance also indicates, the proposed model astonishingly predicts values very close to the measured values. The weaknesses of existing models in predicting this study’s fluid sample could be attributed to various reasons, such as these relationships are specific to certain fluid samples from a particular region. In general, among all the existing models in Groups I and II, the Power Law model exhibits relatively better statistical performance. However, when compared to the model proposed in this study, it still demonstrates significantly inadequate performance.

This evaluation regarding the testing data is also visible in Table 6.. As previously mentioned, and shown in Table 6., the performance of the proposed model is significantly better compared to other models. Concerning the testing data, the \({R}^{2}\) value of the proposed model is nearly unity, with RMSE and \(\text{\%}ADD\) values of 0.8726 and 2.4766%, respectively. The accuracy of the proposed model is evidently clear.

Table 6 Comparison of statistical parameters of the different models for predicting Iranian fuel oil viscosity (testing data– API = 12.92).

Among the models in Group I, the Power Law model and among the models in Group II, Hossain et al. model exhibit the best performance with \({R}^{2}\) values of 0.9946 and 0.5471, respectively. However, their corresponding Maximum Absolute Errors (\(MAE\)) are notably high, being 201.83cp and 8579.7cp, respectively, indicating poor performance in comparison to the proposed model. In general, the Power Law model demonstrates significantly better statistical performance among models in Groups I and II. Nevertheless, this performance is still not satisfactory when compared to the model presented in this study, which outperforms other models in predicting the behavior of this fluid sample.

Other samples

Given the exceptional performance of the proposed model, let us now examine its capability in predicting the behavior of four additional fluid samples discussed in this study. As indicated in Table 4., models with different degrees have been presented for each of these fluid samples, demonstrating the versatility and capability of the proposed model. Furthermore, the proposed model does not require an extensive dataset for formulation and mathematical representation. For instance, Fuel Oil II has been considered with the highest number of training data points, which accounts for less than 30% of the data for this fluid. Nonetheless, as depicted in Fig. 6-(a) for this fluid, the testing data values align well with the Fit Line, illustrating the predictive ability of the proposed relationship for viscosity estimation. This trend has been visualized for the other three samples in Fig. 6-(b-d).

Fig. 6
figure 6

(a) Validation cross-plot of predicted viscosity by proposed model for Iranian fuel oil with API = 8.87, (b) Validation cross-plot of predicted viscosity by proposed model for Iranian fuel oil with API = 15.4, (c) Validation cross-plot of predicted viscosity by proposed model for Iranian light crude oil with API = 40.37, (d) Validation cross-plot of predicted viscosity by proposed model for Iranian light crude oil with API = 44.17. .

As evident, all estimated values generated by the proposed model align closely with the Fit Line, and the \({R}^{2}\) value is nearly unity, indicating exceptional model performance. In this manner, the proposed model is capable of responding to a wide range of API values for hydrocarbon fluids and can find practical applications. Additionally, in Fig. 7, the charts depicting the relative error for these four fluid samples are presented. As can be observed, the relative error percentage of the proposed model for these four fluid samples is minimal, less than one percent, which demonstrates the accuracy and effectiveness of the proposed model.

Fig. 7
figure 7

(a) Relative Error of predicted viscosity by proposed model for Iranian fuel oil with API = 8.87, (b) Relative Error of predicted viscosity by proposed model for Iranian fuel oil with API = 15.4, (c) Relative Error of predicted viscosity by proposed model for Iranian light crude oil with API = 40.37, (d) Relative Error of predicted viscosity by proposed model for Iranian light crude oil with API = 44.17.

In light of this modeling approach, it is incorrect to calculate the statistical parameters for the entire testing data as a whole. Instead, they should be calculated and analyzed separately for each fluid. To this end, the statistical performance for two fuel oil samples has been presented in Table 7, and for two light crude oil samples, it has been presented in Table 8.

Table 7 Comparison of statistical parameters of the different models for predicting Iranian fuel oil viscosity (testing data).
Table 8 Comparison of statistical parameters of the different models for predicting Iranian light crude oil viscosity (testing data).

As evident from Table 7, the performance of the proposed model in predicting viscosity has been remarkably high. The absolute error value for the fuel oil with an API of 8.87 is as low as 0.758cp, nearly reaching 1cp. This is in stark contrast to the Group I and II models, where the lowest maximum absolute error value is 216.7cp, generated by the Power Law model. Similarly, for the fuel oil with an API of 15.4, the absolute error is a mere 0.34cp, an exceedingly negligible value in comparison to other models. Among Group I models, the Power Law model, and among Group II models, Kartoatmodjo model for API of 15.4 have shown better \({R}^{2}\) values. However, within Group II, just Hossain et al. model is applicable for an API of 8.87, with a maximum absolute error of 15,855.7 cp.

Clearly, for fuel oil, the proposed model has demonstrated exceptional performance in comparison to other models, outperforming them by a significant margin.

The statistical performance of various models examined in this study for two light crude oil samples is well presented in Table 8. As evident, the proposed model has performed exceptionally well even for light crude oil, which has significantly lower viscosity values compared to fuel oil. The results achieved from the presented model are incomparable to the outcomes of other models. Interestingly, the Umstätter and Walter models within Group I, despite being older models compared to the Power Law, provide better results for light crude oil.

The performance of Group II models for light crude oil has improved compared to their performance for fuel oil but still does not yield satisfactory results. It’s worth noting that while four models within Group II exhibit maximum absolute errors of less than 3cp, the viscosity of both fluid samples is less than 3 cp. This level of error is not favorable for viscosity prediction. For instance, the Labedi et al. model shows a maximum absolute error of 0.6 cp, occurring at an API of 44.17 and a temperature of 15°C, where the measured viscosity value is 1.38 cp, leading to a substantial relative error of 43.94%.

In conclusion, the proposed model has demonstrated remarkable performance across a wide range of hydrocarbon fluids, including both fuel oil and light crude oil, outperforming other models. The proposed model, in comparison to the models in Group II, which are based on a specific database and are constant, exhibits significantly superior performance that is not even comparable. Models within Group I show better performance compared to Group II. However, despite being determined based on laboratory values, their performance is much weaker when compared to the presented algorithm.

Consequently, based on the conducted investigations for the mentioned case study samples, the exceptional and flawless performance of the presented model in predicting each fluid sample can be confirmed.

Conclusion

In this study, a mathematical algorithm based on Newton interpolation and non-linear regression has been introduced to model the viscosity-temperature relationship for hydrocarbon fluids. To validate the accuracy of the proposed algorithm in modeling viscosity changes with temperature, 243 measured data points of viscosity and corresponding temperatures have been collected for 5 different fluid samples with varying API gravities, measured at atmospheric pressure conditions. Utilizing the proposed algorithm, a mathematical relationship has been calculated for each fluid sample. It’s worth noting that the presented algorithm is an interpolation algorithm and is not applicable for viscosity calculations outside the measured temperature range. To create a wide temperature range during modeling, the minimum and maximum temperatures in the randomly selected training data points need to be included.

In this study, the performance of this algorithm has been compared with various models used to calculate viscosity for light dead oil, heavy oil, and extra heavy oil samples. It’s worth mentioning that apart from significantly higher accuracy compared to other models, the presented model requires a relatively smaller amount of training data to achieve suitable accuracy. The implementation of the proposed algorithm on each fluid sample required a maximum of about 29.5% of the available data points, showcasing the remarkable characteristics and capabilities of this algorithm. However, a smaller or larger number of random data points from the available dataset can also be used to implement the algorithm, which will impact the accuracy of the obtained viscosity-temperature relationship.

The conducted comparisons demonstrate the significantly superior performance of the presented model in formulating the viscosity-temperature relationship for different hydrocarbon fluids, compared to other models and algorithms. The proposed algorithm can be applied to a wide range of API values for dead oil, as investigated in this study within the range of 8.87 to 44.37.

Data availability

No datasets were generated or analysed during the current study.