Abstract
Accurately estimating Battery State of Charge (SOC) is essential for safe and optimal electric vehicle operation. This paper presents a comparative assessment of multiple machine learning regression algorithms including Support Vector Machine, Neural Network, Ensemble Method, and Gaussian Process Regression for modelling the complex relationship between real-time driving data and battery SOC. The models are trained and tested on extensive field data collected from diverse drivers across varying conditions. Statistical performance metrics evaluate the SOC prediction accuracy on the test set. Gaussian process regression demonstrates superior precision surpassing the other techniques with the lowest errors. Case studies analyse model competence in mimicking actual battery charge/discharge characteristics responding to changing drivers, temperatures, and drive cycles. The research provides a reliable data-driven framework leveraging advanced analytics for precise real-time SOC monitoring to enhance battery management.
Similar content being viewed by others
Introduction
Due to the severe consequences of climate change, environmentally favourable initiatives to reduce greenhouse gas emissions from fossil fuel combustion are urgently needed1. In 2023, transportation accounted for about 24% of global CO2 emissions from fuel combustion2. Electric vehicles (EVs), which produce fewer emissions than conventional petrol or diesel vehicles3, are seen as a potential solution. Given the urgency of climate change, precise State of Charge (SOC) estimation becomes crucial for optimizing EV performance and maximizing their environmental benefits. The choice of battery technology critically impacts EV performance. Lead-acid, nickel-metal hydride (NiMH), and lithium-ion batteries have been considered for EV applications4. Lithium-ion batteries (LIB) offer higher energy density, lower self-discharge, higher cell voltage, and longer cycle life compared to the other batteries5. LIBs also possess low weight, wide temperature range, no memory effect, and eco-friendliness6. The Characteristic comparison of various lithium-ion batteries is shown in Table 1. Though LIB has many advantage compared to other batteries, still battery safety, reliability, efficiency, and sustainability remain critical challenges for EVs7. LIBs face significant challenges, including potential failures leading to thermal runaway and accidents, raising safety concerns8,9,10.
In order to secure safe and reliable operation of batteries, the battery management system (BMS) is highly predominant. BMS is defined as the electronic circuit that includes both hardware and software systems that collects the battery data, monitor its operation and controls its functions. The major functions of BMS includes, prediction of SOH, SOC, SOE, SOP, thermal management, and cell balancing. The structure of BMS is shown in Fig. 1. Among all the functions, SOC prediction plays the crucial role. SOC of the battery is defined as the amount of charge present in the battery to its nominal capacity under the same operating environment. Determining battery SOC is critical for EV battery management systems to ensure safe, reliable, and efficient operation13. However, precisely and adaptively estimating SOC under dynamic driving conditions remains an enormous challenge14. Better State of Charge (SOC) estimates of Lithium Ion Batteries (LIBs) used in Electric Vehicles (EVs) might minimise greenhouse gas emissions in several ways:
Enhanced range and efficiency
Accurate battery charge information enables drivers to optimise driving patterns and avoid unexpected power outages. This can minimise EV drivers’ range anxiety and encourage more people to switch to EVs.
Reduced regenerative braking
This technique transforms kinetic energy from the car’s motion into electrical energy for battery recharge. At high speeds, excessive regenerative braking is inefficient. By knowing the SOC better, the regenerative braking system may be optimised to capture energy efficiently without squandering it.
Enhanced battery life
Accurate SOC estimation enables improved battery management, reducing deep discharges and high temperatures. This can lengthen battery life, decreasing replacements and the environmental effect of battery production and disposal.
Optimized charging
Accurate SOC enables optimised charging strategies. Avoiding unnecessary battery topping off can increase battery health and efficiency. EV charging during off-peak hours, when the electricity grid may have more renewable energy, can further lower its carbon footprint.
Numerous methods has been proposed by the researchers for the prediction of SOC of LIB15. Classical SOC determination techniques include Coulomb Counting and Open Circuit Voltage method. Coulomb counting simply integrates cell current over time to calculate SOC based on known initial state and rated capacity16. However, error accumulation from current measurement and capacity deviation degrades its fidelity over operating life17. The OCV methods are used in predicting SOC of batteries used in laboratories only since they requirement of high battery resting time. Later model based approaches (Electric Circuit Model, Electro chemical model) observer (Sliding Mode Observer, Luenberger Observer, Non liner Observer) and filter (Kalman, particle, H infinity) based approaches has been proposed. The major drawback of model based estimation is they are affected by the aging of the battery and environmental conditions. More complex equivalent circuit modelling aims to capture cell impedance changes but still demonstrates simplistic assumptions of linear or simplified nonlinear battery behaviors, unable to mimic intricacies under varying loads, temperatures, and lifetimes18,19. The primary limitation of filter and observer based methods is its reliance on hardware setup and the impact of parameter variations on the accuracy of similar models. The estimation technique and battery model are unable to achieve sufficient accuracy in SOC estimation due to non-linearity, inaccurate modelling, environmental effects, and ageing factors. Several academics have suggested utilising machine learning to predict SOC in order to address these problems. Data-driven solutions that extract insights purely from sensor measurements have shown vast promise in transportation electrification owing to the proliferation of automotive sensors providing substantial telemetry data related to cell voltages, currents, and temperatures across drive cycles20. By directly modelling machine learning systems using input/output data, their dependence on the battery model is reduced, allowing them to be applied to any battery regardless of its technology. Recent research has actively explored more sophisticated machine learning techniques leveraging their powerful function approximation capabilities to accurately reconstruct the implicit relationship between multivariate time-series driving signals and battery internal states like SOC by mining data correlations21. Unlike physics-based models, machine learning algorithms automatically learn the complex electrochemical mappings from data, eliminating difficult-to-measure or unidentified cell dynamics that introduce modelling deficiencies22. The comparison of various SOC estimation methods using data-driven methods is shown in Table 2.
Support vector machines are among the most widely investigated machine learning strategies for SOC estimation given their proficiency in resolving nonlinear patterns and noise tolerance stemming from kernel-based function transformation33. Various neural network architectures have also been evaluated given their theoretical potential to approximate virtually any nonlinear system arbitrarily well34. Their layered hierarchical feature transformations provide representational power to capture subtle nuances and adapt to evolving cell characteristics. Comparatively, Gaussian process regression offers a probabilistic Bayesian perspective for uncertainty quantification along with SOC prediction through kernel-based covariance modelling to inform reliability35.
While data-driven approaches have demonstrated promising outcomes in enhancing SOC accuracy, they also possess several limitations. The SOC prediction effectiveness is determined by a number of data-driven factors, such as the training procedure, input selection, and tuning of hyperparameters. The computational complexity of data-driven algorithms can be decreased, as well as the problems of data under fitting and overfitting, by carefully choosing the hyperparameters36. The hyperparameters and functions that data-driven approaches are often trained with are typically chosen through a time-consuming and inefficient trial-and-error procedure. Consequently, a great deal of investigation is needed to find a workable framework and optimise the hyperparameters of data-driven algorithms to achieve accurate SOC estimates. While the hybrid models have shown encouraging results, there is still room for development as these models have only taken into account a portion of the variables that could influence predictions and have not taken environmental factors into account. In37, the effect of auxiliary loads like heating and air conditioning should have been taken into account. Numerous studies38 discovered that an electric vehicle’s energy consumption is influenced by various factors, including road condition, traffic and environmental condition. Therefore, for a correct calculation of SOC, the impact of all these elements must be considered.
The Key contribution of the proposed article is:
-
Estimate SOC of LIB using various ML regression models and compare their performance by considering EV motor characteristics, vehicle speed and environmental conditions as the input parameters along with battery parameters.
-
Selection of highly correlated input parameters by applying MRMR algorithm based feature selection
-
Optimal hyperparameter selection using Bayesian optimisation technique.
The research will provide significant insights into designing next-generation intelligent EV battery monitoring systems leveraging advanced data science and real-time sensor integration.
Significance of machine learning algorithms
The accurate determination of battery SOC is vital for ensuring the safe, reliable and optimal performance of lithium-ion batteries in EV applications21. However, precisely estimating SOC is enormously challenging owing to the intricate nonlinear dynamics governing battery charge and discharge behaviours in response to real-world operating conditions. Factors such as cell chemistry, capacity fading, temperature fluctuations, load profiles and aging effects interact in complex ways to impact SOC evolution. Conventional model-based techniques like ampere-hour counting and open circuit voltage mapping demonstrate limited accuracy due to simplified assumptions and lack of adaptability39. Data-driven ML algorithms present a sophisticated solution through their unparalleled ability to uncover subtle patterns in multivariate time-series data40. By recognizing the implicit correlations between sensor measurements of current, voltage, temperature and SOC, they can effectively learn the battery’s internal charge dynamics41. The key reasons ML methods are hugely significant for realizing accurate SOC prediction are superior function approximation capabilities, adaptability to changing conditions, high noise tolerance, modelling complexity handling, and real-time prediction.
Support vector machines
SVMs are widely utilized in classification and regression problems owing to their proficiency in nonlinear function approximation, high dimensional feature space handling and global optima convergence41. At the core, SVMs employ kernel functions to implicitly transform input data into elevated dimensional spaces where linear decision boundaries can separate classes or fit nonlinear trends. The key concept underpinning the working of SVMs is the kernel trick efficiently computing dot products between inputs in the transformed space using a kernel function without needing to explicitly calculate the high dimensional mapping. Some commonly used SVM kernels include polynomial, radial basis function, and sigmoid kernels. The basic architecture of SVM is given in Fig. 2 and its kernel is listed in Table 3.
Artificial neural networks
Artificial neural networks are computational models inspired by the biological neural networks in human brains. They consist of an interconnected web of artificial neurons organized in layers that transmit signals via weighted connections analogous to biological synapses43. The hierarchical multilayer network architecture provides the representational power to approximate virtually any nonlinear function with arbitrarily high accuracy given sufficient hidden neurons44. At the heart of a neural network’s computational logic is its ability to learn from experience by tuning its synaptic weights using the backpropagation algorithm minimizing the error in network’s outputs by recursively propagating derivatives towards earlier layers45. The universal approximation theorem establishes that even simple feedforward networks with a single layer can represent any continuous function, while deeper networks learn highly nonlinear relationships more efficiently46. The basic architecture of SVM is given in Fig. 3 and its kernel is listed in Table 4.
Gaussian process regression
While neural networks formulate function approximation as optimization over parametric model families, Gaussian processes provide a Bayesian nonparametric perspective48. A Gaussian process defines a distribution over possible functions consistent with the training data instead of explicitly learning network weights. It is fully specified by its mean and covariance functions, with the latter quantifying the correlation structure governing how function values co-vary across input points49. Gaussian process regression considers function evaluation as a stochastic process, with observations getting projected to an associated normal distribution characterized by uncertainty40. The basic architecture of SVM is given in Fig. 4 and its kernel is listed in Table 5.
The proposed SOC estimation methodology
The workflow of SOC estimation of batteries using ML algorithms is given in the Fig. 5. The methodology for estimating SOC follows a systematic flowchart consisting of key steps including data separation, data pre-processing, model selection, evaluation, and hyperparameter tuning. This allows for an organized approach to developing and optimizing an accurate SOC estimation model.
Data collection and preprocessing
The utilization of the DAQ module facilitated the extraction of real-time data from a battery-EV across diverse scenarios. The driver of the vehicle was altered on a daily basis during the data collecting period, leading to fluctuations in both age and weight. It was believed that there would be a variation in driving style corresponding to the age of the driver. Furthermore, a data collection process was conducted for a duration of 30 consecutive days in order to facilitate the observation and recording of externally temperature variations. The purpose of this experiment was to investigate the potential impact of varying temperature conditions on the extracted data. Additionally, the collection of data also accounted for traffic circumstances and road conditions, as these were deemed to potentially influence the SOC of the vehicle. As a result, a complete dataset was acquired, encompassing over 30 parameters such as battery voltage, battery current, temperature, and motor speed. Various pre-processing steps were conducted to prepare the data for subsequent analysis. The process included the rounding of data, removal of duplicates, and adjustment of variable values to enhance accessibility and user-friendliness. Furthermore, the process of feature scaling was implemented on many variables, resulting in their values being transformed to a standardized range of 0–1, while simultaneously maintaining the original parameter value. The objective of feature scaling was to enhance the training process of the algorithmic model. Finally, in order to conduct training and testing on the data, the 60/40 principle was adhered to. This involved allocating 60% of the data for training the model, while the remaining 40% was used for testing purposes.
Feature selection
The procedure of feature selection is conducted using the MATLAB software. The utilization of feature selection approaches in data analysis is of great significance, especially due to a convergence of compelling factors. First and foremost, the process of feature selection plays a significant role in improving the interpretability of a model51. By identifying and retaining just the most relevant input variables, feature selection simplifies the underlying model for estimating SOC. The process of simplification serves two important purposes: enhancing comprehension of the connections between input characteristics and SOC, and improving the computational efficiency of the model52. Additionally, the utilization of feature selection techniques assists in addressing the challenge of the curse of dimensionality, a phenomenon that is especially pertinent in the analysis of battery data. This is due to the fact that battery datasets frequently consist of a large number of input factors. Feature selection mitigates overfitting, enhances the generalizability of models, and guarantees reliable state-of-charge (SOC) predictions in the presence of diverse operating conditions through the reduction of data dimensionality. Moreover, it improves the model’s flexibility, making it more resistant to variations in the battery system’s arrangement or chemical composition, which is a vital aspect in accurately estimating the SOC in real-time for different types of batteries. In addition, the process of feature selection contributes to the optimization of resources by minimizing computing demands and memory requirements. This aligns with the efficiency requirements of real-time applications. Finally, it improves the dependability of the model by reducing the impact of extraneous or repetitive characteristics, hence strengthening the precision of SOC estimations. Figure 6 is plotted with all accessible datasets that can be retrieved in real time using DAQ, and it ranks itself based on its strong association with SOC. From the same figure we were able to conclude that top eight factors named as pack cell temperature, pack current, FET temperature, humidity, ambient temperature, motor temperature, pack voltage and motor speed are associated in in predicting SOC. The input parameters selected for simulation are plotted in Fig. 7.
Model selection
Since the objective of this paper is to predict and estimate the SOC using machine learning techniques, the regression models available in the field of machine learning have been utilized. The reason behind using regression models lies in their ability to effectively capture the relationship between variables and make accurate predictions. Various regression analysis techniques are employed when there exists a linear or non-linear association between the target and independent variables, and the target variable comprises continuous values. Regression analysis is widely recognized as the predominant methodology employed in machine learning for addressing regression problems through the application of data modelling techniques. The process includes the identification of the optimal fit line, which is a line that intersects with all data points in a manner that minimizes the distance between the line and each individual data point. Based on the present study, a number of machine learning models were chosen, including SVM, NN, Ensemble of Trees, and Gaussian Process Regression (GPR). Additionally, to mitigate the issue of overfitting, the extended Kalman filter (EKF) was employed. The EKF method possesses the capability to adaptively modify the estimation procedure in order to reduce the issue of overfitting, the utilization of a system model in the estimate process is a fundamental characteristic of the EKF.
Hyper parameter optimization
Choosing proper model hyperparameters goes about as a difficult undertaking in AI calculations. The model’s performance can be affected by fine-tuning its hyperparameters. A few examples of hyperparameters for profound learning techniques include the number of units in each layer, the number of layers, the activation function in each layer, the optimizer, the learning rate, and so on. More model parameters require a more extensive search hyperspace. Exhaustive techniques, such as grid search (GS) or random search (RS), are computationally costly and tedious because of the huge dimensionality of hyperspace. Thus, probabilistic models, including the Bayesian Optimization algorithm, have been investigated to tune high-cost evaluation models. The hyperparameters of the ML algorithms model displayed here were chosen utilizing a Bayesian Optimization method. A surrogate model of the goal function is created in Bayesian optimization, and only the hyperparameters that demonstrate the greatest promise based on the outcomes of previous trials are evaluated.
Support vector machine (SVM)
The SVM algorithm has hyperparameters that control the kernel type, the kernel coefficients, and the regularization parameter. These parameters affect the decision boundary learned by the SVM. Kernel parameters control the shape of the decision boundary, while C controls the trade-off between misclassification and simplicity of the boundary. The details of SVM hyperparameter range and its optimised hyperparameter is given in Table 6.
Gaussian process regression (GPR)
Key hyperparameters of Gaussian process regression include the kernel function and its parameters (analogous to the SVM), along with parameters that determine the noise level of the data and optimize the trade-off between data fit and model complexity. Common kernels include squared exponential, Matern, and rational quadratic. Kernels have parameters like length scales that control how quickly the correlations decay with distance. The details of GPR hyperparameter range and its optimised hyperparameter is given in Table 7.
Neural network (NN)
Neural networks have numerous hyperparameters including the number and width of hidden layers, activation functions, weight initialization schemes, batch size, number of training epochs, and optimizers. Most modern NNs use adaptive optimization methods like Adam or RMSprop which have additional hyperparameters like learning rate and scheduling. Early stopping is also commonly used, adding another hyperparameter for patience. The hyperparameter values for the machine learning models used here were selected using Bayesian optimization. Bayesian optimization builds a probabilistic surrogate model for the objective and samples hyperparameters that appear most promising based on previous evaluation results. This allows efficient tuning of costly models with many hyperparameters. The details of ANN hyperparameter range and its optimised hyperparameter is given in Table 8.
Validation of the ML model
To comprehensively assess the effectiveness of the constructed machine learning (ML) models, a range of statistical error functions were employed. The root-mean-squared error (RMSE), mean absolute error (MAE), and the coefficient of determination (R^2) were calculated to evaluate the models’ predictive capabilities.
\({z }_{j}\) Represents the predicted data, \(\widehat{z}_{j}\) stands for the target datasets and the term “n” indicates the number of dataset observations.
The RMSE is a widely adopted metric that quantifies the square root of the mean squared differences between the predicted and actual values. Lower RMSE values indicate better model performance, as they signify that the predicted values are more concentrated around the line of best fit. However, it is important to note that the RMSE is sensitive to outliers, as squaring the errors amplifies the impact of larger deviations. In contrast, the MAE calculates the average magnitude of the errors without considering their direction. Unlike the RMSE, the MAE is less influenced by outliers, as it does not involve squaring the errors. Nonetheless, interpreting the MAE can be more challenging compared to the RMSE, as it lacks the same natural interpretation in terms of the data’s units. The coefficient of determination (R2) provides a measure of how well the predicted values from the model fit the actual data. It represents the proportion of the variance in the dependent variable that can be explained by the independent variables in the model. R2 values range from 0 to 1, with higher values indicating a better model fit. However, it is crucial to recognize that a high R2 value does not necessarily imply a good model, as it can be influenced by the model’s complexity and the presence of irrelevant variables.
Results and discussion
Experimental setup
In order to conduct real-time data extraction for a variety of driving cycles, it is vitally important to employ a meticulously developed experimental configuration. This configuration is suitable for gathering essential performance data throughout the operation of a vehicle. The experimental setup of the proposed work is shown in the Fig. 8. An appropriate EV, equipped with the requisite sensors, is chosen ensuring that the vehicle’s motor control system supports data output via the On-Board Diagnostics (OBD) interface. The NI is safely fitted into the designated EV, enabling the acquisition of analogue and digital input channels with superior quality. The car is equipped with supplementary sensors, including a GPS system for tracking its position, a thermocouple for monitoring engine temperature, a temperature sensor for monitoring battery temperature, FET and ambient temperature, humidity sensor to monitor humidity, and a battery voltage and current sensor.
The connection interface serves as an intermediary between the sensors and the data acquisition (DAQ) system. The utilization of a specialized data logging programme that establishes a connection with the DAQ hardware enables the acquisition of real-time data from vehicle sensors, thereby initiating the process of data storage. The DAQ system is linked to the OBD port of the vehicle in order to get essential vehicle parameters, including speed, throttle position, RPM, and other relevant data. The integration of GPS data with the DAQ system enables the precise recording of vehicle location and speed, which is of utmost importance in the generation of a driving cycle. The monitoring of data quality and sensor accuracy is an ongoing process throughout several driving cycles. The data that has been retrieved is safely stored in the CSV format, allowing for easy accessibility. The CSV format includes time stamp data that utilizes GPS information. The retrieved data undergoes essential filtration processes in order to generate a SOC value for the whole driving cycle.
Model evaluation without hyper parameter tuning
The model training and testing was carried out in Matlab 2023a version. Several machine learning models were trained to predict an output variable from input data. The models included trees, Gaussian process regression, linear regression, support vector machines, neural networks, ensembles, and kernel methods. Each algorithm has a set of tuneable hyperparameters that control model complexity, overfitting, and other properties. The models were trained using default hyperparameters without tuning on a separate validation dataset.
Train results
The training results for the different models with hyperparameter tuning are shown in Figs. 9, 10, 11, 12 and 13. The Gaussian process regression model achieved the lowest root mean squared error (RMSE) of 0.0015 and highest R-Squared of 0.9999 during training. Other top-performing models on the validation data included the stepwise linear regression, narrow neural network, and Matern 5/2 Gaussian process regression which obtained RMSE scores below 0.003.
Test results
The test results evaluate model generalization error to new unseen data. As shown in Figs. 14, 15, 16, 17 and 18, many models exhibited overfitting with much higher error metrics on the test data versus validation. The ensemble boosted trees model performed the best on testing with the lowest mean absolute error of 0.22 and R-Squared of 0.23. Other models with reasonable test performance include the exponential Gaussian process model and wide neural network. The extreme overfitting of linear regression and SVM variants highlights the need for proper regularization and hyperparameter tuning through cross-validation.
Model evaluation with hyper parameter
To reduce overfitting exhibited by models on the test data, hyperparameter tuning was performed through cross-validation on the training set. The best cross-validation hyperparameter configuration for each model is shown in Table 9 and the overall comparison of the various ML models with hyperparameter tuning is shown in Fig. 19. By observing various ML models during testing and training with hyperparameter tuning, it is observed that the GPR has high accuracy, reduced RMSE and have high generalisation.
The Gaussian process model used the Matern 5/2 kernel along with a characteristic length scale parameter that controls how quickly the correlations decay between data points. The neural network contains hyperparameters defining the topology and learning process. The support vector machine has kernel parameters along with the regularization strength C. Finally, the random forest ensemble has tuning hyperparameters for the number of decision trees and their depth. Tuning was done through random search and Bayesian optimization of ~ 20 trials per algorithm. The best configurations discovered through this tuning procedure likely improved model generalization and test performance. However, further tuning work could better optimize predictive accuracy. Overall this highlights model selection and tuning of appropriate hyperparameters as a key machine learning pipeline step before final evaluation.
SOC estimation fidelity analysis
Further analysis was conducted using GPR model case studies across different scenarios to evaluate its fidelity in mimicking real battery behaviors:
-
For a typical driver with gradual acceleration and deceleration, the GPR model accurately depicted the linear SOC discharge behaviour from 100% starting SOC down to 92% after 30 min of driving. This aligns with the stabilized discharge profile expected at high SOCs.
-
At a lower starting SOC of 50%, GPR correctly captured the slightly increased battery discharge rate with SOC reducing to 35% over 30 min for the same driver and driving cycle. This reflects the nonlinear capacity effects and voltage drop generally seen at low SOCs.
-
The GPR model effectively took into account the impact of diverse driving cycles on battery usage. Model testing showed much faster SOC depletion in an aggressive NYC cycle compared to an LA highway cycle for the same distance covered. This mimics increased power demand under city driving.
-
GPR accurately incorporated driver behaviour effects with a calm driver discharging the battery slower compared to an aggressive driver over the identical route and distance travelled. This matches expected higher current draws with harsh acceleration/braking.
-
The model reliably predicted higher battery discharge rates at elevated temperatures above 35 °C versus slower discharge at lower temperatures below 15 °C, replicating increased internal resistance losses at high temperatures.
-
For a 20 mile trip, the projected ending SOCs were 91%, 72%, 65% for starting SOCs of 100%, 75% and 50% respectively. This demonstrates GPR’s competence in relating starting SOC and distance travelled to final SOC after a drive cycle.
The case study analysis highlights GPR’s excellence in capturing the intricate charge and discharge relationships of batteries in response to changing real-world conditions. The findings showcase GPR’s high model fidelity for accurately mimicking diverse scenarios reflecting actual EV operation.
Conclusion
Accurate estimation of battery SOC is critical for effective battery management and safe operation of EVs. This study presented a comparative analysis of multiple machine learning regression techniques for modelling the complex nonlinear relationship between real-time driving data and battery SOC. The models will be improved with the incorporation of the Minimum Redundancy Maximum Relevance (MRMR) technique for feature selection and Bayesian optimisation for hyperparameter tuning. In order to improve estimation accuracy, the input parameters include battery voltage, current, environmental condition like humidity and ambient temperature and along with battery temperature, motor and FET temperature in EV. Their performance was quantified using statistical measures of coefficient of determination, root mean squared error, mean absolute error, mean absolute percentage error and mean squared error. The results demonstrated the superior accuracy and robustness of Gaussian process regression compared to the other methods. It consistently achieved the lowest errors with RMSE of 0.8% and MSE of 0.6 on the test dataset. The GPR model also exhibited excellent fidelity in mimicking real battery behaviors under diverse operating conditions. The potential challenge in training this model is the requirement of high speed system processors and the collection of real time data. In the future work, these models will be applying the developed models in practical BMS and analyse its performance.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
- SOC:
-
State of charge
- ML:
-
Machine learning
- SVM:
-
Support vector machine
- NN:
-
Neural network
- GPR:
-
Gaussian process regression
- RMSE:
-
Root mean squared error
- EV:
-
Electric vehicle
- SOH:
-
State of health
- MAE:
-
Mean absolute error
- MAPE:
-
Mean absolute percentage error
- R2 :
-
Coefficient of determination (squared)
- DAQ:
-
Data acquisition
- BMS:
-
Battery management system
- EKF:
-
Extended Kalman Filter
- SOP:
-
State of power
- SOE:
-
State of energy
References
Xie, J. et al. Battery electric vehicles for modern power systems—A comprehensive review on technological evolutions and integration paradigms. Renew. Energy 183, 459–489 (2022).
Liu, Z., Zhang, Q., Huisingh, D. & Wang, Y. Carbon emissions of electric vehicles based on electricity generation mix: A regional assessment of China. Renew. Energy 163, 1217–1233 (2021).
Wang, Z., Zhang, X., Sun, Y. & Liu, W. A comprehensive overview of hybrid electric vehicles. Appl. Energy 269, 115054 (2020).
Zhang, X., Sahinoglu, Z., Wada, T., Hara, S. & Sakurai, J. Recent progress on lithium ion battery performance degradation analysis and state estimation technologies: A review. J. Energy Storage 28, 101230 (2020).
Chu, S., Majumdar, A., Pan, J., Chiang, Y. M. & Wu, Z. Why are commercial lithium ion batteries unstable? An overview of stability issues, consequences, and remedies. J. Power Sources 493, 229562 (2021).
Jiang, J. & Dahn, J. R. Effects of particle size, electronic conductivity and amount of conductive carbon on performance of LiFePO4 cathodes. Electrochim. Acta 331, 135409 (2020).
Zubi, G., Dufo-López, R., Carvalho, M. & Pasaoglu, G. The lithium-ion battery: State of the art and critical review of modelling approaches. Renew. Sustain. Energy Rev. 129, 109918 (2021).
Yang, X. G., Leng, F., Zhang, G., Ge, S. & Wang, C. Y. Modeling of lithium plating induced aging of lithium-ion batteries: Transition from lithium plating to the solid electrolyte interphase growth. Electrochim. Acta 358, 136843 (2021).
Zheng, H., Sun, Q., Liu, K., Song, X. & Battaglia, V. S. Correlation between dissolution behavior and thermal stability of charged cathode materials in lithium ion batteries. J. Power Sources 448, 227433 (2020).
Li, J. et al. Managing lithium-ion battery safety—A review of fundamentals, solutions and future directions. Renew. Energy 183, 19–45 (2022).
Selvaraj, V. & Vairavasundaram, I. A comprehensive review of state of charge estimation in lithium-ion batteries used in electric vehicles. J. Energy Storage 72, 108777 (2023).
Vedhanayaki, S. & Indragandhi, V. A Bayesian optimized deep learning approach for accurate state of charge estimation of lithium ion batteries used for electric vehicle application. IEEE Access 12, 43308–43327 (2024).
Shen, J., Dusmez, S. & Khaligh, A. Optimization of sizing and battery cycle life in battery/ultracapacitor hybrid energy storage systems for electric vehicle applications. IEEE Trans. Ind. Inf. 10(4), 2112–2121 (2015).
Xiong, R., Cao, J., Yu, Q., He, H. & Sun, F. Critical review on the battery state of charge estimation methods for electric vehicles. IEEE Access 6, 1832–1843 (2018).
Zhang, C. et al. Challenges and methodologies on high-accuracy lithium-ion battery SOC estimation for electric vehicles: A comprehensive review. Energy Rep. 7, 4413–4429 (2021).
Li, J., Barillas, J. K., Guenther, C. & Danzer, M. A. A comparative study of state of charge estimation algorithms for LiFePO4 batteries used in electric vehicles. J. Power Sources 230, 244–250 (2013).
Waag, W., Fleischer, C. & Sauer, D. U. Critical review of the methods for monitoring of lithium-ion batteries in electric and hybrid vehicles. J. Power Sources 258, 321–339 (2014).
Bruen, T. & Marco, J. Modelling and experimental evaluation of parallel connected lithium ion cells for an electric vehicle battery system. J. Power Sources 310, 91–101 (2016).
Goutam, S., Timmermans, J. M., Omar, N., Van Mierlo, J. & Van den Bossche, P. Comparative study of surface charge estimation methods for lithium ion batteries. Int. J. Energy Res. 39(14), 1878–1895 (2015).
Zhou, L., Pan, R., Xi, Y. & Chen, Z. Lithium-ion battery modeling and state of charge estimation approaches: A review. Comput. Ind. Eng. 149, 106882 (2020).
Hannan, M. A., Lipu, M. S. H., Hussain, A. & Saad, M. H. M. Neural network approach for estimating state of charge of lithium-ion battery using backtracking search algorithm. IEEE Access 8, 50107–50115 (2020).
Hu, X. et al. Data-driven methodologies for battery state of health and state of function assessment—A review. Renew. Sustain. Energy Rev. 143, 110898 (2021).
Hannan, M. A. et al. Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer model. Sci. Rep. 11(1), 19541 (2021).
Chandran, V. et al. State of charge estimation of lithium-ion battery for electric vehicles using machine learning algorithms. World Electr. Vehicle J. 12(1), 38 (2021).
Li, C., Xiao, F. & Fan, Y. An approach to state of charge estimation of lithium-ion batteries based on recurrent neural networks with gated recurrent unit. Energies 12(9), 1592 (2019).
How, D. N. et al. SOC estimation using deep bidirectional gated recurrent units with tree parzen estimator hyperparameter optimization. IEEE Trans. Ind. Appl. 58(5), 6629–6638 (2022).
Caliwag, A., Muh, K. L., Kang, S. H., Park, J., & Lim, W. (2020, February). Design of modular battery management system with point-to-point SoC estimation algorithm. In 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) (pp. 701–704). IEEE.
Dubey, A., Zaidi, A., & Kulshreshtha, A. (2022). State-of-charge estimation algorithm for Li-ion batteries using long short-term memory network with Bayesian optimization. In 2022 Second International Conference on Interdisciplinary Cyber Physical Systems (ICPS) (pp. 68–73). IEEE.
Lipu, M. S. H. et al. Extreme learning machine model for state-of-charge estimation of lithium-ion battery using gravitational search algorithm. IEEE Trans. Ind. Appl. 55(4), 4225–4234 (2019).
Zhao, X., Qian, X., Xuan, D. & Jung, S. State of charge estimation of lithium-ion battery based on multi-input extreme learning machine using online model parameter identification. J. Energy Storage 56, 105796 (2022).
Jumah, S., Elezab, A., Zayed, O., Ahmed, R., Narimani, M., & Emadi, A. (2022). State of charge estimation for ev batteries using support vector regression. in 2022 IEEE Transportation Electrification Conference & Expo (ITEC) (pp. 964–969). IEEE.
Ismail, M., Dlyma, R., Elrakaybi, A., Ahmed, R., & Habibi, S. (2017). Battery state of charge estimation using an Artificial Neural Network. In 2017 IEEE transportation electrification conference and expo (ITEC) (pp. 342–349). IEEE.
Cui, S., Wang, Z., Pu, J., Ma, Y. & Ouyang, M. Estimation of state of charge of lithium-ion rechargeable batteries based on support vector machine. Int. J. Hydrogen Energy 37(22), 17209–17216 (2012).
Chaoui, H. & Ibe-Ekeocha, C. C. State of charge and state of health estimation for lithium batteries using recurrent neural networks. IEEE Trans. Veh. Technol. 66(12), 10773–10783 (2017).
Liu, J., Saxena, S., Goebel, K., Saha, B. & Wang, W. 2010. An adaptive recurrent neural network for remaining useful life prediction of lithium-ion batteries. In Annual conference of the prognostics and health management society (pp. 1–10).
Hannan, M. A. et al. Data-driven state of charge estimation of lithium-ion batteries: Algorithms, implementation factors, limitations and future trends. J. Clean. Prod. 277, 124110 (2020).
Elmi S, Tan KL (2021) DeepFEC: Energy consumption prediction under real-world driving conditions for smart cities. In: The web conference 2021—Proceedings of the world wide web conference, WWW 2021, ACM, New York, NY, USA, 2021, pp 1880–1890
Yi, Z. & Bauer, P. H. Effects of environmental factors on electric vehicle energy consumption: A sensitivity analysis. IET Electr. Syst. Transp. 7(1), 3–13 (2017).
Liu, N., Zhang, M., Zheng, C. & Lu, B. Ensemble learning for lithium-ion battery state of health prediction. Appl. Energy 279, 115673 (2020).
Liu, D., Zhou, L., Pan, R., Chen, Z. & Bie, Z. Lithium-ion battery state-of-health online estimation based on multi-scale relevance vector machine. Appl. Energy 306, 118174 (2021).
Yan, J., Liu, C., Han, X., Ji, B. & Sun, H. Lithium-ion battery health prognosis using brown exponential smoothing model-an experimental study. IEEE Access 8, 36367–36377 (2020).
Cristianini, N. and Shawe-Taylor, J., An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, 2000. Retrieved from https://www.cs.ox.ac.uk/activities/ieg/e-library/sources/online/cristianini1/index.html.
Lei, X. et al. Review of statistical and mathematical models for lithium-ion battery health monitoring and degradation prediction. J. Power Sources 481, 226893 (2020).
Jaguemont, J., Boulon, L. & Dubé, Y. A comprehensive review of lithium-ion batteries used in hybrid and electric vehicles at cold temperatures. Appl. Energy 281, 116099 (2021).
Liu, K., Li, Y., Hu, X. & Lucu, M. Lithium-ion battery state of health monitoring and remaining useful life prediction based on support vector regression-particle filter. J. Power Sources 479, 228736 (2020).
Li, Z., Outbib, R., Giurgea, S., Hissel, D. & Gao, B. 2021. An Adaptive Machine Learning Method for Li-Ion Battery State of Charge Estimation. IEEE Transactions on Transportation Electrification.
Goodfellow, I., Bengio, Y. and Courville, A., Deep Learning. MIT Press, 2016. Retrieved from https://www.deeplearningbook.org/.
Liu, K., Li, Y., Hu, X., Lucu, M. & Widanage, W. D. Gaussian process regression with automatic relevance determination kernel function for uncertainty quantification in lithium-ion battery modelling. Appl. Energy 283, 116335 (2021).
Yan, B., Xu, B., Shi, H. & Zhang, X. Adaptive state of health estimation for lithium-ion batteries based on an improved gaussian process regression model. IEEE Access 8, 87191–87204 (2020).
Rasmussen, C.E. and Williams, C.K.I., Gaussian Processes for Machine Learning. MIT Press, 2006. Retrieved from https://gaussianprocess.org/gpml/chapters/RW.pdf.
Lei, X., Foley, A. M., Crow, M. L., Liu, Y. & Zheng, H. A review of feature selection methods for lithium-ion batteries state of health and critical safety indicators monitoring and prognosis. J. Power Sources 481, 228860 (2021).
Liu, G. et al. Lithium-ion battery health prognosis using brown exponential smoothing model-an experimental study. IEEE Access 9, 26004–26012 (2021).
Acknowledgements
The authors acknowledge the funding support for carrying this research activity. This research work is supported and carried as part of the funding from the organizations like “Indian Council of Social Science Research (ICSSR/RPD/MN/2023-24/G/39)” and ‘‘Royal Academy of Engineering, UK (ATSP-2325-5-IN_172)”.
Funding
Open access funding provided by Vellore Institute of Technology.
Author information
Authors and Affiliations
Contributions
S.V, O.P and P.B wrote the main manuscript, collected data and executed the simulation. I.V, A.B, K.C supervised the project and verified the final draft.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
D., O.P., Babu, P.S., V., I. et al. Enhanced SOC estimation of lithium ion batteries with RealTime data using machine learning algorithms. Sci Rep 14, 16036 (2024). https://doi.org/10.1038/s41598-024-66997-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-66997-9
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.